PostgreSQL LIKE Performance: CAST, Concat, and OR Comparison

I started thinking about a query I came across recently. There were two columns: LongColumn (bigint) and Description (text). The search worked like this:

// Old code: CAST + concat
context.Documents
    .Where(d => EF.Functions.Like(
        d.LongColumn.ToString() + " " + d.Description, pattern))

EF Core sent this to PostgreSQL as LongColumn::text || ' ' || Description. To run a LIKE search on a bigint column, you needed to cast with ::text and concatenate it with Description. Later, the data in LongColumn was moved to a new text column called TextColumn. The CAST was no longer needed.

So how should we write the query now? Six different approaches:

// A: CAST + concat (old way)
.Where(d => EF.Functions.Like(
    d.LongColumn.ToString() + " " + d.Description, pattern))

// B: CAST, single column
.Where(d => EF.Functions.Like(d.LongColumn.ToString(), pattern))

// C: TextColumn concat
.Where(d => EF.Functions.Like(
    d.TextColumn + " " + d.Description, pattern))

// D: TextColumn single column
.Where(d => EF.Functions.Like(d.TextColumn, pattern))

// E: TextColumn OR Description (separate LIKE)
.Where(d => EF.Functions.Like(d.TextColumn, pattern)
         || EF.Functions.Like(d.Description, pattern))

// F: LongColumn::text OR Description (CAST OR)
.Where(d => EF.Functions.Like(d.LongColumn.ToString(), pattern)
         || EF.Functions.Like(d.Description, pattern))

Which one is faster? How expensive is CAST? Does concat break index usage? Can the OR approach use an index? To answer these questions, I set up a benchmark with 2 million rows.

Benchmark Setup

  • PostgreSQL 16 (Docker), port 5433
  • documents table: Id, LongColumn (bigint), TextColumn (text), Description (text) - 2 million rows
  • TextColumn: string representation of LongColumn + Turkish text. Average 439 bytes
  • LongColumn: random 10-digit bigint. String representation averages 14 bytes
  • Description: Turkish text, average 423 bytes
  • Indexes on TextColumn and Description (you cannot create a text index on a bigint column)
  • 3 index strategies: None, B-tree, GIN trigram (pg_trgm)
  • 3 patterns: %ankara% (%42 rows), %de% (%99 rows), %xyzqwerty% (none)
  • Each scenario run 5 times, best/worst discarded, average of 3
  • Measured with EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)

SQL Generated by EF Core

A - CAST + concat

-- @__pattern_1='%ankara%'
SELECT d."Id", d."Description", d."LongColumn", d."TextColumn"
FROM documents AS d
WHERE d."LongColumn"::text || ' ' || d."Description" LIKE @__pattern_1 ESCAPE ''

B - CAST, single column

WHERE d."LongColumn"::text LIKE @__pattern_1 ESCAPE ''

C - TextColumn concat

WHERE d."TextColumn" || ' ' || d."Description" LIKE @__pattern_1 ESCAPE ''

D - TextColumn single column

WHERE d."TextColumn" LIKE @__pattern_1 ESCAPE ''

E - TextColumn OR Description

WHERE d."TextColumn" LIKE @__pattern_1 ESCAPE '' OR d."Description" LIKE @__pattern_1 ESCAPE ''

F - LongColumn::text OR Description

WHERE d."LongColumn"::text LIKE @__pattern_1 ESCAPE '' OR d."Description" LIKE @__pattern_1 ESCAPE ''

d.LongColumn.ToString() becomes LongColumn::text. string + string becomes ||. ESCAPE '' is EF Core's default, no performance impact.

Results - With GIN Trigram Index

The most critical scenario is with a GIN index. I skip B-tree since it gives the same result as None - LIKE '%x%' is fundamentally incompatible with B-tree.

ApproachPatternTime (ms)Read BlocksRows
A: CAST+concat%ankara%702.35465,4001,036,239
A: CAST+concat%de%645.73232,3041,991,384
A: CAST+concat%xyzqwerty%468.41464,4160
B: CAST single%ankara%131.93459,5680
B: CAST single%de%328.45229,5560
B: CAST single%xyzqwerty%137.11457,6800
C: TextConcat%ankara%996.65453,2241,729,401
C: TextConcat%de%640.01226,2441,999,945
C: TextConcat%xyzqwerty%780.89451,4800
D: TextSingle%ankara%1,644.34224,260850,373
D: TextSingle%de%552.69224,1481,987,804
D: TextSingle%xyzqwerty%0.0100
E: TextOR%ankara%2,480.59223,4841,297,051
E: TextOR%de%543.10223,2041,999,945
E: TextOR%xyzqwerty%0.0100
F: CAST OR%ankara%1,776.36222,064777,179
F: CAST OR%de%627.46221,8481,991,384
F: CAST OR%xyzqwerty%459.11443,0160

Key Finding 1: GIN Index - Who Uses It, Who Can't?

Focus on the %xyzqwerty% pattern. This pattern exists in zero rows - it shows the index's real effect most clearly:

#ApproachSQL SummaryTimeScan TypeIndex?
ACAST+concatLongCol::text || Desc468msSeq Scan
BCAST singleLongCol::text137msSeq Scan
CTextConcatTextCol || Desc781msSeq Scan
DTextSingleTextCol0.01msBitmap Index Scan
EText ORTextCol OR Desc0.01msBitmapOr✅✅
FCAST ORLongCol::text OR Desc459msSeq Scan

Only D and E can use the GIN index. Why?

A, B, C: || concat and ::text CAST are expressions. The GIN index is on the TextColumn and Description columns. Expressions don't match these indexes - the planner falls back to Seq Scan.

D: TextColumn is a direct column reference. ix_text_gin is exactly on this column. The planner immediately uses Bitmap Index Scan.

E: Two direct column references connected by OR. PostgreSQL scans ix_text_gin and ix_desc_gin separately, then combines results with BitmapOr. Two indexes, one pass.

F: LongColumn::text is CAST, Description is direct. The CAST part cannot use an index. PostgreSQL drops the entire OR expression to Seq Scan - even though Description has a GIN index.

Key Finding 2: E's EXPLAIN Plan - BitmapOr

Approach E (TextColumn OR Description) with GIN index produces this plan:

Bitmap Heap Scan (time=0.003ms, rows=0)
  -> BitmapOr
    -> Bitmap Index Scan on ix_text_gin (time=0.003ms)
       Index Cond: ("TextColumn" ~~ '%xyzqwerty%')
    -> Bitmap Index Scan on ix_desc_gin (time=0.003ms)
       Index Cond: ("Description" ~~ '%xyzqwerty%')

Two indexes scanned in parallel, combined with BitmapOr, heap never touched. Result: 0.01ms.

This shows why separate LIKE + OR is so much better than concat. Concat (A, C) turns the expression into a single expression and disables the index, while OR lets each column use its index independently.

Key Finding 3: CAST Poisons OR

F (LongColumn::text OR Description) does a Seq Scan even with a GIN index:

Gather (workers=2) -> Parallel Seq Scan
  Filter: ((LongColumn::text ~~ '%xyzqwerty%') OR (Description ~~ '%xyzqwerty%'))
  Rows Removed by Filter: 666,667 (x3 workers = 2M)
  Execution Time: 459ms

Description has a GIN index (ix_desc_gin). But because LongColumn::text CAST is non-indexable, PostgreSQL plans the entire OR expression as a Seq Scan. A single non-indexable term leaves the whole query without index benefits.

This shows why the migration was critical: as long as LongColumn (bigint) remains, whether you use concat or OR, you get no index benefit.

Key Finding 4: Speed Ranking (GIN, sparse pattern)

Winners:

  1. D and E: 0.01ms - uses index, sub-millisecond
  2. B: 137ms - Seq Scan but shortest data (14 bytes)
  3. A: 468ms - Seq Scan, CAST + concat (~438 bytes)
  4. F: 459ms - Seq Scan, CAST OR (~438 bytes)
  5. C: 781ms - Seq Scan, TextConcat (~863 bytes)

C is slowest because it scans the largest data (TextColumn 439B + Description 423B = 862B). B is the fastest Seq Scan because it scans only 14 bytes of LongColumn::text.

But when the index is in play, data size no longer matters - D scans 439 bytes but still takes 0.01ms thanks to the index.

Other Findings

Index is not used for frequent patterns. For %de% and %ankara%, all approaches use Seq Scan. The planner knows that when many rows match, heap lookup cost exceeds a direct scan.

CAST cost is low. The bigint::text conversion is microsecond-level. The real cost is concatenating two large columns.

A is faster than C. CAST+concat (468ms) is faster than TextConcat (781ms). Reason: LongColumn::text is 14 bytes, TextColumn is 439 bytes. Data size is the deciding factor, not CAST cost.

Practical Recommendations

  1. Never use concat in a LIKE query. The col1 || ' ' || col2 expression completely disables column-level indexes. Always use separate LIKE + OR instead.

  2. When using OR, don't include CAST columns. Even a single CAST term makes the entire OR fall to Seq Scan. Consider writing the CAST column as a separate WHERE condition and combining with UNION.

  3. Migrate non-text columns to text. If you need LIKE search on types like bigint, migrate to a text column. CAST disappears and GIN index becomes usable.

  4. GIN trigram index is devastating for sparse patterns. But the planner won't use it for frequent patterns. Evaluate the INSERT/UPDATE cost of the index.

  5. Separate LIKE + OR is as fast as single-column LIKE. Both E (TextColumn OR Description) and D (TextColumn) take 0.01ms. Adding the second column has negligible index cost (BitmapOr).

  6. Measure on your own data. This benchmark is 2M rows, Docker macOS. Test with EXPLAIN ANALYZE in your own environment.

Sources

Share