PostgreSQL LIKE Performance: CAST, Concat, and OR Comparison
I started thinking about a query I came across recently. There were two columns: LongColumn (bigint) and Description (text). The search worked like this:
// Old code: CAST + concat
context.Documents
.Where(d => EF.Functions.Like(
d.LongColumn.ToString() + " " + d.Description, pattern))
EF Core sent this to PostgreSQL as LongColumn::text || ' ' || Description. To run a LIKE search on a bigint column, you needed to cast with ::text and concatenate it with Description. Later, the data in LongColumn was moved to a new text column called TextColumn. The CAST was no longer needed.
So how should we write the query now? Six different approaches:
// A: CAST + concat (old way)
.Where(d => EF.Functions.Like(
d.LongColumn.ToString() + " " + d.Description, pattern))
// B: CAST, single column
.Where(d => EF.Functions.Like(d.LongColumn.ToString(), pattern))
// C: TextColumn concat
.Where(d => EF.Functions.Like(
d.TextColumn + " " + d.Description, pattern))
// D: TextColumn single column
.Where(d => EF.Functions.Like(d.TextColumn, pattern))
// E: TextColumn OR Description (separate LIKE)
.Where(d => EF.Functions.Like(d.TextColumn, pattern)
|| EF.Functions.Like(d.Description, pattern))
// F: LongColumn::text OR Description (CAST OR)
.Where(d => EF.Functions.Like(d.LongColumn.ToString(), pattern)
|| EF.Functions.Like(d.Description, pattern))
Which one is faster? How expensive is CAST? Does concat break index usage? Can the OR approach use an index? To answer these questions, I set up a benchmark with 2 million rows.
Benchmark Setup
- PostgreSQL 16 (Docker), port 5433
documentstable:Id,LongColumn (bigint),TextColumn (text),Description (text)- 2 million rowsTextColumn: string representation ofLongColumn+ Turkish text. Average 439 bytesLongColumn: random 10-digit bigint. String representation averages 14 bytesDescription: Turkish text, average 423 bytes- Indexes on
TextColumnandDescription(you cannot create a text index on abigintcolumn) - 3 index strategies: None, B-tree, GIN trigram (
pg_trgm) - 3 patterns:
%ankara%(%42 rows),%99 rows),%de%(%xyzqwerty%(none) - Each scenario run 5 times, best/worst discarded, average of 3
- Measured with
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SQL Generated by EF Core
A - CAST + concat
-- @__pattern_1='%ankara%'
SELECT d."Id", d."Description", d."LongColumn", d."TextColumn"
FROM documents AS d
WHERE d."LongColumn"::text || ' ' || d."Description" LIKE @__pattern_1 ESCAPE ''
B - CAST, single column
WHERE d."LongColumn"::text LIKE @__pattern_1 ESCAPE ''
C - TextColumn concat
WHERE d."TextColumn" || ' ' || d."Description" LIKE @__pattern_1 ESCAPE ''
D - TextColumn single column
WHERE d."TextColumn" LIKE @__pattern_1 ESCAPE ''
E - TextColumn OR Description
WHERE d."TextColumn" LIKE @__pattern_1 ESCAPE '' OR d."Description" LIKE @__pattern_1 ESCAPE ''
F - LongColumn::text OR Description
WHERE d."LongColumn"::text LIKE @__pattern_1 ESCAPE '' OR d."Description" LIKE @__pattern_1 ESCAPE ''
d.LongColumn.ToString() becomes LongColumn::text. string + string becomes ||. ESCAPE '' is EF Core's default, no performance impact.
Results - With GIN Trigram Index
The most critical scenario is with a GIN index. I skip B-tree since it gives the same result as None - LIKE '%x%' is fundamentally incompatible with B-tree.
| Approach | Pattern | Time (ms) | Read Blocks | Rows |
|---|---|---|---|---|
| A: CAST+concat | %ankara% | 702.35 | 465,400 | 1,036,239 |
| A: CAST+concat | %de% | 645.73 | 232,304 | 1,991,384 |
| A: CAST+concat | %xyzqwerty% | 468.41 | 464,416 | 0 |
| B: CAST single | %ankara% | 131.93 | 459,568 | 0 |
| B: CAST single | %de% | 328.45 | 229,556 | 0 |
| B: CAST single | %xyzqwerty% | 137.11 | 457,680 | 0 |
| C: TextConcat | %ankara% | 996.65 | 453,224 | 1,729,401 |
| C: TextConcat | %de% | 640.01 | 226,244 | 1,999,945 |
| C: TextConcat | %xyzqwerty% | 780.89 | 451,480 | 0 |
| D: TextSingle | %ankara% | 1,644.34 | 224,260 | 850,373 |
| D: TextSingle | %de% | 552.69 | 224,148 | 1,987,804 |
| D: TextSingle | %xyzqwerty% | 0.01 | 0 | 0 |
| E: TextOR | %ankara% | 2,480.59 | 223,484 | 1,297,051 |
| E: TextOR | %de% | 543.10 | 223,204 | 1,999,945 |
| E: TextOR | %xyzqwerty% | 0.01 | 0 | 0 |
| F: CAST OR | %ankara% | 1,776.36 | 222,064 | 777,179 |
| F: CAST OR | %de% | 627.46 | 221,848 | 1,991,384 |
| F: CAST OR | %xyzqwerty% | 459.11 | 443,016 | 0 |
Key Finding 1: GIN Index - Who Uses It, Who Can't?
Focus on the %xyzqwerty% pattern. This pattern exists in zero rows - it shows the index's real effect most clearly:
| # | Approach | SQL Summary | Time | Scan Type | Index? |
|---|---|---|---|---|---|
| A | CAST+concat | LongCol::text || Desc | 468ms | Seq Scan | ❌ |
| B | CAST single | LongCol::text | 137ms | Seq Scan | ❌ |
| C | TextConcat | TextCol || Desc | 781ms | Seq Scan | ❌ |
| D | TextSingle | TextCol | 0.01ms | Bitmap Index Scan | ✅ |
| E | Text OR | TextCol OR Desc | 0.01ms | BitmapOr | ✅✅ |
| F | CAST OR | LongCol::text OR Desc | 459ms | Seq Scan | ❌ |
Only D and E can use the GIN index. Why?
A, B, C: || concat and ::text CAST are expressions. The GIN index is on the TextColumn and Description columns. Expressions don't match these indexes - the planner falls back to Seq Scan.
D: TextColumn is a direct column reference. ix_text_gin is exactly on this column. The planner immediately uses Bitmap Index Scan.
E: Two direct column references connected by OR. PostgreSQL scans ix_text_gin and ix_desc_gin separately, then combines results with BitmapOr. Two indexes, one pass.
F: LongColumn::text is CAST, Description is direct. The CAST part cannot use an index. PostgreSQL drops the entire OR expression to Seq Scan - even though Description has a GIN index.
Key Finding 2: E's EXPLAIN Plan - BitmapOr
Approach E (TextColumn OR Description) with GIN index produces this plan:
Bitmap Heap Scan (time=0.003ms, rows=0)
-> BitmapOr
-> Bitmap Index Scan on ix_text_gin (time=0.003ms)
Index Cond: ("TextColumn" ~~ '%xyzqwerty%')
-> Bitmap Index Scan on ix_desc_gin (time=0.003ms)
Index Cond: ("Description" ~~ '%xyzqwerty%')
Two indexes scanned in parallel, combined with BitmapOr, heap never touched. Result: 0.01ms.
This shows why separate LIKE + OR is so much better than concat. Concat (A, C) turns the expression into a single expression and disables the index, while OR lets each column use its index independently.
Key Finding 3: CAST Poisons OR
F (LongColumn::text OR Description) does a Seq Scan even with a GIN index:
Gather (workers=2) -> Parallel Seq Scan
Filter: ((LongColumn::text ~~ '%xyzqwerty%') OR (Description ~~ '%xyzqwerty%'))
Rows Removed by Filter: 666,667 (x3 workers = 2M)
Execution Time: 459ms
Description has a GIN index (ix_desc_gin). But because LongColumn::text CAST is non-indexable, PostgreSQL plans the entire OR expression as a Seq Scan. A single non-indexable term leaves the whole query without index benefits.
This shows why the migration was critical: as long as LongColumn (bigint) remains, whether you use concat or OR, you get no index benefit.
Key Finding 4: Speed Ranking (GIN, sparse pattern)
Winners:
- D and E: 0.01ms - uses index, sub-millisecond
- B: 137ms - Seq Scan but shortest data (14 bytes)
- A: 468ms - Seq Scan, CAST + concat (~438 bytes)
- F: 459ms - Seq Scan, CAST OR (~438 bytes)
- C: 781ms - Seq Scan, TextConcat (~863 bytes)
C is slowest because it scans the largest data (TextColumn 439B + Description 423B = 862B). B is the fastest Seq Scan because it scans only 14 bytes of LongColumn::text.
But when the index is in play, data size no longer matters - D scans 439 bytes but still takes 0.01ms thanks to the index.
Other Findings
Index is not used for frequent patterns. For %de% and %ankara%, all approaches use Seq Scan. The planner knows that when many rows match, heap lookup cost exceeds a direct scan.
CAST cost is low. The bigint::text conversion is microsecond-level. The real cost is concatenating two large columns.
A is faster than C. CAST+concat (468ms) is faster than TextConcat (781ms). Reason: LongColumn::text is 14 bytes, TextColumn is 439 bytes. Data size is the deciding factor, not CAST cost.
Practical Recommendations
-
Never use concat in a LIKE query. The
col1 || ' ' || col2expression completely disables column-level indexes. Always use separate LIKE + OR instead. -
When using OR, don't include CAST columns. Even a single CAST term makes the entire OR fall to Seq Scan. Consider writing the CAST column as a separate WHERE condition and combining with
UNION. -
Migrate non-text columns to text. If you need LIKE search on types like
bigint, migrate to a text column. CAST disappears and GIN index becomes usable. -
GIN trigram index is devastating for sparse patterns. But the planner won't use it for frequent patterns. Evaluate the INSERT/UPDATE cost of the index.
-
Separate LIKE + OR is as fast as single-column LIKE. Both E (
TextColumn OR Description) and D (TextColumn) take 0.01ms. Adding the second column has negligible index cost (BitmapOr). -
Measure on your own data. This benchmark is 2M rows, Docker macOS. Test with
EXPLAIN ANALYZEin your own environment.