Skip to content

Instantly share code, notes, and snippets.

@jon-whit
Last active May 17, 2024 09:18

Setup

  1. Start a Postgres instance, for example
docker run -p 5432:5432 -e POSTGRES_PASSWORD=password -d docker
  1. Open a psql client shell
psql -h localhost -p 5432  -U postgres -d postgres
Password for user postgres: <enter password>
  1. CREATE TABLE test (object_id TEXT, PRIMARY KEY (object_id));
postgres=# \d+ test;
                                           Table "public.test"
  Column   | Type | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
-----------+------+-----------+----------+---------+----------+-------------+--------------+-------------
 object_id | text |           | not null |         | extended |             |              |
Indexes:
    "test_pkey" PRIMARY KEY, btree (object_id)
Access method: heap

Query Plan Scenarios

Scenario 1

Description: With 1k object_ids, select the middle most object in the table

  1. go run main.go -num_ids=1000 | pbcopy
  2. go run main.go -num_ids=1000 -query | pbcopy
postgres=# EXPLAIN ANALYZE SELECT object_id FROM test WHERE object_id IN ('500');
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Only Scan using test_pkey on test  (cost=0.28..4.29 rows=1 width=3) (actual time=0.039..0.040 rows=1 loops=1)
   Index Cond: (object_id = '500'::text)
   Heap Fetches: 0
 Planning Time: 0.454 ms
 Execution Time: 0.076 ms
(5 rows)

Scenario 2

Description: With 5k object_ids, select the middle most object in the table

  1. go run main.go -num_ids=5000 | pbcopy
  2. go run main.go -num_ids=5000 -query | pbcopy
postgres=# EXPLAIN ANALYZE SELECT object_id FROM test WHERE object_id IN ('2500');
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Only Scan using test_pkey on test  (cost=0.28..4.30 rows=1 width=4) (actual time=0.067..0.069 rows=1 loops=1)
   Index Cond: (object_id = '2500'::text)
   Heap Fetches: 0
 Planning Time: 0.634 ms
 Execution Time: 0.543 ms
(5 rows)

Scenario 3

Description: With 10k object_ids, select the middle most object in the table

  1. go run main.go -num_ids=10000 | pbcopy
  2. go run main.go -num_ids=10000 -query | pbcopy
postgres=# EXPLAIN ANALYZE SELECT object_id FROM test WHERE object_id IN ('5000');
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Only Scan using test_pkey on test  (cost=0.29..8.30 rows=1 width=4) (actual time=0.100..0.103 rows=1 loops=1)
   Index Cond: (object_id = '5000'::text)
   Heap Fetches: 1
 Planning Time: 0.218 ms
 Execution Time: 0.165 ms
(5 rows)

Scenario 4

Description: With 20k object_ids, select the middle most object in the table

  1. go run main.go -num_ids=20000 | pbcopy
  2. go run main.go -num_ids=20000 -query | pbcopy
postgres=# EXPLAIN ANALYZE SELECT object_id FROM test WHERE object_id IN ('10000');
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Only Scan using test_pkey on test  (cost=0.29..8.30 rows=1 width=4) (actual time=0.093..0.095 rows=1 loops=1)
   Index Cond: (object_id = '10000'::text)
   Heap Fetches: 1
 Planning Time: 0.141 ms
 Execution Time: 0.128 ms
(5 rows)

Scenario 5

Description: With 100k object_ids, select the middle most object in the table

  1. go run main.go -num_ids=100000 | pbcopy
  2. go run main.go -num_ids=100000 -query | pbcopy
EXPLAIN ANALYZE SELECT object_id FROM test WHERE object_id IN ('50000');
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Index Only Scan using test_pkey on test  (cost=0.42..8.44 rows=1 width=5) (actual time=0.041..0.041 rows=1 loops=1)
   Index Cond: (object_id = '50000'::text)
   Heap Fetches: 1
 Planning Time: 0.130 ms
 Execution Time: 0.063 ms
(5 rows)

Summary: Scenarios 1-5 demonstrate that as the input table size grows, selecting from a row in the middle of the b-tree remains relatively amortized in cost due to the logarithmic complexity of the b-tree traversal. This is exactly what we'd expect from a b-tree.


Scenario 6

Description: 100k rows, 1 row in the filter matching in the middle of the table, 10 object_ids provided in the predicate filter

  1. go run main.go -num_ids=100000 | pbcopy
  2. go run main.go -num_ids=100000 -list_size=10 -query | pbcopy
EXPLAIN ANALYZE SELECT object_id FROM test WHERE object_id IN ('50000','100000', '100001', '100002', '100003', '100004', '100005', '100006', '100007', '100008', '100009');
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using test_pkey on test  (cost=0.42..48.79 rows=11 width=5) (actual time=0.360..0.361 rows=1 loops=1)
   Index Cond: (object_id = ANY ('{50000,100000,100001,100002,100003,100004,100005,100006,100007,100008,100009}'::text[]))
   Heap Fetches: 0
 Planning Time: 0.247 ms
 Execution Time: 0.401 ms
(5 rows)

Scenario 7

Description: 100k rows, 1 row in the filter matching in the middle of the table, 100 object_ids provided in the predicate filter

  1. go run main.go -num_ids=100000 | pbcopy
  2. go run main.go -num_ids=100000 -list_size=100 -query | pbcopy
                                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using test_pkey on test  (cost=0.42..415.94 rows=101 width=5) (actual time=0.507..0.508 rows=1 loops=1)
   Index Cond: (object_id = ANY ('{50000,100000,100001,100002,100003,100004,100005,100006,100007,100008,100009,100010,100011,100012,100013,100014,100015,100016,100017,100018,100019,100020,100021,100022,100023,100024,100025,100026,100027,100028,100029,100030,100031,100032,100033,100034,100035,100036,100037,100038,100039,100040,100041,100042,100043,100044,100045,100046,100047,100048,100049,100050,100051,100052,100053,100054,100055,100056,100057,100058,100059,100060,100061,100062,100063,100064,100065,100066,100067,100068,100069,100070,100071,100072,100073,100074,100075,100076,100077,100078,100079,100080,100081,100082,100083,100084,100085,100086,100087,100088,100089,100090,100091,100092,100093,100094,100095,100096,100097,100098,100099}'::text[]))
   Heap Fetches: 0
 Planning Time: 0.166 ms
 Execution Time: 0.529 ms
(5 rows)

Scenario 8

Description: 100k rows, 1 row in the filter matching in the middle of the table, 1000 object_ids provided in the predicate filter

  1. go run main.go -num_ids=100000 | pbcopy
  2. go run main.go -num_ids=100000 -list_size=1000 -query | pbcopy
 Seq Scan on test  (cost=2.50..1945.50 rows=1001 width=5) (actual time=3.322..9.139 rows=1 loops=1)
   Filter: (object_id = ANY ('{50000,100000,100001,100002,100003,100004,100005,100006,100007,100008,100009,100010,100011,100012,100013,100014,100015,100016,100017,100018,100019,100020,100021,100022,100023,100024,100025,100026,100027,100028,100029,100030,100031,100032,100033,100034,100035,100036,100037,100038,100039,100040,100041,100042,100043,100044,100045,100046,100047,100048,100049,100050,100051,100052,100053,100054,100055,100056,100057,100058,100059,100060,100061,100062,100063,100064,100065,100066,100067,100068,100069,100070,100071,100072,100073,100074,100075,100076,100077,100078,100079,100080,100081,100082,100083,100084,100085,100086,100087,100088,100089,100090,100091,100092,100093,100094,100095,100096,100097,100098,100099,100100,100101,100102,100103,100104,100105,100106,100107,100108,100109,100110,100111,100112,100113,100114,100115,100116,100117,100118,100119,100120,100121,100122,100123,100124,100125,100126,100127,100128,100129,100130,100131,100132,100133,100134,100135,100136,100137,100138,100139,100140,100141,100142,100143,100144,100145,100146,100147,100148,100149,100150,100151,100152,100153,100154,100155,100156,100157,100158,100159,100160,100161,100162,100163,100164,100165,100166,100167,100168,100169,100170,100171,100172,100173,100174,100175,100176,100177,100178,100179,100180,100181,100182,100183,100184,100185,100186,100187,100188,100189,100190,100191,100192,100193,100194,100195,100196,100197,100198,100199,100200,100201,100202,100203,100204,100205,100206,100207,100208,100209,100210,100211,100212,100213,100214,100215,100216,100217,100218,100219,100220,100221,100222,100223,100224,100225,100226,100227,100228,100229,100230,100231,100232,100233,100234,100235,100236,100237,100238,100239,100240,100241,100242,100243,100244,100245,100246,100247,100248,100249,100250,100251,100252,100253,100254,100255,100256,100257,100258,100259,100260,100261,100262,100263,100264,100265,100266,100267,100268,100269,100270,100271,100272,100273,100274,100275,100276,100277,100278,100279,100280,100281,100282,100283,100284,100285,100286,100287,100288,100289,100290,100291,100292,100293,100294,100295,100296,100297,100298,100299,100300,100301,100302,100303,100304,100305,100306,100307,100308,100309,100310,100311,100312,100313,100314,100315,100316,100317,100318,100319,100320,100321,100322,100323,100324,100325,100326,100327,100328,100329,100330,100331,100332,100333,100334,100335,100336,100337,100338,100339,100340,100341,100342,100343,100344,100345,100346,100347,100348,100349,100350,100351,100352,100353,100354,100355,100356,100357,100358,100359,100360,100361,100362,100363,100364,100365,100366,100367,100368,100369,100370,100371,100372,100373,100374,100375,100376,100377,100378,100379,100380,100381,100382,100383,100384,100385,100386,100387,100388,100389,100390,100391,100392,100393,100394,100395,100396,100397,100398,100399,100400,100401,100402,100403,100404,100405,100406,100407,100408,100409,100410,100411,100412,100413,100414,100415,100416,100417,100418,100419,100420,100421,100422,100423,100424,100425,100426,100427,100428,100429,100430,100431,100432,100433,100434,100435,100436,100437,100438,100439,100440,100441,100442,100443,100444,100445,100446,100447,100448,100449,100450,100451,100452,100453,100454,100455,100456,100457,100458,100459,100460,100461,100462,100463,100464,100465,100466,100467,100468,100469,100470,100471,100472,100473,100474,100475,100476,100477,100478,100479,100480,100481,100482,100483,100484,100485,100486,100487,100488,100489,100490,100491,100492,100493,100494,100495,100496,100497,100498,100499,100500,100501,100502,100503,100504,100505,100506,100507,100508,100509,100510,100511,100512,100513,100514,100515,100516,100517,100518,100519,100520,100521,100522,100523,100524,100525,100526,100527,100528,100529,100530,100531,100532,100533,100534,100535,100536,100537,100538,100539,100540,100541,100542,100543,100544,100545,100546,100547,100548,100549,100550,100551,100552,100553,10055
4,100995,100996,100997,100998,100999}'::text[]))
   Rows Removed by Filter: 99999
 Planning Time: 0.538 ms
 Execution Time: 9.365 ms

If roughly 5-10% of all rows in the table have to be evaluated, then a sequential scan is much faster than an index scan because of the sequential access pattern to disk. This isn't always true, but in general it is if you assume a higher cache ratio for sequential access. See https://www.postgresql.org/docs/current/runtime-config-query.html for more info on 'random_page_cost'. The query planner here determines that a sequential scan of the table is actually lower cost than the alternative index scan as we saw previously.

Summary: Scenario 6-8 demonstrate that, for a given sparse query (that is one with a single matching predicate in the middle of the b-tree and all others non-matching predicates), the query planner and cost of the query degrades with respect to the range query and the b-tree on the primary table. Care should be taken to ensure we're serving index scans instead of sequence scans in these read query patterns, which practically means that the length of these predicate lists in these range queries should be relatively low ratio of the number of tuples in that FGA store. So long as that is the case we'll see solid logarithmic query performance out of the query engine.

package main
import (
"fmt"
"flag"
)
var numIDsFlag = flag.Int("num_ids", 1, "number of object ids to produce")
var objectListSizeFlag = flag.Int("list_size", 0, "number of object ids to include in the query filter predicate")
var queryFlag = flag.Bool("query", false, "return the query or otherwise produce the insert statement")
func main() {
flag.Parse()
numIDs := *numIDsFlag
if !*queryFlag {
sql := `INSERT INTO test VALUES `
for i := 0; i < numIDs; i++ {
sql += fmt.Sprintf("('%d')", i)
if i == numIDs - 1 {
sql += ";"
} else {
sql += ", "
}
}
fmt.Println(sql)
}
objectListSize := *objectListSizeFlag
sql := `EXPLAIN ANALYZE SELECT object_id FROM test WHERE object_id IN (`
sql += fmt.Sprintf("'%d'", numIDs / 2)
if objectListSize > 0 {
sql += ", "
}
for i := numIDs; i < numIDs + objectListSize; i++ {
sql += fmt.Sprintf("'%d'", i)
if i < numIDs + objectListSize - 1 {
sql += ", "
}
}
sql += ");"
if *queryFlag {
fmt.Println(sql)
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment