GiST Index

GiST stands for Generalized Search Tree and is a balanced tree structure.

A GiST index is lossy, which means that it can produce false matches. The query must take the results from the index scan and then check the actual table to filter out any bad results.

Each document is represented in the index by a fixed-length signature created by hashing each word into a single bit in an n-bit string. Each of these bits are strung together to produce an n-bit document signature. When two words hash to the same bit position, there will be a false match.

The more unique words in the dataset, the more often collisions will occur; therefore, it is advisable to use a dictionary or normalize words to reduce the number of variations. A good rule of thumb is to try and keep GiST indexes under 100,000 unique words (lexemes).1

Because GiST indexes are storing a hash rather than actual values, the size of the index is much smaller than a comparable Gin index. Therefore, writes are going to be faster, but because of its lossy nature, reads will be slower.

Comparison

Creating a GiST index

CREATE INDEX name ON table USING GIST(column);


  1. GiST and GIN Index Types