1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>65.4. GIN Indexes</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="spgist.html" title="65.3. SP-GiST Indexes" /><link rel="next" href="brin.html" title="65.5. BRIN Indexes" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">65.4. GIN Indexes</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="spgist.html" title="65.3. SP-GiST Indexes">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="indextypes.html" title="Chapter 65. Built-in Index Access Methods">Up</a></td><th width="60%" align="center">Chapter 65. Built-in Index Access Methods</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="brin.html" title="65.5. BRIN Indexes">Next</a></td></tr></table><hr /></div><div class="sect1" id="GIN"><div class="titlepage"><div><div><h2 class="title" style="clear: both">65.4. GIN Indexes <a href="#GIN" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="gin.html#GIN-INTRO">65.4.1. Introduction</a></span></dt><dt><span class="sect2"><a href="gin.html#GIN-BUILTIN-OPCLASSES">65.4.2. Built-in Operator Classes</a></span></dt><dt><span class="sect2"><a href="gin.html#GIN-EXTENSIBILITY">65.4.3. Extensibility</a></span></dt><dt><span class="sect2"><a href="gin.html#GIN-IMPLEMENTATION">65.4.4. Implementation</a></span></dt><dt><span class="sect2"><a href="gin.html#GIN-TIPS">65.4.5. GIN Tips and Tricks</a></span></dt><dt><span class="sect2"><a href="gin.html#GIN-LIMIT">65.4.6. Limitations</a></span></dt><dt><span class="sect2"><a href="gin.html#GIN-EXAMPLES">65.4.7. Examples</a></span></dt></dl></div><a id="id-1.10.17.5.2" class="indexterm"></a><div class="sect2" id="GIN-INTRO"><div class="titlepage"><div><div><h3 class="title">65.4.1. Introduction <a href="#GIN-INTRO" class="id_link">#</a></h3></div></div></div><p>
3 <acronym class="acronym">GIN</acronym> stands for Generalized Inverted Index.
4 <acronym class="acronym">GIN</acronym> is designed for handling cases where the items
5 to be indexed are composite values, and the queries to be handled by
6 the index need to search for element values that appear within
7 the composite items. For example, the items could be documents,
8 and the queries could be searches for documents containing specific words.
10 We use the word <em class="firstterm">item</em> to refer to a composite value that
11 is to be indexed, and the word <em class="firstterm">key</em> to refer to an element
12 value. <acronym class="acronym">GIN</acronym> always stores and searches for keys,
13 not item values per se.
15 A <acronym class="acronym">GIN</acronym> index stores a set of (key, posting list) pairs,
16 where a <em class="firstterm">posting list</em> is a set of row IDs in which the key
17 occurs. The same row ID can appear in multiple posting lists, since
18 an item can contain more than one key. Each key value is stored only
19 once, so a <acronym class="acronym">GIN</acronym> index is very compact for cases
20 where the same key appears many times.
22 <acronym class="acronym">GIN</acronym> is generalized in the sense that the
23 <acronym class="acronym">GIN</acronym> access method code does not need to know the
24 specific operations that it accelerates.
25 Instead, it uses custom strategies defined for particular data types.
26 The strategy defines how keys are extracted from indexed items and
27 query conditions, and how to determine whether a row that contains
28 some of the key values in a query actually satisfies the query.
30 One advantage of <acronym class="acronym">GIN</acronym> is that it allows the development
31 of custom data types with the appropriate access methods, by
32 an expert in the domain of the data type, rather than a database expert.
33 This is much the same advantage as using <acronym class="acronym">GiST</acronym>.
35 The <acronym class="acronym">GIN</acronym>
36 implementation in <span class="productname">PostgreSQL</span> is primarily
37 maintained by Teodor Sigaev and Oleg Bartunov. There is more
38 information about <acronym class="acronym">GIN</acronym> on their
39 <a class="ulink" href="http://www.sai.msu.su/~megera/wiki/Gin" target="_top">website</a>.
40 </p></div><div class="sect2" id="GIN-BUILTIN-OPCLASSES"><div class="titlepage"><div><div><h3 class="title">65.4.2. Built-in Operator Classes <a href="#GIN-BUILTIN-OPCLASSES" class="id_link">#</a></h3></div></div></div><p>
41 The core <span class="productname">PostgreSQL</span> distribution
42 includes the <acronym class="acronym">GIN</acronym> operator classes shown in
43 <a class="xref" href="gin.html#GIN-BUILTIN-OPCLASSES-TABLE" title="Table 65.3. Built-in GIN Operator Classes">Table 65.3</a>.
44 (Some of the optional modules described in <a class="xref" href="contrib.html" title="Appendix F. Additional Supplied Modules and Extensions">Appendix F</a>
45 provide additional <acronym class="acronym">GIN</acronym> operator classes.)
46 </p><div class="table" id="GIN-BUILTIN-OPCLASSES-TABLE"><p class="title"><strong>Table 65.3. Built-in <acronym class="acronym">GIN</acronym> Operator Classes</strong></p><div class="table-contents"><table class="table" summary="Built-in GIN Operator Classes" border="1"><colgroup><col /><col /></colgroup><thead><tr><th>Name</th><th>Indexable Operators</th></tr></thead><tbody><tr><td rowspan="4" valign="middle"><code class="literal">array_ops</code></td><td><code class="literal">&& (anyarray,anyarray)</code></td></tr><tr><td><code class="literal">@> (anyarray,anyarray)</code></td></tr><tr><td><code class="literal"><@ (anyarray,anyarray)</code></td></tr><tr><td><code class="literal">= (anyarray,anyarray)</code></td></tr><tr><td rowspan="6" valign="middle"><code class="literal">jsonb_ops</code></td><td><code class="literal">@> (jsonb,jsonb)</code></td></tr><tr><td><code class="literal">@? (jsonb,jsonpath)</code></td></tr><tr><td><code class="literal">@@ (jsonb,jsonpath)</code></td></tr><tr><td><code class="literal">? (jsonb,text)</code></td></tr><tr><td><code class="literal">?| (jsonb,text[])</code></td></tr><tr><td><code class="literal">?& (jsonb,text[])</code></td></tr><tr><td rowspan="3" valign="middle"><code class="literal">jsonb_path_ops</code></td><td><code class="literal">@> (jsonb,jsonb)</code></td></tr><tr><td><code class="literal">@? (jsonb,jsonpath)</code></td></tr><tr><td><code class="literal">@@ (jsonb,jsonpath)</code></td></tr><tr><td valign="middle"><code class="literal">tsvector_ops</code></td><td><code class="literal">@@ (tsvector,tsquery)</code></td></tr></tbody></table></div></div><br class="table-break" /><p>
47 Of the two operator classes for type <code class="type">jsonb</code>, <code class="literal">jsonb_ops</code>
48 is the default. <code class="literal">jsonb_path_ops</code> supports fewer operators but
49 offers better performance for those operators.
50 See <a class="xref" href="datatype-json.html#JSON-INDEXING" title="8.14.4. jsonb Indexing">Section 8.14.4</a> for details.
51 </p></div><div class="sect2" id="GIN-EXTENSIBILITY"><div class="titlepage"><div><div><h3 class="title">65.4.3. Extensibility <a href="#GIN-EXTENSIBILITY" class="id_link">#</a></h3></div></div></div><p>
52 The <acronym class="acronym">GIN</acronym> interface has a high level of abstraction,
53 requiring the access method implementer only to implement the semantics of
54 the data type being accessed. The <acronym class="acronym">GIN</acronym> layer itself
55 takes care of concurrency, logging and searching the tree structure.
57 All it takes to get a <acronym class="acronym">GIN</acronym> access method working is to
58 implement a few user-defined methods, which define the behavior of
59 keys in the tree and the relationships between keys, indexed items,
60 and indexable queries. In short, <acronym class="acronym">GIN</acronym> combines
61 extensibility with generality, code reuse, and a clean interface.
63 There are two methods that an operator class for
64 <acronym class="acronym">GIN</acronym> must provide:
66 </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="function">Datum *extractValue(Datum itemValue, int32 *nkeys,
67 bool **nullFlags)</code></span></dt><dd><p>
68 Returns a palloc'd array of keys given an item to be indexed. The
69 number of returned keys must be stored into <code class="literal">*nkeys</code>.
70 If any of the keys can be null, also palloc an array of
71 <code class="literal">*nkeys</code> <code class="type">bool</code> fields, store its address at
72 <code class="literal">*nullFlags</code>, and set these null flags as needed.
73 <code class="literal">*nullFlags</code> can be left <code class="symbol">NULL</code> (its initial value)
74 if all keys are non-null.
75 The return value can be <code class="symbol">NULL</code> if the item contains no keys.
76 </p></dd><dt><span class="term"><code class="function">Datum *extractQuery(Datum query, int32 *nkeys,
77 StrategyNumber n, bool **pmatch, Pointer **extra_data,
78 bool **nullFlags, int32 *searchMode)</code></span></dt><dd><p>
79 Returns a palloc'd array of keys given a value to be queried; that is,
80 <code class="literal">query</code> is the value on the right-hand side of an
81 indexable operator whose left-hand side is the indexed column.
82 <code class="literal">n</code> is the strategy number of the operator within the
83 operator class (see <a class="xref" href="xindex.html#XINDEX-STRATEGIES" title="36.16.2. Index Method Strategies">Section 36.16.2</a>).
84 Often, <code class="function">extractQuery</code> will need
85 to consult <code class="literal">n</code> to determine the data type of
86 <code class="literal">query</code> and the method it should use to extract key values.
87 The number of returned keys must be stored into <code class="literal">*nkeys</code>.
88 If any of the keys can be null, also palloc an array of
89 <code class="literal">*nkeys</code> <code class="type">bool</code> fields, store its address at
90 <code class="literal">*nullFlags</code>, and set these null flags as needed.
91 <code class="literal">*nullFlags</code> can be left <code class="symbol">NULL</code> (its initial value)
92 if all keys are non-null.
93 The return value can be <code class="symbol">NULL</code> if the <code class="literal">query</code> contains no keys.
95 <code class="literal">searchMode</code> is an output argument that allows
96 <code class="function">extractQuery</code> to specify details about how the search
98 If <code class="literal">*searchMode</code> is set to
99 <code class="literal">GIN_SEARCH_MODE_DEFAULT</code> (which is the value it is
100 initialized to before call), only items that match at least one of
101 the returned keys are considered candidate matches.
102 If <code class="literal">*searchMode</code> is set to
103 <code class="literal">GIN_SEARCH_MODE_INCLUDE_EMPTY</code>, then in addition to items
104 containing at least one matching key, items that contain no keys at
105 all are considered candidate matches. (This mode is useful for
106 implementing is-subset-of operators, for example.)
107 If <code class="literal">*searchMode</code> is set to <code class="literal">GIN_SEARCH_MODE_ALL</code>,
108 then all non-null items in the index are considered candidate
109 matches, whether they match any of the returned keys or not. (This
110 mode is much slower than the other two choices, since it requires
111 scanning essentially the entire index, but it may be necessary to
112 implement corner cases correctly. An operator that needs this mode
113 in most cases is probably not a good candidate for a GIN operator
115 The symbols to use for setting this mode are defined in
116 <code class="filename">access/gin.h</code>.
118 <code class="literal">pmatch</code> is an output argument for use when partial match
119 is supported. To use it, <code class="function">extractQuery</code> must allocate
120 an array of <code class="literal">*nkeys</code> <code class="type">bool</code>s and store its address at
121 <code class="literal">*pmatch</code>. Each element of the array should be set to true
122 if the corresponding key requires partial match, false if not.
123 If <code class="literal">*pmatch</code> is set to <code class="symbol">NULL</code> then GIN assumes partial match
124 is not required. The variable is initialized to <code class="symbol">NULL</code> before call,
125 so this argument can simply be ignored by operator classes that do
126 not support partial match.
128 <code class="literal">extra_data</code> is an output argument that allows
129 <code class="function">extractQuery</code> to pass additional data to the
130 <code class="function">consistent</code> and <code class="function">comparePartial</code> methods.
131 To use it, <code class="function">extractQuery</code> must allocate
132 an array of <code class="literal">*nkeys</code> pointers and store its address at
133 <code class="literal">*extra_data</code>, then store whatever it wants to into the
134 individual pointers. The variable is initialized to <code class="symbol">NULL</code> before
135 call, so this argument can simply be ignored by operator classes that
136 do not require extra data. If <code class="literal">*extra_data</code> is set, the
137 whole array is passed to the <code class="function">consistent</code> method, and
138 the appropriate element to the <code class="function">comparePartial</code> method.
139 </p></dd></dl></div><p>
141 An operator class must also provide a function to check if an indexed item
142 matches the query. It comes in two flavors, a Boolean <code class="function">consistent</code>
143 function, and a ternary <code class="function">triConsistent</code> function.
144 <code class="function">triConsistent</code> covers the functionality of both, so providing
145 <code class="function">triConsistent</code> alone is sufficient. However, if the Boolean
146 variant is significantly cheaper to calculate, it can be advantageous to
147 provide both. If only the Boolean variant is provided, some optimizations
148 that depend on refuting index items before fetching all the keys are
151 </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="function">bool consistent(bool check[], StrategyNumber n, Datum query,
152 int32 nkeys, Pointer extra_data[], bool *recheck,
153 Datum queryKeys[], bool nullFlags[])</code></span></dt><dd><p>
154 Returns true if an indexed item satisfies the query operator with
155 strategy number <code class="literal">n</code> (or might satisfy it, if the recheck
156 indication is returned). This function does not have direct access
157 to the indexed item's value, since <acronym class="acronym">GIN</acronym> does not
158 store items explicitly. Rather, what is available is knowledge
159 about which key values extracted from the query appear in a given
160 indexed item. The <code class="literal">check</code> array has length
161 <code class="literal">nkeys</code>, which is the same as the number of keys previously
162 returned by <code class="function">extractQuery</code> for this <code class="literal">query</code> datum.
164 <code class="literal">check</code> array is true if the indexed item contains the
165 corresponding query key, i.e., if (check[i] == true) the i-th key of the
166 <code class="function">extractQuery</code> result array is present in the indexed item.
167 The original <code class="literal">query</code> datum is
168 passed in case the <code class="function">consistent</code> method needs to consult it,
169 and so are the <code class="literal">queryKeys[]</code> and <code class="literal">nullFlags[]</code>
170 arrays previously returned by <code class="function">extractQuery</code>.
171 <code class="literal">extra_data</code> is the extra-data array returned by
172 <code class="function">extractQuery</code>, or <code class="symbol">NULL</code> if none.
174 When <code class="function">extractQuery</code> returns a null key in
175 <code class="literal">queryKeys[]</code>, the corresponding <code class="literal">check[]</code> element
176 is true if the indexed item contains a null key; that is, the
177 semantics of <code class="literal">check[]</code> are like <code class="literal">IS NOT DISTINCT
178 FROM</code>. The <code class="function">consistent</code> function can examine the
179 corresponding <code class="literal">nullFlags[]</code> element if it needs to tell
180 the difference between a regular value match and a null match.
182 On success, <code class="literal">*recheck</code> should be set to true if the heap
183 tuple needs to be rechecked against the query operator, or false if
184 the index test is exact. That is, a false return value guarantees
185 that the heap tuple does not match the query; a true return value with
186 <code class="literal">*recheck</code> set to false guarantees that the heap tuple does
187 match the query; and a true return value with
188 <code class="literal">*recheck</code> set to true means that the heap tuple might match
189 the query, so it needs to be fetched and rechecked by evaluating the
190 query operator directly against the originally indexed item.
191 </p></dd><dt><span class="term"><code class="function">GinTernaryValue triConsistent(GinTernaryValue check[], StrategyNumber n, Datum query,
192 int32 nkeys, Pointer extra_data[],
193 Datum queryKeys[], bool nullFlags[])</code></span></dt><dd><p>
194 <code class="function">triConsistent</code> is similar to <code class="function">consistent</code>,
195 but instead of Booleans in the <code class="literal">check</code> vector, there are
196 three possible values for each
197 key: <code class="literal">GIN_TRUE</code>, <code class="literal">GIN_FALSE</code> and
198 <code class="literal">GIN_MAYBE</code>. <code class="literal">GIN_FALSE</code> and <code class="literal">GIN_TRUE</code>
199 have the same meaning as regular Boolean values, while
200 <code class="literal">GIN_MAYBE</code> means that the presence of that key is not known.
201 When <code class="literal">GIN_MAYBE</code> values are present, the function should only
202 return <code class="literal">GIN_TRUE</code> if the item certainly matches whether or
203 not the index item contains the corresponding query keys. Likewise, the
204 function must return <code class="literal">GIN_FALSE</code> only if the item certainly
205 does not match, whether or not it contains the <code class="literal">GIN_MAYBE</code>
206 keys. If the result depends on the <code class="literal">GIN_MAYBE</code> entries, i.e.,
207 the match cannot be confirmed or refuted based on the known query keys,
208 the function must return <code class="literal">GIN_MAYBE</code>.
210 When there are no <code class="literal">GIN_MAYBE</code> values in the <code class="literal">check</code>
211 vector, a <code class="literal">GIN_MAYBE</code> return value is the equivalent of
212 setting the <code class="literal">recheck</code> flag in the
213 Boolean <code class="function">consistent</code> function.
214 </p></dd></dl></div><p>
216 In addition, GIN must have a way to sort the key values stored in the index.
217 The operator class can define the sort ordering by specifying a comparison
220 </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="function">int compare(Datum a, Datum b)</code></span></dt><dd><p>
221 Compares two keys (not indexed items!) and returns an integer less than
222 zero, zero, or greater than zero, indicating whether the first key is
223 less than, equal to, or greater than the second. Null keys are never
224 passed to this function.
225 </p></dd></dl></div><p>
227 Alternatively, if the operator class does not provide a <code class="function">compare</code>
228 method, GIN will look up the default btree operator class for the index
229 key data type, and use its comparison function. It is recommended to
230 specify the comparison function in a GIN operator class that is meant for
231 just one data type, as looking up the btree operator class costs a few
232 cycles. However, polymorphic GIN operator classes (such
233 as <code class="literal">array_ops</code>) typically cannot specify a single comparison
236 An operator class for <acronym class="acronym">GIN</acronym> can optionally supply the
239 </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="function">int comparePartial(Datum partial_key, Datum key, StrategyNumber n,
240 Pointer extra_data)</code></span></dt><dd><p>
241 Compare a partial-match query key to an index key. Returns an integer
242 whose sign indicates the result: less than zero means the index key
243 does not match the query, but the index scan should continue; zero
244 means that the index key does match the query; greater than zero
245 indicates that the index scan should stop because no more matches
246 are possible. The strategy number <code class="literal">n</code> of the operator
247 that generated the partial match query is provided, in case its
248 semantics are needed to determine when to end the scan. Also,
249 <code class="literal">extra_data</code> is the corresponding element of the extra-data
250 array made by <code class="function">extractQuery</code>, or <code class="symbol">NULL</code> if none.
251 Null keys are never passed to this function.
252 </p></dd><dt><span class="term"><code class="function">void options(local_relopts *relopts)</code></span></dt><dd><p>
253 Defines a set of user-visible parameters that control operator class
256 The <code class="function">options</code> function is passed a pointer to a
257 <code class="structname">local_relopts</code> struct, which needs to be
258 filled with a set of operator class specific options. The options
259 can be accessed from other support functions using the
260 <code class="literal">PG_HAS_OPCLASS_OPTIONS()</code> and
261 <code class="literal">PG_GET_OPCLASS_OPTIONS()</code> macros.
263 Since both key extraction of indexed values and representation of the
264 key in <acronym class="acronym">GIN</acronym> are flexible, they may depend on
265 user-specified parameters.
266 </p></dd></dl></div><p>
268 To support <span class="quote">“<span class="quote">partial match</span>”</span> queries, an operator class must
269 provide the <code class="function">comparePartial</code> method, and its
270 <code class="function">extractQuery</code> method must set the <code class="literal">pmatch</code>
271 parameter when a partial-match query is encountered. See
272 <a class="xref" href="gin.html#GIN-PARTIAL-MATCH" title="65.4.4.2. Partial Match Algorithm">Section 65.4.4.2</a> for details.
274 The actual data types of the various <code class="literal">Datum</code> values mentioned
275 above vary depending on the operator class. The item values passed to
276 <code class="function">extractValue</code> are always of the operator class's input type, and
277 all key values must be of the class's <code class="literal">STORAGE</code> type. The type of
278 the <code class="literal">query</code> argument passed to <code class="function">extractQuery</code>,
279 <code class="function">consistent</code> and <code class="function">triConsistent</code> is whatever is the
280 right-hand input type of the class member operator identified by the
281 strategy number. This need not be the same as the indexed type, so long as
282 key values of the correct type can be extracted from it. However, it is
283 recommended that the SQL declarations of these three support functions use
284 the opclass's indexed data type for the <code class="literal">query</code> argument, even
285 though the actual type might be something else depending on the operator.
286 </p></div><div class="sect2" id="GIN-IMPLEMENTATION"><div class="titlepage"><div><div><h3 class="title">65.4.4. Implementation <a href="#GIN-IMPLEMENTATION" class="id_link">#</a></h3></div></div></div><p>
287 Internally, a <acronym class="acronym">GIN</acronym> index contains a B-tree index
288 constructed over keys, where each key is an element of one or more indexed
289 items (a member of an array, for example) and where each tuple in a leaf
290 page contains either a pointer to a B-tree of heap pointers (a
291 <span class="quote">“<span class="quote">posting tree</span>”</span>), or a simple list of heap pointers (a <span class="quote">“<span class="quote">posting
292 list</span>”</span>) when the list is small enough to fit into a single index tuple along
293 with the key value. <a class="xref" href="gin.html#GIN-INTERNALS-FIGURE" title="Figure 65.1. GIN Internals">Figure 65.1</a> illustrates
294 these components of a GIN index.
296 As of <span class="productname">PostgreSQL</span> 9.1, null key values can be
297 included in the index. Also, placeholder nulls are included in the index
298 for indexed items that are null or contain no keys according to
299 <code class="function">extractValue</code>. This allows searches that should find empty
302 Multicolumn <acronym class="acronym">GIN</acronym> indexes are implemented by building
303 a single B-tree over composite values (column number, key value). The
304 key values for different columns can be of different types.
305 </p><div class="figure" id="GIN-INTERNALS-FIGURE"><p class="title"><strong>Figure 65.1. GIN Internals</strong></p><div class="figure-contents"><div class="mediaobject"><object type="image/svg+xml" data="gin.svg" width="100%"></object></div></div></div><br class="figure-break" /><div class="sect3" id="GIN-FAST-UPDATE"><div class="titlepage"><div><div><h4 class="title">65.4.4.1. GIN Fast Update Technique <a href="#GIN-FAST-UPDATE" class="id_link">#</a></h4></div></div></div><p>
306 Updating a <acronym class="acronym">GIN</acronym> index tends to be slow because of the
307 intrinsic nature of inverted indexes: inserting or updating one heap row
308 can cause many inserts into the index (one for each key extracted
309 from the indexed item).
310 <acronym class="acronym">GIN</acronym> is capable of postponing much of this work by inserting
311 new tuples into a temporary, unsorted list of pending entries.
312 When the table is vacuumed or autoanalyzed, or when
313 <code class="function">gin_clean_pending_list</code> function is called, or if the
314 pending list becomes larger than
315 <a class="xref" href="runtime-config-client.html#GUC-GIN-PENDING-LIST-LIMIT">gin_pending_list_limit</a>, the entries are moved to the
316 main <acronym class="acronym">GIN</acronym> data structure using the same bulk insert
317 techniques used during initial index creation. This greatly improves
318 <acronym class="acronym">GIN</acronym> index update speed, even counting the additional
319 vacuum overhead. Moreover the overhead work can be done by a background
320 process instead of in foreground query processing.
322 The main disadvantage of this approach is that searches must scan the list
323 of pending entries in addition to searching the regular index, and so
324 a large list of pending entries will slow searches significantly.
325 Another disadvantage is that, while most updates are fast, an update
326 that causes the pending list to become <span class="quote">“<span class="quote">too large</span>”</span> will incur an
327 immediate cleanup cycle and thus be much slower than other updates.
328 Proper use of autovacuum can minimize both of these problems.
330 If consistent response time is more important than update speed,
331 use of pending entries can be disabled by turning off the
332 <code class="literal">fastupdate</code> storage parameter for a
333 <acronym class="acronym">GIN</acronym> index. See <a class="xref" href="sql-createindex.html" title="CREATE INDEX"><span class="refentrytitle">CREATE INDEX</span></a>
335 </p></div><div class="sect3" id="GIN-PARTIAL-MATCH"><div class="titlepage"><div><div><h4 class="title">65.4.4.2. Partial Match Algorithm <a href="#GIN-PARTIAL-MATCH" class="id_link">#</a></h4></div></div></div><p>
336 GIN can support <span class="quote">“<span class="quote">partial match</span>”</span> queries, in which the query
337 does not determine an exact match for one or more keys, but the possible
338 matches fall within a reasonably narrow range of key values (within the
339 key sorting order determined by the <code class="function">compare</code> support method).
340 The <code class="function">extractQuery</code> method, instead of returning a key value
341 to be matched exactly, returns a key value that is the lower bound of
342 the range to be searched, and sets the <code class="literal">pmatch</code> flag true.
343 The key range is then scanned using the <code class="function">comparePartial</code>
344 method. <code class="function">comparePartial</code> must return zero for a matching
345 index key, less than zero for a non-match that is still within the range
346 to be searched, or greater than zero if the index key is past the range
348 </p></div></div><div class="sect2" id="GIN-TIPS"><div class="titlepage"><div><div><h3 class="title">65.4.5. GIN Tips and Tricks <a href="#GIN-TIPS" class="id_link">#</a></h3></div></div></div><div class="variablelist"><dl class="variablelist"><dt><span class="term">Create vs. insert</span></dt><dd><p>
349 Insertion into a <acronym class="acronym">GIN</acronym> index can be slow
350 due to the likelihood of many keys being inserted for each item.
351 So, for bulk insertions into a table it is advisable to drop the GIN
352 index and recreate it after finishing bulk insertion.
354 When <code class="literal">fastupdate</code> is enabled for <acronym class="acronym">GIN</acronym>
355 (see <a class="xref" href="gin.html#GIN-FAST-UPDATE" title="65.4.4.1. GIN Fast Update Technique">Section 65.4.4.1</a> for details), the penalty is
356 less than when it is not. But for very large updates it may still be
357 best to drop and recreate the index.
358 </p></dd><dt><span class="term"><a class="xref" href="runtime-config-resource.html#GUC-MAINTENANCE-WORK-MEM">maintenance_work_mem</a></span></dt><dd><p>
359 Build time for a <acronym class="acronym">GIN</acronym> index is very sensitive to
360 the <code class="varname">maintenance_work_mem</code> setting; it doesn't pay to
361 skimp on work memory during index creation.
362 </p></dd><dt><span class="term"><a class="xref" href="runtime-config-client.html#GUC-GIN-PENDING-LIST-LIMIT">gin_pending_list_limit</a></span></dt><dd><p>
363 During a series of insertions into an existing <acronym class="acronym">GIN</acronym>
364 index that has <code class="literal">fastupdate</code> enabled, the system will clean up
365 the pending-entry list whenever the list grows larger than
366 <code class="varname">gin_pending_list_limit</code>. To avoid fluctuations in observed
367 response time, it's desirable to have pending-list cleanup occur in the
368 background (i.e., via autovacuum). Foreground cleanup operations
369 can be avoided by increasing <code class="varname">gin_pending_list_limit</code>
370 or making autovacuum more aggressive.
371 However, enlarging the threshold of the cleanup operation means that
372 if a foreground cleanup does occur, it will take even longer.
374 <code class="varname">gin_pending_list_limit</code> can be overridden for individual
375 GIN indexes by changing storage parameters, which allows each
376 GIN index to have its own cleanup threshold.
377 For example, it's possible to increase the threshold only for the GIN
378 index which can be updated heavily, and decrease it otherwise.
379 </p></dd><dt><span class="term"><a class="xref" href="runtime-config-client.html#GUC-GIN-FUZZY-SEARCH-LIMIT">gin_fuzzy_search_limit</a></span></dt><dd><p>
380 The primary goal of developing <acronym class="acronym">GIN</acronym> indexes was
381 to create support for highly scalable full-text search in
382 <span class="productname">PostgreSQL</span>, and there are often situations when
383 a full-text search returns a very large set of results. Moreover, this
384 often happens when the query contains very frequent words, so that the
385 large result set is not even useful. Since reading many
386 tuples from the disk and sorting them could take a lot of time, this is
387 unacceptable for production. (Note that the index search itself is very
390 To facilitate controlled execution of such queries,
391 <acronym class="acronym">GIN</acronym> has a configurable soft upper limit on the
392 number of rows returned: the
393 <code class="varname">gin_fuzzy_search_limit</code> configuration parameter.
394 It is set to 0 (meaning no limit) by default.
395 If a non-zero limit is set, then the returned set is a subset of
396 the whole result set, chosen at random.
398 <span class="quote">“<span class="quote">Soft</span>”</span> means that the actual number of returned results
399 could differ somewhat from the specified limit, depending on the query
400 and the quality of the system's random number generator.
402 From experience, values in the thousands (e.g., 5000 — 20000)
404 </p></dd></dl></div></div><div class="sect2" id="GIN-LIMIT"><div class="titlepage"><div><div><h3 class="title">65.4.6. Limitations <a href="#GIN-LIMIT" class="id_link">#</a></h3></div></div></div><p>
405 <acronym class="acronym">GIN</acronym> assumes that indexable operators are strict. This
406 means that <code class="function">extractValue</code> will not be called at all on a null
407 item value (instead, a placeholder index entry is created automatically),
408 and <code class="function">extractQuery</code> will not be called on a null query
409 value either (instead, the query is presumed to be unsatisfiable). Note
410 however that null key values contained within a non-null composite item
411 or query value are supported.
412 </p></div><div class="sect2" id="GIN-EXAMPLES"><div class="titlepage"><div><div><h3 class="title">65.4.7. Examples <a href="#GIN-EXAMPLES" class="id_link">#</a></h3></div></div></div><p>
413 The core <span class="productname">PostgreSQL</span> distribution
414 includes the <acronym class="acronym">GIN</acronym> operator classes previously shown in
415 <a class="xref" href="gin.html#GIN-BUILTIN-OPCLASSES-TABLE" title="Table 65.3. Built-in GIN Operator Classes">Table 65.3</a>.
416 The following <code class="filename">contrib</code> modules also contain
417 <acronym class="acronym">GIN</acronym> operator classes:
419 </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="filename">btree_gin</code></span></dt><dd><p>B-tree equivalent functionality for several data types</p></dd><dt><span class="term"><code class="filename">hstore</code></span></dt><dd><p>Module for storing (key, value) pairs</p></dd><dt><span class="term"><code class="filename">intarray</code></span></dt><dd><p>Enhanced support for <code class="type">int[]</code></p></dd><dt><span class="term"><code class="filename">pg_trgm</code></span></dt><dd><p>Text similarity using trigram matching</p></dd></dl></div><p>
420 </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="spgist.html" title="65.3. SP-GiST Indexes">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="indextypes.html" title="Chapter 65. Built-in Index Access Methods">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="brin.html" title="65.5. BRIN Indexes">Next</a></td></tr><tr><td width="40%" align="left" valign="top">65.3. SP-GiST Indexes </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 65.5. BRIN Indexes</td></tr></table></div></body></html>