begriffs open source - ai-pg/blob - full-docs/html/gist.html

   1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
   2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>65.2. GiST Indexes</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="btree.html" title="65.1. B-Tree Indexes" /><link rel="next" href="spgist.html" title="65.3. SP-GiST Indexes" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">65.2. GiST Indexes</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="btree.html" title="65.1. B-Tree Indexes">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="indextypes.html" title="Chapter 65. Built-in Index Access Methods">Up</a></td><th width="60%" align="center">Chapter 65. Built-in Index Access Methods</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="spgist.html" title="65.3. SP-GiST Indexes">Next</a></td></tr></table><hr /></div><div class="sect1" id="GIST"><div class="titlepage"><div><div><h2 class="title" style="clear: both">65.2. GiST Indexes <a href="#GIST" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="gist.html#GIST-INTRO">65.2.1. Introduction</a></span></dt><dt><span class="sect2"><a href="gist.html#GIST-BUILTIN-OPCLASSES">65.2.2. Built-in Operator Classes</a></span></dt><dt><span class="sect2"><a href="gist.html#GIST-EXTENSIBILITY">65.2.3. Extensibility</a></span></dt><dt><span class="sect2"><a href="gist.html#GIST-IMPLEMENTATION">65.2.4. Implementation</a></span></dt><dt><span class="sect2"><a href="gist.html#GIST-EXAMPLES">65.2.5. Examples</a></span></dt></dl></div><a id="id-1.10.17.3.2" class="indexterm"></a><div class="sect2" id="GIST-INTRO"><div class="titlepage"><div><div><h3 class="title">65.2.1. Introduction <a href="#GIST-INTRO" class="id_link">#</a></h3></div></div></div><p>
   3    <acronym class="acronym">GiST</acronym> stands for Generalized Search Tree.  It is a
   4    balanced, tree-structured access method, that acts as a base template in
   5    which to implement arbitrary indexing schemes. B-trees, R-trees and many
   6    other indexing schemes can be implemented in <acronym class="acronym">GiST</acronym>.
   7  </p><p>
   8   One advantage of <acronym class="acronym">GiST</acronym> is that it allows the development
   9   of custom data types with the appropriate access methods, by
  10   an expert in the domain of the data type, rather than a database expert.
  11  </p><p>
  12     Some of the information here is derived from the University of California
  13     at Berkeley's GiST Indexing Project
  14     <a class="ulink" href="http://gist.cs.berkeley.edu/" target="_top">web site</a> and
  15     Marcel Kornacker's thesis,
  16     <a class="ulink" href="http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/access-methods-for-next-generation.pdf.gz" target="_top">
  17     Access Methods for Next-Generation Database Systems</a>.
  18     The <acronym class="acronym">GiST</acronym>
  19     implementation in <span class="productname">PostgreSQL</span> is primarily
  20     maintained by Teodor Sigaev and Oleg Bartunov, and there is more
  21     information on their
  22     <a class="ulink" href="http://www.sai.msu.su/~megera/postgres/gist/" target="_top">web site</a>.
  23   </p></div><div class="sect2" id="GIST-BUILTIN-OPCLASSES"><div class="titlepage"><div><div><h3 class="title">65.2.2. Built-in Operator Classes <a href="#GIST-BUILTIN-OPCLASSES" class="id_link">#</a></h3></div></div></div><p>
  24   The core <span class="productname">PostgreSQL</span> distribution
  25   includes the <acronym class="acronym">GiST</acronym> operator classes shown in
  26   <a class="xref" href="gist.html#GIST-BUILTIN-OPCLASSES-TABLE" title="Table 65.1. Built-in GiST Operator Classes">Table 65.1</a>.
  27   (Some of the optional modules described in <a class="xref" href="contrib.html" title="Appendix F. Additional Supplied Modules and Extensions">Appendix F</a>
  28   provide additional <acronym class="acronym">GiST</acronym> operator classes.)
  29  </p><div class="table" id="GIST-BUILTIN-OPCLASSES-TABLE"><p class="title"><strong>Table 65.1. Built-in <acronym class="acronym">GiST</acronym> Operator Classes</strong></p><div class="table-contents"><table class="table" summary="Built-in GiST Operator Classes" border="1"><colgroup><col class="col1" /><col class="col2" /><col class="col3" /></colgroup><thead><tr><th>Name</th><th>Indexable Operators</th><th>Ordering Operators</th></tr></thead><tbody><tr><td rowspan="12" valign="middle"><code class="literal">box_ops</code></td><td><code class="literal">&lt;&lt; (box, box)</code></td><td rowspan="12" valign="middle"><code class="literal">&lt;-&gt; (box, point)</code></td></tr><tr><td><code class="literal">&amp;&lt; (box, box)</code></td></tr><tr><td><code class="literal">&amp;&amp; (box, box)</code></td></tr><tr><td><code class="literal">&amp;&gt; (box, box)</code></td></tr><tr><td><code class="literal">&gt;&gt; (box, box)</code></td></tr><tr><td><code class="literal">~= (box, box)</code></td></tr><tr><td><code class="literal">@&gt; (box, box)</code></td></tr><tr><td><code class="literal">&lt;@ (box, box)</code></td></tr><tr><td><code class="literal">&amp;&lt;| (box, box)</code></td></tr><tr><td><code class="literal">&lt;&lt;| (box, box)</code></td></tr><tr><td><code class="literal">|&gt;&gt; (box, box)</code></td></tr><tr><td><code class="literal">|&amp;&gt; (box, box)</code></td></tr><tr><td rowspan="12" valign="middle"><code class="literal">circle_ops</code></td><td><code class="literal">&lt;&lt; (circle, circle)</code></td><td rowspan="12" valign="middle"><code class="literal">&lt;-&gt; (circle, point)</code></td></tr><tr><td><code class="literal">&amp;&lt; (circle, circle)</code></td></tr><tr><td><code class="literal">&amp;&gt; (circle, circle)</code></td></tr><tr><td><code class="literal">&gt;&gt; (circle, circle)</code></td></tr><tr><td><code class="literal">&lt;@ (circle, circle)</code></td></tr><tr><td><code class="literal">@&gt; (circle, circle)</code></td></tr><tr><td><code class="literal">~= (circle, circle)</code></td></tr><tr><td><code class="literal">&amp;&amp; (circle, circle)</code></td></tr><tr><td><code class="literal">|&gt;&gt; (circle, circle)</code></td></tr><tr><td><code class="literal">&lt;&lt;| (circle, circle)</code></td></tr><tr><td><code class="literal">&amp;&lt;| (circle, circle)</code></td></tr><tr><td><code class="literal">|&amp;&gt; (circle, circle)</code></td></tr><tr><td rowspan="11" valign="middle"><code class="literal">inet_ops</code></td><td><code class="literal">&lt;&lt; (inet, inet)</code></td><td rowspan="11" valign="middle"> </td></tr><tr><td><code class="literal">&lt;&lt;= (inet, inet)</code></td></tr><tr><td><code class="literal">&gt;&gt; (inet, inet)</code></td></tr><tr><td><code class="literal">&gt;&gt;= (inet, inet)</code></td></tr><tr><td><code class="literal">= (inet, inet)</code></td></tr><tr><td><code class="literal">&lt;&gt; (inet, inet)</code></td></tr><tr><td><code class="literal">&lt; (inet, inet)</code></td></tr><tr><td><code class="literal">&lt;= (inet, inet)</code></td></tr><tr><td><code class="literal">&gt; (inet, inet)</code></td></tr><tr><td><code class="literal">&gt;= (inet, inet)</code></td></tr><tr><td><code class="literal">&amp;&amp; (inet, inet)</code></td></tr><tr><td rowspan="18" valign="middle"><code class="literal">multirange_ops</code></td><td><code class="literal">= (anymultirange, anymultirange)</code></td><td rowspan="18" valign="middle"> </td></tr><tr><td><code class="literal">&amp;&amp; (anymultirange, anymultirange)</code></td></tr><tr><td><code class="literal">&amp;&amp; (anymultirange, anyrange)</code></td></tr><tr><td><code class="literal">@&gt; (anymultirange, anyelement)</code></td></tr><tr><td><code class="literal">@&gt; (anymultirange, anymultirange)</code></td></tr><tr><td><code class="literal">@&gt; (anymultirange, anyrange)</code></td></tr><tr><td><code class="literal">&lt;@ (anymultirange, anymultirange)</code></td></tr><tr><td><code class="literal">&lt;@ (anymultirange, anyrange)</code></td></tr><tr><td><code class="literal">&lt;&lt; (anymultirange, anymultirange)</code></td></tr><tr><td><code class="literal">&lt;&lt; (anymultirange, anyrange)</code></td></tr><tr><td><code class="literal">&gt;&gt; (anymultirange, anymultirange)</code></td></tr><tr><td><code class="literal">&gt;&gt; (anymultirange, anyrange)</code></td></tr><tr><td><code class="literal">&amp;&lt; (anymultirange, anymultirange)</code></td></tr><tr><td><code class="literal">&amp;&lt; (anymultirange, anyrange)</code></td></tr><tr><td><code class="literal">&amp;&gt; (anymultirange, anymultirange)</code></td></tr><tr><td><code class="literal">&amp;&gt; (anymultirange, anyrange)</code></td></tr><tr><td><code class="literal">-|- (anymultirange, anymultirange)</code></td></tr><tr><td><code class="literal">-|- (anymultirange, anyrange)</code></td></tr><tr><td rowspan="8" valign="middle"><code class="literal">point_ops</code></td><td><code class="literal">|&gt;&gt; (point, point)</code></td><td rowspan="8" valign="middle"><code class="literal">&lt;-&gt; (point, point)</code></td></tr><tr><td><code class="literal">&lt;&lt; (point, point)</code></td></tr><tr><td><code class="literal">&gt;&gt; (point, point)</code></td></tr><tr><td><code class="literal">&lt;&lt;| (point, point)</code></td></tr><tr><td><code class="literal">~= (point, point)</code></td></tr><tr><td><code class="literal">&lt;@ (point, box)</code></td></tr><tr><td><code class="literal">&lt;@ (point, polygon)</code></td></tr><tr><td><code class="literal">&lt;@ (point, circle)</code></td></tr><tr><td rowspan="12" valign="middle"><code class="literal">poly_ops</code></td><td><code class="literal">&lt;&lt; (polygon, polygon)</code></td><td rowspan="12" valign="middle"><code class="literal">&lt;-&gt; (polygon, point)</code></td></tr><tr><td><code class="literal">&amp;&lt; (polygon, polygon)</code></td></tr><tr><td><code class="literal">&amp;&gt; (polygon, polygon)</code></td></tr><tr><td><code class="literal">&gt;&gt; (polygon, polygon)</code></td></tr><tr><td><code class="literal">&lt;@ (polygon, polygon)</code></td></tr><tr><td><code class="literal">@&gt; (polygon, polygon)</code></td></tr><tr><td><code class="literal">~= (polygon, polygon)</code></td></tr><tr><td><code class="literal">&amp;&amp; (polygon, polygon)</code></td></tr><tr><td><code class="literal">&lt;&lt;| (polygon, polygon)</code></td></tr><tr><td><code class="literal">&amp;&lt;| (polygon, polygon)</code></td></tr><tr><td><code class="literal">|&amp;&gt; (polygon, polygon)</code></td></tr><tr><td><code class="literal">|&gt;&gt; (polygon, polygon)</code></td></tr><tr><td rowspan="18" valign="middle"><code class="literal">range_ops</code></td><td><code class="literal">= (anyrange, anyrange)</code></td><td rowspan="18" valign="middle"> </td></tr><tr><td><code class="literal">&amp;&amp; (anyrange, anyrange)</code></td></tr><tr><td><code class="literal">&amp;&amp; (anyrange, anymultirange)</code></td></tr><tr><td><code class="literal">@&gt; (anyrange, anyelement)</code></td></tr><tr><td><code class="literal">@&gt; (anyrange, anyrange)</code></td></tr><tr><td><code class="literal">@&gt; (anyrange, anymultirange)</code></td></tr><tr><td><code class="literal">&lt;@ (anyrange, anyrange)</code></td></tr><tr><td><code class="literal">&lt;@ (anyrange, anymultirange)</code></td></tr><tr><td><code class="literal">&lt;&lt; (anyrange, anyrange)</code></td></tr><tr><td><code class="literal">&lt;&lt; (anyrange, anymultirange)</code></td></tr><tr><td><code class="literal">&gt;&gt; (anyrange, anyrange)</code></td></tr><tr><td><code class="literal">&gt;&gt; (anyrange, anymultirange)</code></td></tr><tr><td><code class="literal">&amp;&lt; (anyrange, anyrange)</code></td></tr><tr><td><code class="literal">&amp;&lt; (anyrange, anymultirange)</code></td></tr><tr><td><code class="literal">&amp;&gt; (anyrange, anyrange)</code></td></tr><tr><td><code class="literal">&amp;&gt; (anyrange, anymultirange)</code></td></tr><tr><td><code class="literal">-|- (anyrange, anyrange)</code></td></tr><tr><td><code class="literal">-|- (anyrange, anymultirange)</code></td></tr><tr><td rowspan="2" valign="middle"><code class="literal">tsquery_ops</code></td><td><code class="literal">&lt;@ (tsquery, tsquery)</code></td><td rowspan="2" valign="middle"> </td></tr><tr><td><code class="literal">@&gt; (tsquery, tsquery)</code></td></tr><tr><td valign="middle"><code class="literal">tsvector_ops</code></td><td><code class="literal">@@ (tsvector, tsquery)</code></td><td> </td></tr></tbody></table></div></div><br class="table-break" /><p>
  30   For historical reasons, the <code class="literal">inet_ops</code> operator class is
  31   not the default class for types <code class="type">inet</code> and <code class="type">cidr</code>.
  32   To use it, mention the class name in <code class="command">CREATE INDEX</code>,
  33   for example
  34 </p><pre class="programlisting">
  35 CREATE INDEX ON my_table USING GIST (my_inet_column inet_ops);
  36 </pre><p>
  37  </p></div><div class="sect2" id="GIST-EXTENSIBILITY"><div class="titlepage"><div><div><h3 class="title">65.2.3. Extensibility <a href="#GIST-EXTENSIBILITY" class="id_link">#</a></h3></div></div></div><p>
  38    Traditionally, implementing a new index access method meant a lot of
  39    difficult work.  It was necessary to understand the inner workings of the
  40    database, such as the lock manager and Write-Ahead Log.  The
  41    <acronym class="acronym">GiST</acronym> interface has a high level of abstraction,
  42    requiring the access method implementer only to implement the semantics of
  43    the data type being accessed.  The <acronym class="acronym">GiST</acronym> layer itself
  44    takes care of concurrency, logging and searching the tree structure.
  45  </p><p>
  46    This extensibility should not be confused with the extensibility of the
  47    other standard search trees in terms of the data they can handle.  For
  48    example, <span class="productname">PostgreSQL</span> supports extensible B-trees
  49    and hash indexes. That means that you can use
  50    <span class="productname">PostgreSQL</span> to build a B-tree or hash over any
  51    data type you want. But B-trees only support range predicates
  52    (<code class="literal">&lt;</code>, <code class="literal">=</code>, <code class="literal">&gt;</code>),
  53    and hash indexes only support equality queries.
  54  </p><p>
  55    So if you index, say, an image collection with a
  56    <span class="productname">PostgreSQL</span> B-tree, you can only issue queries
  57    such as <span class="quote">“<span class="quote">is imagex equal to imagey</span>”</span>, <span class="quote">“<span class="quote">is imagex less
  58    than imagey</span>”</span> and <span class="quote">“<span class="quote">is imagex greater than imagey</span>”</span>.
  59    Depending on how you define <span class="quote">“<span class="quote">equals</span>”</span>, <span class="quote">“<span class="quote">less than</span>”</span>
  60    and <span class="quote">“<span class="quote">greater than</span>”</span> in this context, this could be useful.
  61    However, by using a <acronym class="acronym">GiST</acronym> based index, you could create
  62    ways to ask domain-specific questions, perhaps <span class="quote">“<span class="quote">find all images of
  63    horses</span>”</span> or <span class="quote">“<span class="quote">find all over-exposed images</span>”</span>.
  64  </p><p>
  65    All it takes to get a <acronym class="acronym">GiST</acronym> access method up and running
  66    is to implement several user-defined methods, which define the behavior of
  67    keys in the tree. Of course these methods have to be pretty fancy to
  68    support fancy queries, but for all the standard queries (B-trees,
  69    R-trees, etc.) they're relatively straightforward. In short,
  70    <acronym class="acronym">GiST</acronym> combines extensibility along with generality, code
  71    reuse, and a clean interface.
  72   </p><p>
  73    There are five methods that an index operator class for
  74    <acronym class="acronym">GiST</acronym> must provide, and seven that are optional.
  75    Correctness of the index is ensured
  76    by proper implementation of the <code class="function">same</code>, <code class="function">consistent</code>
  77    and <code class="function">union</code> methods, while efficiency (size and speed) of the
  78    index will depend on the <code class="function">penalty</code> and <code class="function">picksplit</code>
  79    methods.
  80    Two optional methods are <code class="function">compress</code> and
  81    <code class="function">decompress</code>, which allow an index to have internal tree data of
  82    a different type than the data it indexes. The leaves are to be of the
  83    indexed data type, while the other tree nodes can be of any C struct (but
  84    you still have to follow <span class="productname">PostgreSQL</span> data type rules here,
  85    see about <code class="literal">varlena</code> for variable sized data). If the tree's
  86    internal data type exists at the SQL level, the <code class="literal">STORAGE</code> option
  87    of the <code class="command">CREATE OPERATOR CLASS</code> command can be used.
  88    The optional eighth method is <code class="function">distance</code>, which is needed
  89    if the operator class wishes to support ordered scans (nearest-neighbor
  90    searches). The optional ninth method <code class="function">fetch</code> is needed if the
  91    operator class wishes to support index-only scans, except when the
  92    <code class="function">compress</code> method is omitted. The optional tenth method
  93    <code class="function">options</code> is needed if the operator class has
  94    user-specified parameters.
  95    The optional eleventh method <code class="function">sortsupport</code> is used to
  96    speed up building a <acronym class="acronym">GiST</acronym> index.
  97    The optional twelfth method <code class="function">stratnum</code> is used to
  98    translate compare types (from
  99    <code class="filename">src/include/nodes/primnodes.h</code>) into strategy numbers
 100    used by the operator class.  This lets the core code look up operators for
 101    temporal constraint indexes.
 102  </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="function">consistent</code></span></dt><dd><p>
 103        Given an index entry <code class="literal">p</code> and a query value <code class="literal">q</code>,
 104        this function determines whether the index entry is
 105        <span class="quote">“<span class="quote">consistent</span>”</span> with the query; that is, could the predicate
 106        <span class="quote">“<span class="quote"><em class="replaceable"><code>indexed_column</code></em>
 107        <em class="replaceable"><code>indexable_operator</code></em> <code class="literal">q</code></span>”</span> be true for
 108        any row represented by the index entry?  For a leaf index entry this is
 109        equivalent to testing the indexable condition, while for an internal
 110        tree node this determines whether it is necessary to scan the subtree
 111        of the index represented by the tree node.  When the result is
 112        <code class="literal">true</code>, a <code class="literal">recheck</code> flag must also be returned.
 113        This indicates whether the predicate is certainly true or only possibly
 114        true.  If <code class="literal">recheck</code> = <code class="literal">false</code> then the index has
 115        tested the predicate condition exactly, whereas if <code class="literal">recheck</code>
 116        = <code class="literal">true</code> the row is only a candidate match.  In that case the
 117        system will automatically evaluate the
 118        <em class="replaceable"><code>indexable_operator</code></em> against the actual row value to see
 119        if it is really a match.  This convention allows
 120        <acronym class="acronym">GiST</acronym> to support both lossless and lossy index
 121        structures.
 122       </p><p>
 123         The <acronym class="acronym">SQL</acronym> declaration of the function must look like this:
 124
 125 </p><pre class="programlisting">
 126 CREATE OR REPLACE FUNCTION my_consistent(internal, data_type, smallint, oid, internal)
 127 RETURNS bool
 128 AS 'MODULE_PATHNAME'
 129 LANGUAGE C STRICT;
 130 </pre><p>
 131
 132         And the matching code in the C module could then follow this skeleton:
 133
 134 </p><pre class="programlisting">
 135 PG_FUNCTION_INFO_V1(my_consistent);
 136
 137 Datum
 138 my_consistent(PG_FUNCTION_ARGS)
 139 {
 140     GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
 141     data_type  *query = PG_GETARG_DATA_TYPE_P(1);
 142     StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
 143     /* Oid subtype = PG_GETARG_OID(3); */
 144     bool       *recheck = (bool *) PG_GETARG_POINTER(4);
 145     data_type  *key = DatumGetDataType(entry-&gt;key);
 146     bool        retval;
 147
 148     /*
 149      * determine return value as a function of strategy, key and query.
 150      *
 151      * Use GIST_LEAF(entry) to know where you're called in the index tree,
 152      * which comes handy when supporting the = operator for example (you could
 153      * check for non empty union() in non-leaf nodes and equality in leaf
 154      * nodes).
 155      */
 156
 157     *recheck = true;        /* or false if check is exact */
 158
 159     PG_RETURN_BOOL(retval);
 160 }
 161 </pre><p>
 162
 163        Here, <code class="varname">key</code> is an element in the index and <code class="varname">query</code>
 164        the value being looked up in the index. The <code class="literal">StrategyNumber</code>
 165        parameter indicates which operator of your operator class is being
 166        applied — it matches one of the operator numbers in the
 167        <code class="command">CREATE OPERATOR CLASS</code> command.
 168       </p><p>
 169        Depending on which operators you have included in the class, the data
 170        type of <code class="varname">query</code> could vary with the operator, since it will
 171        be whatever type is on the right-hand side of the operator, which might
 172        be different from the indexed data type appearing on the left-hand side.
 173        (The above code skeleton assumes that only one type is possible; if
 174        not, fetching the <code class="varname">query</code> argument value would have to depend
 175        on the operator.)  It is recommended that the SQL declaration of
 176        the <code class="function">consistent</code> function use the opclass's indexed data
 177        type for the <code class="varname">query</code> argument, even though the actual type
 178        might be something else depending on the operator.
 179       </p></dd><dt><span class="term"><code class="function">union</code></span></dt><dd><p>
 180        This method consolidates information in the tree.  Given a set of
 181        entries, this function generates a new index entry that represents
 182        all the given entries.
 183       </p><p>
 184         The <acronym class="acronym">SQL</acronym> declaration of the function must look like this:
 185
 186 </p><pre class="programlisting">
 187 CREATE OR REPLACE FUNCTION my_union(internal, internal)
 188 RETURNS storage_type
 189 AS 'MODULE_PATHNAME'
 190 LANGUAGE C STRICT;
 191 </pre><p>
 192
 193         And the matching code in the C module could then follow this skeleton:
 194
 195 </p><pre class="programlisting">
 196 PG_FUNCTION_INFO_V1(my_union);
 197
 198 Datum
 199 my_union(PG_FUNCTION_ARGS)
 200 {
 201     GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0);
 202     GISTENTRY  *ent = entryvec-&gt;vector;
 203     data_type  *out,
 204                *tmp,
 205                *old;
 206     int         numranges,
 207                 i = 0;
 208
 209     numranges = entryvec-&gt;n;
 210     tmp = DatumGetDataType(ent[0].key);
 211     out = tmp;
 212
 213     if (numranges == 1)
 214     {
 215         out = data_type_deep_copy(tmp);
 216
 217         PG_RETURN_DATA_TYPE_P(out);
 218     }
 219
 220     for (i = 1; i &lt; numranges; i++)
 221     {
 222         old = out;
 223         tmp = DatumGetDataType(ent[i].key);
 224         out = my_union_implementation(out, tmp);
 225     }
 226
 227     PG_RETURN_DATA_TYPE_P(out);
 228 }
 229 </pre><p>
 230       </p><p>
 231         As you can see, in this skeleton we're dealing with a data type
 232         where <code class="literal">union(X, Y, Z) = union(union(X, Y), Z)</code>. It's easy
 233         enough to support data types where this is not the case, by
 234         implementing the proper union algorithm in this
 235         <acronym class="acronym">GiST</acronym> support method.
 236       </p><p>
 237         The result of the <code class="function">union</code> function must be a value of the
 238         index's storage type, whatever that is (it might or might not be
 239         different from the indexed column's type).  The <code class="function">union</code>
 240         function should return a pointer to newly <code class="function">palloc()</code>ed
 241         memory. You can't just return the input value as-is, even if there is
 242         no type change.
 243       </p><p>
 244        As shown above, the <code class="function">union</code> function's
 245        first <code class="type">internal</code> argument is actually
 246        a <code class="structname">GistEntryVector</code> pointer.  The second argument is a
 247        pointer to an integer variable, which can be ignored.  (It used to be
 248        required that the <code class="function">union</code> function store the size of its
 249        result value into that variable, but this is no longer necessary.)
 250       </p></dd><dt><span class="term"><code class="function">compress</code></span></dt><dd><p>
 251        Converts a data item into a format suitable for physical storage in
 252        an index page.
 253        If the <code class="function">compress</code> method is omitted, data items are stored
 254        in the index without modification.
 255       </p><p>
 256         The <acronym class="acronym">SQL</acronym> declaration of the function must look like this:
 257
 258 </p><pre class="programlisting">
 259 CREATE OR REPLACE FUNCTION my_compress(internal)
 260 RETURNS internal
 261 AS 'MODULE_PATHNAME'
 262 LANGUAGE C STRICT;
 263 </pre><p>
 264
 265         And the matching code in the C module could then follow this skeleton:
 266
 267 </p><pre class="programlisting">
 268 PG_FUNCTION_INFO_V1(my_compress);
 269
 270 Datum
 271 my_compress(PG_FUNCTION_ARGS)
 272 {
 273     GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
 274     GISTENTRY  *retval;
 275
 276     if (entry-&gt;leafkey)
 277     {
 278         /* replace entry-&gt;key with a compressed version */
 279         compressed_data_type *compressed_data = palloc(sizeof(compressed_data_type));
 280
 281         /* fill *compressed_data from entry-&gt;key ... */
 282
 283         retval = palloc(sizeof(GISTENTRY));
 284         gistentryinit(*retval, PointerGetDatum(compressed_data),
 285                       entry-&gt;rel, entry-&gt;page, entry-&gt;offset, FALSE);
 286     }
 287     else
 288     {
 289         /* typically we needn't do anything with non-leaf entries */
 290         retval = entry;
 291     }
 292
 293     PG_RETURN_POINTER(retval);
 294 }
 295 </pre><p>
 296       </p><p>
 297        You have to adapt <em class="replaceable"><code>compressed_data_type</code></em> to the specific
 298        type you're converting to in order to compress your leaf nodes, of
 299        course.
 300       </p></dd><dt><span class="term"><code class="function">decompress</code></span></dt><dd><p>
 301        Converts the stored representation of a data item into a format that
 302        can be manipulated by the other GiST methods in the operator class.
 303        If the <code class="function">decompress</code> method is omitted, it is assumed that
 304        the other GiST methods can work directly on the stored data format.
 305        (<code class="function">decompress</code> is not necessarily the reverse of
 306        the <code class="function">compress</code> method; in particular,
 307        if <code class="function">compress</code> is lossy then it's impossible
 308        for <code class="function">decompress</code> to exactly reconstruct the original
 309        data.  <code class="function">decompress</code> is not necessarily equivalent
 310        to <code class="function">fetch</code>, either, since the other GiST methods might not
 311        require full reconstruction of the data.)
 312       </p><p>
 313         The <acronym class="acronym">SQL</acronym> declaration of the function must look like this:
 314
 315 </p><pre class="programlisting">
 316 CREATE OR REPLACE FUNCTION my_decompress(internal)
 317 RETURNS internal
 318 AS 'MODULE_PATHNAME'
 319 LANGUAGE C STRICT;
 320 </pre><p>
 321
 322         And the matching code in the C module could then follow this skeleton:
 323
 324 </p><pre class="programlisting">
 325 PG_FUNCTION_INFO_V1(my_decompress);
 326
 327 Datum
 328 my_decompress(PG_FUNCTION_ARGS)
 329 {
 330     PG_RETURN_POINTER(PG_GETARG_POINTER(0));
 331 }
 332 </pre><p>
 333
 334         The above skeleton is suitable for the case where no decompression
 335         is needed.  (But, of course, omitting the method altogether is even
 336         easier, and is recommended in such cases.)
 337       </p></dd><dt><span class="term"><code class="function">penalty</code></span></dt><dd><p>
 338        Returns a value indicating the <span class="quote">“<span class="quote">cost</span>”</span> of inserting the new
 339        entry into a particular branch of the tree.  Items will be inserted
 340        down the path of least <code class="function">penalty</code> in the tree.
 341        Values returned by <code class="function">penalty</code> should be non-negative.
 342        If a negative value is returned, it will be treated as zero.
 343       </p><p>
 344         The <acronym class="acronym">SQL</acronym> declaration of the function must look like this:
 345
 346 </p><pre class="programlisting">
 347 CREATE OR REPLACE FUNCTION my_penalty(internal, internal, internal)
 348 RETURNS internal
 349 AS 'MODULE_PATHNAME'
 350 LANGUAGE C STRICT;  -- in some cases penalty functions need not be strict
 351 </pre><p>
 352
 353         And the matching code in the C module could then follow this skeleton:
 354
 355 </p><pre class="programlisting">
 356 PG_FUNCTION_INFO_V1(my_penalty);
 357
 358 Datum
 359 my_penalty(PG_FUNCTION_ARGS)
 360 {
 361     GISTENTRY  *origentry = (GISTENTRY *) PG_GETARG_POINTER(0);
 362     GISTENTRY  *newentry = (GISTENTRY *) PG_GETARG_POINTER(1);
 363     float      *penalty = (float *) PG_GETARG_POINTER(2);
 364     data_type  *orig = DatumGetDataType(origentry-&gt;key);
 365     data_type  *new = DatumGetDataType(newentry-&gt;key);
 366
 367     *penalty = my_penalty_implementation(orig, new);
 368     PG_RETURN_POINTER(penalty);
 369 }
 370 </pre><p>
 371
 372         For historical reasons, the <code class="function">penalty</code> function doesn't
 373         just return a <code class="type">float</code> result; instead it has to store the value
 374         at the location indicated by the third argument.  The return
 375         value per se is ignored, though it's conventional to pass back the
 376         address of that argument.
 377       </p><p>
 378         The <code class="function">penalty</code> function is crucial to good performance of
 379         the index. It'll get used at insertion time to determine which branch
 380         to follow when choosing where to add the new entry in the tree. At
 381         query time, the more balanced the index, the quicker the lookup.
 382       </p></dd><dt><span class="term"><code class="function">picksplit</code></span></dt><dd><p>
 383        When an index page split is necessary, this function decides which
 384        entries on the page are to stay on the old page, and which are to move
 385        to the new page.
 386       </p><p>
 387         The <acronym class="acronym">SQL</acronym> declaration of the function must look like this:
 388
 389 </p><pre class="programlisting">
 390 CREATE OR REPLACE FUNCTION my_picksplit(internal, internal)
 391 RETURNS internal
 392 AS 'MODULE_PATHNAME'
 393 LANGUAGE C STRICT;
 394 </pre><p>
 395
 396         And the matching code in the C module could then follow this skeleton:
 397
 398 </p><pre class="programlisting">
 399 PG_FUNCTION_INFO_V1(my_picksplit);
 400
 401 Datum
 402 my_picksplit(PG_FUNCTION_ARGS)
 403 {
 404     GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0);
 405     GIST_SPLITVEC *v = (GIST_SPLITVEC *) PG_GETARG_POINTER(1);
 406     OffsetNumber maxoff = entryvec-&gt;n - 1;
 407     GISTENTRY  *ent = entryvec-&gt;vector;
 408     int         i,
 409                 nbytes;
 410     OffsetNumber *left,
 411                *right;
 412     data_type  *tmp_union;
 413     data_type  *unionL;
 414     data_type  *unionR;
 415     GISTENTRY **raw_entryvec;
 416
 417     maxoff = entryvec-&gt;n - 1;
 418     nbytes = (maxoff + 1) * sizeof(OffsetNumber);
 419
 420     v-&gt;spl_left = (OffsetNumber *) palloc(nbytes);
 421     left = v-&gt;spl_left;
 422     v-&gt;spl_nleft = 0;
 423
 424     v-&gt;spl_right = (OffsetNumber *) palloc(nbytes);
 425     right = v-&gt;spl_right;
 426     v-&gt;spl_nright = 0;
 427
 428     unionL = NULL;
 429     unionR = NULL;
 430
 431     /* Initialize the raw entry vector. */
 432     raw_entryvec = (GISTENTRY **) malloc(entryvec-&gt;n * sizeof(void *));
 433     for (i = FirstOffsetNumber; i &lt;= maxoff; i = OffsetNumberNext(i))
 434         raw_entryvec[i] = &amp;(entryvec-&gt;vector[i]);
 435
 436     for (i = FirstOffsetNumber; i &lt;= maxoff; i = OffsetNumberNext(i))
 437     {
 438         int         real_index = raw_entryvec[i] - entryvec-&gt;vector;
 439
 440         tmp_union = DatumGetDataType(entryvec-&gt;vector[real_index].key);
 441         Assert(tmp_union != NULL);
 442
 443         /*
 444          * Choose where to put the index entries and update unionL and unionR
 445          * accordingly. Append the entries to either v-&gt;spl_left or
 446          * v-&gt;spl_right, and care about the counters.
 447          */
 448
 449         if (my_choice_is_left(unionL, curl, unionR, curr))
 450         {
 451             if (unionL == NULL)
 452                 unionL = tmp_union;
 453             else
 454                 unionL = my_union_implementation(unionL, tmp_union);
 455
 456             *left = real_index;
 457             ++left;
 458             ++(v-&gt;spl_nleft);
 459         }
 460         else
 461         {
 462             /*
 463              * Same on the right
 464              */
 465         }
 466     }
 467
 468     v-&gt;spl_ldatum = DataTypeGetDatum(unionL);
 469     v-&gt;spl_rdatum = DataTypeGetDatum(unionR);
 470     PG_RETURN_POINTER(v);
 471 }
 472 </pre><p>
 473
 474        Notice that the <code class="function">picksplit</code> function's result is delivered
 475        by modifying the passed-in <code class="structname">v</code> structure.  The return
 476        value per se is ignored, though it's conventional to pass back the
 477        address of <code class="structname">v</code>.
 478       </p><p>
 479         Like <code class="function">penalty</code>, the <code class="function">picksplit</code> function
 480         is crucial to good performance of the index.  Designing suitable
 481         <code class="function">penalty</code> and <code class="function">picksplit</code> implementations
 482         is where the challenge of implementing well-performing
 483         <acronym class="acronym">GiST</acronym> indexes lies.
 484       </p></dd><dt><span class="term"><code class="function">same</code></span></dt><dd><p>
 485        Returns true if two index entries are identical, false otherwise.
 486        (An <span class="quote">“<span class="quote">index entry</span>”</span> is a value of the index's storage type,
 487        not necessarily the original indexed column's type.)
 488       </p><p>
 489         The <acronym class="acronym">SQL</acronym> declaration of the function must look like this:
 490
 491 </p><pre class="programlisting">
 492 CREATE OR REPLACE FUNCTION my_same(storage_type, storage_type, internal)
 493 RETURNS internal
 494 AS 'MODULE_PATHNAME'
 495 LANGUAGE C STRICT;
 496 </pre><p>
 497
 498         And the matching code in the C module could then follow this skeleton:
 499
 500 </p><pre class="programlisting">
 501 PG_FUNCTION_INFO_V1(my_same);
 502
 503 Datum
 504 my_same(PG_FUNCTION_ARGS)
 505 {
 506     prefix_range *v1 = PG_GETARG_PREFIX_RANGE_P(0);
 507     prefix_range *v2 = PG_GETARG_PREFIX_RANGE_P(1);
 508     bool       *result = (bool *) PG_GETARG_POINTER(2);
 509
 510     *result = my_eq(v1, v2);
 511     PG_RETURN_POINTER(result);
 512 }
 513 </pre><p>
 514
 515         For historical reasons, the <code class="function">same</code> function doesn't
 516         just return a Boolean result; instead it has to store the flag
 517         at the location indicated by the third argument.  The return
 518         value per se is ignored, though it's conventional to pass back the
 519         address of that argument.
 520       </p></dd><dt><span class="term"><code class="function">distance</code></span></dt><dd><p>
 521        Given an index entry <code class="literal">p</code> and a query value <code class="literal">q</code>,
 522        this function determines the index entry's
 523        <span class="quote">“<span class="quote">distance</span>”</span> from the query value.  This function must be
 524        supplied if the operator class contains any ordering operators.
 525        A query using the ordering operator will be implemented by returning
 526        index entries with the smallest <span class="quote">“<span class="quote">distance</span>”</span> values first,
 527        so the results must be consistent with the operator's semantics.
 528        For a leaf index entry the result just represents the distance to
 529        the index entry; for an internal tree node, the result must be the
 530        smallest distance that any child entry could have.
 531       </p><p>
 532         The <acronym class="acronym">SQL</acronym> declaration of the function must look like this:
 533
 534 </p><pre class="programlisting">
 535 CREATE OR REPLACE FUNCTION my_distance(internal, data_type, smallint, oid, internal)
 536 RETURNS float8
 537 AS 'MODULE_PATHNAME'
 538 LANGUAGE C STRICT;
 539 </pre><p>
 540
 541         And the matching code in the C module could then follow this skeleton:
 542
 543 </p><pre class="programlisting">
 544 PG_FUNCTION_INFO_V1(my_distance);
 545
 546 Datum
 547 my_distance(PG_FUNCTION_ARGS)
 548 {
 549     GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
 550     data_type  *query = PG_GETARG_DATA_TYPE_P(1);
 551     StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
 552     /* Oid subtype = PG_GETARG_OID(3); */
 553     /* bool *recheck = (bool *) PG_GETARG_POINTER(4); */
 554     data_type  *key = DatumGetDataType(entry-&gt;key);
 555     double      retval;
 556
 557     /*
 558      * determine return value as a function of strategy, key and query.
 559      */
 560
 561     PG_RETURN_FLOAT8(retval);
 562 }
 563 </pre><p>
 564
 565        The arguments to the <code class="function">distance</code> function are identical to
 566        the arguments of the <code class="function">consistent</code> function.
 567       </p><p>
 568        Some approximation is allowed when determining the distance, so long
 569        as the result is never greater than the entry's actual distance. Thus,
 570        for example, distance to a bounding box is usually sufficient in
 571        geometric applications.  For an internal tree node, the distance
 572        returned must not be greater than the distance to any of the child
 573        nodes. If the returned distance is not exact, the function must set
 574        <code class="literal">*recheck</code> to true. (This is not necessary for internal tree
 575        nodes; for them, the calculation is always assumed to be inexact.) In
 576        this case the executor will calculate the accurate distance after
 577        fetching the tuple from the heap, and reorder the tuples if necessary.
 578       </p><p>
 579        If the distance function returns <code class="literal">*recheck = true</code> for any
 580        leaf node, the original ordering operator's return type must
 581        be <code class="type">float8</code> or <code class="type">float4</code>, and the distance function's
 582        result values must be comparable to those of the original ordering
 583        operator, since the executor will sort using both distance function
 584        results and recalculated ordering-operator results.  Otherwise, the
 585        distance function's result values can be any finite <code class="type">float8</code>
 586        values, so long as the relative order of the result values matches the
 587        order returned by the ordering operator.  (Infinity and minus infinity
 588        are used internally to handle cases such as nulls, so it is not
 589        recommended that <code class="function">distance</code> functions return these values.)
 590       </p></dd><dt><span class="term"><code class="function">fetch</code></span></dt><dd><p>
 591        Converts the compressed index representation of a data item into the
 592        original data type, for index-only scans. The returned data must be an
 593        exact, non-lossy copy of the originally indexed value.
 594       </p><p>
 595         The <acronym class="acronym">SQL</acronym> declaration of the function must look like this:
 596
 597 </p><pre class="programlisting">
 598 CREATE OR REPLACE FUNCTION my_fetch(internal)
 599 RETURNS internal
 600 AS 'MODULE_PATHNAME'
 601 LANGUAGE C STRICT;
 602 </pre><p>
 603
 604         The argument is a pointer to a <code class="structname">GISTENTRY</code> struct. On
 605         entry, its <code class="structfield">key</code> field contains a non-NULL leaf datum in
 606         compressed form. The return value is another <code class="structname">GISTENTRY</code>
 607         struct, whose <code class="structfield">key</code> field contains the same datum in its
 608         original, uncompressed form. If the opclass's compress function does
 609         nothing for leaf entries, the <code class="function">fetch</code> method can return the
 610         argument as-is.  Or, if the opclass does not have a compress function,
 611         the <code class="function">fetch</code> method can be omitted as well, since it would
 612         necessarily be a no-op.
 613        </p><p>
 614         The matching code in the C module could then follow this skeleton:
 615
 616 </p><pre class="programlisting">
 617 PG_FUNCTION_INFO_V1(my_fetch);
 618
 619 Datum
 620 my_fetch(PG_FUNCTION_ARGS)
 621 {
 622     GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
 623     input_data_type *in = DatumGetPointer(entry-&gt;key);
 624     fetched_data_type *fetched_data;
 625     GISTENTRY  *retval;
 626
 627     retval = palloc(sizeof(GISTENTRY));
 628     fetched_data = palloc(sizeof(fetched_data_type));
 629
 630     /*
 631      * Convert 'fetched_data' into the a Datum of the original datatype.
 632      */
 633
 634     /* fill *retval from fetched_data. */
 635     gistentryinit(*retval, PointerGetDatum(converted_datum),
 636                   entry-&gt;rel, entry-&gt;page, entry-&gt;offset, FALSE);
 637
 638     PG_RETURN_POINTER(retval);
 639 }
 640 </pre><p>
 641       </p><p>
 642        If the compress method is lossy for leaf entries, the operator class
 643        cannot support index-only scans, and must not define
 644        a <code class="function">fetch</code> function.
 645       </p></dd><dt><span class="term"><code class="function">options</code></span></dt><dd><p>
 646        Allows definition of user-visible parameters that control operator
 647        class behavior.
 648       </p><p>
 649         The <acronym class="acronym">SQL</acronym> declaration of the function must look like this:
 650
 651 </p><pre class="programlisting">
 652 CREATE OR REPLACE FUNCTION my_options(internal)
 653 RETURNS void
 654 AS 'MODULE_PATHNAME'
 655 LANGUAGE C STRICT;
 656 </pre><p>
 657       </p><p>
 658        The function is passed a pointer to a <code class="structname">local_relopts</code>
 659        struct, which needs to be filled with a set of operator class
 660        specific options.  The options can be accessed from other support
 661        functions using the <code class="literal">PG_HAS_OPCLASS_OPTIONS()</code> and
 662        <code class="literal">PG_GET_OPCLASS_OPTIONS()</code> macros.
 663       </p><p>
 664         An example implementation of my_options() and parameters use
 665         from other support functions are given below:
 666
 667 </p><pre class="programlisting">
 668 typedef enum MyEnumType
 669 {
 670     MY_ENUM_ON,
 671     MY_ENUM_OFF,
 672     MY_ENUM_AUTO
 673 } MyEnumType;
 674
 675 typedef struct
 676 {
 677     int32   vl_len_;    /* varlena header (do not touch directly!) */
 678     int     int_param;  /* integer parameter */
 679     double  real_param; /* real parameter */
 680     MyEnumType enum_param; /* enum parameter */
 681     int     str_param;  /* string parameter */
 682 } MyOptionsStruct;
 683
 684 /* String representation of enum values */
 685 static relopt_enum_elt_def myEnumValues[] =
 686 {
 687     {"on", MY_ENUM_ON},
 688     {"off", MY_ENUM_OFF},
 689     {"auto", MY_ENUM_AUTO},
 690     {(const char *) NULL}   /* list terminator */
 691 };
 692
 693 static char *str_param_default = "default";
 694
 695 /*
 696  * Sample validator: checks that string is not longer than 8 bytes.
 697  */
 698 static void
 699 validate_my_string_relopt(const char *value)
 700 {
 701     if (strlen(value) &gt; 8)
 702         ereport(ERROR,
 703                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 704                  errmsg("str_param must be at most 8 bytes")));
 705 }
 706
 707 /*
 708  * Sample filler: switches characters to lower case.
 709  */
 710 static Size
 711 fill_my_string_relopt(const char *value, void *ptr)
 712 {
 713     char   *tmp = str_tolower(value, strlen(value), DEFAULT_COLLATION_OID);
 714     int     len = strlen(tmp);
 715
 716     if (ptr)
 717         strcpy(ptr, tmp);
 718
 719     pfree(tmp);
 720     return len + 1;
 721 }
 722
 723 PG_FUNCTION_INFO_V1(my_options);
 724
 725 Datum
 726 my_options(PG_FUNCTION_ARGS)
 727 {
 728     local_relopts *relopts = (local_relopts *) PG_GETARG_POINTER(0);
 729
 730     init_local_reloptions(relopts, sizeof(MyOptionsStruct));
 731     add_local_int_reloption(relopts, "int_param", "integer parameter",
 732                             100, 0, 1000000,
 733                             offsetof(MyOptionsStruct, int_param));
 734     add_local_real_reloption(relopts, "real_param", "real parameter",
 735                              1.0, 0.0, 1000000.0,
 736                              offsetof(MyOptionsStruct, real_param));
 737     add_local_enum_reloption(relopts, "enum_param", "enum parameter",
 738                              myEnumValues, MY_ENUM_ON,
 739                              "Valid values are: \"on\", \"off\" and \"auto\".",
 740                              offsetof(MyOptionsStruct, enum_param));
 741     add_local_string_reloption(relopts, "str_param", "string parameter",
 742                                str_param_default,
 743                                &amp;validate_my_string_relopt,
 744                                &amp;fill_my_string_relopt,
 745                                offsetof(MyOptionsStruct, str_param));
 746
 747     PG_RETURN_VOID();
 748 }
 749
 750 PG_FUNCTION_INFO_V1(my_compress);
 751
 752 Datum
 753 my_compress(PG_FUNCTION_ARGS)
 754 {
 755     int     int_param = 100;
 756     double  real_param = 1.0;
 757     MyEnumType enum_param = MY_ENUM_ON;
 758     char   *str_param = str_param_default;
 759
 760     /*
 761      * Normally, when opclass contains 'options' method, then options are always
 762      * passed to support functions.  However, if you add 'options' method to
 763      * existing opclass, previously defined indexes have no options, so the
 764      * check is required.
 765      */
 766     if (PG_HAS_OPCLASS_OPTIONS())
 767     {
 768         MyOptionsStruct *options = (MyOptionsStruct *) PG_GET_OPCLASS_OPTIONS();
 769
 770         int_param = options-&gt;int_param;
 771         real_param = options-&gt;real_param;
 772         enum_param = options-&gt;enum_param;
 773         str_param = GET_STRING_RELOPTION(options, str_param);
 774     }
 775
 776     /* the rest implementation of support function */
 777 }
 778
 779 </pre><p>
 780       </p><p>
 781        Since the representation of the key in <acronym class="acronym">GiST</acronym> is
 782        flexible, it may depend on user-specified parameters.  For instance,
 783        the length of key signature may be specified.  See
 784        <code class="literal">gtsvector_options()</code> for example.
 785       </p></dd><dt><span class="term"><code class="function">sortsupport</code></span></dt><dd><p>
 786        Returns a comparator function to sort data in a way that preserves
 787        locality. It is used by <code class="command">CREATE INDEX</code> and
 788        <code class="command">REINDEX</code> commands. The quality of the created index
 789        depends on how well the sort order determined by the comparator function
 790        preserves locality of the inputs.
 791       </p><p>
 792        The <code class="function">sortsupport</code> method is optional. If it is not
 793        provided, <code class="command">CREATE INDEX</code> builds the index by inserting
 794        each tuple to the tree using the <code class="function">penalty</code> and
 795        <code class="function">picksplit</code> functions, which is much slower.
 796       </p><p>
 797        The <acronym class="acronym">SQL</acronym> declaration of the function must look like
 798        this:
 799
 800 </p><pre class="programlisting">
 801 CREATE OR REPLACE FUNCTION my_sortsupport(internal)
 802 RETURNS void
 803 AS 'MODULE_PATHNAME'
 804 LANGUAGE C STRICT;
 805 </pre><p>
 806
 807        The argument is a pointer to a <code class="structname">SortSupport</code>
 808        struct. At a minimum, the function must fill in its comparator field.
 809        The comparator takes three arguments: two Datums to compare, and
 810        a pointer to the <code class="structname">SortSupport</code> struct. The
 811        Datums are the two indexed values in the format that they are stored
 812        in the index; that is, in the format returned by the
 813        <code class="function">compress</code> method. The full API is defined in
 814        <code class="filename">src/include/utils/sortsupport.h</code>.
 815        </p><p>
 816         The matching code in the C module could then follow this skeleton:
 817
 818 </p><pre class="programlisting">
 819 PG_FUNCTION_INFO_V1(my_sortsupport);
 820
 821 static int
 822 my_fastcmp(Datum x, Datum y, SortSupport ssup)
 823 {
 824   /* establish order between x and y by computing some sorting value z */
 825
 826   int z1 = ComputeSpatialCode(x);
 827   int z2 = ComputeSpatialCode(y);
 828
 829   return z1 == z2 ? 0 : z1 &gt; z2 ? 1 : -1;
 830 }
 831
 832 Datum
 833 my_sortsupport(PG_FUNCTION_ARGS)
 834 {
 835   SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
 836
 837   ssup-&gt;comparator = my_fastcmp;
 838   PG_RETURN_VOID();
 839 }
 840 </pre><p>
 841       </p></dd><dt><span class="term"><code class="function">translate_cmptype</code></span></dt><dd><p>
 842        Given a <code class="literal">CompareType</code> value from
 843        <code class="filename">src/include/nodes/primnodes.h</code>, returns a strategy
 844        number used by this operator class for matching functionality.  The
 845        function should return <code class="literal">InvalidStrategy</code> if the
 846        operator class has no matching strategy.
 847       </p><p>
 848        This is used for temporal index constraints (i.e., <code class="literal">PRIMARY
 849        KEY</code> and <code class="literal">UNIQUE</code>).  If the operator class
 850        provides this function and it returns results for
 851        <code class="literal">COMPARE_EQ</code>, it can be used in the
 852        non-<code class="literal">WITHOUT OVERLAPS</code> part(s) of an index constraint.
 853       </p><p>
 854        This support function corresponds to the index access method callback
 855        function <code class="structfield">amtranslatecmptype</code> (see <a class="xref" href="index-functions.html" title="63.2. Index Access Method Functions">Section 63.2</a>).  The
 856        <code class="structfield">amtranslatecmptype</code> callback function for
 857        GiST indexes merely calls down to the
 858        <code class="function">translate_cmptype</code> support function of the
 859        respective operator family, since the GiST index access method has no
 860        fixed strategy numbers itself.
 861       </p><p>
 862        The <acronym class="acronym">SQL</acronym> declaration of the function must look like
 863        this:
 864
 865 </p><pre class="programlisting">
 866 CREATE OR REPLACE FUNCTION my_translate_cmptype(integer)
 867 RETURNS smallint
 868 AS 'MODULE_PATHNAME'
 869 LANGUAGE C STRICT;
 870 </pre><p>
 871
 872        And the operator family registration must look like this:
 873 </p><pre class="programlisting">
 874 ALTER OPERATOR FAMILY my_opfamily USING gist ADD
 875     FUNCTION 12 ("any", "any") my_translate_cmptype(int);
 876 </pre><p>
 877       </p><p>
 878         The matching code in the C module could then follow this skeleton:
 879
 880 </p><pre class="programlisting">
 881 PG_FUNCTION_INFO_V1(my_translate_cmptype);
 882
 883 Datum
 884 my_translate_cmptype(PG_FUNCTION_ARGS)
 885 {
 886     CompareType cmptype = PG_GETARG_INT32(0);
 887     StrategyNumber ret = InvalidStrategy;
 888
 889     switch (cmptype)
 890     {
 891         case COMPARE_EQ:
 892             ret = BTEqualStrategyNumber;
 893     }
 894
 895     PG_RETURN_UINT16(ret);
 896 }
 897 </pre><p>
 898       </p><p>
 899        One translation function is provided by
 900        <span class="productname">PostgreSQL</span>:
 901        <code class="literal">gist_translate_cmptype_common</code> is for operator classes that
 902        use the <code class="literal">RT*StrategyNumber</code> constants.
 903        The <code class="literal">btree_gist</code>
 904        extension defines a second translation function,
 905        <code class="literal">gist_translate_cmptype_btree</code>, for operator classes that use
 906        the <code class="literal">BT*StrategyNumber</code> constants.
 907       </p></dd></dl></div><p>
 908    All the GiST support methods are normally called in short-lived memory
 909    contexts; that is, <code class="varname">CurrentMemoryContext</code> will get reset after
 910    each tuple is processed.  It is therefore not very important to worry about
 911    pfree'ing everything you palloc.  However, in some cases it's useful for a
 912    support method to cache data across repeated calls.  To do that, allocate
 913    the longer-lived data in <code class="literal">fcinfo-&gt;flinfo-&gt;fn_mcxt</code>, and
 914    keep a pointer to it in <code class="literal">fcinfo-&gt;flinfo-&gt;fn_extra</code>.  Such
 915    data will survive for the life of the index operation (e.g., a single GiST
 916    index scan, index build, or index tuple insertion).  Be careful to pfree
 917    the previous value when replacing a <code class="literal">fn_extra</code> value, or the leak
 918    will accumulate for the duration of the operation.
 919   </p></div><div class="sect2" id="GIST-IMPLEMENTATION"><div class="titlepage"><div><div><h3 class="title">65.2.4. Implementation <a href="#GIST-IMPLEMENTATION" class="id_link">#</a></h3></div></div></div><div class="sect3" id="GIST-BUFFERING-BUILD"><div class="titlepage"><div><div><h4 class="title">65.2.4.1. GiST Index Build Methods <a href="#GIST-BUFFERING-BUILD" class="id_link">#</a></h4></div></div></div><p>
 920    The simplest way to build a GiST index is just to insert all the entries,
 921    one by one.  This tends to be slow for large indexes, because if the
 922    index tuples are scattered across the index and the index is large enough
 923    to not fit in cache, a lot of random I/O will be
 924    needed.  <span class="productname">PostgreSQL</span> supports two alternative
 925    methods for initial build of a GiST index: <em class="firstterm">sorted</em>
 926    and <em class="firstterm">buffered</em> modes.
 927   </p><p>
 928    The sorted method is only available if each of the opclasses used by the
 929    index provides a <code class="function">sortsupport</code> function, as described
 930    in <a class="xref" href="gist.html#GIST-EXTENSIBILITY" title="65.2.3. Extensibility">Section 65.2.3</a>.  If they do, this method is
 931    usually the best, so it is used by default.
 932   </p><p>
 933    The buffered method works by not inserting tuples directly into the index
 934    right away.  It can dramatically reduce the amount of random I/O needed
 935    for non-ordered data sets.  For well-ordered data sets the benefit is
 936    smaller or non-existent, because only a small number of pages receive new
 937    tuples at a time, and those pages fit in cache even if the index as a
 938    whole does not.
 939   </p><p>
 940    The buffered method needs to call the <code class="function">penalty</code>
 941    function more often than the simple method does, which consumes some
 942    extra CPU resources. Also, the buffers need temporary disk space, up to
 943    the size of the resulting index. Buffering can also influence the quality
 944    of the resulting index, in both positive and negative directions. That
 945    influence depends on various factors, like the distribution of the input
 946    data and the operator class implementation.
 947   </p><p>
 948    If sorting is not possible, then by default a GiST index build switches
 949    to the buffering method when the index size reaches
 950    <a class="xref" href="runtime-config-query.html#GUC-EFFECTIVE-CACHE-SIZE">effective_cache_size</a>.  Buffering can be manually
 951    forced or prevented by the <code class="literal">buffering</code> parameter to the
 952    CREATE INDEX command.  The default behavior is good for most cases, but
 953    turning buffering off might speed up the build somewhat if the input data
 954    is ordered.
 955   </p></div></div><div class="sect2" id="GIST-EXAMPLES"><div class="titlepage"><div><div><h3 class="title">65.2.5. Examples <a href="#GIST-EXAMPLES" class="id_link">#</a></h3></div></div></div><p>
 956   The <span class="productname">PostgreSQL</span> source distribution includes
 957   several examples of index methods implemented using
 958   <acronym class="acronym">GiST</acronym>.  The core system currently provides text search
 959   support (indexing for <code class="type">tsvector</code> and <code class="type">tsquery</code>) as well as
 960   R-Tree equivalent functionality for some of the built-in geometric data types
 961   (see <code class="filename">src/backend/access/gist/gistproc.c</code>).  The following
 962   <code class="filename">contrib</code> modules also contain <acronym class="acronym">GiST</acronym>
 963   operator classes:
 964
 965  </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><code class="filename">btree_gist</code></span></dt><dd><p>B-tree equivalent functionality for several data types</p></dd><dt><span class="term"><code class="filename">cube</code></span></dt><dd><p>Indexing for multidimensional cubes</p></dd><dt><span class="term"><code class="filename">hstore</code></span></dt><dd><p>Module for storing (key, value) pairs</p></dd><dt><span class="term"><code class="filename">intarray</code></span></dt><dd><p>RD-Tree for one-dimensional array of int4 values</p></dd><dt><span class="term"><code class="filename">ltree</code></span></dt><dd><p>Indexing for tree-like structures</p></dd><dt><span class="term"><code class="filename">pg_trgm</code></span></dt><dd><p>Text similarity using trigram matching</p></dd><dt><span class="term"><code class="filename">seg</code></span></dt><dd><p>Indexing for <span class="quote">“<span class="quote">float ranges</span>”</span></p></dd></dl></div><p>
 966  </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="btree.html" title="65.1. B-Tree Indexes">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="indextypes.html" title="Chapter 65. Built-in Index Access Methods">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="spgist.html" title="65.3. SP-GiST Indexes">Next</a></td></tr><tr><td width="40%" align="left" valign="top">65.1. B-Tree Indexes </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 65.3. SP-GiST Indexes</td></tr></table></div></body></html>