begriffs open source - ai-pg/blob - full-docs/src/sgml/html/indexes-index-only-scans.html

   1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
   2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>11.9. Index-Only Scans and Covering Indexes</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="indexes-partial.html" title="11.8. Partial Indexes" /><link rel="next" href="indexes-opclass.html" title="11.10. Operator Classes and Operator Families" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">11.9. Index-Only Scans and Covering Indexes</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="indexes-partial.html" title="11.8. Partial Indexes">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="indexes.html" title="Chapter 11. Indexes">Up</a></td><th width="60%" align="center">Chapter 11. Indexes</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="indexes-opclass.html" title="11.10. Operator Classes and Operator Families">Next</a></td></tr></table><hr /></div><div class="sect1" id="INDEXES-INDEX-ONLY-SCANS"><div class="titlepage"><div><div><h2 class="title" style="clear: both">11.9. Index-Only Scans and Covering Indexes <a href="#INDEXES-INDEX-ONLY-SCANS" class="id_link">#</a></h2></div></div></div><a id="id-1.5.10.12.2" class="indexterm"></a><a id="id-1.5.10.12.3" class="indexterm"></a><a id="id-1.5.10.12.4" class="indexterm"></a><a id="id-1.5.10.12.5" class="indexterm"></a><p>
   3    All indexes in <span class="productname">PostgreSQL</span>
   4    are <em class="firstterm">secondary</em> indexes, meaning that each index is
   5    stored separately from the table's main data area (which is called the
   6    table's <em class="firstterm">heap</em>
   7    in <span class="productname">PostgreSQL</span> terminology).  This means that
   8    in an ordinary index scan, each row retrieval requires fetching data from
   9    both the index and the heap.  Furthermore, while the index entries that
  10    match a given indexable <code class="literal">WHERE</code> condition are usually
  11    close together in the index, the table rows they reference might be
  12    anywhere in the heap.  The heap-access portion of an index scan thus
  13    involves a lot of random access into the heap, which can be slow,
  14    particularly on traditional rotating media.  (As described in
  15    <a class="xref" href="indexes-bitmap-scans.html" title="11.5. Combining Multiple Indexes">Section 11.5</a>, bitmap scans try to alleviate
  16    this cost by doing the heap accesses in sorted order, but that only goes
  17    so far.)
  18   </p><p>
  19    To solve this performance problem, <span class="productname">PostgreSQL</span>
  20    supports <em class="firstterm">index-only scans</em>, which can answer
  21    queries from an index alone without any heap access.  The basic idea is
  22    to return values directly out of each index entry instead of consulting
  23    the associated heap entry.  There are two fundamental restrictions on
  24    when this method can be used:
  25
  26    </p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>
  27       The index type must support index-only scans.  B-tree indexes always
  28       do.  GiST and SP-GiST indexes support index-only scans for some
  29       operator classes but not others.  Other index types have no support.
  30       The underlying requirement is that the index must physically store, or
  31       else be able to reconstruct, the original data value for each index
  32       entry.  As a counterexample, GIN indexes cannot support index-only
  33       scans because each index entry typically holds only part of the
  34       original data value.
  35      </p></li><li class="listitem"><p>
  36       The query must reference only columns stored in the index.  For
  37       example, given an index on columns <code class="literal">x</code>
  38       and <code class="literal">y</code> of a table that also has a
  39       column <code class="literal">z</code>, these queries could use index-only scans:
  40 </p><pre class="programlisting">
  41 SELECT x, y FROM tab WHERE x = 'key';
  42 SELECT x FROM tab WHERE x = 'key' AND y &lt; 42;
  43 </pre><p>
  44       but these queries could not:
  45 </p><pre class="programlisting">
  46 SELECT x, z FROM tab WHERE x = 'key';
  47 SELECT x FROM tab WHERE x = 'key' AND z &lt; 42;
  48 </pre><p>
  49       (Expression indexes and partial indexes complicate this rule,
  50       as discussed below.)
  51      </p></li></ol></div><p>
  52   </p><p>
  53    If these two fundamental requirements are met, then all the data values
  54    required by the query are available from the index, so an index-only scan
  55    is physically possible.  But there is an additional requirement for any
  56    table scan in <span class="productname">PostgreSQL</span>: it must verify that
  57    each retrieved row be <span class="quote">“<span class="quote">visible</span>”</span> to the query's MVCC
  58    snapshot, as discussed in <a class="xref" href="mvcc.html" title="Chapter 13. Concurrency Control">Chapter 13</a>.  Visibility information
  59    is not stored in index entries, only in heap entries; so at first glance
  60    it would seem that every row retrieval would require a heap access
  61    anyway.  And this is indeed the case, if the table row has been modified
  62    recently.  However, for seldom-changing data there is a way around this
  63    problem.  <span class="productname">PostgreSQL</span> tracks, for each page in
  64    a table's heap, whether all rows stored in that page are old enough to be
  65    visible to all current and future transactions.  This information is
  66    stored in a bit in the table's <em class="firstterm">visibility map</em>.  An
  67    index-only scan, after finding a candidate index entry, checks the
  68    visibility map bit for the corresponding heap page.  If it's set, the row
  69    is known visible and so the data can be returned with no further work.
  70    If it's not set, the heap entry must be visited to find out whether it's
  71    visible, so no performance advantage is gained over a standard index
  72    scan.  Even in the successful case, this approach trades visibility map
  73    accesses for heap accesses; but since the visibility map is four orders
  74    of magnitude smaller than the heap it describes, far less physical I/O is
  75    needed to access it.  In most situations the visibility map remains
  76    cached in memory all the time.
  77   </p><p>
  78    In short, while an index-only scan is possible given the two fundamental
  79    requirements, it will be a win only if a significant fraction of the
  80    table's heap pages have their all-visible map bits set.  But tables in
  81    which a large fraction of the rows are unchanging are common enough to
  82    make this type of scan very useful in practice.
  83   </p><p>
  84    <a id="id-1.5.10.12.10.1" class="indexterm"></a>
  85    To make effective use of the index-only scan feature, you might choose to
  86    create a <em class="firstterm">covering index</em>, which is an index
  87    specifically designed to include the columns needed by a particular
  88    type of query that you run frequently.  Since queries typically need to
  89    retrieve more columns than just the ones they search
  90    on, <span class="productname">PostgreSQL</span> allows you to create an index
  91    in which some columns are just <span class="quote">“<span class="quote">payload</span>”</span> and are not part
  92    of the search key.  This is done by adding an <code class="literal">INCLUDE</code>
  93    clause listing the extra columns.  For example, if you commonly run
  94    queries like
  95 </p><pre class="programlisting">
  96 SELECT y FROM tab WHERE x = 'key';
  97 </pre><p>
  98    the traditional approach to speeding up such queries would be to create
  99    an index on <code class="literal">x</code> only.  However, an index defined as
 100 </p><pre class="programlisting">
 101 CREATE INDEX tab_x_y ON tab(x) INCLUDE (y);
 102 </pre><p>
 103    could handle these queries as index-only scans,
 104    because <code class="literal">y</code> can be obtained from the index without
 105    visiting the heap.
 106   </p><p>
 107    Because column <code class="literal">y</code> is not part of the index's search
 108    key, it does not have to be of a data type that the index can handle;
 109    it's merely stored in the index and is not interpreted by the index
 110    machinery.  Also, if the index is a unique index, that is
 111 </p><pre class="programlisting">
 112 CREATE UNIQUE INDEX tab_x_y ON tab(x) INCLUDE (y);
 113 </pre><p>
 114    the uniqueness condition applies to just column <code class="literal">x</code>,
 115    not to the combination of <code class="literal">x</code> and <code class="literal">y</code>.
 116    (An <code class="literal">INCLUDE</code> clause can also be written
 117    in <code class="literal">UNIQUE</code> and <code class="literal">PRIMARY KEY</code>
 118    constraints, providing alternative syntax for setting up an index like
 119    this.)
 120   </p><p>
 121    It's wise to be conservative about adding non-key payload columns to an
 122    index, especially wide columns.  If an index tuple exceeds the
 123    maximum size allowed for the index type, data insertion will fail.
 124    In any case, non-key columns duplicate data from the index's table
 125    and bloat the size of the index, thus potentially slowing searches.
 126    And remember that there is little point in including payload columns in an
 127    index unless the table changes slowly enough that an index-only scan is
 128    likely to not need to access the heap.  If the heap tuple must be visited
 129    anyway, it costs nothing more to get the column's value from there.
 130    Other restrictions are that expressions are not currently supported as
 131    included columns, and that only B-tree, GiST and SP-GiST indexes currently
 132    support included columns.
 133   </p><p>
 134    Before <span class="productname">PostgreSQL</span> had
 135    the <code class="literal">INCLUDE</code> feature, people sometimes made covering
 136    indexes by writing the payload columns as ordinary index columns,
 137    that is writing
 138 </p><pre class="programlisting">
 139 CREATE INDEX tab_x_y ON tab(x, y);
 140 </pre><p>
 141    even though they had no intention of ever using <code class="literal">y</code> as
 142    part of a <code class="literal">WHERE</code> clause.  This works fine as long as
 143    the extra columns are trailing columns; making them be leading columns is
 144    unwise for the reasons explained in <a class="xref" href="indexes-multicolumn.html" title="11.3. Multicolumn Indexes">Section 11.3</a>.
 145    However, this method doesn't support the case where you want the index to
 146    enforce uniqueness on the key column(s).
 147   </p><p>
 148    <em class="firstterm">Suffix truncation</em> always removes non-key
 149    columns from upper B-Tree levels.  As payload columns, they are
 150    never used to guide index scans.  The truncation process also
 151    removes one or more trailing key column(s) when the remaining
 152    prefix of key column(s) happens to be sufficient to describe tuples
 153    on the lowest B-Tree level.  In practice, covering indexes without
 154    an <code class="literal">INCLUDE</code> clause often avoid storing columns
 155    that are effectively payload in the upper levels.  However,
 156    explicitly defining payload columns as non-key columns
 157    <span class="emphasis"><em>reliably</em></span> keeps the tuples in upper levels
 158    small.
 159   </p><p>
 160    In principle, index-only scans can be used with expression indexes.
 161    For example, given an index on <code class="literal">f(x)</code>
 162    where <code class="literal">x</code> is a table column, it should be possible to
 163    execute
 164 </p><pre class="programlisting">
 165 SELECT f(x) FROM tab WHERE f(x) &lt; 1;
 166 </pre><p>
 167    as an index-only scan; and this is very attractive
 168    if <code class="literal">f()</code> is an expensive-to-compute function.
 169    However, <span class="productname">PostgreSQL</span>'s planner is currently not
 170    very smart about such cases.  It considers a query to be potentially
 171    executable by index-only scan only when all <span class="emphasis"><em>columns</em></span>
 172    needed by the query are available from the index.  In this
 173    example, <code class="literal">x</code> is not needed except in the
 174    context <code class="literal">f(x)</code>, but the planner does not notice that and
 175    concludes that an index-only scan is not possible.  If an index-only scan
 176    seems sufficiently worthwhile, this can be worked around by
 177    adding <code class="literal">x</code> as an included column, for example
 178 </p><pre class="programlisting">
 179 CREATE INDEX tab_f_x ON tab (f(x)) INCLUDE (x);
 180 </pre><p>
 181    An additional caveat, if the goal is to avoid
 182    recalculating <code class="literal">f(x)</code>, is that the planner won't
 183    necessarily match uses of <code class="literal">f(x)</code> that aren't in
 184    indexable <code class="literal">WHERE</code> clauses to the index column.  It will
 185    usually get this right in simple queries such as shown above, but not in
 186    queries that involve joins.  These deficiencies may be remedied in future
 187    versions of <span class="productname">PostgreSQL</span>.
 188   </p><p>
 189    Partial indexes also have interesting interactions with index-only scans.
 190    Consider the partial index shown in <a class="xref" href="indexes-partial.html#INDEXES-PARTIAL-EX3" title="Example 11.3. Setting up a Partial Unique Index">Example 11.3</a>:
 191 </p><pre class="programlisting">
 192 CREATE UNIQUE INDEX tests_success_constraint ON tests (subject, target)
 193     WHERE success;
 194 </pre><p>
 195    In principle, we could do an index-only scan on this index to satisfy a
 196    query like
 197 </p><pre class="programlisting">
 198 SELECT target FROM tests WHERE subject = 'some-subject' AND success;
 199 </pre><p>
 200    But there's a problem: the <code class="literal">WHERE</code> clause refers
 201    to <code class="literal">success</code> which is not available as a result column
 202    of the index.  Nonetheless, an index-only scan is possible because the
 203    plan does not need to recheck that part of the <code class="literal">WHERE</code>
 204    clause at run time: all entries found in the index necessarily
 205    have <code class="literal">success = true</code> so this need not be explicitly
 206    checked in the plan.  <span class="productname">PostgreSQL</span> versions 9.6
 207    and later will recognize such cases and allow index-only scans to be
 208    generated, but older versions will not.
 209   </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="indexes-partial.html" title="11.8. Partial Indexes">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="indexes.html" title="Chapter 11. Indexes">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="indexes-opclass.html" title="11.10. Operator Classes and Operator Families">Next</a></td></tr><tr><td width="40%" align="left" valign="top">11.8. Partial Indexes </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 11.10. Operator Classes and Operator Families</td></tr></table></div></body></html>