begriffs open source - ai-pg/blob - full-docs/src/sgml/html/index-cost-estimation.html

   1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
   2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>63.6. Index Cost Estimation Functions</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="index-unique-checks.html" title="63.5. Index Uniqueness Checks" /><link rel="next" href="wal-for-extensions.html" title="Chapter 64. Write Ahead Logging for Extensions" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">63.6. Index Cost Estimation Functions</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="index-unique-checks.html" title="63.5. Index Uniqueness Checks">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="indexam.html" title="Chapter 63. Index Access Method Interface Definition">Up</a></td><th width="60%" align="center">Chapter 63. Index Access Method Interface Definition</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="wal-for-extensions.html" title="Chapter 64. Write Ahead Logging for Extensions">Next</a></td></tr></table><hr /></div><div class="sect1" id="INDEX-COST-ESTIMATION"><div class="titlepage"><div><div><h2 class="title" style="clear: both">63.6. Index Cost Estimation Functions <a href="#INDEX-COST-ESTIMATION" class="id_link">#</a></h2></div></div></div><p>
   3    The <code class="function">amcostestimate</code> function is given information describing
   4    a possible index scan, including lists of WHERE and ORDER BY clauses that
   5    have been determined to be usable with the index.  It must return estimates
   6    of the cost of accessing the index and the selectivity of the WHERE
   7    clauses (that is, the fraction of parent-table rows that will be
   8    retrieved during the index scan).  For simple cases, nearly all the
   9    work of the cost estimator can be done by calling standard routines
  10    in the optimizer; the point of having an <code class="function">amcostestimate</code> function is
  11    to allow index access methods to provide index-type-specific knowledge,
  12    in case it is possible to improve on the standard estimates.
  13   </p><p>
  14    Each <code class="function">amcostestimate</code> function must have the signature:
  15
  16 </p><pre class="programlisting">
  17 void
  18 amcostestimate (PlannerInfo *root,
  19                 IndexPath *path,
  20                 double loop_count,
  21                 Cost *indexStartupCost,
  22                 Cost *indexTotalCost,
  23                 Selectivity *indexSelectivity,
  24                 double *indexCorrelation,
  25                 double *indexPages);
  26 </pre><p>
  27
  28    The first three parameters are inputs:
  29
  30    </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><em class="parameter"><code>root</code></em></span></dt><dd><p>
  31        The planner's information about the query being processed.
  32       </p></dd><dt><span class="term"><em class="parameter"><code>path</code></em></span></dt><dd><p>
  33        The index access path being considered.  All fields except cost and
  34        selectivity values are valid.
  35       </p></dd><dt><span class="term"><em class="parameter"><code>loop_count</code></em></span></dt><dd><p>
  36        The number of repetitions of the index scan that should be factored
  37        into the cost estimates.  This will typically be greater than one when
  38        considering a parameterized scan for use in the inside of a nestloop
  39        join.  Note that the cost estimates should still be for just one scan;
  40        a larger <em class="parameter"><code>loop_count</code></em> means that it may be appropriate
  41        to allow for some caching effects across multiple scans.
  42       </p></dd></dl></div><p>
  43   </p><p>
  44    The last five parameters are pass-by-reference outputs:
  45
  46    </p><div class="variablelist"><dl class="variablelist"><dt><span class="term"><em class="parameter"><code>*indexStartupCost</code></em></span></dt><dd><p>
  47        Set to cost of index start-up processing
  48       </p></dd><dt><span class="term"><em class="parameter"><code>*indexTotalCost</code></em></span></dt><dd><p>
  49        Set to total cost of index processing
  50       </p></dd><dt><span class="term"><em class="parameter"><code>*indexSelectivity</code></em></span></dt><dd><p>
  51        Set to index selectivity
  52       </p></dd><dt><span class="term"><em class="parameter"><code>*indexCorrelation</code></em></span></dt><dd><p>
  53        Set to correlation coefficient between index scan order and
  54        underlying table's order
  55       </p></dd><dt><span class="term"><em class="parameter"><code>*indexPages</code></em></span></dt><dd><p>
  56        Set to number of index leaf pages
  57       </p></dd></dl></div><p>
  58   </p><p>
  59    Note that cost estimate functions must be written in C, not in SQL or
  60    any available procedural language, because they must access internal
  61    data structures of the planner/optimizer.
  62   </p><p>
  63    The index access costs should be computed using the parameters used by
  64    <code class="filename">src/backend/optimizer/path/costsize.c</code>: a sequential
  65    disk block fetch has cost <code class="varname">seq_page_cost</code>, a nonsequential fetch
  66    has cost <code class="varname">random_page_cost</code>, and the cost of processing one index
  67    row should usually be taken as <code class="varname">cpu_index_tuple_cost</code>.  In
  68    addition, an appropriate multiple of <code class="varname">cpu_operator_cost</code> should
  69    be charged for any comparison operators invoked during index processing
  70    (especially evaluation of the indexquals themselves).
  71   </p><p>
  72    The access costs should include all disk and CPU costs associated with
  73    scanning the index itself, but <span class="emphasis"><em>not</em></span> the costs of retrieving or
  74    processing the parent-table rows that are identified by the index.
  75   </p><p>
  76    The <span class="quote">“<span class="quote">start-up cost</span>”</span> is the part of the total scan cost that
  77    must be expended before we can begin to fetch the first row.  For most
  78    indexes this can be taken as zero, but an index type with a high start-up
  79    cost might want to set it nonzero.
  80   </p><p>
  81    The <em class="parameter"><code>indexSelectivity</code></em> should be set to the estimated fraction of the parent
  82    table rows that will be retrieved during the index scan.  In the case
  83    of a lossy query, this will typically be higher than the fraction of
  84    rows that actually pass the given qual conditions.
  85   </p><p>
  86    The <em class="parameter"><code>indexCorrelation</code></em> should be set to the correlation (ranging between
  87    -1.0 and 1.0) between the index order and the table order.  This is used
  88    to adjust the estimate for the cost of fetching rows from the parent
  89    table.
  90   </p><p>
  91    The <em class="parameter"><code>indexPages</code></em> should be set to the number of leaf pages.
  92    This is used to estimate the number of workers for parallel index scan.
  93   </p><p>
  94    When <em class="parameter"><code>loop_count</code></em> is greater than one, the returned numbers
  95    should be averages expected for any one scan of the index.
  96   </p><div class="procedure" id="id-1.10.15.12.13"><p class="title"><strong>Cost Estimation</strong></p><p>
  97     A typical cost estimator will proceed as follows:
  98    </p><ol class="procedure" type="1"><li class="step"><p>
  99      Estimate and return the fraction of parent-table rows that will be visited
 100      based on the given qual conditions.  In the absence of any index-type-specific
 101      knowledge, use the standard optimizer function <code class="function">clauselist_selectivity()</code>:
 102
 103 </p><pre class="programlisting">
 104 *indexSelectivity = clauselist_selectivity(root, path-&gt;indexquals,
 105                                            path-&gt;indexinfo-&gt;rel-&gt;relid,
 106                                            JOIN_INNER, NULL);
 107 </pre><p>
 108     </p></li><li class="step"><p>
 109      Estimate the number of index rows that will be visited during the
 110      scan.  For many index types this is the same as <em class="parameter"><code>indexSelectivity</code></em> times
 111      the number of rows in the index, but it might be more.  (Note that the
 112      index's size in pages and rows is available from the
 113      <code class="literal">path-&gt;indexinfo</code> struct.)
 114     </p></li><li class="step"><p>
 115      Estimate the number of index pages that will be retrieved during the scan.
 116      This might be just <em class="parameter"><code>indexSelectivity</code></em> times the index's size in pages.
 117     </p></li><li class="step"><p>
 118      Compute the index access cost.  A generic estimator might do this:
 119
 120 </p><pre class="programlisting">
 121 /*
 122  * Our generic assumption is that the index pages will be read
 123  * sequentially, so they cost seq_page_cost each, not random_page_cost.
 124  * Also, we charge for evaluation of the indexquals at each index row.
 125  * All the costs are assumed to be paid incrementally during the scan.
 126  */
 127 cost_qual_eval(&amp;index_qual_cost, path-&gt;indexquals, root);
 128 *indexStartupCost = index_qual_cost.startup;
 129 *indexTotalCost = seq_page_cost * numIndexPages +
 130     (cpu_index_tuple_cost + index_qual_cost.per_tuple) * numIndexTuples;
 131 </pre><p>
 132
 133      However, the above does not account for amortization of index reads
 134      across repeated index scans.
 135     </p></li><li class="step"><p>
 136      Estimate the index correlation.  For a simple ordered index on a single
 137      field, this can be retrieved from pg_statistic.  If the correlation
 138      is not known, the conservative estimate is zero (no correlation).
 139     </p></li></ol></div><p>
 140    Examples of cost estimator functions can be found in
 141    <code class="filename">src/backend/utils/adt/selfuncs.c</code>.
 142   </p></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="index-unique-checks.html" title="63.5. Index Uniqueness Checks">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="indexam.html" title="Chapter 63. Index Access Method Interface Definition">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="wal-for-extensions.html" title="Chapter 64. Write Ahead Logging for Extensions">Next</a></td></tr><tr><td width="40%" align="left" valign="top">63.5. Index Uniqueness Checks </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="40%" align="right" valign="top"> Chapter 64. Write Ahead Logging for Extensions</td></tr></table></div></body></html>