begriffs open source - ai-pg/blob - full-docs/txt/index-cost-estimation.txt

   1
   2 63.6. Index Cost Estimation Functions #
   3
   4    The amcostestimate function is given information describing a possible
   5    index scan, including lists of WHERE and ORDER BY clauses that have
   6    been determined to be usable with the index. It must return estimates
   7    of the cost of accessing the index and the selectivity of the WHERE
   8    clauses (that is, the fraction of parent-table rows that will be
   9    retrieved during the index scan). For simple cases, nearly all the work
  10    of the cost estimator can be done by calling standard routines in the
  11    optimizer; the point of having an amcostestimate function is to allow
  12    index access methods to provide index-type-specific knowledge, in case
  13    it is possible to improve on the standard estimates.
  14
  15    Each amcostestimate function must have the signature:
  16 void
  17 amcostestimate (PlannerInfo *root,
  18                 IndexPath *path,
  19                 double loop_count,
  20                 Cost *indexStartupCost,
  21                 Cost *indexTotalCost,
  22                 Selectivity *indexSelectivity,
  23                 double *indexCorrelation,
  24                 double *indexPages);
  25
  26    The first three parameters are inputs:
  27
  28    root
  29           The planner's information about the query being processed.
  30
  31    path
  32           The index access path being considered. All fields except cost
  33           and selectivity values are valid.
  34
  35    loop_count
  36           The number of repetitions of the index scan that should be
  37           factored into the cost estimates. This will typically be greater
  38           than one when considering a parameterized scan for use in the
  39           inside of a nestloop join. Note that the cost estimates should
  40           still be for just one scan; a larger loop_count means that it
  41           may be appropriate to allow for some caching effects across
  42           multiple scans.
  43
  44    The last five parameters are pass-by-reference outputs:
  45
  46    *indexStartupCost
  47           Set to cost of index start-up processing
  48
  49    *indexTotalCost
  50           Set to total cost of index processing
  51
  52    *indexSelectivity
  53           Set to index selectivity
  54
  55    *indexCorrelation
  56           Set to correlation coefficient between index scan order and
  57           underlying table's order
  58
  59    *indexPages
  60           Set to number of index leaf pages
  61
  62    Note that cost estimate functions must be written in C, not in SQL or
  63    any available procedural language, because they must access internal
  64    data structures of the planner/optimizer.
  65
  66    The index access costs should be computed using the parameters used by
  67    src/backend/optimizer/path/costsize.c: a sequential disk block fetch
  68    has cost seq_page_cost, a nonsequential fetch has cost
  69    random_page_cost, and the cost of processing one index row should
  70    usually be taken as cpu_index_tuple_cost. In addition, an appropriate
  71    multiple of cpu_operator_cost should be charged for any comparison
  72    operators invoked during index processing (especially evaluation of the
  73    indexquals themselves).
  74
  75    The access costs should include all disk and CPU costs associated with
  76    scanning the index itself, but not the costs of retrieving or
  77    processing the parent-table rows that are identified by the index.
  78
  79    The “start-up cost” is the part of the total scan cost that must be
  80    expended before we can begin to fetch the first row. For most indexes
  81    this can be taken as zero, but an index type with a high start-up cost
  82    might want to set it nonzero.
  83
  84    The indexSelectivity should be set to the estimated fraction of the
  85    parent table rows that will be retrieved during the index scan. In the
  86    case of a lossy query, this will typically be higher than the fraction
  87    of rows that actually pass the given qual conditions.
  88
  89    The indexCorrelation should be set to the correlation (ranging between
  90    -1.0 and 1.0) between the index order and the table order. This is used
  91    to adjust the estimate for the cost of fetching rows from the parent
  92    table.
  93
  94    The indexPages should be set to the number of leaf pages. This is used
  95    to estimate the number of workers for parallel index scan.
  96
  97    When loop_count is greater than one, the returned numbers should be
  98    averages expected for any one scan of the index.
  99
 100    Cost Estimation
 101
 102    A typical cost estimator will proceed as follows:
 103     1. Estimate and return the fraction of parent-table rows that will be
 104        visited based on the given qual conditions. In the absence of any
 105        index-type-specific knowledge, use the standard optimizer function
 106        clauselist_selectivity():
 107 *indexSelectivity = clauselist_selectivity(root, path->indexquals,
 108                                            path->indexinfo->rel->relid,
 109                                            JOIN_INNER, NULL);
 110
 111     2. Estimate the number of index rows that will be visited during the
 112        scan. For many index types this is the same as indexSelectivity
 113        times the number of rows in the index, but it might be more. (Note
 114        that the index's size in pages and rows is available from the
 115        path->indexinfo struct.)
 116     3. Estimate the number of index pages that will be retrieved during
 117        the scan. This might be just indexSelectivity times the index's
 118        size in pages.
 119     4. Compute the index access cost. A generic estimator might do this:
 120 /*
 121  * Our generic assumption is that the index pages will be read
 122  * sequentially, so they cost seq_page_cost each, not random_page_cost.
 123  * Also, we charge for evaluation of the indexquals at each index row.
 124  * All the costs are assumed to be paid incrementally during the scan.
 125  */
 126 cost_qual_eval(&index_qual_cost, path->indexquals, root);
 127 *indexStartupCost = index_qual_cost.startup;
 128 *indexTotalCost = seq_page_cost * numIndexPages +
 129     (cpu_index_tuple_cost + index_qual_cost.per_tuple) * numIndexTuples;
 130
 131        However, the above does not account for amortization of index reads
 132        across repeated index scans.
 133     5. Estimate the index correlation. For a simple ordered index on a
 134        single field, this can be retrieved from pg_statistic. If the
 135        correlation is not known, the conservative estimate is zero (no
 136        correlation).
 137
 138    Examples of cost estimator functions can be found in
 139    src/backend/utils/adt/selfuncs.c.