begriffs open source - ai-pg/blob - full-docs/txt/planner-optimizer.txt

   1
   2 51.5. Planner/Optimizer #
   3
   4    51.5.1. Generating Possible Plans
   5
   6    The task of the planner/optimizer is to create an optimal execution
   7    plan. A given SQL query (and hence, a query tree) can be actually
   8    executed in a wide variety of different ways, each of which will
   9    produce the same set of results. If it is computationally feasible, the
  10    query optimizer will examine each of these possible execution plans,
  11    ultimately selecting the execution plan that is expected to run the
  12    fastest.
  13
  14 Note
  15
  16    In some situations, examining each possible way in which a query can be
  17    executed would take an excessive amount of time and memory. In
  18    particular, this occurs when executing queries involving large numbers
  19    of join operations. In order to determine a reasonable (not necessarily
  20    optimal) query plan in a reasonable amount of time, PostgreSQL uses a
  21    Genetic Query Optimizer (see Chapter 61) when the number of joins
  22    exceeds a threshold (see geqo_threshold).
  23
  24    The planner's search procedure actually works with data structures
  25    called paths, which are simply cut-down representations of plans
  26    containing only as much information as the planner needs to make its
  27    decisions. After the cheapest path is determined, a full-fledged plan
  28    tree is built to pass to the executor. This represents the desired
  29    execution plan in sufficient detail for the executor to run it. In the
  30    rest of this section we'll ignore the distinction between paths and
  31    plans.
  32
  33 51.5.1. Generating Possible Plans #
  34
  35    The planner/optimizer starts by generating plans for scanning each
  36    individual relation (table) used in the query. The possible plans are
  37    determined by the available indexes on each relation. There is always
  38    the possibility of performing a sequential scan on a relation, so a
  39    sequential scan plan is always created. Assume an index is defined on a
  40    relation (for example a B-tree index) and a query contains the
  41    restriction relation.attribute OPR constant. If relation.attribute
  42    happens to match the key of the B-tree index and OPR is one of the
  43    operators listed in the index's operator class, another plan is created
  44    using the B-tree index to scan the relation. If there are further
  45    indexes present and the restrictions in the query happen to match a key
  46    of an index, further plans will be considered. Index scan plans are
  47    also generated for indexes that have a sort ordering that can match the
  48    query's ORDER BY clause (if any), or a sort ordering that might be
  49    useful for merge joining (see below).
  50
  51    If the query requires joining two or more relations, plans for joining
  52    relations are considered after all feasible plans have been found for
  53    scanning single relations. The three available join strategies are:
  54      * nested loop join: The right relation is scanned once for every row
  55        found in the left relation. This strategy is easy to implement but
  56        can be very time consuming. (However, if the right relation can be
  57        scanned with an index scan, this can be a good strategy. It is
  58        possible to use values from the current row of the left relation as
  59        keys for the index scan of the right.)
  60      * merge join: Each relation is sorted on the join attributes before
  61        the join starts. Then the two relations are scanned in parallel,
  62        and matching rows are combined to form join rows. This kind of join
  63        is attractive because each relation has to be scanned only once.
  64        The required sorting might be achieved either by an explicit sort
  65        step, or by scanning the relation in the proper order using an
  66        index on the join key.
  67      * hash join: the right relation is first scanned and loaded into a
  68        hash table, using its join attributes as hash keys. Next the left
  69        relation is scanned and the appropriate values of every row found
  70        are used as hash keys to locate the matching rows in the table.
  71
  72    When the query involves more than two relations, the final result must
  73    be built up by a tree of join steps, each with two inputs. The planner
  74    examines different possible join sequences to find the cheapest one.
  75
  76    If the query uses fewer than geqo_threshold relations, a
  77    near-exhaustive search is conducted to find the best join sequence. The
  78    planner preferentially considers joins between any two relations for
  79    which there exists a corresponding join clause in the WHERE
  80    qualification (i.e., for which a restriction like where
  81    rel1.attr1=rel2.attr2 exists). Join pairs with no join clause are
  82    considered only when there is no other choice, that is, a particular
  83    relation has no available join clauses to any other relation. All
  84    possible plans are generated for every join pair considered by the
  85    planner, and the one that is (estimated to be) the cheapest is chosen.
  86
  87    When geqo_threshold is exceeded, the join sequences considered are
  88    determined by heuristics, as described in Chapter 61. Otherwise the
  89    process is the same.
  90
  91    The finished plan tree consists of sequential or index scans of the
  92    base relations, plus nested-loop, merge, or hash join nodes as needed,
  93    plus any auxiliary steps needed, such as sort nodes or
  94    aggregate-function calculation nodes. Most of these plan node types
  95    have the additional ability to do selection (discarding rows that do
  96    not meet a specified Boolean condition) and projection (computation of
  97    a derived column set based on given column values, that is, evaluation
  98    of scalar expressions where needed). One of the responsibilities of the
  99    planner is to attach selection conditions from the WHERE clause and
 100    computation of required output expressions to the most appropriate
 101    nodes of the plan tree.