2 51.5. Planner/Optimizer #
4 51.5.1. Generating Possible Plans
6 The task of the planner/optimizer is to create an optimal execution
7 plan. A given SQL query (and hence, a query tree) can be actually
8 executed in a wide variety of different ways, each of which will
9 produce the same set of results. If it is computationally feasible, the
10 query optimizer will examine each of these possible execution plans,
11 ultimately selecting the execution plan that is expected to run the
16 In some situations, examining each possible way in which a query can be
17 executed would take an excessive amount of time and memory. In
18 particular, this occurs when executing queries involving large numbers
19 of join operations. In order to determine a reasonable (not necessarily
20 optimal) query plan in a reasonable amount of time, PostgreSQL uses a
21 Genetic Query Optimizer (see Chapter 61) when the number of joins
22 exceeds a threshold (see geqo_threshold).
24 The planner's search procedure actually works with data structures
25 called paths, which are simply cut-down representations of plans
26 containing only as much information as the planner needs to make its
27 decisions. After the cheapest path is determined, a full-fledged plan
28 tree is built to pass to the executor. This represents the desired
29 execution plan in sufficient detail for the executor to run it. In the
30 rest of this section we'll ignore the distinction between paths and
33 51.5.1. Generating Possible Plans #
35 The planner/optimizer starts by generating plans for scanning each
36 individual relation (table) used in the query. The possible plans are
37 determined by the available indexes on each relation. There is always
38 the possibility of performing a sequential scan on a relation, so a
39 sequential scan plan is always created. Assume an index is defined on a
40 relation (for example a B-tree index) and a query contains the
41 restriction relation.attribute OPR constant. If relation.attribute
42 happens to match the key of the B-tree index and OPR is one of the
43 operators listed in the index's operator class, another plan is created
44 using the B-tree index to scan the relation. If there are further
45 indexes present and the restrictions in the query happen to match a key
46 of an index, further plans will be considered. Index scan plans are
47 also generated for indexes that have a sort ordering that can match the
48 query's ORDER BY clause (if any), or a sort ordering that might be
49 useful for merge joining (see below).
51 If the query requires joining two or more relations, plans for joining
52 relations are considered after all feasible plans have been found for
53 scanning single relations. The three available join strategies are:
54 * nested loop join: The right relation is scanned once for every row
55 found in the left relation. This strategy is easy to implement but
56 can be very time consuming. (However, if the right relation can be
57 scanned with an index scan, this can be a good strategy. It is
58 possible to use values from the current row of the left relation as
59 keys for the index scan of the right.)
60 * merge join: Each relation is sorted on the join attributes before
61 the join starts. Then the two relations are scanned in parallel,
62 and matching rows are combined to form join rows. This kind of join
63 is attractive because each relation has to be scanned only once.
64 The required sorting might be achieved either by an explicit sort
65 step, or by scanning the relation in the proper order using an
66 index on the join key.
67 * hash join: the right relation is first scanned and loaded into a
68 hash table, using its join attributes as hash keys. Next the left
69 relation is scanned and the appropriate values of every row found
70 are used as hash keys to locate the matching rows in the table.
72 When the query involves more than two relations, the final result must
73 be built up by a tree of join steps, each with two inputs. The planner
74 examines different possible join sequences to find the cheapest one.
76 If the query uses fewer than geqo_threshold relations, a
77 near-exhaustive search is conducted to find the best join sequence. The
78 planner preferentially considers joins between any two relations for
79 which there exists a corresponding join clause in the WHERE
80 qualification (i.e., for which a restriction like where
81 rel1.attr1=rel2.attr2 exists). Join pairs with no join clause are
82 considered only when there is no other choice, that is, a particular
83 relation has no available join clauses to any other relation. All
84 possible plans are generated for every join pair considered by the
85 planner, and the one that is (estimated to be) the cheapest is chosen.
87 When geqo_threshold is exceeded, the join sequences considered are
88 determined by heuristics, as described in Chapter 61. Otherwise the
91 The finished plan tree consists of sequential or index scans of the
92 base relations, plus nested-loop, merge, or hash join nodes as needed,
93 plus any auxiliary steps needed, such as sort nodes or
94 aggregate-function calculation nodes. Most of these plan node types
95 have the additional ability to do selection (discarding rows that do
96 not meet a specified Boolean condition) and projection (computation of
97 a derived column set based on given column values, that is, evaluation
98 of scalar expressions where needed). One of the responsibilities of the
99 planner is to attach selection conditions from the WHERE clause and
100 computation of required output expressions to the most appropriate
101 nodes of the plan tree.