begriffs open source - ai-pg/blob - full-docs/txt/how-parallel-query-works.txt

   1
   2 15.1. How Parallel Query Works #
   3
   4    When the optimizer determines that parallel query is the fastest
   5    execution strategy for a particular query, it will create a query plan
   6    that includes a Gather or Gather Merge node. Here is a simple example:
   7 EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
   8                                      QUERY PLAN
   9 -------------------------------------------------------------------------------
  10 ------
  11  Gather  (cost=1000.00..217018.43 rows=1 width=97)
  12    Workers Planned: 2
  13    ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..216018.33 rows=1 width
  14 =97)
  15          Filter: (filler ~~ '%x%'::text)
  16 (4 rows)
  17
  18    In all cases, the Gather or Gather Merge node will have exactly one
  19    child plan, which is the portion of the plan that will be executed in
  20    parallel. If the Gather or Gather Merge node is at the very top of the
  21    plan tree, then the entire query will execute in parallel. If it is
  22    somewhere else in the plan tree, then only the portion of the plan
  23    below it will run in parallel. In the example above, the query accesses
  24    only one table, so there is only one plan node other than the Gather
  25    node itself; since that plan node is a child of the Gather node, it
  26    will run in parallel.
  27
  28    Using EXPLAIN, you can see the number of workers chosen by the planner.
  29    When the Gather node is reached during query execution, the process
  30    that is implementing the user's session will request a number of
  31    background worker processes equal to the number of workers chosen by
  32    the planner. The number of background workers that the planner will
  33    consider using is limited to at most max_parallel_workers_per_gather.
  34    The total number of background workers that can exist at any one time
  35    is limited by both max_worker_processes and max_parallel_workers.
  36    Therefore, it is possible for a parallel query to run with fewer
  37    workers than planned, or even with no workers at all. The optimal plan
  38    may depend on the number of workers that are available, so this can
  39    result in poor query performance. If this occurrence is frequent,
  40    consider increasing max_worker_processes and max_parallel_workers so
  41    that more workers can be run simultaneously or alternatively reducing
  42    max_parallel_workers_per_gather so that the planner requests fewer
  43    workers.
  44
  45    Every background worker process that is successfully started for a
  46    given parallel query will execute the parallel portion of the plan. The
  47    leader will also execute that portion of the plan, but it has an
  48    additional responsibility: it must also read all of the tuples
  49    generated by the workers. When the parallel portion of the plan
  50    generates only a small number of tuples, the leader will often behave
  51    very much like an additional worker, speeding up query execution.
  52    Conversely, when the parallel portion of the plan generates a large
  53    number of tuples, the leader may be almost entirely occupied with
  54    reading the tuples generated by the workers and performing any further
  55    processing steps that are required by plan nodes above the level of the
  56    Gather node or Gather Merge node. In such cases, the leader will do
  57    very little of the work of executing the parallel portion of the plan.
  58
  59    When the node at the top of the parallel portion of the plan is Gather
  60    Merge rather than Gather, it indicates that each process executing the
  61    parallel portion of the plan is producing tuples in sorted order, and
  62    that the leader is performing an order-preserving merge. In contrast,
  63    Gather reads tuples from the workers in whatever order is convenient,
  64    destroying any sort order that may have existed.