begriffs open source - ai-pg/blob - full-docs/html/parallel-plans.html

   1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
   2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>15.3. Parallel Plans</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="when-can-parallel-query-be-used.html" title="15.2. When Can Parallel Query Be Used?" /><link rel="next" href="parallel-safety.html" title="15.4. Parallel Safety" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">15.3. Parallel Plans</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="when-can-parallel-query-be-used.html" title="15.2. When Can Parallel Query Be Used?">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="parallel-query.html" title="Chapter 15. Parallel Query">Up</a></td><th width="60%" align="center">Chapter 15. Parallel Query</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="parallel-safety.html" title="15.4. Parallel Safety">Next</a></td></tr></table><hr /></div><div class="sect1" id="PARALLEL-PLANS"><div class="titlepage"><div><div><h2 class="title" style="clear: both">15.3. Parallel Plans <a href="#PARALLEL-PLANS" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="parallel-plans.html#PARALLEL-SCANS">15.3.1. Parallel Scans</a></span></dt><dt><span class="sect2"><a href="parallel-plans.html#PARALLEL-JOINS">15.3.2. Parallel Joins</a></span></dt><dt><span class="sect2"><a href="parallel-plans.html#PARALLEL-AGGREGATION">15.3.3. Parallel Aggregation</a></span></dt><dt><span class="sect2"><a href="parallel-plans.html#PARALLEL-APPEND">15.3.4. Parallel Append</a></span></dt><dt><span class="sect2"><a href="parallel-plans.html#PARALLEL-PLAN-TIPS">15.3.5. Parallel Plan Tips</a></span></dt></dl></div><p>
   3     Because each worker executes the parallel portion of the plan to
   4     completion, it is not possible to simply take an ordinary query plan
   5     and run it using multiple workers.  Each worker would produce a full
   6     copy of the output result set, so the query would not run any faster
   7     than normal but would produce incorrect results.  Instead, the parallel
   8     portion of the plan must be what is known internally to the query
   9     optimizer as a <em class="firstterm">partial plan</em>; that is, it must be constructed
  10     so that each process that executes the plan will generate only a
  11     subset of the output rows in such a way that each required output row
  12     is guaranteed to be generated by exactly one of the cooperating processes.
  13     Generally, this means that the scan on the driving table of the query
  14     must be a parallel-aware scan.
  15   </p><div class="sect2" id="PARALLEL-SCANS"><div class="titlepage"><div><div><h3 class="title">15.3.1. Parallel Scans <a href="#PARALLEL-SCANS" class="id_link">#</a></h3></div></div></div><p>
  16     The following types of parallel-aware table scans are currently supported.
  17
  18   </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
  19         In a <span class="emphasis"><em>parallel sequential scan</em></span>, the table's blocks will
  20         be divided into ranges and shared among the cooperating processes.  Each
  21         worker process will complete the scanning of its given range of blocks before
  22         requesting an additional range of blocks.
  23       </p></li><li class="listitem"><p>
  24         In a <span class="emphasis"><em>parallel bitmap heap scan</em></span>, one process is chosen
  25         as the leader.  That process performs a scan of one or more indexes
  26         and builds a bitmap indicating which table blocks need to be visited.
  27         These blocks are then divided among the cooperating processes as in
  28         a parallel sequential scan.  In other words, the heap scan is performed
  29         in parallel, but the underlying index scan is not.
  30       </p></li><li class="listitem"><p>
  31         In a <span class="emphasis"><em>parallel index scan</em></span> or <span class="emphasis"><em>parallel index-only
  32         scan</em></span>, the cooperating processes take turns reading data from the
  33         index.  Currently, parallel index scans are supported only for
  34         btree indexes.  Each process will claim a single index block and will
  35         scan and return all tuples referenced by that block; other processes can
  36         at the same time be returning tuples from a different index block.
  37         The results of a parallel btree scan are returned in sorted order
  38         within each worker process.
  39       </p></li></ul></div><p>
  40
  41     Other scan types, such as scans of non-btree indexes, may support
  42     parallel scans in the future.
  43   </p></div><div class="sect2" id="PARALLEL-JOINS"><div class="titlepage"><div><div><h3 class="title">15.3.2. Parallel Joins <a href="#PARALLEL-JOINS" class="id_link">#</a></h3></div></div></div><p>
  44     Just as in a non-parallel plan, the driving table may be joined to one or
  45     more other tables using a nested loop, hash join, or merge join.  The
  46     inner side of the join may be any kind of non-parallel plan that is
  47     otherwise supported by the planner provided that it is safe to run within
  48     a parallel worker.  Depending on the join type, the inner side may also be
  49     a parallel plan.
  50   </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
  51         In a <span class="emphasis"><em>nested loop join</em></span>, the inner side is always
  52         non-parallel.  Although it is executed in full, this is efficient if
  53         the inner side is an index scan, because the outer tuples and thus
  54         the loops that look up values in the index are divided over the
  55         cooperating processes.
  56       </p></li><li class="listitem"><p>
  57         In a <span class="emphasis"><em>merge join</em></span>, the inner side is always
  58         a non-parallel plan and therefore executed in full.  This may be
  59         inefficient, especially if a sort must be performed, because the work
  60         and resulting data are duplicated in every cooperating process.
  61       </p></li><li class="listitem"><p>
  62         In a <span class="emphasis"><em>hash join</em></span> (without the "parallel" prefix),
  63         the inner side is executed in full by every cooperating process
  64         to build identical copies of the hash table.  This may be inefficient
  65         if the hash table is large or the plan is expensive.  In a
  66         <span class="emphasis"><em>parallel hash join</em></span>, the inner side is a
  67         <span class="emphasis"><em>parallel hash</em></span> that divides the work of building
  68         a shared hash table over the cooperating processes.
  69       </p></li></ul></div></div><div class="sect2" id="PARALLEL-AGGREGATION"><div class="titlepage"><div><div><h3 class="title">15.3.3. Parallel Aggregation <a href="#PARALLEL-AGGREGATION" class="id_link">#</a></h3></div></div></div><p>
  70     <span class="productname">PostgreSQL</span> supports parallel aggregation by aggregating in
  71     two stages.  First, each process participating in the parallel portion of
  72     the query performs an aggregation step, producing a partial result for
  73     each group of which that process is aware.  This is reflected in the plan
  74     as a <code class="literal">Partial Aggregate</code> node.  Second, the partial results are
  75     transferred to the leader via <code class="literal">Gather</code> or <code class="literal">Gather
  76     Merge</code>.  Finally, the leader re-aggregates the results across all
  77     workers in order to produce the final result.  This is reflected in the
  78     plan as a <code class="literal">Finalize Aggregate</code> node.
  79   </p><p>
  80     Because the <code class="literal">Finalize Aggregate</code> node runs on the leader
  81     process, queries that produce a relatively large number of groups in
  82     comparison to the number of input rows will appear less favorable to the
  83     query planner. For example, in the worst-case scenario the number of
  84     groups seen by the <code class="literal">Finalize Aggregate</code> node could be as many as
  85     the number of input rows that were seen by all worker processes in the
  86     <code class="literal">Partial Aggregate</code> stage. For such cases, there is clearly
  87     going to be no performance benefit to using parallel aggregation. The
  88     query planner takes this into account during the planning process and is
  89     unlikely to choose parallel aggregate in this scenario.
  90   </p><p>
  91     Parallel aggregation is not supported in all situations.  Each aggregate
  92     must be <a class="link" href="parallel-safety.html" title="15.4. Parallel Safety">safe</a> for parallelism and must
  93     have a combine function.  If the aggregate has a transition state of type
  94     <code class="literal">internal</code>, it must have serialization and deserialization
  95     functions.  See <a class="xref" href="sql-createaggregate.html" title="CREATE AGGREGATE"><span class="refentrytitle">CREATE AGGREGATE</span></a> for more details.
  96     Parallel aggregation is not supported if any aggregate function call
  97     contains <code class="literal">DISTINCT</code> or <code class="literal">ORDER BY</code> clause and is also
  98     not supported for ordered set aggregates or when  the query involves
  99     <code class="literal">GROUPING SETS</code>.  It can only be used when all joins involved in
 100     the query are also part of the parallel portion of the plan.
 101   </p></div><div class="sect2" id="PARALLEL-APPEND"><div class="titlepage"><div><div><h3 class="title">15.3.4. Parallel Append <a href="#PARALLEL-APPEND" class="id_link">#</a></h3></div></div></div><p>
 102     Whenever <span class="productname">PostgreSQL</span> needs to combine rows
 103     from multiple sources into a single result set, it uses an
 104     <code class="literal">Append</code> or <code class="literal">MergeAppend</code> plan node.
 105     This commonly happens when implementing <code class="literal">UNION ALL</code> or
 106     when scanning a partitioned table.  Such nodes can be used in parallel
 107     plans just as they can in any other plan.  However, in a parallel plan,
 108     the planner may instead use a <code class="literal">Parallel Append</code> node.
 109   </p><p>
 110     When an <code class="literal">Append</code> node is used in a parallel plan, each
 111     process will execute the child plans in the order in which they appear,
 112     so that all participating processes cooperate to execute the first child
 113     plan until it is complete and then move to the second plan at around the
 114     same time.  When a <code class="literal">Parallel Append</code> is used instead, the
 115     executor will instead spread out the participating processes as evenly as
 116     possible across its child plans, so that multiple child plans are executed
 117     simultaneously.  This avoids contention, and also avoids paying the startup
 118     cost of a child plan in those processes that never execute it.
 119   </p><p>
 120     Also, unlike a regular <code class="literal">Append</code> node, which can only have
 121     partial children when used within a parallel plan, a <code class="literal">Parallel
 122     Append</code> node can have both partial and non-partial child plans.
 123     Non-partial children will be scanned by only a single process, since
 124     scanning them more than once would produce duplicate results.  Plans that
 125     involve appending multiple result sets can therefore achieve
 126     coarse-grained parallelism even when efficient partial plans are not
 127     available.  For example, consider a query against a partitioned table
 128     that can only be implemented efficiently by using an index that does
 129     not support parallel scans.  The planner might choose a <code class="literal">Parallel
 130     Append</code> of regular <code class="literal">Index Scan</code> plans; each
 131     individual index scan would have to be executed to completion by a single
 132     process, but different scans could be performed at the same time by
 133     different processes.
 134   </p><p>
 135     <a class="xref" href="runtime-config-query.html#GUC-ENABLE-PARALLEL-APPEND">enable_parallel_append</a> can be used to disable
 136     this feature.
 137   </p></div><div class="sect2" id="PARALLEL-PLAN-TIPS"><div class="titlepage"><div><div><h3 class="title">15.3.5. Parallel Plan Tips <a href="#PARALLEL-PLAN-TIPS" class="id_link">#</a></h3></div></div></div><p>
 138     If a query that is expected to do so does not produce a parallel plan,
 139     you can try reducing <a class="xref" href="runtime-config-query.html#GUC-PARALLEL-SETUP-COST">parallel_setup_cost</a> or
 140     <a class="xref" href="runtime-config-query.html#GUC-PARALLEL-TUPLE-COST">parallel_tuple_cost</a>.  Of course, this plan may turn
 141     out to be slower than the serial plan that the planner preferred, but
 142     this will not always be the case.  If you don't get a parallel
 143     plan even with very small values of these settings (e.g., after setting
 144     them both to zero), there may be some reason why the query planner is
 145     unable to generate a parallel plan for your query.  See
 146     <a class="xref" href="when-can-parallel-query-be-used.html" title="15.2. When Can Parallel Query Be Used?">Section 15.2</a> and
 147     <a class="xref" href="parallel-safety.html" title="15.4. Parallel Safety">Section 15.4</a> for information on why this may be
 148     the case.
 149   </p><p>
 150     When executing a parallel plan, you can use <code class="literal">EXPLAIN (ANALYZE,
 151     VERBOSE)</code> to display per-worker statistics for each plan node.
 152     This may be useful in determining whether the work is being evenly
 153     distributed between all plan nodes and more generally in understanding the
 154     performance characteristics of the plan.
 155   </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="when-can-parallel-query-be-used.html" title="15.2. When Can Parallel Query Be Used?">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="parallel-query.html" title="Chapter 15. Parallel Query">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="parallel-safety.html" title="15.4. Parallel Safety">Next</a></td></tr><tr><td width="40%" align="left" valign="top">15.2. When Can Parallel Query Be Used? </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 15.4. Parallel Safety</td></tr></table></div></body></html>