begriffs open source - ai-pg/blob - full-docs/txt/queries-with.txt

   1
   2 7.8. WITH Queries (Common Table Expressions) #
   3
   4    7.8.1. SELECT in WITH
   5    7.8.2. Recursive Queries
   6    7.8.3. Common Table Expression Materialization
   7    7.8.4. Data-Modifying Statements in WITH
   8
   9    WITH provides a way to write auxiliary statements for use in a larger
  10    query. These statements, which are often referred to as Common Table
  11    Expressions or CTEs, can be thought of as defining temporary tables
  12    that exist just for one query. Each auxiliary statement in a WITH
  13    clause can be a SELECT, INSERT, UPDATE, DELETE, or MERGE; and the WITH
  14    clause itself is attached to a primary statement that can also be a
  15    SELECT, INSERT, UPDATE, DELETE, or MERGE.
  16
  17 7.8.1. SELECT in WITH #
  18
  19    The basic value of SELECT in WITH is to break down complicated queries
  20    into simpler parts. An example is:
  21 WITH regional_sales AS (
  22     SELECT region, SUM(amount) AS total_sales
  23     FROM orders
  24     GROUP BY region
  25 ), top_regions AS (
  26     SELECT region
  27     FROM regional_sales
  28     WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
  29 )
  30 SELECT region,
  31        product,
  32        SUM(quantity) AS product_units,
  33        SUM(amount) AS product_sales
  34 FROM orders
  35 WHERE region IN (SELECT region FROM top_regions)
  36 GROUP BY region, product;
  37
  38    which displays per-product sales totals in only the top sales regions.
  39    The WITH clause defines two auxiliary statements named regional_sales
  40    and top_regions, where the output of regional_sales is used in
  41    top_regions and the output of top_regions is used in the primary SELECT
  42    query. This example could have been written without WITH, but we'd have
  43    needed two levels of nested sub-SELECTs. It's a bit easier to follow
  44    this way.
  45
  46 7.8.2. Recursive Queries #
  47
  48    The optional RECURSIVE modifier changes WITH from a mere syntactic
  49    convenience into a feature that accomplishes things not otherwise
  50    possible in standard SQL. Using RECURSIVE, a WITH query can refer to
  51    its own output. A very simple example is this query to sum the integers
  52    from 1 through 100:
  53 WITH RECURSIVE t(n) AS (
  54     VALUES (1)
  55   UNION ALL
  56     SELECT n+1 FROM t WHERE n < 100
  57 )
  58 SELECT sum(n) FROM t;
  59
  60    The general form of a recursive WITH query is always a non-recursive
  61    term, then UNION (or UNION ALL), then a recursive term, where only the
  62    recursive term can contain a reference to the query's own output. Such
  63    a query is executed as follows:
  64
  65    Recursive Query Evaluation
  66     1. Evaluate the non-recursive term. For UNION (but not UNION ALL),
  67        discard duplicate rows. Include all remaining rows in the result of
  68        the recursive query, and also place them in a temporary working
  69        table.
  70     2. So long as the working table is not empty, repeat these steps:
  71          a. Evaluate the recursive term, substituting the current contents
  72             of the working table for the recursive self-reference. For
  73             UNION (but not UNION ALL), discard duplicate rows and rows
  74             that duplicate any previous result row. Include all remaining
  75             rows in the result of the recursive query, and also place them
  76             in a temporary intermediate table.
  77          b. Replace the contents of the working table with the contents of
  78             the intermediate table, then empty the intermediate table.
  79
  80 Note
  81
  82    While RECURSIVE allows queries to be specified recursively, internally
  83    such queries are evaluated iteratively.
  84
  85    In the example above, the working table has just a single row in each
  86    step, and it takes on the values from 1 through 100 in successive
  87    steps. In the 100th step, there is no output because of the WHERE
  88    clause, and so the query terminates.
  89
  90    Recursive queries are typically used to deal with hierarchical or
  91    tree-structured data. A useful example is this query to find all the
  92    direct and indirect sub-parts of a product, given only a table that
  93    shows immediate inclusions:
  94 WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
  95     SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
  96   UNION ALL
  97     SELECT p.sub_part, p.part, p.quantity * pr.quantity
  98     FROM included_parts pr, parts p
  99     WHERE p.part = pr.sub_part
 100 )
 101 SELECT sub_part, SUM(quantity) as total_quantity
 102 FROM included_parts
 103 GROUP BY sub_part
 104
 105 7.8.2.1. Search Order #
 106
 107    When computing a tree traversal using a recursive query, you might want
 108    to order the results in either depth-first or breadth-first order. This
 109    can be done by computing an ordering column alongside the other data
 110    columns and using that to sort the results at the end. Note that this
 111    does not actually control in which order the query evaluation visits
 112    the rows; that is as always in SQL implementation-dependent. This
 113    approach merely provides a convenient way to order the results
 114    afterwards.
 115
 116    To create a depth-first order, we compute for each result row an array
 117    of rows that we have visited so far. For example, consider the
 118    following query that searches a table tree using a link field:
 119 WITH RECURSIVE search_tree(id, link, data) AS (
 120     SELECT t.id, t.link, t.data
 121     FROM tree t
 122   UNION ALL
 123     SELECT t.id, t.link, t.data
 124     FROM tree t, search_tree st
 125     WHERE t.id = st.link
 126 )
 127 SELECT * FROM search_tree;
 128
 129    To add depth-first ordering information, you can write this:
 130 WITH RECURSIVE search_tree(id, link, data, path) AS (
 131     SELECT t.id, t.link, t.data, ARRAY[t.id]
 132     FROM tree t
 133   UNION ALL
 134     SELECT t.id, t.link, t.data, path || t.id
 135     FROM tree t, search_tree st
 136     WHERE t.id = st.link
 137 )
 138 SELECT * FROM search_tree ORDER BY path;
 139
 140    In the general case where more than one field needs to be used to
 141    identify a row, use an array of rows. For example, if we needed to
 142    track fields f1 and f2:
 143 WITH RECURSIVE search_tree(id, link, data, path) AS (
 144     SELECT t.id, t.link, t.data, ARRAY[ROW(t.f1, t.f2)]
 145     FROM tree t
 146   UNION ALL
 147     SELECT t.id, t.link, t.data, path || ROW(t.f1, t.f2)
 148     FROM tree t, search_tree st
 149     WHERE t.id = st.link
 150 )
 151 SELECT * FROM search_tree ORDER BY path;
 152
 153 Tip
 154
 155    Omit the ROW() syntax in the common case where only one field needs to
 156    be tracked. This allows a simple array rather than a composite-type
 157    array to be used, gaining efficiency.
 158
 159    To create a breadth-first order, you can add a column that tracks the
 160    depth of the search, for example:
 161 WITH RECURSIVE search_tree(id, link, data, depth) AS (
 162     SELECT t.id, t.link, t.data, 0
 163     FROM tree t
 164   UNION ALL
 165     SELECT t.id, t.link, t.data, depth + 1
 166     FROM tree t, search_tree st
 167     WHERE t.id = st.link
 168 )
 169 SELECT * FROM search_tree ORDER BY depth;
 170
 171    To get a stable sort, add data columns as secondary sorting columns.
 172
 173 Tip
 174
 175    The recursive query evaluation algorithm produces its output in
 176    breadth-first search order. However, this is an implementation detail
 177    and it is perhaps unsound to rely on it. The order of the rows within
 178    each level is certainly undefined, so some explicit ordering might be
 179    desired in any case.
 180
 181    There is built-in syntax to compute a depth- or breadth-first sort
 182    column. For example:
 183 WITH RECURSIVE search_tree(id, link, data) AS (
 184     SELECT t.id, t.link, t.data
 185     FROM tree t
 186   UNION ALL
 187     SELECT t.id, t.link, t.data
 188     FROM tree t, search_tree st
 189     WHERE t.id = st.link
 190 ) SEARCH DEPTH FIRST BY id SET ordercol
 191 SELECT * FROM search_tree ORDER BY ordercol;
 192
 193 WITH RECURSIVE search_tree(id, link, data) AS (
 194     SELECT t.id, t.link, t.data
 195     FROM tree t
 196   UNION ALL
 197     SELECT t.id, t.link, t.data
 198     FROM tree t, search_tree st
 199     WHERE t.id = st.link
 200 ) SEARCH BREADTH FIRST BY id SET ordercol
 201 SELECT * FROM search_tree ORDER BY ordercol;
 202
 203    This syntax is internally expanded to something similar to the above
 204    hand-written forms. The SEARCH clause specifies whether depth- or
 205    breadth first search is wanted, the list of columns to track for
 206    sorting, and a column name that will contain the result data that can
 207    be used for sorting. That column will implicitly be added to the output
 208    rows of the CTE.
 209
 210 7.8.2.2. Cycle Detection #
 211
 212    When working with recursive queries it is important to be sure that the
 213    recursive part of the query will eventually return no tuples, or else
 214    the query will loop indefinitely. Sometimes, using UNION instead of
 215    UNION ALL can accomplish this by discarding rows that duplicate
 216    previous output rows. However, often a cycle does not involve output
 217    rows that are completely duplicate: it may be necessary to check just
 218    one or a few fields to see if the same point has been reached before.
 219    The standard method for handling such situations is to compute an array
 220    of the already-visited values. For example, consider again the
 221    following query that searches a table graph using a link field:
 222 WITH RECURSIVE search_graph(id, link, data, depth) AS (
 223     SELECT g.id, g.link, g.data, 0
 224     FROM graph g
 225   UNION ALL
 226     SELECT g.id, g.link, g.data, sg.depth + 1
 227     FROM graph g, search_graph sg
 228     WHERE g.id = sg.link
 229 )
 230 SELECT * FROM search_graph;
 231
 232    This query will loop if the link relationships contain cycles. Because
 233    we require a “depth” output, just changing UNION ALL to UNION would not
 234    eliminate the looping. Instead we need to recognize whether we have
 235    reached the same row again while following a particular path of links.
 236    We add two columns is_cycle and path to the loop-prone query:
 237 WITH RECURSIVE search_graph(id, link, data, depth, is_cycle, path) AS (
 238     SELECT g.id, g.link, g.data, 0,
 239       false,
 240       ARRAY[g.id]
 241     FROM graph g
 242   UNION ALL
 243     SELECT g.id, g.link, g.data, sg.depth + 1,
 244       g.id = ANY(path),
 245       path || g.id
 246     FROM graph g, search_graph sg
 247     WHERE g.id = sg.link AND NOT is_cycle
 248 )
 249 SELECT * FROM search_graph;
 250
 251    Aside from preventing cycles, the array value is often useful in its
 252    own right as representing the “path” taken to reach any particular row.
 253
 254    In the general case where more than one field needs to be checked to
 255    recognize a cycle, use an array of rows. For example, if we needed to
 256    compare fields f1 and f2:
 257 WITH RECURSIVE search_graph(id, link, data, depth, is_cycle, path) AS (
 258     SELECT g.id, g.link, g.data, 0,
 259       false,
 260       ARRAY[ROW(g.f1, g.f2)]
 261     FROM graph g
 262   UNION ALL
 263     SELECT g.id, g.link, g.data, sg.depth + 1,
 264       ROW(g.f1, g.f2) = ANY(path),
 265       path || ROW(g.f1, g.f2)
 266     FROM graph g, search_graph sg
 267     WHERE g.id = sg.link AND NOT is_cycle
 268 )
 269 SELECT * FROM search_graph;
 270
 271 Tip
 272
 273    Omit the ROW() syntax in the common case where only one field needs to
 274    be checked to recognize a cycle. This allows a simple array rather than
 275    a composite-type array to be used, gaining efficiency.
 276
 277    There is built-in syntax to simplify cycle detection. The above query
 278    can also be written like this:
 279 WITH RECURSIVE search_graph(id, link, data, depth) AS (
 280     SELECT g.id, g.link, g.data, 1
 281     FROM graph g
 282   UNION ALL
 283     SELECT g.id, g.link, g.data, sg.depth + 1
 284     FROM graph g, search_graph sg
 285     WHERE g.id = sg.link
 286 ) CYCLE id SET is_cycle USING path
 287 SELECT * FROM search_graph;
 288
 289    and it will be internally rewritten to the above form. The CYCLE clause
 290    specifies first the list of columns to track for cycle detection, then
 291    a column name that will show whether a cycle has been detected, and
 292    finally the name of another column that will track the path. The cycle
 293    and path columns will implicitly be added to the output rows of the
 294    CTE.
 295
 296 Tip
 297
 298    The cycle path column is computed in the same way as the depth-first
 299    ordering column show in the previous section. A query can have both a
 300    SEARCH and a CYCLE clause, but a depth-first search specification and a
 301    cycle detection specification would create redundant computations, so
 302    it's more efficient to just use the CYCLE clause and order by the path
 303    column. If breadth-first ordering is wanted, then specifying both
 304    SEARCH and CYCLE can be useful.
 305
 306    A helpful trick for testing queries when you are not certain if they
 307    might loop is to place a LIMIT in the parent query. For example, this
 308    query would loop forever without the LIMIT:
 309 WITH RECURSIVE t(n) AS (
 310     SELECT 1
 311   UNION ALL
 312     SELECT n+1 FROM t
 313 )
 314 SELECT n FROM t LIMIT 100;
 315
 316    This works because PostgreSQL's implementation evaluates only as many
 317    rows of a WITH query as are actually fetched by the parent query. Using
 318    this trick in production is not recommended, because other systems
 319    might work differently. Also, it usually won't work if you make the
 320    outer query sort the recursive query's results or join them to some
 321    other table, because in such cases the outer query will usually try to
 322    fetch all of the WITH query's output anyway.
 323
 324 7.8.3. Common Table Expression Materialization #
 325
 326    A useful property of WITH queries is that they are normally evaluated
 327    only once per execution of the parent query, even if they are referred
 328    to more than once by the parent query or sibling WITH queries. Thus,
 329    expensive calculations that are needed in multiple places can be placed
 330    within a WITH query to avoid redundant work. Another possible
 331    application is to prevent unwanted multiple evaluations of functions
 332    with side-effects. However, the other side of this coin is that the
 333    optimizer is not able to push restrictions from the parent query down
 334    into a multiply-referenced WITH query, since that might affect all uses
 335    of the WITH query's output when it should affect only one. The
 336    multiply-referenced WITH query will be evaluated as written, without
 337    suppression of rows that the parent query might discard afterwards.
 338    (But, as mentioned above, evaluation might stop early if the
 339    reference(s) to the query demand only a limited number of rows.)
 340
 341    However, if a WITH query is non-recursive and side-effect-free (that
 342    is, it is a SELECT containing no volatile functions) then it can be
 343    folded into the parent query, allowing joint optimization of the two
 344    query levels. By default, this happens if the parent query references
 345    the WITH query just once, but not if it references the WITH query more
 346    than once. You can override that decision by specifying MATERIALIZED to
 347    force separate calculation of the WITH query, or by specifying NOT
 348    MATERIALIZED to force it to be merged into the parent query. The latter
 349    choice risks duplicate computation of the WITH query, but it can still
 350    give a net savings if each usage of the WITH query needs only a small
 351    part of the WITH query's full output.
 352
 353    A simple example of these rules is
 354 WITH w AS (
 355     SELECT * FROM big_table
 356 )
 357 SELECT * FROM w WHERE key = 123;
 358
 359    This WITH query will be folded, producing the same execution plan as
 360 SELECT * FROM big_table WHERE key = 123;
 361
 362    In particular, if there's an index on key, it will probably be used to
 363    fetch just the rows having key = 123. On the other hand, in
 364 WITH w AS (
 365     SELECT * FROM big_table
 366 )
 367 SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref
 368 WHERE w2.key = 123;
 369
 370    the WITH query will be materialized, producing a temporary copy of
 371    big_table that is then joined with itself — without benefit of any
 372    index. This query will be executed much more efficiently if written as
 373 WITH w AS NOT MATERIALIZED (
 374     SELECT * FROM big_table
 375 )
 376 SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref
 377 WHERE w2.key = 123;
 378
 379    so that the parent query's restrictions can be applied directly to
 380    scans of big_table.
 381
 382    An example where NOT MATERIALIZED could be undesirable is
 383 WITH w AS (
 384     SELECT key, very_expensive_function(val) as f FROM some_table
 385 )
 386 SELECT * FROM w AS w1 JOIN w AS w2 ON w1.f = w2.f;
 387
 388    Here, materialization of the WITH query ensures that
 389    very_expensive_function is evaluated only once per table row, not
 390    twice.
 391
 392    The examples above only show WITH being used with SELECT, but it can be
 393    attached in the same way to INSERT, UPDATE, DELETE, or MERGE. In each
 394    case it effectively provides temporary table(s) that can be referred to
 395    in the main command.
 396
 397 7.8.4. Data-Modifying Statements in WITH #
 398
 399    You can use data-modifying statements (INSERT, UPDATE, DELETE, or
 400    MERGE) in WITH. This allows you to perform several different operations
 401    in the same query. An example is:
 402 WITH moved_rows AS (
 403     DELETE FROM products
 404     WHERE
 405         "date" >= '2010-10-01' AND
 406         "date" < '2010-11-01'
 407     RETURNING *
 408 )
 409 INSERT INTO products_log
 410 SELECT * FROM moved_rows;
 411
 412    This query effectively moves rows from products to products_log. The
 413    DELETE in WITH deletes the specified rows from products, returning
 414    their contents by means of its RETURNING clause; and then the primary
 415    query reads that output and inserts it into products_log.
 416
 417    A fine point of the above example is that the WITH clause is attached
 418    to the INSERT, not the sub-SELECT within the INSERT. This is necessary
 419    because data-modifying statements are only allowed in WITH clauses that
 420    are attached to the top-level statement. However, normal WITH
 421    visibility rules apply, so it is possible to refer to the WITH
 422    statement's output from the sub-SELECT.
 423
 424    Data-modifying statements in WITH usually have RETURNING clauses (see
 425    Section 6.4), as shown in the example above. It is the output of the
 426    RETURNING clause, not the target table of the data-modifying statement,
 427    that forms the temporary table that can be referred to by the rest of
 428    the query. If a data-modifying statement in WITH lacks a RETURNING
 429    clause, then it forms no temporary table and cannot be referred to in
 430    the rest of the query. Such a statement will be executed nonetheless. A
 431    not-particularly-useful example is:
 432 WITH t AS (
 433     DELETE FROM foo
 434 )
 435 DELETE FROM bar;
 436
 437    This example would remove all rows from tables foo and bar. The number
 438    of affected rows reported to the client would only include rows removed
 439    from bar.
 440
 441    Recursive self-references in data-modifying statements are not allowed.
 442    In some cases it is possible to work around this limitation by
 443    referring to the output of a recursive WITH, for example:
 444 WITH RECURSIVE included_parts(sub_part, part) AS (
 445     SELECT sub_part, part FROM parts WHERE part = 'our_product'
 446   UNION ALL
 447     SELECT p.sub_part, p.part
 448     FROM included_parts pr, parts p
 449     WHERE p.part = pr.sub_part
 450 )
 451 DELETE FROM parts
 452   WHERE part IN (SELECT part FROM included_parts);
 453
 454    This query would remove all direct and indirect subparts of a product.
 455
 456    Data-modifying statements in WITH are executed exactly once, and always
 457    to completion, independently of whether the primary query reads all (or
 458    indeed any) of their output. Notice that this is different from the
 459    rule for SELECT in WITH: as stated in the previous section, execution
 460    of a SELECT is carried only as far as the primary query demands its
 461    output.
 462
 463    The sub-statements in WITH are executed concurrently with each other
 464    and with the main query. Therefore, when using data-modifying
 465    statements in WITH, the order in which the specified updates actually
 466    happen is unpredictable. All the statements are executed with the same
 467    snapshot (see Chapter 13), so they cannot “see” one another's effects
 468    on the target tables. This alleviates the effects of the
 469    unpredictability of the actual order of row updates, and means that
 470    RETURNING data is the only way to communicate changes between different
 471    WITH sub-statements and the main query. An example of this is that in
 472 WITH t AS (
 473     UPDATE products SET price = price * 1.05
 474     RETURNING *
 475 )
 476 SELECT * FROM products;
 477
 478    the outer SELECT would return the original prices before the action of
 479    the UPDATE, while in
 480 WITH t AS (
 481     UPDATE products SET price = price * 1.05
 482     RETURNING *
 483 )
 484 SELECT * FROM t;
 485
 486    the outer SELECT would return the updated data.
 487
 488    Trying to update the same row twice in a single statement is not
 489    supported. Only one of the modifications takes place, but it is not
 490    easy (and sometimes not possible) to reliably predict which one. This
 491    also applies to deleting a row that was already updated in the same
 492    statement: only the update is performed. Therefore you should generally
 493    avoid trying to modify a single row twice in a single statement. In
 494    particular avoid writing WITH sub-statements that could affect the same
 495    rows changed by the main statement or a sibling sub-statement. The
 496    effects of such a statement will not be predictable.
 497
 498    At present, any table used as the target of a data-modifying statement
 499    in WITH must not have a conditional rule, nor an ALSO rule, nor an
 500    INSTEAD rule that expands to multiple statements.