2 7.8. WITH Queries (Common Table Expressions) #
5 7.8.2. Recursive Queries
6 7.8.3. Common Table Expression Materialization
7 7.8.4. Data-Modifying Statements in WITH
9 WITH provides a way to write auxiliary statements for use in a larger
10 query. These statements, which are often referred to as Common Table
11 Expressions or CTEs, can be thought of as defining temporary tables
12 that exist just for one query. Each auxiliary statement in a WITH
13 clause can be a SELECT, INSERT, UPDATE, DELETE, or MERGE; and the WITH
14 clause itself is attached to a primary statement that can also be a
15 SELECT, INSERT, UPDATE, DELETE, or MERGE.
17 7.8.1. SELECT in WITH #
19 The basic value of SELECT in WITH is to break down complicated queries
20 into simpler parts. An example is:
21 WITH regional_sales AS (
22 SELECT region, SUM(amount) AS total_sales
28 WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
32 SUM(quantity) AS product_units,
33 SUM(amount) AS product_sales
35 WHERE region IN (SELECT region FROM top_regions)
36 GROUP BY region, product;
38 which displays per-product sales totals in only the top sales regions.
39 The WITH clause defines two auxiliary statements named regional_sales
40 and top_regions, where the output of regional_sales is used in
41 top_regions and the output of top_regions is used in the primary SELECT
42 query. This example could have been written without WITH, but we'd have
43 needed two levels of nested sub-SELECTs. It's a bit easier to follow
46 7.8.2. Recursive Queries #
48 The optional RECURSIVE modifier changes WITH from a mere syntactic
49 convenience into a feature that accomplishes things not otherwise
50 possible in standard SQL. Using RECURSIVE, a WITH query can refer to
51 its own output. A very simple example is this query to sum the integers
53 WITH RECURSIVE t(n) AS (
56 SELECT n+1 FROM t WHERE n < 100
60 The general form of a recursive WITH query is always a non-recursive
61 term, then UNION (or UNION ALL), then a recursive term, where only the
62 recursive term can contain a reference to the query's own output. Such
63 a query is executed as follows:
65 Recursive Query Evaluation
66 1. Evaluate the non-recursive term. For UNION (but not UNION ALL),
67 discard duplicate rows. Include all remaining rows in the result of
68 the recursive query, and also place them in a temporary working
70 2. So long as the working table is not empty, repeat these steps:
71 a. Evaluate the recursive term, substituting the current contents
72 of the working table for the recursive self-reference. For
73 UNION (but not UNION ALL), discard duplicate rows and rows
74 that duplicate any previous result row. Include all remaining
75 rows in the result of the recursive query, and also place them
76 in a temporary intermediate table.
77 b. Replace the contents of the working table with the contents of
78 the intermediate table, then empty the intermediate table.
82 While RECURSIVE allows queries to be specified recursively, internally
83 such queries are evaluated iteratively.
85 In the example above, the working table has just a single row in each
86 step, and it takes on the values from 1 through 100 in successive
87 steps. In the 100th step, there is no output because of the WHERE
88 clause, and so the query terminates.
90 Recursive queries are typically used to deal with hierarchical or
91 tree-structured data. A useful example is this query to find all the
92 direct and indirect sub-parts of a product, given only a table that
93 shows immediate inclusions:
94 WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
95 SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
97 SELECT p.sub_part, p.part, p.quantity * pr.quantity
98 FROM included_parts pr, parts p
99 WHERE p.part = pr.sub_part
101 SELECT sub_part, SUM(quantity) as total_quantity
105 7.8.2.1. Search Order #
107 When computing a tree traversal using a recursive query, you might want
108 to order the results in either depth-first or breadth-first order. This
109 can be done by computing an ordering column alongside the other data
110 columns and using that to sort the results at the end. Note that this
111 does not actually control in which order the query evaluation visits
112 the rows; that is as always in SQL implementation-dependent. This
113 approach merely provides a convenient way to order the results
116 To create a depth-first order, we compute for each result row an array
117 of rows that we have visited so far. For example, consider the
118 following query that searches a table tree using a link field:
119 WITH RECURSIVE search_tree(id, link, data) AS (
120 SELECT t.id, t.link, t.data
123 SELECT t.id, t.link, t.data
124 FROM tree t, search_tree st
127 SELECT * FROM search_tree;
129 To add depth-first ordering information, you can write this:
130 WITH RECURSIVE search_tree(id, link, data, path) AS (
131 SELECT t.id, t.link, t.data, ARRAY[t.id]
134 SELECT t.id, t.link, t.data, path || t.id
135 FROM tree t, search_tree st
138 SELECT * FROM search_tree ORDER BY path;
140 In the general case where more than one field needs to be used to
141 identify a row, use an array of rows. For example, if we needed to
142 track fields f1 and f2:
143 WITH RECURSIVE search_tree(id, link, data, path) AS (
144 SELECT t.id, t.link, t.data, ARRAY[ROW(t.f1, t.f2)]
147 SELECT t.id, t.link, t.data, path || ROW(t.f1, t.f2)
148 FROM tree t, search_tree st
151 SELECT * FROM search_tree ORDER BY path;
155 Omit the ROW() syntax in the common case where only one field needs to
156 be tracked. This allows a simple array rather than a composite-type
157 array to be used, gaining efficiency.
159 To create a breadth-first order, you can add a column that tracks the
160 depth of the search, for example:
161 WITH RECURSIVE search_tree(id, link, data, depth) AS (
162 SELECT t.id, t.link, t.data, 0
165 SELECT t.id, t.link, t.data, depth + 1
166 FROM tree t, search_tree st
169 SELECT * FROM search_tree ORDER BY depth;
171 To get a stable sort, add data columns as secondary sorting columns.
175 The recursive query evaluation algorithm produces its output in
176 breadth-first search order. However, this is an implementation detail
177 and it is perhaps unsound to rely on it. The order of the rows within
178 each level is certainly undefined, so some explicit ordering might be
181 There is built-in syntax to compute a depth- or breadth-first sort
183 WITH RECURSIVE search_tree(id, link, data) AS (
184 SELECT t.id, t.link, t.data
187 SELECT t.id, t.link, t.data
188 FROM tree t, search_tree st
190 ) SEARCH DEPTH FIRST BY id SET ordercol
191 SELECT * FROM search_tree ORDER BY ordercol;
193 WITH RECURSIVE search_tree(id, link, data) AS (
194 SELECT t.id, t.link, t.data
197 SELECT t.id, t.link, t.data
198 FROM tree t, search_tree st
200 ) SEARCH BREADTH FIRST BY id SET ordercol
201 SELECT * FROM search_tree ORDER BY ordercol;
203 This syntax is internally expanded to something similar to the above
204 hand-written forms. The SEARCH clause specifies whether depth- or
205 breadth first search is wanted, the list of columns to track for
206 sorting, and a column name that will contain the result data that can
207 be used for sorting. That column will implicitly be added to the output
210 7.8.2.2. Cycle Detection #
212 When working with recursive queries it is important to be sure that the
213 recursive part of the query will eventually return no tuples, or else
214 the query will loop indefinitely. Sometimes, using UNION instead of
215 UNION ALL can accomplish this by discarding rows that duplicate
216 previous output rows. However, often a cycle does not involve output
217 rows that are completely duplicate: it may be necessary to check just
218 one or a few fields to see if the same point has been reached before.
219 The standard method for handling such situations is to compute an array
220 of the already-visited values. For example, consider again the
221 following query that searches a table graph using a link field:
222 WITH RECURSIVE search_graph(id, link, data, depth) AS (
223 SELECT g.id, g.link, g.data, 0
226 SELECT g.id, g.link, g.data, sg.depth + 1
227 FROM graph g, search_graph sg
230 SELECT * FROM search_graph;
232 This query will loop if the link relationships contain cycles. Because
233 we require a “depth” output, just changing UNION ALL to UNION would not
234 eliminate the looping. Instead we need to recognize whether we have
235 reached the same row again while following a particular path of links.
236 We add two columns is_cycle and path to the loop-prone query:
237 WITH RECURSIVE search_graph(id, link, data, depth, is_cycle, path) AS (
238 SELECT g.id, g.link, g.data, 0,
243 SELECT g.id, g.link, g.data, sg.depth + 1,
246 FROM graph g, search_graph sg
247 WHERE g.id = sg.link AND NOT is_cycle
249 SELECT * FROM search_graph;
251 Aside from preventing cycles, the array value is often useful in its
252 own right as representing the “path” taken to reach any particular row.
254 In the general case where more than one field needs to be checked to
255 recognize a cycle, use an array of rows. For example, if we needed to
256 compare fields f1 and f2:
257 WITH RECURSIVE search_graph(id, link, data, depth, is_cycle, path) AS (
258 SELECT g.id, g.link, g.data, 0,
260 ARRAY[ROW(g.f1, g.f2)]
263 SELECT g.id, g.link, g.data, sg.depth + 1,
264 ROW(g.f1, g.f2) = ANY(path),
265 path || ROW(g.f1, g.f2)
266 FROM graph g, search_graph sg
267 WHERE g.id = sg.link AND NOT is_cycle
269 SELECT * FROM search_graph;
273 Omit the ROW() syntax in the common case where only one field needs to
274 be checked to recognize a cycle. This allows a simple array rather than
275 a composite-type array to be used, gaining efficiency.
277 There is built-in syntax to simplify cycle detection. The above query
278 can also be written like this:
279 WITH RECURSIVE search_graph(id, link, data, depth) AS (
280 SELECT g.id, g.link, g.data, 1
283 SELECT g.id, g.link, g.data, sg.depth + 1
284 FROM graph g, search_graph sg
286 ) CYCLE id SET is_cycle USING path
287 SELECT * FROM search_graph;
289 and it will be internally rewritten to the above form. The CYCLE clause
290 specifies first the list of columns to track for cycle detection, then
291 a column name that will show whether a cycle has been detected, and
292 finally the name of another column that will track the path. The cycle
293 and path columns will implicitly be added to the output rows of the
298 The cycle path column is computed in the same way as the depth-first
299 ordering column show in the previous section. A query can have both a
300 SEARCH and a CYCLE clause, but a depth-first search specification and a
301 cycle detection specification would create redundant computations, so
302 it's more efficient to just use the CYCLE clause and order by the path
303 column. If breadth-first ordering is wanted, then specifying both
304 SEARCH and CYCLE can be useful.
306 A helpful trick for testing queries when you are not certain if they
307 might loop is to place a LIMIT in the parent query. For example, this
308 query would loop forever without the LIMIT:
309 WITH RECURSIVE t(n) AS (
314 SELECT n FROM t LIMIT 100;
316 This works because PostgreSQL's implementation evaluates only as many
317 rows of a WITH query as are actually fetched by the parent query. Using
318 this trick in production is not recommended, because other systems
319 might work differently. Also, it usually won't work if you make the
320 outer query sort the recursive query's results or join them to some
321 other table, because in such cases the outer query will usually try to
322 fetch all of the WITH query's output anyway.
324 7.8.3. Common Table Expression Materialization #
326 A useful property of WITH queries is that they are normally evaluated
327 only once per execution of the parent query, even if they are referred
328 to more than once by the parent query or sibling WITH queries. Thus,
329 expensive calculations that are needed in multiple places can be placed
330 within a WITH query to avoid redundant work. Another possible
331 application is to prevent unwanted multiple evaluations of functions
332 with side-effects. However, the other side of this coin is that the
333 optimizer is not able to push restrictions from the parent query down
334 into a multiply-referenced WITH query, since that might affect all uses
335 of the WITH query's output when it should affect only one. The
336 multiply-referenced WITH query will be evaluated as written, without
337 suppression of rows that the parent query might discard afterwards.
338 (But, as mentioned above, evaluation might stop early if the
339 reference(s) to the query demand only a limited number of rows.)
341 However, if a WITH query is non-recursive and side-effect-free (that
342 is, it is a SELECT containing no volatile functions) then it can be
343 folded into the parent query, allowing joint optimization of the two
344 query levels. By default, this happens if the parent query references
345 the WITH query just once, but not if it references the WITH query more
346 than once. You can override that decision by specifying MATERIALIZED to
347 force separate calculation of the WITH query, or by specifying NOT
348 MATERIALIZED to force it to be merged into the parent query. The latter
349 choice risks duplicate computation of the WITH query, but it can still
350 give a net savings if each usage of the WITH query needs only a small
351 part of the WITH query's full output.
353 A simple example of these rules is
355 SELECT * FROM big_table
357 SELECT * FROM w WHERE key = 123;
359 This WITH query will be folded, producing the same execution plan as
360 SELECT * FROM big_table WHERE key = 123;
362 In particular, if there's an index on key, it will probably be used to
363 fetch just the rows having key = 123. On the other hand, in
365 SELECT * FROM big_table
367 SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref
370 the WITH query will be materialized, producing a temporary copy of
371 big_table that is then joined with itself — without benefit of any
372 index. This query will be executed much more efficiently if written as
373 WITH w AS NOT MATERIALIZED (
374 SELECT * FROM big_table
376 SELECT * FROM w AS w1 JOIN w AS w2 ON w1.key = w2.ref
379 so that the parent query's restrictions can be applied directly to
382 An example where NOT MATERIALIZED could be undesirable is
384 SELECT key, very_expensive_function(val) as f FROM some_table
386 SELECT * FROM w AS w1 JOIN w AS w2 ON w1.f = w2.f;
388 Here, materialization of the WITH query ensures that
389 very_expensive_function is evaluated only once per table row, not
392 The examples above only show WITH being used with SELECT, but it can be
393 attached in the same way to INSERT, UPDATE, DELETE, or MERGE. In each
394 case it effectively provides temporary table(s) that can be referred to
397 7.8.4. Data-Modifying Statements in WITH #
399 You can use data-modifying statements (INSERT, UPDATE, DELETE, or
400 MERGE) in WITH. This allows you to perform several different operations
401 in the same query. An example is:
405 "date" >= '2010-10-01' AND
406 "date" < '2010-11-01'
409 INSERT INTO products_log
410 SELECT * FROM moved_rows;
412 This query effectively moves rows from products to products_log. The
413 DELETE in WITH deletes the specified rows from products, returning
414 their contents by means of its RETURNING clause; and then the primary
415 query reads that output and inserts it into products_log.
417 A fine point of the above example is that the WITH clause is attached
418 to the INSERT, not the sub-SELECT within the INSERT. This is necessary
419 because data-modifying statements are only allowed in WITH clauses that
420 are attached to the top-level statement. However, normal WITH
421 visibility rules apply, so it is possible to refer to the WITH
422 statement's output from the sub-SELECT.
424 Data-modifying statements in WITH usually have RETURNING clauses (see
425 Section 6.4), as shown in the example above. It is the output of the
426 RETURNING clause, not the target table of the data-modifying statement,
427 that forms the temporary table that can be referred to by the rest of
428 the query. If a data-modifying statement in WITH lacks a RETURNING
429 clause, then it forms no temporary table and cannot be referred to in
430 the rest of the query. Such a statement will be executed nonetheless. A
431 not-particularly-useful example is:
437 This example would remove all rows from tables foo and bar. The number
438 of affected rows reported to the client would only include rows removed
441 Recursive self-references in data-modifying statements are not allowed.
442 In some cases it is possible to work around this limitation by
443 referring to the output of a recursive WITH, for example:
444 WITH RECURSIVE included_parts(sub_part, part) AS (
445 SELECT sub_part, part FROM parts WHERE part = 'our_product'
447 SELECT p.sub_part, p.part
448 FROM included_parts pr, parts p
449 WHERE p.part = pr.sub_part
452 WHERE part IN (SELECT part FROM included_parts);
454 This query would remove all direct and indirect subparts of a product.
456 Data-modifying statements in WITH are executed exactly once, and always
457 to completion, independently of whether the primary query reads all (or
458 indeed any) of their output. Notice that this is different from the
459 rule for SELECT in WITH: as stated in the previous section, execution
460 of a SELECT is carried only as far as the primary query demands its
463 The sub-statements in WITH are executed concurrently with each other
464 and with the main query. Therefore, when using data-modifying
465 statements in WITH, the order in which the specified updates actually
466 happen is unpredictable. All the statements are executed with the same
467 snapshot (see Chapter 13), so they cannot “see” one another's effects
468 on the target tables. This alleviates the effects of the
469 unpredictability of the actual order of row updates, and means that
470 RETURNING data is the only way to communicate changes between different
471 WITH sub-statements and the main query. An example of this is that in
473 UPDATE products SET price = price * 1.05
476 SELECT * FROM products;
478 the outer SELECT would return the original prices before the action of
481 UPDATE products SET price = price * 1.05
486 the outer SELECT would return the updated data.
488 Trying to update the same row twice in a single statement is not
489 supported. Only one of the modifications takes place, but it is not
490 easy (and sometimes not possible) to reliably predict which one. This
491 also applies to deleting a row that was already updated in the same
492 statement: only the update is performed. Therefore you should generally
493 avoid trying to modify a single row twice in a single statement. In
494 particular avoid writing WITH sub-statements that could affect the same
495 rows changed by the main statement or a sibling sub-statement. The
496 effects of such a statement will not be predictable.
498 At present, any table used as the target of a data-modifying statement
499 in WITH must not have a conditional rule, nor an ALSO rule, nor an
500 INSTEAD rule that expands to multiple statements.