begriffs open source - ai-pg/blob - full-docs/src/sgml/html/storage-toast.html

   1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
   2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>66.2. TOAST</title><link rel="stylesheet" type="text/css" href="stylesheet.css" /><link rev="made" href="pgsql-docs@lists.postgresql.org" /><meta name="generator" content="DocBook XSL Stylesheets Vsnapshot" /><link rel="prev" href="storage-file-layout.html" title="66.1. Database File Layout" /><link rel="next" href="storage-fsm.html" title="66.3. Free Space Map" /></head><body id="docContent" class="container-fluid col-10"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="5" align="center">66.2. TOAST</th></tr><tr><td width="10%" align="left"><a accesskey="p" href="storage-file-layout.html" title="66.1. Database File Layout">Prev</a> </td><td width="10%" align="left"><a accesskey="u" href="storage.html" title="Chapter 66. Database Physical Storage">Up</a></td><th width="60%" align="center">Chapter 66. Database Physical Storage</th><td width="10%" align="right"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="10%" align="right"> <a accesskey="n" href="storage-fsm.html" title="66.3. Free Space Map">Next</a></td></tr></table><hr /></div><div class="sect1" id="STORAGE-TOAST"><div class="titlepage"><div><div><h2 class="title" style="clear: both">66.2. TOAST <a href="#STORAGE-TOAST" class="id_link">#</a></h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="storage-toast.html#STORAGE-TOAST-ONDISK">66.2.1. Out-of-Line, On-Disk TOAST Storage</a></span></dt><dt><span class="sect2"><a href="storage-toast.html#STORAGE-TOAST-INMEMORY">66.2.2. Out-of-Line, In-Memory TOAST Storage</a></span></dt></dl></div><a id="id-1.10.18.4.2" class="indexterm"></a><a id="id-1.10.18.4.3" class="indexterm"></a><p>
   3 This section provides an overview of <acronym class="acronym">TOAST</acronym> (The
   4 Oversized-Attribute Storage Technique).
   5 </p><p>
   6 <span class="productname">PostgreSQL</span> uses a fixed page size (commonly
   7 8 kB), and does not allow tuples to span multiple pages.  Therefore, it is
   8 not possible to store very large field values directly.  To overcome
   9 this limitation, large field values are compressed and/or broken up into
  10 multiple physical rows.  This happens transparently to the user, with only
  11 small impact on most of the backend code.  The technique is affectionately
  12 known as <acronym class="acronym">TOAST</acronym> (or <span class="quote">“<span class="quote">the best thing since sliced bread</span>”</span>).
  13 The <acronym class="acronym">TOAST</acronym> infrastructure is also used to improve handling of
  14 large data values in-memory.
  15 </p><p>
  16 Only certain data types support <acronym class="acronym">TOAST</acronym> — there is no need to
  17 impose the overhead on data types that cannot produce large field values.
  18 To support <acronym class="acronym">TOAST</acronym>, a data type must have a variable-length
  19 (<em class="firstterm">varlena</em>) representation, in which, ordinarily, the first
  20 four-byte word of any stored value contains the total length of the value in
  21 bytes (including itself).  <acronym class="acronym">TOAST</acronym> does not constrain the rest
  22 of the data type's representation.  The special representations collectively
  23 called <em class="firstterm"><acronym class="acronym">TOAST</acronym>ed values</em> work by modifying or
  24 reinterpreting this initial length word.  Therefore, the C-level functions
  25 supporting a <acronym class="acronym">TOAST</acronym>-able data type must be careful about how they
  26 handle potentially <acronym class="acronym">TOAST</acronym>ed input values: an input might not
  27 actually consist of a four-byte length word and contents until after it's
  28 been <em class="firstterm">detoasted</em>.  (This is normally done by invoking
  29 <code class="function">PG_DETOAST_DATUM</code> before doing anything with an input value,
  30 but in some cases more efficient approaches are possible.
  31 See <a class="xref" href="xtypes.html#XTYPES-TOAST" title="36.13.1. TOAST Considerations">Section 36.13.1</a> for more detail.)
  32 </p><p>
  33 <acronym class="acronym">TOAST</acronym> usurps two bits of the varlena length word (the high-order
  34 bits on big-endian machines, the low-order bits on little-endian machines),
  35 thereby limiting the logical size of any value of a <acronym class="acronym">TOAST</acronym>-able
  36 data type to 1 GB (2<sup>30</sup> - 1 bytes).  When both bits are zero,
  37 the value is an ordinary un-<acronym class="acronym">TOAST</acronym>ed value of the data type, and
  38 the remaining bits of the length word give the total datum size (including
  39 length word) in bytes.  When the highest-order or lowest-order bit is set,
  40 the value has only a single-byte header instead of the normal four-byte
  41 header, and the remaining bits of that byte give the total datum size
  42 (including length byte) in bytes.  This alternative supports space-efficient
  43 storage of values shorter than 127 bytes, while still allowing the data type
  44 to grow to 1 GB at need.  Values with single-byte headers aren't aligned on
  45 any particular boundary, whereas values with four-byte headers are aligned on
  46 at least a four-byte boundary; this omission of alignment padding provides
  47 additional space savings that is significant compared to short values.
  48 As a special case, if the remaining bits of a single-byte header are all
  49 zero (which would be impossible for a self-inclusive length), the value is
  50 a pointer to out-of-line data, with several possible alternatives as
  51 described below.  The type and size of such a <em class="firstterm">TOAST pointer</em>
  52 are determined by a code stored in the second byte of the datum.
  53 Lastly, when the highest-order or lowest-order bit is clear but the adjacent
  54 bit is set, the content of the datum has been compressed and must be
  55 decompressed before use.  In this case the remaining bits of the four-byte
  56 length word give the total size of the compressed datum, not the
  57 original data.  Note that compression is also possible for out-of-line data
  58 but the varlena header does not tell whether it has occurred —
  59 the content of the <acronym class="acronym">TOAST</acronym> pointer tells that, instead.
  60 </p><p>
  61 The compression technique used for either in-line or out-of-line compressed
  62 data can be selected for each column by setting
  63 the <code class="literal">COMPRESSION</code> column option in <code class="command">CREATE
  64 TABLE</code> or <code class="command">ALTER TABLE</code>.  The default for columns
  65 with no explicit setting is to consult the
  66 <a class="xref" href="runtime-config-client.html#GUC-DEFAULT-TOAST-COMPRESSION">default_toast_compression</a> parameter at the time data is
  67 inserted.
  68 </p><p>
  69 As mentioned, there are multiple types of <acronym class="acronym">TOAST</acronym> pointer datums.
  70 The oldest and most common type is a pointer to out-of-line data stored in
  71 a <em class="firstterm"><acronym class="acronym">TOAST</acronym> table</em> that is separate from, but
  72 associated with, the table containing the <acronym class="acronym">TOAST</acronym> pointer datum
  73 itself.  These <em class="firstterm">on-disk</em> pointer datums are created by the
  74 <acronym class="acronym">TOAST</acronym> management code (in <code class="filename">access/common/toast_internals.c</code>)
  75 when a tuple to be stored on disk is too large to be stored as-is.
  76 Further details appear in <a class="xref" href="storage-toast.html#STORAGE-TOAST-ONDISK" title="66.2.1. Out-of-Line, On-Disk TOAST Storage">Section 66.2.1</a>.
  77 Alternatively, a <acronym class="acronym">TOAST</acronym> pointer datum can contain a pointer to
  78 out-of-line data that appears elsewhere in memory.  Such datums are
  79 necessarily short-lived, and will never appear on-disk, but they are very
  80 useful for avoiding copying and redundant processing of large data values.
  81 Further details appear in <a class="xref" href="storage-toast.html#STORAGE-TOAST-INMEMORY" title="66.2.2. Out-of-Line, In-Memory TOAST Storage">Section 66.2.2</a>.
  82 </p><div class="sect2" id="STORAGE-TOAST-ONDISK"><div class="titlepage"><div><div><h3 class="title">66.2.1. Out-of-Line, On-Disk TOAST Storage <a href="#STORAGE-TOAST-ONDISK" class="id_link">#</a></h3></div></div></div><p>
  83 If any of the columns of a table are <acronym class="acronym">TOAST</acronym>-able, the table will
  84 have an associated <acronym class="acronym">TOAST</acronym> table, whose OID is stored in the table's
  85 <code class="structname">pg_class</code>.<code class="structfield">reltoastrelid</code> entry.  On-disk
  86 <acronym class="acronym">TOAST</acronym>ed values are kept in the <acronym class="acronym">TOAST</acronym> table, as
  87 described in more detail below.
  88 </p><p>
  89 Out-of-line values are divided (after compression if used) into chunks of at
  90 most <code class="symbol">TOAST_MAX_CHUNK_SIZE</code> bytes (by default this value is chosen
  91 so that four chunk rows will fit on a page, making it about 2000 bytes).
  92 Each chunk is stored as a separate row in the <acronym class="acronym">TOAST</acronym> table
  93 belonging to the owning table.  Every
  94 <acronym class="acronym">TOAST</acronym> table has the columns <code class="structfield">chunk_id</code> (an OID
  95 identifying the particular <acronym class="acronym">TOAST</acronym>ed value),
  96 <code class="structfield">chunk_seq</code> (a sequence number for the chunk within its value),
  97 and <code class="structfield">chunk_data</code> (the actual data of the chunk).  A unique index
  98 on <code class="structfield">chunk_id</code> and <code class="structfield">chunk_seq</code> provides fast
  99 retrieval of the values.  A pointer datum representing an out-of-line on-disk
 100 <acronym class="acronym">TOAST</acronym>ed value therefore needs to store the OID of the
 101 <acronym class="acronym">TOAST</acronym> table in which to look and the OID of the specific value
 102 (its <code class="structfield">chunk_id</code>).  For convenience, pointer datums also store the
 103 logical datum size (original uncompressed data length), physical stored size
 104 (different if compression was applied), and the compression method used, if
 105 any.  Allowing for the varlena header bytes,
 106 the total size of an on-disk <acronym class="acronym">TOAST</acronym> pointer datum is therefore 18
 107 bytes regardless of the actual size of the represented value.
 108 </p><p>
 109 The <acronym class="acronym">TOAST</acronym> management code is triggered only
 110 when a row value to be stored in a table is wider than
 111 <code class="symbol">TOAST_TUPLE_THRESHOLD</code> bytes (normally 2 kB).
 112 The <acronym class="acronym">TOAST</acronym> code will compress and/or move
 113 field values out-of-line until the row value is shorter than
 114 <code class="symbol">TOAST_TUPLE_TARGET</code> bytes (also normally 2 kB, adjustable)
 115 or no more gains can be had.  During an UPDATE
 116 operation, values of unchanged fields are normally preserved as-is; so an
 117 UPDATE of a row with out-of-line values incurs no <acronym class="acronym">TOAST</acronym> costs if
 118 none of the out-of-line values change.
 119 </p><p>
 120 The <acronym class="acronym">TOAST</acronym> management code recognizes four different strategies
 121 for storing <acronym class="acronym">TOAST</acronym>-able columns on disk:
 122
 123    </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
 124       <code class="literal">PLAIN</code> prevents either compression or
 125       out-of-line storage.  This is the only possible strategy for
 126       columns of non-<acronym class="acronym">TOAST</acronym>-able data types.
 127      </p></li><li class="listitem"><p>
 128       <code class="literal">EXTENDED</code> allows both compression and out-of-line
 129       storage.  This is the default for most <acronym class="acronym">TOAST</acronym>-able data types.
 130       Compression will be attempted first, then out-of-line storage if
 131       the row is still too big.
 132      </p></li><li class="listitem"><p>
 133       <code class="literal">EXTERNAL</code> allows out-of-line storage but not
 134       compression.  Use of <code class="literal">EXTERNAL</code> will
 135       make substring operations on wide <code class="type">text</code> and
 136       <code class="type">bytea</code> columns faster (at the penalty of increased storage
 137       space) because these operations are optimized to fetch only the
 138       required parts of the out-of-line value when it is not compressed.
 139      </p></li><li class="listitem"><p>
 140       <code class="literal">MAIN</code> allows compression but not out-of-line
 141       storage.  (Actually, out-of-line storage will still be performed
 142       for such columns, but only as a last resort when there is no other
 143       way to make the row small enough to fit on a page.)
 144      </p></li></ul></div><p>
 145
 146 Each <acronym class="acronym">TOAST</acronym>-able data type specifies a default strategy for columns
 147 of that data type, but the strategy for a given table column can be altered
 148 with <a class="link" href="sql-altertable.html" title="ALTER TABLE"><code class="command">ALTER TABLE ... SET STORAGE</code></a>.
 149 </p><p>
 150 <code class="symbol">TOAST_TUPLE_TARGET</code> can be adjusted for each table using
 151 <a class="link" href="sql-altertable.html" title="ALTER TABLE"><code class="command">ALTER TABLE ... SET (toast_tuple_target = N)</code></a>
 152 </p><p>
 153 This scheme has a number of advantages compared to a more straightforward
 154 approach such as allowing row values to span pages.  Assuming that queries are
 155 usually qualified by comparisons against relatively small key values, most of
 156 the work of the executor will be done using the main row entry. The big values
 157 of <acronym class="acronym">TOAST</acronym>ed attributes will only be pulled out (if selected at all)
 158 at the time the result set is sent to the client. Thus, the main table is much
 159 smaller and more of its rows fit in the shared buffer cache than would be the
 160 case without any out-of-line storage. Sort sets shrink also, and sorts will
 161 more often be done entirely in memory. A little test showed that a table
 162 containing typical HTML pages and their URLs was stored in about half of the
 163 raw data size including the <acronym class="acronym">TOAST</acronym> table, and that the main table
 164 contained only about 10% of the entire data (the URLs and some small HTML
 165 pages). There was no run time difference compared to an un-<acronym class="acronym">TOAST</acronym>ed
 166 comparison table, in which all the HTML pages were cut down to 7 kB to fit.
 167 </p></div><div class="sect2" id="STORAGE-TOAST-INMEMORY"><div class="titlepage"><div><div><h3 class="title">66.2.2. Out-of-Line, In-Memory TOAST Storage <a href="#STORAGE-TOAST-INMEMORY" class="id_link">#</a></h3></div></div></div><p>
 168 <acronym class="acronym">TOAST</acronym> pointers can point to data that is not on disk, but is
 169 elsewhere in the memory of the current server process.  Such pointers
 170 obviously cannot be long-lived, but they are nonetheless useful.  There
 171 are currently two sub-cases:
 172 pointers to <em class="firstterm">indirect</em> data and
 173 pointers to <em class="firstterm">expanded</em> data.
 174 </p><p>
 175 Indirect <acronym class="acronym">TOAST</acronym> pointers simply point at a non-indirect varlena
 176 value stored somewhere in memory.  This case was originally created merely
 177 as a proof of concept, but it is currently used during logical decoding to
 178 avoid possibly having to create physical tuples exceeding 1 GB (as pulling
 179 all out-of-line field values into the tuple might do).  The case is of
 180 limited use since the creator of the pointer datum is entirely responsible
 181 that the referenced data survives for as long as the pointer could exist,
 182 and there is no infrastructure to help with this.
 183 </p><p>
 184 Expanded <acronym class="acronym">TOAST</acronym> pointers are useful for complex data types
 185 whose on-disk representation is not especially suited for computational
 186 purposes.  As an example, the standard varlena representation of a
 187 <span class="productname">PostgreSQL</span> array includes dimensionality information, a
 188 nulls bitmap if there are any null elements, then the values of all the
 189 elements in order.  When the element type itself is variable-length, the
 190 only way to find the <em class="replaceable"><code>N</code></em>'th element is to scan through all the
 191 preceding elements.  This representation is appropriate for on-disk storage
 192 because of its compactness, but for computations with the array it's much
 193 nicer to have an <span class="quote">“<span class="quote">expanded</span>”</span> or <span class="quote">“<span class="quote">deconstructed</span>”</span>
 194 representation in which all the element starting locations have been
 195 identified.  The <acronym class="acronym">TOAST</acronym> pointer mechanism supports this need by
 196 allowing a pass-by-reference Datum to point to either a standard varlena
 197 value (the on-disk representation) or a <acronym class="acronym">TOAST</acronym> pointer that
 198 points to an expanded representation somewhere in memory.  The details of
 199 this expanded representation are up to the data type, though it must have
 200 a standard header and meet the other API requirements given
 201 in <code class="filename">src/include/utils/expandeddatum.h</code>.  C-level functions
 202 working with the data type can choose to handle either representation.
 203 Functions that do not know about the expanded representation, but simply
 204 apply <code class="function">PG_DETOAST_DATUM</code> to their inputs, will automatically
 205 receive the traditional varlena representation; so support for an expanded
 206 representation can be introduced incrementally, one function at a time.
 207 </p><p>
 208 <acronym class="acronym">TOAST</acronym> pointers to expanded values are further broken down
 209 into <em class="firstterm">read-write</em> and <em class="firstterm">read-only</em> pointers.
 210 The pointed-to representation is the same either way, but a function that
 211 receives a read-write pointer is allowed to modify the referenced value
 212 in-place, whereas one that receives a read-only pointer must not; it must
 213 first create a copy if it wants to make a modified version of the value.
 214 This distinction and some associated conventions make it possible to avoid
 215 unnecessary copying of expanded values during query execution.
 216 </p><p>
 217 For all types of in-memory <acronym class="acronym">TOAST</acronym> pointer, the <acronym class="acronym">TOAST</acronym>
 218 management code ensures that no such pointer datum can accidentally get
 219 stored on disk.  In-memory <acronym class="acronym">TOAST</acronym> pointers are automatically
 220 expanded to normal in-line varlena values before storage — and then
 221 possibly converted to on-disk <acronym class="acronym">TOAST</acronym> pointers, if the containing
 222 tuple would otherwise be too big.
 223 </p></div></div><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="storage-file-layout.html" title="66.1. Database File Layout">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="storage.html" title="Chapter 66. Database Physical Storage">Up</a></td><td width="40%" align="right"> <a accesskey="n" href="storage-fsm.html" title="66.3. Free Space Map">Next</a></td></tr><tr><td width="40%" align="left" valign="top">66.1. Database File Layout </td><td width="20%" align="center"><a accesskey="h" href="index.html" title="PostgreSQL 18.0 Documentation">Home</a></td><td width="40%" align="right" valign="top"> 66.3. Free Space Map</td></tr></table></div></body></html>