begriffs open source - ai-pg/blob - full-docs/txt/storage-page-layout.txt

   1
   2 66.6. Database Page Layout #
   3
   4    66.6.1. Table Row Layout
   5
   6    This section provides an overview of the page format used within
   7    PostgreSQL tables and indexes.^[19] Sequences and TOAST tables are
   8    formatted just like a regular table.
   9
  10    In the following explanation, a byte is assumed to contain 8 bits. In
  11    addition, the term item refers to an individual data value that is
  12    stored on a page. In a table, an item is a row; in an index, an item is
  13    an index entry.
  14
  15    Every table and index is stored as an array of pages of a fixed size
  16    (usually 8 kB, although a different page size can be selected when
  17    compiling the server). In a table, all the pages are logically
  18    equivalent, so a particular item (row) can be stored in any page. In
  19    indexes, the first page is generally reserved as a metapage holding
  20    control information, and there can be different types of pages within
  21    the index, depending on the index access method.
  22
  23    Table 66.2 shows the overall layout of a page. There are five parts to
  24    each page.
  25
  26    Table 66.2. Overall Page Layout
  27    Item Description
  28    PageHeaderData 24 bytes long. Contains general information about the
  29    page, including free space pointers.
  30    ItemIdData Array of item identifiers pointing to the actual items. Each
  31    entry is an (offset,length) pair. 4 bytes per item.
  32    Free space The unallocated space. New item identifiers are allocated
  33    from the start of this area, new items from the end.
  34    Items The actual items themselves.
  35    Special space Index access method specific data. Different methods
  36    store different data. Empty in ordinary tables.
  37
  38    The first 24 bytes of each page consists of a page header
  39    (PageHeaderData). Its format is detailed in Table 66.3. The first field
  40    tracks the most recent WAL entry related to this page. The second field
  41    contains the page checksum if -k are enabled. Next is a 2-byte field
  42    containing flag bits. This is followed by three 2-byte integer fields
  43    (pd_lower, pd_upper, and pd_special). These contain byte offsets from
  44    the page start to the start of unallocated space, to the end of
  45    unallocated space, and to the start of the special space. The next 2
  46    bytes of the page header, pd_pagesize_version, store both the page size
  47    and a version indicator. Beginning with PostgreSQL 8.3 the version
  48    number is 4; PostgreSQL 8.1 and 8.2 used version number 3; PostgreSQL
  49    8.0 used version number 2; PostgreSQL 7.3 and 7.4 used version number
  50    1; prior releases used version number 0. (The basic page layout and
  51    header format has not changed in most of these versions, but the layout
  52    of heap row headers has.) The page size is basically only present as a
  53    cross-check; there is no support for having more than one page size in
  54    an installation. The last field is a hint that shows whether pruning
  55    the page is likely to be profitable: it tracks the oldest un-pruned
  56    XMAX on the page.
  57
  58    Table 66.3. PageHeaderData Layout
  59    Field Type Length Description
  60    pd_lsn PageXLogRecPtr 8 bytes LSN: next byte after last byte of WAL
  61    record for last change to this page
  62    pd_checksum uint16 2 bytes Page checksum
  63    pd_flags uint16 2 bytes Flag bits
  64    pd_lower LocationIndex 2 bytes Offset to start of free space
  65    pd_upper LocationIndex 2 bytes Offset to end of free space
  66    pd_special LocationIndex 2 bytes Offset to start of special space
  67    pd_pagesize_version uint16 2 bytes Page size and layout version number
  68    information
  69    pd_prune_xid TransactionId 4 bytes Oldest unpruned XMAX on page, or
  70    zero if none
  71
  72    All the details can be found in src/include/storage/bufpage.h.
  73
  74    Following the page header are item identifiers (ItemIdData), each
  75    requiring four bytes. An item identifier contains a byte-offset to the
  76    start of an item, its length in bytes, and a few attribute bits which
  77    affect its interpretation. New item identifiers are allocated as needed
  78    from the beginning of the unallocated space. The number of item
  79    identifiers present can be determined by looking at pd_lower, which is
  80    increased to allocate a new identifier. Because an item identifier is
  81    never moved until it is freed, its index can be used on a long-term
  82    basis to reference an item, even when the item itself is moved around
  83    on the page to compact free space. In fact, every pointer to an item
  84    (ItemPointer, also known as CTID) created by PostgreSQL consists of a
  85    page number and the index of an item identifier.
  86
  87    The items themselves are stored in space allocated backwards from the
  88    end of unallocated space. The exact structure varies depending on what
  89    the table is to contain. Tables and sequences both use a structure
  90    named HeapTupleHeaderData, described below.
  91
  92    The final section is the “special section” which can contain anything
  93    the access method wishes to store. For example, b-tree indexes store
  94    links to the page's left and right siblings, as well as some other data
  95    relevant to the index structure. Ordinary tables do not use a special
  96    section at all (indicated by setting pd_special to equal the page
  97    size).
  98
  99    Figure 66.1 illustrates how these parts are laid out in a page.
 100
 101    Figure 66.1. Page Layout
 102
 103 66.6.1. Table Row Layout #
 104
 105    All table rows are structured in the same way. There is a fixed-size
 106    header (occupying 23 bytes on most machines), followed by an optional
 107    null bitmap, an optional object ID field, and the user data. The header
 108    is detailed in Table 66.4. The actual user data (columns of the row)
 109    begins at the offset indicated by t_hoff, which must always be a
 110    multiple of the MAXALIGN distance for the platform. The null bitmap is
 111    only present if the HEAP_HASNULL bit is set in t_infomask. If it is
 112    present it begins just after the fixed header and occupies enough bytes
 113    to have one bit per data column (that is, the number of bits that
 114    equals the attribute count in t_infomask2). In this list of bits, a 1
 115    bit indicates not-null, a 0 bit is a null. When the bitmap is not
 116    present, all columns are assumed not-null. The object ID is only
 117    present if the HEAP_HASOID_OLD bit is set in t_infomask. If present, it
 118    appears just before the t_hoff boundary. Any padding needed to make
 119    t_hoff a MAXALIGN multiple will appear between the null bitmap and the
 120    object ID. (This in turn ensures that the object ID is suitably
 121    aligned.)
 122
 123    Table 66.4. HeapTupleHeaderData Layout
 124    Field Type Length Description
 125    t_xmin TransactionId 4 bytes insert XID stamp
 126    t_xmax TransactionId 4 bytes delete XID stamp
 127    t_cid CommandId 4 bytes insert and/or delete CID stamp (overlays with
 128    t_xvac)
 129    t_xvac TransactionId 4 bytes XID for VACUUM operation moving a row
 130    version
 131    t_ctid ItemPointerData 6 bytes current TID of this or newer row version
 132    t_infomask2 uint16 2 bytes number of attributes, plus various flag bits
 133    t_infomask uint16 2 bytes various flag bits
 134    t_hoff uint8 1 byte offset to user data
 135
 136    All the details can be found in src/include/access/htup_details.h.
 137
 138    Interpreting the actual data can only be done with information obtained
 139    from other tables, mostly pg_attribute. The key values needed to
 140    identify field locations are attlen and attalign. There is no way to
 141    directly get a particular attribute, except when there are only fixed
 142    width fields and no null values. All this trickery is wrapped up in the
 143    functions heap_getattr, fastgetattr and heap_getsysattr.
 144
 145    To read the data you need to examine each attribute in turn. First
 146    check whether the field is NULL according to the null bitmap. If it is,
 147    go to the next. Then make sure you have the right alignment. If the
 148    field is a fixed width field, then all the bytes are simply placed. If
 149    it's a variable length field (attlen = -1) then it's a bit more
 150    complicated. All variable-length data types share the common header
 151    structure struct varlena, which includes the total length of the stored
 152    value and some flag bits. Depending on the flags, the data can be
 153    either inline or in a TOAST table; it might be compressed, too (see
 154    Section 66.2).
 155
 156    ^[19] Actually, use of this page format is not required for either
 157    table or index access methods. The heap table access method always uses
 158    this format. All the existing index methods also use the basic format,
 159    but the data kept on index metapages usually doesn't follow the item
 160    layout rules.