begriffs open source - ai-pg/blob - full-docs/txt/storage-file-layout.txt

   1
   2 66.1. Database File Layout #
   3
   4    This section describes the storage format at the level of files and
   5    directories.
   6
   7    Traditionally, the configuration and data files used by a database
   8    cluster are stored together within the cluster's data directory,
   9    commonly referred to as PGDATA (after the name of the environment
  10    variable that can be used to define it). A common location for PGDATA
  11    is /var/lib/pgsql/data. Multiple clusters, managed by different server
  12    instances, can exist on the same machine.
  13
  14    The PGDATA directory contains several subdirectories and control files,
  15    as shown in Table 66.1. In addition to these required items, the
  16    cluster configuration files postgresql.conf, pg_hba.conf, and
  17    pg_ident.conf are traditionally stored in PGDATA, although it is
  18    possible to place them elsewhere.
  19
  20    Table 66.1. Contents of PGDATA
  21    Item Description
  22    PG_VERSION A file containing the major version number of PostgreSQL
  23    base Subdirectory containing per-database subdirectories
  24    current_logfiles File recording the log file(s) currently written to by
  25    the logging collector
  26    global Subdirectory containing cluster-wide tables, such as pg_database
  27    pg_commit_ts Subdirectory containing transaction commit timestamp data
  28    pg_dynshmem Subdirectory containing files used by the dynamic shared
  29    memory subsystem
  30    pg_logical Subdirectory containing status data for logical decoding
  31    pg_multixact Subdirectory containing multitransaction status data (used
  32    for shared row locks)
  33    pg_notify Subdirectory containing LISTEN/NOTIFY status data
  34    pg_replslot Subdirectory containing replication slot data
  35    pg_serial Subdirectory containing information about committed
  36    serializable transactions
  37    pg_snapshots Subdirectory containing exported snapshots
  38    pg_stat Subdirectory containing permanent files for the statistics
  39    subsystem
  40    pg_stat_tmp Subdirectory containing temporary files for the statistics
  41    subsystem
  42    pg_subtrans Subdirectory containing subtransaction status data
  43    pg_tblspc Subdirectory containing symbolic links to tablespaces
  44    pg_twophase Subdirectory containing state files for prepared
  45    transactions
  46    pg_wal Subdirectory containing WAL (Write Ahead Log) files
  47    pg_xact Subdirectory containing transaction commit status data
  48    postgresql.auto.conf A file used for storing configuration parameters
  49    that are set by ALTER SYSTEM
  50    postmaster.opts A file recording the command-line options the server
  51    was last started with
  52    postmaster.pid A lock file recording the current postmaster process ID
  53    (PID), cluster data directory path, postmaster start timestamp, port
  54    number, Unix-domain socket directory path (could be empty), first valid
  55    listen_address (IP address or *, or empty if not listening on TCP), and
  56    shared memory segment ID (this file is not present after server
  57    shutdown)
  58
  59    For each database in the cluster there is a subdirectory within
  60    PGDATA/base, named after the database's OID in pg_database. This
  61    subdirectory is the default location for the database's files; in
  62    particular, its system catalogs are stored there.
  63
  64    Note that the following sections describe the behavior of the builtin
  65    heap table access method, and the builtin index access methods. Due to
  66    the extensible nature of PostgreSQL, other access methods might work
  67    differently.
  68
  69    Each table and index is stored in a separate file. For ordinary
  70    relations, these files are named after the table or index's filenode
  71    number, which can be found in pg_class.relfilenode. But for temporary
  72    relations, the file name is of the form tBBB_FFF, where BBB is the
  73    process number of the backend which created the file, and FFF is the
  74    filenode number. In either case, in addition to the main file (a/k/a
  75    main fork), each table and index has a free space map (see
  76    Section 66.3), which stores information about free space available in
  77    the relation. The free space map is stored in a file named with the
  78    filenode number plus the suffix _fsm. Tables also have a visibility
  79    map, stored in a fork with the suffix _vm, to track which pages are
  80    known to have no dead tuples. The visibility map is described further
  81    in Section 66.4. Unlogged tables and indexes have a third fork, known
  82    as the initialization fork, which is stored in a fork with the suffix
  83    _init (see Section 66.5).
  84
  85 Caution
  86
  87    Note that while a table's filenode often matches its OID, this is not
  88    necessarily the case; some operations, like TRUNCATE, REINDEX, CLUSTER
  89    and some forms of ALTER TABLE, can change the filenode while preserving
  90    the OID. Avoid assuming that filenode and table OID are the same. Also,
  91    for certain system catalogs including pg_class itself,
  92    pg_class.relfilenode contains zero. The actual filenode number of these
  93    catalogs is stored in a lower-level data structure, and can be obtained
  94    using the pg_relation_filenode() function.
  95
  96    When a table or index exceeds 1 GB, it is divided into gigabyte-sized
  97    segments. The first segment's file name is the same as the filenode;
  98    subsequent segments are named filenode.1, filenode.2, etc. This
  99    arrangement avoids problems on platforms that have file size
 100    limitations. (Actually, 1 GB is just the default segment size. The
 101    segment size can be adjusted using the configuration option
 102    --with-segsize when building PostgreSQL.) In principle, free space map
 103    and visibility map forks could require multiple segments as well,
 104    though this is unlikely to happen in practice.
 105
 106    A table that has columns with potentially large entries will have an
 107    associated TOAST table, which is used for out-of-line storage of field
 108    values that are too large to keep in the table rows proper.
 109    pg_class.reltoastrelid links from a table to its TOAST table, if any.
 110    See Section 66.2 for more information.
 111
 112    The contents of tables and indexes are discussed further in
 113    Section 66.6.
 114
 115    Tablespaces make the scenario more complicated. Each user-defined
 116    tablespace has a symbolic link inside the PGDATA/pg_tblspc directory,
 117    which points to the physical tablespace directory (i.e., the location
 118    specified in the tablespace's CREATE TABLESPACE command). This symbolic
 119    link is named after the tablespace's OID. Inside the physical
 120    tablespace directory there is a subdirectory with a name that depends
 121    on the PostgreSQL server version, such as PG_9.0_201008051. (The reason
 122    for using this subdirectory is so that successive versions of the
 123    database can use the same CREATE TABLESPACE location value without
 124    conflicts.) Within the version-specific subdirectory, there is a
 125    subdirectory for each database that has elements in the tablespace,
 126    named after the database's OID. Tables and indexes are stored within
 127    that directory, using the filenode naming scheme. The pg_default
 128    tablespace is not accessed through pg_tblspc, but corresponds to
 129    PGDATA/base. Similarly, the pg_global tablespace is not accessed
 130    through pg_tblspc, but corresponds to PGDATA/global.
 131
 132    The pg_relation_filepath() function shows the entire path (relative to
 133    PGDATA) of any relation. It is often useful as a substitute for
 134    remembering many of the above rules. But keep in mind that this
 135    function just gives the name of the first segment of the main fork of
 136    the relation — you may need to append a segment number and/or _fsm,
 137    _vm, or _init to find all the files associated with the relation.
 138
 139    Temporary files (for operations such as sorting more data than can fit
 140    in memory) are created within PGDATA/base/pgsql_tmp, or within a
 141    pgsql_tmp subdirectory of a tablespace directory if a tablespace other
 142    than pg_default is specified for them. The name of a temporary file has
 143    the form pgsql_tmpPPP.NNN, where PPP is the PID of the owning backend
 144    and NNN distinguishes different temporary files of that backend.