begriffs open source - ai-pg/blob - full-docs/txt/creating-cluster.txt

   1
   2 18.2. Creating a Database Cluster #
   3
   4    18.2.1. Use of Secondary File Systems
   5    18.2.2. File Systems
   6
   7    Before you can do anything, you must initialize a database storage area
   8    on disk. We call this a database cluster. (The SQL standard uses the
   9    term catalog cluster.) A database cluster is a collection of databases
  10    that is managed by a single instance of a running database server.
  11    After initialization, a database cluster will contain a database named
  12    postgres, which is meant as a default database for use by utilities,
  13    users and third party applications. The database server itself does not
  14    require the postgres database to exist, but many external utility
  15    programs assume it exists. There are two more databases created within
  16    each cluster during initialization, named template1 and template0. As
  17    the names suggest, these will be used as templates for
  18    subsequently-created databases; they should not be used for actual
  19    work. (See Chapter 22 for information about creating new databases
  20    within a cluster.)
  21
  22    In file system terms, a database cluster is a single directory under
  23    which all data will be stored. We call this the data directory or data
  24    area. It is completely up to you where you choose to store your data.
  25    There is no default, although locations such as /usr/local/pgsql/data
  26    or /var/lib/pgsql/data are popular. The data directory must be
  27    initialized before being used, using the program initdb which is
  28    installed with PostgreSQL.
  29
  30    If you are using a pre-packaged version of PostgreSQL, it may well have
  31    a specific convention for where to place the data directory, and it may
  32    also provide a script for creating the data directory. In that case you
  33    should use that script in preference to running initdb directly.
  34    Consult the package-level documentation for details.
  35
  36    To initialize a database cluster manually, run initdb and specify the
  37    desired file system location of the database cluster with the -D
  38    option, for example:
  39 $ initdb -D /usr/local/pgsql/data
  40
  41    Note that you must execute this command while logged into the
  42    PostgreSQL user account, which is described in the previous section.
  43
  44 Tip
  45
  46    As an alternative to the -D option, you can set the environment
  47    variable PGDATA.
  48
  49    Alternatively, you can run initdb via the pg_ctl program like so:
  50 $ pg_ctl -D /usr/local/pgsql/data initdb
  51
  52    This may be more intuitive if you are using pg_ctl for starting and
  53    stopping the server (see Section 18.3), so that pg_ctl would be the
  54    sole command you use for managing the database server instance.
  55
  56    initdb will attempt to create the directory you specify if it does not
  57    already exist. Of course, this will fail if initdb does not have
  58    permissions to write in the parent directory. It's generally
  59    recommendable that the PostgreSQL user own not just the data directory
  60    but its parent directory as well, so that this should not be a problem.
  61    If the desired parent directory doesn't exist either, you will need to
  62    create it first, using root privileges if the grandparent directory
  63    isn't writable. So the process might look like this:
  64 root# mkdir /usr/local/pgsql
  65 root# chown postgres /usr/local/pgsql
  66 root# su postgres
  67 postgres$ initdb -D /usr/local/pgsql/data
  68
  69    initdb will refuse to run if the data directory exists and already
  70    contains files; this is to prevent accidentally overwriting an existing
  71    installation.
  72
  73    Because the data directory contains all the data stored in the
  74    database, it is essential that it be secured from unauthorized access.
  75    initdb therefore revokes access permissions from everyone but the
  76    PostgreSQL user, and optionally, group. Group access, when enabled, is
  77    read-only. This allows an unprivileged user in the same group as the
  78    cluster owner to take a backup of the cluster data or perform other
  79    operations that only require read access.
  80
  81    Note that enabling or disabling group access on an existing cluster
  82    requires the cluster to be shut down and the appropriate mode to be set
  83    on all directories and files before restarting PostgreSQL. Otherwise, a
  84    mix of modes might exist in the data directory. For clusters that allow
  85    access only by the owner, the appropriate modes are 0700 for
  86    directories and 0600 for files. For clusters that also allow reads by
  87    the group, the appropriate modes are 0750 for directories and 0640 for
  88    files.
  89
  90    However, while the directory contents are secure, the default client
  91    authentication setup allows any local user to connect to the database
  92    and even become the database superuser. If you do not trust other local
  93    users, we recommend you use one of initdb's -W, --pwprompt or --pwfile
  94    options to assign a password to the database superuser. Also, specify
  95    -A scram-sha-256 so that the default trust authentication mode is not
  96    used; or modify the generated pg_hba.conf file after running initdb,
  97    but before you start the server for the first time. (Other reasonable
  98    approaches include using peer authentication or file system permissions
  99    to restrict connections. See Chapter 20 for more information.)
 100
 101    initdb also initializes the default locale for the database cluster.
 102    Normally, it will just take the locale settings in the environment and
 103    apply them to the initialized database. It is possible to specify a
 104    different locale for the database; more information about that can be
 105    found in Section 23.1. The default sort order used within the
 106    particular database cluster is set by initdb, and while you can create
 107    new databases using different sort order, the order used in the
 108    template databases that initdb creates cannot be changed without
 109    dropping and recreating them. There is also a performance impact for
 110    using locales other than C or POSIX. Therefore, it is important to make
 111    this choice correctly the first time.
 112
 113    initdb also sets the default character set encoding for the database
 114    cluster. Normally this should be chosen to match the locale setting.
 115    For details see Section 23.3.
 116
 117    Non-C and non-POSIX locales rely on the operating system's collation
 118    library for character set ordering. This controls the ordering of keys
 119    stored in indexes. For this reason, a cluster cannot switch to an
 120    incompatible collation library version, either through snapshot
 121    restore, binary streaming replication, a different operating system, or
 122    an operating system upgrade.
 123
 124 18.2.1. Use of Secondary File Systems #
 125
 126    Many installations create their database clusters on file systems
 127    (volumes) other than the machine's “root” volume. If you choose to do
 128    this, it is not advisable to try to use the secondary volume's topmost
 129    directory (mount point) as the data directory. Best practice is to
 130    create a directory within the mount-point directory that is owned by
 131    the PostgreSQL user, and then create the data directory within that.
 132    This avoids permissions problems, particularly for operations such as
 133    pg_upgrade, and it also ensures clean failures if the secondary volume
 134    is taken offline.
 135
 136 18.2.2. File Systems #
 137
 138    Generally, any file system with POSIX semantics can be used for
 139    PostgreSQL. Users prefer different file systems for a variety of
 140    reasons, including vendor support, performance, and familiarity.
 141    Experience suggests that, all other things being equal, one should not
 142    expect major performance or behavior changes merely from switching file
 143    systems or making minor file system configuration changes.
 144
 145 18.2.2.1. NFS #
 146
 147    It is possible to use an NFS file system for storing the PostgreSQL
 148    data directory. PostgreSQL does nothing special for NFS file systems,
 149    meaning it assumes NFS behaves exactly like locally-connected drives.
 150    PostgreSQL does not use any functionality that is known to have
 151    nonstandard behavior on NFS, such as file locking.
 152
 153    The only firm requirement for using NFS with PostgreSQL is that the
 154    file system is mounted using the hard option. With the hard option,
 155    processes can “hang” indefinitely if there are network problems, so
 156    this configuration will require a careful monitoring setup. The soft
 157    option will interrupt system calls in case of network problems, but
 158    PostgreSQL will not repeat system calls interrupted in this way, so any
 159    such interruption will result in an I/O error being reported.
 160
 161    It is not necessary to use the sync mount option. The behavior of the
 162    async option is sufficient, since PostgreSQL issues fsync calls at
 163    appropriate times to flush the write caches. (This is analogous to how
 164    it works on a local file system.) However, it is strongly recommended
 165    to use the sync export option on the NFS server on systems where it
 166    exists (mainly Linux). Otherwise, an fsync or equivalent on the NFS
 167    client is not actually guaranteed to reach permanent storage on the
 168    server, which could cause corruption similar to running with the
 169    parameter fsync off. The defaults of these mount and export options
 170    differ between vendors and versions, so it is recommended to check and
 171    perhaps specify them explicitly in any case to avoid any ambiguity.
 172
 173    In some cases, an external storage product can be accessed either via
 174    NFS or a lower-level protocol such as iSCSI. In the latter case, the
 175    storage appears as a block device and any available file system can be
 176    created on it. That approach might relieve the DBA from having to deal
 177    with some of the idiosyncrasies of NFS, but of course the complexity of
 178    managing remote storage then happens at other levels.