begriffs open source - ai-pg/blob - full-docs/txt/backup-dump.txt

   1
   2 25.1. SQL Dump #
   3
   4    25.1.1. Restoring the Dump
   5    25.1.2. Using pg_dumpall
   6    25.1.3. Handling Large Databases
   7
   8    The idea behind this dump method is to generate a file with SQL
   9    commands that, when fed back to the server, will recreate the database
  10    in the same state as it was at the time of the dump. PostgreSQL
  11    provides the utility program pg_dump for this purpose. The basic usage
  12    of this command is:
  13 pg_dump dbname > dumpfile
  14
  15    As you see, pg_dump writes its result to the standard output. We will
  16    see below how this can be useful. While the above command creates a
  17    text file, pg_dump can create files in other formats that allow for
  18    parallelism and more fine-grained control of object restoration.
  19
  20    pg_dump is a regular PostgreSQL client application (albeit a
  21    particularly clever one). This means that you can perform this backup
  22    procedure from any remote host that has access to the database. But
  23    remember that pg_dump does not operate with special permissions. In
  24    particular, it must have read access to all tables that you want to
  25    back up, so in order to back up the entire database you almost always
  26    have to run it as a database superuser. (If you do not have sufficient
  27    privileges to back up the entire database, you can still back up
  28    portions of the database to which you do have access using options such
  29    as -n schema or -t table.)
  30
  31    To specify which database server pg_dump should contact, use the
  32    command line options -h host and -p port. The default host is the local
  33    host or whatever your PGHOST environment variable specifies. Similarly,
  34    the default port is indicated by the PGPORT environment variable or,
  35    failing that, by the compiled-in default. (Conveniently, the server
  36    will normally have the same compiled-in default.)
  37
  38    Like any other PostgreSQL client application, pg_dump will by default
  39    connect with the database user name that is equal to the current
  40    operating system user name. To override this, either specify the -U
  41    option or set the environment variable PGUSER. Remember that pg_dump
  42    connections are subject to the normal client authentication mechanisms
  43    (which are described in Chapter 20).
  44
  45    An important advantage of pg_dump over the other backup methods
  46    described later is that pg_dump's output can generally be re-loaded
  47    into newer versions of PostgreSQL, whereas file-level backups and
  48    continuous archiving are both extremely server-version-specific.
  49    pg_dump is also the only method that will work when transferring a
  50    database to a different machine architecture, such as going from a
  51    32-bit to a 64-bit server.
  52
  53    Dumps created by pg_dump are internally consistent, meaning, the dump
  54    represents a snapshot of the database at the time pg_dump began
  55    running. pg_dump does not block other operations on the database while
  56    it is working. (Exceptions are those operations that need to operate
  57    with an exclusive lock, such as most forms of ALTER TABLE.)
  58
  59 25.1.1. Restoring the Dump #
  60
  61    Text files created by pg_dump are intended to be read by the psql
  62    program using its default settings. The general command form to restore
  63    a text dump is
  64 psql -X dbname < dumpfile
  65
  66    where dumpfile is the file output by the pg_dump command. The database
  67    dbname will not be created by this command, so you must create it
  68    yourself from template0 before executing psql (e.g., with createdb -T
  69    template0 dbname). To ensure psql runs with its default settings, use
  70    the -X (--no-psqlrc) option. psql supports options similar to pg_dump
  71    for specifying the database server to connect to and the user name to
  72    use. See the psql reference page for more information.
  73
  74    Non-text file dumps should be restored using the pg_restore utility.
  75
  76    Before restoring an SQL dump, all the users who own objects or were
  77    granted permissions on objects in the dumped database must already
  78    exist. If they do not, the restore will fail to recreate the objects
  79    with the original ownership and/or permissions. (Sometimes this is what
  80    you want, but usually it is not.)
  81
  82    By default, the psql script will continue to execute after an SQL error
  83    is encountered. You might wish to run psql with the ON_ERROR_STOP
  84    variable set to alter that behavior and have psql exit with an exit
  85    status of 3 if an SQL error occurs:
  86 psql -X --set ON_ERROR_STOP=on dbname < dumpfile
  87
  88    Either way, you will only have a partially restored database.
  89    Alternatively, you can specify that the whole dump should be restored
  90    as a single transaction, so the restore is either fully completed or
  91    fully rolled back. This mode can be specified by passing the -1 or
  92    --single-transaction command-line options to psql. When using this
  93    mode, be aware that even a minor error can rollback a restore that has
  94    already run for many hours. However, that might still be preferable to
  95    manually cleaning up a complex database after a partially restored
  96    dump.
  97
  98    The ability of pg_dump and psql to write to or read from pipes makes it
  99    possible to dump a database directly from one server to another, for
 100    example:
 101 pg_dump -h host1 dbname | psql -X -h host2 dbname
 102
 103 Important
 104
 105    The dumps produced by pg_dump are relative to template0. This means
 106    that any languages, procedures, etc. added via template1 will also be
 107    dumped by pg_dump. As a result, when restoring, if you are using a
 108    customized template1, you must create the empty database from
 109    template0, as in the example above.
 110
 111    After restoring a backup, it is wise to run ANALYZE on each database so
 112    the query optimizer has useful statistics; see Section 24.1.3 and
 113    Section 24.1.6 for more information. For more advice on how to load
 114    large amounts of data into PostgreSQL efficiently, refer to
 115    Section 14.4.
 116
 117 25.1.2. Using pg_dumpall #
 118
 119    pg_dump dumps only a single database at a time, and it does not dump
 120    information about roles or tablespaces (because those are cluster-wide
 121    rather than per-database). To support convenient dumping of the entire
 122    contents of a database cluster, the pg_dumpall program is provided.
 123    pg_dumpall backs up each database in a given cluster, and also
 124    preserves cluster-wide data such as role and tablespace definitions.
 125    The basic usage of this command is:
 126 pg_dumpall > dumpfile
 127
 128    The resulting dump can be restored with psql:
 129 psql -X -f dumpfile postgres
 130
 131    (Actually, you can specify any existing database name to start from,
 132    but if you are loading into an empty cluster then postgres should
 133    usually be used.) It is always necessary to have database superuser
 134    access when restoring a pg_dumpall dump, as that is required to restore
 135    the role and tablespace information. If you use tablespaces, make sure
 136    that the tablespace paths in the dump are appropriate for the new
 137    installation.
 138
 139    pg_dumpall works by emitting commands to re-create roles, tablespaces,
 140    and empty databases, then invoking pg_dump for each database. This
 141    means that while each database will be internally consistent, the
 142    snapshots of different databases are not synchronized.
 143
 144    Cluster-wide data can be dumped alone using the pg_dumpall
 145    --globals-only option. This is necessary to fully backup the cluster if
 146    running the pg_dump command on individual databases.
 147
 148 25.1.3. Handling Large Databases #
 149
 150    Some operating systems have maximum file size limits that cause
 151    problems when creating large pg_dump output files. Fortunately, pg_dump
 152    can write to the standard output, so you can use standard Unix tools to
 153    work around this potential problem. There are several possible methods:
 154
 155    Use compressed dumps.  You can use your favorite compression program,
 156    for example gzip:
 157 pg_dump dbname | gzip > filename.gz
 158
 159    Reload with:
 160 gunzip -c filename.gz | psql dbname
 161
 162    or:
 163 cat filename.gz | gunzip | psql dbname
 164
 165    Use split.  The split command allows you to split the output into
 166    smaller files that are acceptable in size to the underlying file
 167    system. For example, to make 2 gigabyte chunks:
 168 pg_dump dbname | split -b 2G - filename
 169
 170    Reload with:
 171 cat filename* | psql dbname
 172
 173    If using GNU split, it is possible to use it and gzip together:
 174 pg_dump dbname | split -b 2G --filter='gzip > $FILE.gz'
 175
 176    It can be restored using zcat.
 177
 178    Use pg_dump's custom dump format.  If PostgreSQL was built on a system
 179    with the zlib compression library installed, the custom dump format
 180    will compress data as it writes it to the output file. This will
 181    produce dump file sizes similar to using gzip, but it has the added
 182    advantage that tables can be restored selectively. The following
 183    command dumps a database using the custom dump format:
 184 pg_dump -Fc dbname > filename
 185
 186    A custom-format dump is not a script for psql, but instead must be
 187    restored with pg_restore, for example:
 188 pg_restore -d dbname filename
 189
 190    See the pg_dump and pg_restore reference pages for details.
 191
 192    For very large databases, you might need to combine split with one of
 193    the other two approaches.
 194
 195    Use pg_dump's parallel dump feature.  To speed up the dump of a large
 196    database, you can use pg_dump's parallel mode. This will dump multiple
 197    tables at the same time. You can control the degree of parallelism with
 198    the -j parameter. Parallel dumps are only supported for the "directory"
 199    archive format.
 200 pg_dump -j num -F d -f out.dir dbname
 201
 202    You can use pg_restore -j to restore a dump in parallel. This will work
 203    for any archive of either the "custom" or the "directory" archive mode,
 204    whether or not it has been created with pg_dump -j.