begriffs open source - ai-pg/blob - full-docs/txt/hot-standby.txt

   1
   2 26.4. Hot Standby #
   3
   4    26.4.1. User's Overview
   5    26.4.2. Handling Query Conflicts
   6    26.4.3. Administrator's Overview
   7    26.4.4. Hot Standby Parameter Reference
   8    26.4.5. Caveats
   9
  10    Hot standby is the term used to describe the ability to connect to the
  11    server and run read-only queries while the server is in archive
  12    recovery or standby mode. This is useful both for replication purposes
  13    and for restoring a backup to a desired state with great precision. The
  14    term hot standby also refers to the ability of the server to move from
  15    recovery through to normal operation while users continue running
  16    queries and/or keep their connections open.
  17
  18    Running queries in hot standby mode is similar to normal query
  19    operation, though there are several usage and administrative
  20    differences explained below.
  21
  22 26.4.1. User's Overview #
  23
  24    When the hot_standby parameter is set to true on a standby server, it
  25    will begin accepting connections once the recovery has brought the
  26    system to a consistent state and be ready for hot standby. All such
  27    connections are strictly read-only; not even temporary tables may be
  28    written.
  29
  30    The data on the standby takes some time to arrive from the primary
  31    server so there will be a measurable delay between primary and standby.
  32    Running the same query nearly simultaneously on both primary and
  33    standby might therefore return differing results. We say that data on
  34    the standby is eventually consistent with the primary. Once the commit
  35    record for a transaction is replayed on the standby, the changes made
  36    by that transaction will be visible to any new snapshots taken on the
  37    standby. Snapshots may be taken at the start of each query or at the
  38    start of each transaction, depending on the current transaction
  39    isolation level. For more details, see Section 13.2.
  40
  41    Transactions started during hot standby may issue the following
  42    commands:
  43      * Query access: SELECT, COPY TO
  44      * Cursor commands: DECLARE, FETCH, CLOSE
  45      * Settings: SHOW, SET, RESET
  46      * Transaction management commands:
  47           + BEGIN, END, ABORT, START TRANSACTION
  48           + SAVEPOINT, RELEASE, ROLLBACK TO SAVEPOINT
  49           + EXCEPTION blocks and other internal subtransactions
  50      * LOCK TABLE, though only when explicitly in one of these modes:
  51        ACCESS SHARE, ROW SHARE or ROW EXCLUSIVE.
  52      * Plans and resources: PREPARE, EXECUTE, DEALLOCATE, DISCARD
  53      * Plugins and extensions: LOAD
  54      * UNLISTEN
  55
  56    Transactions started during hot standby will never be assigned a
  57    transaction ID and cannot write to the system write-ahead log.
  58    Therefore, the following actions will produce error messages:
  59      * Data Manipulation Language (DML): INSERT, UPDATE, DELETE, MERGE,
  60        COPY FROM, TRUNCATE. Note that there are no allowed actions that
  61        result in a trigger being executed during recovery. This
  62        restriction applies even to temporary tables, because table rows
  63        cannot be read or written without assigning a transaction ID, which
  64        is currently not possible in a hot standby environment.
  65      * Data Definition Language (DDL): CREATE, DROP, ALTER, COMMENT. This
  66        restriction applies even to temporary tables, because carrying out
  67        these operations would require updating the system catalog tables.
  68      * SELECT ... FOR SHARE | UPDATE, because row locks cannot be taken
  69        without updating the underlying data files.
  70      * Rules on SELECT statements that generate DML commands.
  71      * LOCK that explicitly requests a mode higher than ROW EXCLUSIVE
  72        MODE.
  73      * LOCK in short default form, since it requests ACCESS EXCLUSIVE
  74        MODE.
  75      * Transaction management commands that explicitly set non-read-only
  76        state:
  77           + BEGIN READ WRITE, START TRANSACTION READ WRITE
  78           + SET TRANSACTION READ WRITE, SET SESSION CHARACTERISTICS AS
  79             TRANSACTION READ WRITE
  80           + SET transaction_read_only = off
  81      * Two-phase commit commands: PREPARE TRANSACTION, COMMIT PREPARED,
  82        ROLLBACK PREPARED because even read-only transactions need to write
  83        WAL in the prepare phase (the first phase of two phase commit).
  84      * Sequence updates: nextval(), setval()
  85      * LISTEN, NOTIFY
  86
  87    In normal operation, “read-only” transactions are allowed to use LISTEN
  88    and NOTIFY, so hot standby sessions operate under slightly tighter
  89    restrictions than ordinary read-only sessions. It is possible that some
  90    of these restrictions might be loosened in a future release.
  91
  92    During hot standby, the parameter transaction_read_only is always true
  93    and may not be changed. But as long as no attempt is made to modify the
  94    database, connections during hot standby will act much like any other
  95    database connection. If failover or switchover occurs, the database
  96    will switch to normal processing mode. Sessions will remain connected
  97    while the server changes mode. Once hot standby finishes, it will be
  98    possible to initiate read-write transactions (even from a session begun
  99    during hot standby).
 100
 101    Users can determine whether hot standby is currently active for their
 102    session by issuing SHOW in_hot_standby. (In server versions before 14,
 103    the in_hot_standby parameter did not exist; a workable substitute
 104    method for older servers is SHOW transaction_read_only.) In addition, a
 105    set of functions (Table 9.98) allow users to access information about
 106    the standby server. These allow you to write programs that are aware of
 107    the current state of the database. These can be used to monitor the
 108    progress of recovery, or to allow you to write complex programs that
 109    restore the database to particular states.
 110
 111 26.4.2. Handling Query Conflicts #
 112
 113    The primary and standby servers are in many ways loosely connected.
 114    Actions on the primary will have an effect on the standby. As a result,
 115    there is potential for negative interactions or conflicts between them.
 116    The easiest conflict to understand is performance: if a huge data load
 117    is taking place on the primary then this will generate a similar stream
 118    of WAL records on the standby, so standby queries may contend for
 119    system resources, such as I/O.
 120
 121    There are also additional types of conflict that can occur with hot
 122    standby. These conflicts are hard conflicts in the sense that queries
 123    might need to be canceled and, in some cases, sessions disconnected to
 124    resolve them. The user is provided with several ways to handle these
 125    conflicts. Conflict cases include:
 126      * Access Exclusive locks taken on the primary server, including both
 127        explicit LOCK commands and various DDL actions, conflict with table
 128        accesses in standby queries.
 129      * Dropping a tablespace on the primary conflicts with standby queries
 130        using that tablespace for temporary work files.
 131      * Dropping a database on the primary conflicts with sessions
 132        connected to that database on the standby.
 133      * Application of a vacuum cleanup record from WAL conflicts with
 134        standby transactions whose snapshots can still “see” any of the
 135        rows to be removed.
 136      * Application of a vacuum cleanup record from WAL conflicts with
 137        queries accessing the target page on the standby, whether or not
 138        the data to be removed is visible.
 139
 140    On the primary server, these cases simply result in waiting; and the
 141    user might choose to cancel either of the conflicting actions. However,
 142    on the standby there is no choice: the WAL-logged action already
 143    occurred on the primary so the standby must not fail to apply it.
 144    Furthermore, allowing WAL application to wait indefinitely may be very
 145    undesirable, because the standby's state will become increasingly far
 146    behind the primary's. Therefore, a mechanism is provided to forcibly
 147    cancel standby queries that conflict with to-be-applied WAL records.
 148
 149    An example of the problem situation is an administrator on the primary
 150    server running DROP TABLE on a table that is currently being queried on
 151    the standby server. Clearly the standby query cannot continue if the
 152    DROP TABLE is applied on the standby. If this situation occurred on the
 153    primary, the DROP TABLE would wait until the other query had finished.
 154    But when DROP TABLE is run on the primary, the primary doesn't have
 155    information about what queries are running on the standby, so it will
 156    not wait for any such standby queries. The WAL change records come
 157    through to the standby while the standby query is still running,
 158    causing a conflict. The standby server must either delay application of
 159    the WAL records (and everything after them, too) or else cancel the
 160    conflicting query so that the DROP TABLE can be applied.
 161
 162    When a conflicting query is short, it's typically desirable to allow it
 163    to complete by delaying WAL application for a little bit; but a long
 164    delay in WAL application is usually not desirable. So the cancel
 165    mechanism has parameters, max_standby_archive_delay and
 166    max_standby_streaming_delay, that define the maximum allowed delay in
 167    WAL application. Conflicting queries will be canceled once it has taken
 168    longer than the relevant delay setting to apply any newly-received WAL
 169    data. There are two parameters so that different delay values can be
 170    specified for the case of reading WAL data from an archive (i.e.,
 171    initial recovery from a base backup or “catching up” a standby server
 172    that has fallen far behind) versus reading WAL data via streaming
 173    replication.
 174
 175    In a standby server that exists primarily for high availability, it's
 176    best to set the delay parameters relatively short, so that the server
 177    cannot fall far behind the primary due to delays caused by standby
 178    queries. However, if the standby server is meant for executing
 179    long-running queries, then a high or even infinite delay value may be
 180    preferable. Keep in mind however that a long-running query could cause
 181    other sessions on the standby server to not see recent changes on the
 182    primary, if it delays application of WAL records.
 183
 184    Once the delay specified by max_standby_archive_delay or
 185    max_standby_streaming_delay has been exceeded, conflicting queries will
 186    be canceled. This usually results just in a cancellation error,
 187    although in the case of replaying a DROP DATABASE the entire
 188    conflicting session will be terminated. Also, if the conflict is over a
 189    lock held by an idle transaction, the conflicting session is terminated
 190    (this behavior might change in the future).
 191
 192    Canceled queries may be retried immediately (after beginning a new
 193    transaction, of course). Since query cancellation depends on the nature
 194    of the WAL records being replayed, a query that was canceled may well
 195    succeed if it is executed again.
 196
 197    Keep in mind that the delay parameters are compared to the elapsed time
 198    since the WAL data was received by the standby server. Thus, the grace
 199    period allowed to any one query on the standby is never more than the
 200    delay parameter, and could be considerably less if the standby has
 201    already fallen behind as a result of waiting for previous queries to
 202    complete, or as a result of being unable to keep up with a heavy update
 203    load.
 204
 205    The most common reason for conflict between standby queries and WAL
 206    replay is “early cleanup”. Normally, PostgreSQL allows cleanup of old
 207    row versions when there are no transactions that need to see them to
 208    ensure correct visibility of data according to MVCC rules. However,
 209    this rule can only be applied for transactions executing on the
 210    primary. So it is possible that cleanup on the primary will remove row
 211    versions that are still visible to a transaction on the standby.
 212
 213    Row version cleanup isn't the only potential cause of conflicts with
 214    standby queries. All index-only scans (including those that run on
 215    standbys) must use an MVCC snapshot that “agrees” with the visibility
 216    map. Conflicts are therefore required whenever VACUUM sets a page as
 217    all-visible in the visibility map containing one or more rows not
 218    visible to all standby queries. So even running VACUUM against a table
 219    with no updated or deleted rows requiring cleanup might lead to
 220    conflicts.
 221
 222    Users should be clear that tables that are regularly and heavily
 223    updated on the primary server will quickly cause cancellation of longer
 224    running queries on the standby. In such cases the setting of a finite
 225    value for max_standby_archive_delay or max_standby_streaming_delay can
 226    be considered similar to setting statement_timeout.
 227
 228    Remedial possibilities exist if the number of standby-query
 229    cancellations is found to be unacceptable. The first option is to set
 230    the parameter hot_standby_feedback, which prevents VACUUM from removing
 231    recently-dead rows and so cleanup conflicts do not occur. If you do
 232    this, you should note that this will delay cleanup of dead rows on the
 233    primary, which may result in undesirable table bloat. However, the
 234    cleanup situation will be no worse than if the standby queries were
 235    running directly on the primary server, and you are still getting the
 236    benefit of off-loading execution onto the standby. If standby servers
 237    connect and disconnect frequently, you might want to make adjustments
 238    to handle the period when hot_standby_feedback feedback is not being
 239    provided. For example, consider increasing max_standby_archive_delay so
 240    that queries are not rapidly canceled by conflicts in WAL archive files
 241    during disconnected periods. You should also consider increasing
 242    max_standby_streaming_delay to avoid rapid cancellations by
 243    newly-arrived streaming WAL entries after reconnection.
 244
 245    The number of query cancels and the reason for them can be viewed using
 246    the pg_stat_database_conflicts system view on the standby server. The
 247    pg_stat_database system view also contains summary information.
 248
 249    Users can control whether a log message is produced when WAL replay is
 250    waiting longer than deadlock_timeout for conflicts. This is controlled
 251    by the log_recovery_conflict_waits parameter.
 252
 253 26.4.3. Administrator's Overview #
 254
 255    If hot_standby is on in postgresql.conf (the default value) and there
 256    is a standby.signal file present, the server will run in hot standby
 257    mode. However, it may take some time for hot standby connections to be
 258    allowed, because the server will not accept connections until it has
 259    completed sufficient recovery to provide a consistent state against
 260    which queries can run. During this period, clients that attempt to
 261    connect will be refused with an error message. To confirm the server
 262    has come up, either loop trying to connect from the application, or
 263    look for these messages in the server logs:
 264 LOG:  entering standby mode
 265
 266 ... then some time later ...
 267
 268 LOG:  consistent recovery state reached
 269 LOG:  database system is ready to accept read-only connections
 270
 271    Consistency information is recorded once per checkpoint on the primary.
 272    It is not possible to enable hot standby when reading WAL written
 273    during a period when wal_level was not set to replica or logical on the
 274    primary. Even after reaching a consistent state, the recovery snapshot
 275    may not be ready for hot standby if both of the following conditions
 276    are met, delaying accepting read-only connections. To enable hot
 277    standby, long-lived write transactions with more than 64
 278    subtransactions need to be closed on the primary.
 279      * A write transaction has more than 64 subtransactions
 280      * Very long-lived write transactions
 281
 282    If you are running file-based log shipping ("warm standby"), you might
 283    need to wait until the next WAL file arrives, which could be as long as
 284    the archive_timeout setting on the primary.
 285
 286    The settings of some parameters determine the size of shared memory for
 287    tracking transaction IDs, locks, and prepared transactions. These
 288    shared memory structures must be no smaller on a standby than on the
 289    primary in order to ensure that the standby does not run out of shared
 290    memory during recovery. For example, if the primary had used a prepared
 291    transaction but the standby had not allocated any shared memory for
 292    tracking prepared transactions, then recovery could not continue until
 293    the standby's configuration is changed. The parameters affected are:
 294      * max_connections
 295      * max_prepared_transactions
 296      * max_locks_per_transaction
 297      * max_wal_senders
 298      * max_worker_processes
 299
 300    The easiest way to ensure this does not become a problem is to have
 301    these parameters set on the standbys to values equal to or greater than
 302    on the primary. Therefore, if you want to increase these values, you
 303    should do so on all standby servers first, before applying the changes
 304    to the primary server. Conversely, if you want to decrease these
 305    values, you should do so on the primary server first, before applying
 306    the changes to all standby servers. Keep in mind that when a standby is
 307    promoted, it becomes the new reference for the required parameter
 308    settings for the standbys that follow it. Therefore, to avoid this
 309    becoming a problem during a switchover or failover, it is recommended
 310    to keep these settings the same on all standby servers.
 311
 312    The WAL tracks changes to these parameters on the primary. If a hot
 313    standby processes WAL that indicates that the current value on the
 314    primary is higher than its own value, it will log a warning and pause
 315    recovery, for example:
 316 WARNING:  hot standby is not possible because of insufficient parameter settings
 317 DETAIL:  max_connections = 80 is a lower setting than on the primary server, whe
 318 re its value was 100.
 319 LOG:  recovery has paused
 320 DETAIL:  If recovery is unpaused, the server will shut down.
 321 HINT:  You can then restart the server after making the necessary configuration
 322 changes.
 323
 324    At that point, the settings on the standby need to be updated and the
 325    instance restarted before recovery can continue. If the standby is not
 326    a hot standby, then when it encounters the incompatible parameter
 327    change, it will shut down immediately without pausing, since there is
 328    then no value in keeping it up.
 329
 330    It is important that the administrator select appropriate settings for
 331    max_standby_archive_delay and max_standby_streaming_delay. The best
 332    choices vary depending on business priorities. For example if the
 333    server is primarily tasked as a High Availability server, then you will
 334    want low delay settings, perhaps even zero, though that is a very
 335    aggressive setting. If the standby server is tasked as an additional
 336    server for decision support queries then it might be acceptable to set
 337    the maximum delay values to many hours, or even -1 which means wait
 338    forever for queries to complete.
 339
 340    Transaction status "hint bits" written on the primary are not
 341    WAL-logged, so data on the standby will likely re-write the hints again
 342    on the standby. Thus, the standby server will still perform disk writes
 343    even though all users are read-only; no changes occur to the data
 344    values themselves. Users will still write large sort temporary files
 345    and re-generate relcache info files, so no part of the database is
 346    truly read-only during hot standby mode. Note also that writes to
 347    remote databases using dblink module, and other operations outside the
 348    database using PL functions will still be possible, even though the
 349    transaction is read-only locally.
 350
 351    The following types of administration commands are not accepted during
 352    recovery mode:
 353      * Data Definition Language (DDL): e.g., CREATE INDEX
 354      * Privilege and Ownership: GRANT, REVOKE, REASSIGN
 355      * Maintenance commands: ANALYZE, VACUUM, CLUSTER, REINDEX
 356
 357    Again, note that some of these commands are actually allowed during
 358    "read only" mode transactions on the primary.
 359
 360    As a result, you cannot create additional indexes that exist solely on
 361    the standby, nor statistics that exist solely on the standby. If these
 362    administration commands are needed, they should be executed on the
 363    primary, and eventually those changes will propagate to the standby.
 364
 365    pg_cancel_backend() and pg_terminate_backend() will work on user
 366    backends, but not the startup process, which performs recovery.
 367    pg_stat_activity does not show recovering transactions as active. As a
 368    result, pg_prepared_xacts is always empty during recovery. If you wish
 369    to resolve in-doubt prepared transactions, view pg_prepared_xacts on
 370    the primary and issue commands to resolve transactions there or resolve
 371    them after the end of recovery.
 372
 373    pg_locks will show locks held by backends, as normal. pg_locks also
 374    shows a virtual transaction managed by the startup process that owns
 375    all AccessExclusiveLocks held by transactions being replayed by
 376    recovery. Note that the startup process does not acquire locks to make
 377    database changes, and thus locks other than AccessExclusiveLocks do not
 378    show in pg_locks for the Startup process; they are just presumed to
 379    exist.
 380
 381    The Nagios plugin check_pgsql will work, because the simple information
 382    it checks for exists. The check_postgres monitoring script will also
 383    work, though some reported values could give different or confusing
 384    results. For example, last vacuum time will not be maintained, since no
 385    vacuum occurs on the standby. Vacuums running on the primary do still
 386    send their changes to the standby.
 387
 388    WAL file control commands will not work during recovery, e.g.,
 389    pg_backup_start, pg_switch_wal etc.
 390
 391    Dynamically loadable modules work, including pg_stat_statements.
 392
 393    Advisory locks work normally in recovery, including deadlock detection.
 394    Note that advisory locks are never WAL logged, so it is impossible for
 395    an advisory lock on either the primary or the standby to conflict with
 396    WAL replay. Nor is it possible to acquire an advisory lock on the
 397    primary and have it initiate a similar advisory lock on the standby.
 398    Advisory locks relate only to the server on which they are acquired.
 399
 400    Trigger-based replication systems such as Slony, Londiste and Bucardo
 401    won't run on the standby at all, though they will run happily on the
 402    primary server as long as the changes are not sent to standby servers
 403    to be applied. WAL replay is not trigger-based so you cannot relay from
 404    the standby to any system that requires additional database writes or
 405    relies on the use of triggers.
 406
 407    New OIDs cannot be assigned, though some UUID generators may still work
 408    as long as they do not rely on writing new status to the database.
 409
 410    Currently, temporary table creation is not allowed during read-only
 411    transactions, so in some cases existing scripts will not run correctly.
 412    This restriction might be relaxed in a later release. This is both an
 413    SQL standard compliance issue and a technical issue.
 414
 415    DROP TABLESPACE can only succeed if the tablespace is empty. Some
 416    standby users may be actively using the tablespace via their
 417    temp_tablespaces parameter. If there are temporary files in the
 418    tablespace, all active queries are canceled to ensure that temporary
 419    files are removed, so the tablespace can be removed and WAL replay can
 420    continue.
 421
 422    Running DROP DATABASE or ALTER DATABASE ... SET TABLESPACE on the
 423    primary will generate a WAL entry that will cause all users connected
 424    to that database on the standby to be forcibly disconnected. This
 425    action occurs immediately, whatever the setting of
 426    max_standby_streaming_delay. Note that ALTER DATABASE ... RENAME does
 427    not disconnect users, which in most cases will go unnoticed, though
 428    might in some cases cause a program confusion if it depends in some way
 429    upon database name.
 430
 431    In normal (non-recovery) mode, if you issue DROP USER or DROP ROLE for
 432    a role with login capability while that user is still connected then
 433    nothing happens to the connected user — they remain connected. The user
 434    cannot reconnect however. This behavior applies in recovery also, so a
 435    DROP USER on the primary does not disconnect that user on the standby.
 436
 437    The cumulative statistics system is active during recovery. All scans,
 438    reads, blocks, index usage, etc., will be recorded normally on the
 439    standby. However, WAL replay will not increment relation and database
 440    specific counters. I.e. replay will not increment pg_stat_all_tables
 441    columns (like n_tup_ins), nor will reads or writes performed by the
 442    startup process be tracked in the pg_statio_ views, nor will associated
 443    pg_stat_database columns be incremented.
 444
 445    Autovacuum is not active during recovery. It will start normally at the
 446    end of recovery.
 447
 448    The checkpointer process and the background writer process are active
 449    during recovery. The checkpointer process will perform restartpoints
 450    (similar to checkpoints on the primary) and the background writer
 451    process will perform normal block cleaning activities. This can include
 452    updates of the hint bit information stored on the standby server. The
 453    CHECKPOINT command is accepted during recovery, though it performs a
 454    restartpoint rather than a new checkpoint.
 455
 456 26.4.4. Hot Standby Parameter Reference #
 457
 458    Various parameters have been mentioned above in Section 26.4.2 and
 459    Section 26.4.3.
 460
 461    On the primary, the wal_level parameter can be used.
 462    max_standby_archive_delay and max_standby_streaming_delay have no
 463    effect if set on the primary.
 464
 465    On the standby, parameters hot_standby, max_standby_archive_delay and
 466    max_standby_streaming_delay can be used.
 467
 468 26.4.5. Caveats #
 469
 470    There are several limitations of hot standby. These can and probably
 471    will be fixed in future releases:
 472      * Full knowledge of running transactions is required before snapshots
 473        can be taken. Transactions that use large numbers of
 474        subtransactions (currently greater than 64) will delay the start of
 475        read-only connections until the completion of the longest running
 476        write transaction. If this situation occurs, explanatory messages
 477        will be sent to the server log.
 478      * Valid starting points for standby queries are generated at each
 479        checkpoint on the primary. If the standby is shut down while the
 480        primary is in a shutdown state, it might not be possible to
 481        re-enter hot standby until the primary is started up, so that it
 482        generates further starting points in the WAL logs. This situation
 483        isn't a problem in the most common situations where it might
 484        happen. Generally, if the primary is shut down and not available
 485        anymore, that's likely due to a serious failure that requires the
 486        standby being converted to operate as the new primary anyway. And
 487        in situations where the primary is being intentionally taken down,
 488        coordinating to make sure the standby becomes the new primary
 489        smoothly is also standard procedure.
 490      * At the end of recovery, AccessExclusiveLocks held by prepared
 491        transactions will require twice the normal number of lock table
 492        entries. If you plan on running either a large number of concurrent
 493        prepared transactions that normally take AccessExclusiveLocks, or
 494        you plan on having one large transaction that takes many
 495        AccessExclusiveLocks, you are advised to select a larger value of
 496        max_locks_per_transaction, perhaps as much as twice the value of
 497        the parameter on the primary server. You need not consider this at
 498        all if your setting of max_prepared_transactions is 0.
 499      * The Serializable transaction isolation level is not yet available
 500        in hot standby. (See Section 13.2.3 and Section 13.4.1 for
 501        details.) An attempt to set a transaction to the serializable
 502        isolation level in hot standby mode will generate an error.