begriffs open source - ai-pg/blob - full-docs/txt/warm-standby.txt

   1
   2 26.2. Log-Shipping Standby Servers #
   3
   4    26.2.1. Planning
   5    26.2.2. Standby Server Operation
   6    26.2.3. Preparing the Primary for Standby Servers
   7    26.2.4. Setting Up a Standby Server
   8    26.2.5. Streaming Replication
   9    26.2.6. Replication Slots
  10    26.2.7. Cascading Replication
  11    26.2.8. Synchronous Replication
  12    26.2.9. Continuous Archiving in Standby
  13
  14    Continuous archiving can be used to create a high availability (HA)
  15    cluster configuration with one or more standby servers ready to take
  16    over operations if the primary server fails. This capability is widely
  17    referred to as warm standby or log shipping.
  18
  19    The primary and standby server work together to provide this
  20    capability, though the servers are only loosely coupled. The primary
  21    server operates in continuous archiving mode, while each standby server
  22    operates in continuous recovery mode, reading the WAL files from the
  23    primary. No changes to the database tables are required to enable this
  24    capability, so it offers low administration overhead compared to some
  25    other replication solutions. This configuration also has relatively low
  26    performance impact on the primary server.
  27
  28    Directly moving WAL records from one database server to another is
  29    typically described as log shipping. PostgreSQL implements file-based
  30    log shipping by transferring WAL records one file (WAL segment) at a
  31    time. WAL files (16MB) can be shipped easily and cheaply over any
  32    distance, whether it be to an adjacent system, another system at the
  33    same site, or another system on the far side of the globe. The
  34    bandwidth required for this technique varies according to the
  35    transaction rate of the primary server. Record-based log shipping is
  36    more granular and streams WAL changes incrementally over a network
  37    connection (see Section 26.2.5).
  38
  39    It should be noted that log shipping is asynchronous, i.e., the WAL
  40    records are shipped after transaction commit. As a result, there is a
  41    window for data loss should the primary server suffer a catastrophic
  42    failure; transactions not yet shipped will be lost. The size of the
  43    data loss window in file-based log shipping can be limited by use of
  44    the archive_timeout parameter, which can be set as low as a few
  45    seconds. However such a low setting will substantially increase the
  46    bandwidth required for file shipping. Streaming replication (see
  47    Section 26.2.5) allows a much smaller window of data loss.
  48
  49    Recovery performance is sufficiently good that the standby will
  50    typically be only moments away from full availability once it has been
  51    activated. As a result, this is called a warm standby configuration
  52    which offers high availability. Restoring a server from an archived
  53    base backup and rollforward will take considerably longer, so that
  54    technique only offers a solution for disaster recovery, not high
  55    availability. A standby server can also be used for read-only queries,
  56    in which case it is called a hot standby server. See Section 26.4 for
  57    more information.
  58
  59 26.2.1. Planning #
  60
  61    It is usually wise to create the primary and standby servers so that
  62    they are as similar as possible, at least from the perspective of the
  63    database server. In particular, the path names associated with
  64    tablespaces will be passed across unmodified, so both primary and
  65    standby servers must have the same mount paths for tablespaces if that
  66    feature is used. Keep in mind that if CREATE TABLESPACE is executed on
  67    the primary, any new mount point needed for it must be created on the
  68    primary and all standby servers before the command is executed.
  69    Hardware need not be exactly the same, but experience shows that
  70    maintaining two identical systems is easier than maintaining two
  71    dissimilar ones over the lifetime of the application and system. In any
  72    case the hardware architecture must be the same — shipping from, say, a
  73    32-bit to a 64-bit system will not work.
  74
  75    In general, log shipping between servers running different major
  76    PostgreSQL release levels is not possible. It is the policy of the
  77    PostgreSQL Global Development Group not to make changes to disk formats
  78    during minor release upgrades, so it is likely that running different
  79    minor release levels on primary and standby servers will work
  80    successfully. However, no formal support for that is offered and you
  81    are advised to keep primary and standby servers at the same release
  82    level as much as possible. When updating to a new minor release, the
  83    safest policy is to update the standby servers first — a new minor
  84    release is more likely to be able to read WAL files from a previous
  85    minor release than vice versa.
  86
  87 26.2.2. Standby Server Operation #
  88
  89    A server enters standby mode if a standby.signal file exists in the
  90    data directory when the server is started.
  91
  92    In standby mode, the server continuously applies WAL received from the
  93    primary server. The standby server can read WAL from a WAL archive (see
  94    restore_command) or directly from the primary over a TCP connection
  95    (streaming replication). The standby server will also attempt to
  96    restore any WAL found in the standby cluster's pg_wal directory. That
  97    typically happens after a server restart, when the standby replays
  98    again WAL that was streamed from the primary before the restart, but
  99    you can also manually copy files to pg_wal at any time to have them
 100    replayed.
 101
 102    At startup, the standby begins by restoring all WAL available in the
 103    archive location, calling restore_command. Once it reaches the end of
 104    WAL available there and restore_command fails, it tries to restore any
 105    WAL available in the pg_wal directory. If that fails, and streaming
 106    replication has been configured, the standby tries to connect to the
 107    primary server and start streaming WAL from the last valid record found
 108    in archive or pg_wal. If that fails or streaming replication is not
 109    configured, or if the connection is later disconnected, the standby
 110    goes back to step 1 and tries to restore the file from the archive
 111    again. This loop of retries from the archive, pg_wal, and via streaming
 112    replication goes on until the server is stopped or is promoted.
 113
 114    Standby mode is exited and the server switches to normal operation when
 115    pg_ctl promote is run, or pg_promote() is called. Before failover, any
 116    WAL immediately available in the archive or in pg_wal will be restored,
 117    but no attempt is made to connect to the primary.
 118
 119 26.2.3. Preparing the Primary for Standby Servers #
 120
 121    Set up continuous archiving on the primary to an archive directory
 122    accessible from the standby, as described in Section 25.3. The archive
 123    location should be accessible from the standby even when the primary is
 124    down, i.e., it should reside on the standby server itself or another
 125    trusted server, not on the primary server.
 126
 127    If you want to use streaming replication, set up authentication on the
 128    primary server to allow replication connections from the standby
 129    server(s); that is, create a role and provide a suitable entry or
 130    entries in pg_hba.conf with the database field set to replication. Also
 131    ensure max_wal_senders is set to a sufficiently large value in the
 132    configuration file of the primary server. If replication slots will be
 133    used, ensure that max_replication_slots is set sufficiently high as
 134    well.
 135
 136    Take a base backup as described in Section 25.3.2 to bootstrap the
 137    standby server.
 138
 139 26.2.4. Setting Up a Standby Server #
 140
 141    To set up the standby server, restore the base backup taken from
 142    primary server (see Section 25.3.5). Create a file standby.signal in
 143    the standby's cluster data directory. Set restore_command to a simple
 144    command to copy files from the WAL archive. If you plan to have
 145    multiple standby servers for high availability purposes, make sure that
 146    recovery_target_timeline is set to latest (the default), to make the
 147    standby server follow the timeline change that occurs at failover to
 148    another standby.
 149
 150 Note
 151
 152    restore_command should return immediately if the file does not exist;
 153    the server will retry the command again if necessary.
 154
 155    If you want to use streaming replication, fill in primary_conninfo with
 156    a libpq connection string, including the host name (or IP address) and
 157    any additional details needed to connect to the primary server. If the
 158    primary needs a password for authentication, the password needs to be
 159    specified in primary_conninfo as well.
 160
 161    If you're setting up the standby server for high availability purposes,
 162    set up WAL archiving, connections and authentication like the primary
 163    server, because the standby server will work as a primary server after
 164    failover.
 165
 166    If you're using a WAL archive, its size can be minimized using the
 167    archive_cleanup_command parameter to remove files that are no longer
 168    required by the standby server. The pg_archivecleanup utility is
 169    designed specifically to be used with archive_cleanup_command in
 170    typical single-standby configurations, see pg_archivecleanup. Note
 171    however, that if you're using the archive for backup purposes, you need
 172    to retain files needed to recover from at least the latest base backup,
 173    even if they're no longer needed by the standby.
 174
 175    A simple example of configuration is:
 176 primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass option
 177 s=''-c wal_sender_timeout=5000'''
 178 restore_command = 'cp /path/to/archive/%f %p'
 179 archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'
 180
 181    You can have any number of standby servers, but if you use streaming
 182    replication, make sure you set max_wal_senders high enough in the
 183    primary to allow them to be connected simultaneously.
 184
 185 26.2.5. Streaming Replication #
 186
 187    Streaming replication allows a standby server to stay more up-to-date
 188    than is possible with file-based log shipping. The standby connects to
 189    the primary, which streams WAL records to the standby as they're
 190    generated, without waiting for the WAL file to be filled.
 191
 192    Streaming replication is asynchronous by default (see Section 26.2.8),
 193    in which case there is a small delay between committing a transaction
 194    in the primary and the changes becoming visible in the standby. This
 195    delay is however much smaller than with file-based log shipping,
 196    typically under one second assuming the standby is powerful enough to
 197    keep up with the load. With streaming replication, archive_timeout is
 198    not required to reduce the data loss window.
 199
 200    If you use streaming replication without file-based continuous
 201    archiving, the server might recycle old WAL segments before the standby
 202    has received them. If this occurs, the standby will need to be
 203    reinitialized from a new base backup. You can avoid this by setting
 204    wal_keep_size to a value large enough to ensure that WAL segments are
 205    not recycled too early, or by configuring a replication slot for the
 206    standby. If you set up a WAL archive that's accessible from the
 207    standby, these solutions are not required, since the standby can always
 208    use the archive to catch up provided it retains enough segments.
 209
 210    To use streaming replication, set up a file-based log-shipping standby
 211    server as described in Section 26.2. The step that turns a file-based
 212    log-shipping standby into streaming replication standby is setting the
 213    primary_conninfo setting to point to the primary server. Set
 214    listen_addresses and authentication options (see pg_hba.conf) on the
 215    primary so that the standby server can connect to the replication
 216    pseudo-database on the primary server (see Section 26.2.5.1).
 217
 218    On systems that support the keepalive socket option, setting
 219    tcp_keepalives_idle, tcp_keepalives_interval and tcp_keepalives_count
 220    helps the primary promptly notice a broken connection.
 221
 222    Set the maximum number of concurrent connections from the standby
 223    servers (see max_wal_senders for details).
 224
 225    When the standby is started and primary_conninfo is set correctly, the
 226    standby will connect to the primary after replaying all WAL files
 227    available in the archive. If the connection is established
 228    successfully, you will see a walreceiver in the standby, and a
 229    corresponding walsender process in the primary.
 230
 231 26.2.5.1. Authentication #
 232
 233    It is very important that the access privileges for replication be set
 234    up so that only trusted users can read the WAL stream, because it is
 235    easy to extract privileged information from it. Standby servers must
 236    authenticate to the primary as an account that has the REPLICATION
 237    privilege or a superuser. It is recommended to create a dedicated user
 238    account with REPLICATION and LOGIN privileges for replication. While
 239    REPLICATION privilege gives very high permissions, it does not allow
 240    the user to modify any data on the primary system, which the SUPERUSER
 241    privilege does.
 242
 243    Client authentication for replication is controlled by a pg_hba.conf
 244    record specifying replication in the database field. For example, if
 245    the standby is running on host IP 192.168.1.100 and the account name
 246    for replication is foo, the administrator can add the following line to
 247    the pg_hba.conf file on the primary:
 248 # Allow the user "foo" from host 192.168.1.100 to connect to the primary
 249 # as a replication standby if the user's password is correctly supplied.
 250 #
 251 # TYPE  DATABASE        USER            ADDRESS                 METHOD
 252 host    replication     foo             192.168.1.100/32        md5
 253
 254    The host name and port number of the primary, connection user name, and
 255    password are specified in the primary_conninfo. The password can also
 256    be set in the ~/.pgpass file on the standby (specify replication in the
 257    database field). For example, if the primary is running on host IP
 258    192.168.1.50, port 5432, the account name for replication is foo, and
 259    the password is foopass, the administrator can add the following line
 260    to the postgresql.conf file on the standby:
 261 # The standby connects to the primary that is running on host 192.168.1.50
 262 # and port 5432 as the user "foo" whose password is "foopass".
 263 primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
 264
 265 26.2.5.2. Monitoring #
 266
 267    An important health indicator of streaming replication is the amount of
 268    WAL records generated in the primary, but not yet applied in the
 269    standby. You can calculate this lag by comparing the current WAL write
 270    location on the primary with the last WAL location received by the
 271    standby. These locations can be retrieved using pg_current_wal_lsn on
 272    the primary and pg_last_wal_receive_lsn on the standby, respectively
 273    (see Table 9.97 and Table 9.98 for details). The last WAL receive
 274    location in the standby is also displayed in the process status of the
 275    WAL receiver process, displayed using the ps command (see Section 27.1
 276    for details).
 277
 278    You can retrieve a list of WAL sender processes via the
 279    pg_stat_replication view. Large differences between pg_current_wal_lsn
 280    and the view's sent_lsn field might indicate that the primary server is
 281    under heavy load, while differences between sent_lsn and
 282    pg_last_wal_receive_lsn on the standby might indicate network delay, or
 283    that the standby is under heavy load.
 284
 285    On a hot standby, the status of the WAL receiver process can be
 286    retrieved via the pg_stat_wal_receiver view. A large difference between
 287    pg_last_wal_replay_lsn and the view's flushed_lsn indicates that WAL is
 288    being received faster than it can be replayed.
 289
 290 26.2.6. Replication Slots #
 291
 292    Replication slots provide an automated way to ensure that the primary
 293    server does not remove WAL segments until they have been received by
 294    all standbys, and that the primary does not remove rows which could
 295    cause a recovery conflict even when the standby is disconnected.
 296
 297    In lieu of using replication slots, it is possible to prevent the
 298    removal of old WAL segments using wal_keep_size, or by storing the
 299    segments in an archive using archive_command or archive_library. A
 300    disadvantage of these methods is that they often result in retaining
 301    more WAL segments than required, whereas replication slots retain only
 302    the number of segments known to be needed.
 303
 304    Similarly, hot_standby_feedback on its own, without also using a
 305    replication slot, provides protection against relevant rows being
 306    removed by vacuum, but provides no protection during any time period
 307    when the standby is not connected.
 308
 309 Caution
 310
 311    Beware that replication slots can cause the server to retain so many
 312    WAL segments that they fill up the space allocated for pg_wal.
 313    max_slot_wal_keep_size can be used to limit the size of WAL files
 314    retained by replication slots.
 315
 316 26.2.6.1. Querying and Manipulating Replication Slots #
 317
 318    Each replication slot has a name, which can contain lower-case letters,
 319    numbers, and the underscore character.
 320
 321    Existing replication slots and their state can be seen in the
 322    pg_replication_slots view.
 323
 324    Slots can be created and dropped either via the streaming replication
 325    protocol (see Section 54.4) or via SQL functions (see Section 9.28.6).
 326
 327 26.2.6.2. Configuration Example #
 328
 329    You can create a replication slot like this:
 330 postgres=# SELECT * FROM pg_create_physical_replication_slot('node_a_slot');
 331   slot_name  | lsn
 332 -------------+-----
 333  node_a_slot |
 334
 335 postgres=# SELECT slot_name, slot_type, active FROM pg_replication_slots;
 336   slot_name  | slot_type | active
 337 -------------+-----------+--------
 338  node_a_slot | physical  | f
 339 (1 row)
 340
 341    To configure the standby to use this slot, primary_slot_name should be
 342    configured on the standby. Here is a simple example:
 343 primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
 344 primary_slot_name = 'node_a_slot'
 345
 346 26.2.7. Cascading Replication #
 347
 348    The cascading replication feature allows a standby server to accept
 349    replication connections and stream WAL records to other standbys,
 350    acting as a relay. This can be used to reduce the number of direct
 351    connections to the primary and also to minimize inter-site bandwidth
 352    overheads.
 353
 354    A standby acting as both a receiver and a sender is known as a
 355    cascading standby. Standbys that are more directly connected to the
 356    primary are known as upstream servers, while those standby servers
 357    further away are downstream servers. Cascading replication does not
 358    place limits on the number or arrangement of downstream servers, though
 359    each standby connects to only one upstream server which eventually
 360    links to a single primary server.
 361
 362    A cascading standby sends not only WAL records received from the
 363    primary but also those restored from the archive. So even if the
 364    replication connection in some upstream connection is terminated,
 365    streaming replication continues downstream for as long as new WAL
 366    records are available.
 367
 368    Cascading replication is currently asynchronous. Synchronous
 369    replication (see Section 26.2.8) settings have no effect on cascading
 370    replication at present.
 371
 372    Hot standby feedback propagates upstream, whatever the cascaded
 373    arrangement.
 374
 375    If an upstream standby server is promoted to become the new primary,
 376    downstream servers will continue to stream from the new primary if
 377    recovery_target_timeline is set to 'latest' (the default).
 378
 379    To use cascading replication, set up the cascading standby so that it
 380    can accept replication connections (that is, set max_wal_senders and
 381    hot_standby, and configure host-based authentication). You will also
 382    need to set primary_conninfo in the downstream standby to point to the
 383    cascading standby.
 384
 385 26.2.8. Synchronous Replication #
 386
 387    PostgreSQL streaming replication is asynchronous by default. If the
 388    primary server crashes then some transactions that were committed may
 389    not have been replicated to the standby server, causing data loss. The
 390    amount of data loss is proportional to the replication delay at the
 391    time of failover.
 392
 393    Synchronous replication offers the ability to confirm that all changes
 394    made by a transaction have been transferred to one or more synchronous
 395    standby servers. This extends that standard level of durability offered
 396    by a transaction commit. This level of protection is referred to as
 397    2-safe replication in computer science theory, and group-1-safe
 398    (group-safe and 1-safe) when synchronous_commit is set to remote_write.
 399
 400    When requesting synchronous replication, each commit of a write
 401    transaction will wait until confirmation is received that the commit
 402    has been written to the write-ahead log on disk of both the primary and
 403    standby server. The only possibility that data can be lost is if both
 404    the primary and the standby suffer crashes at the same time. This can
 405    provide a much higher level of durability, though only if the sysadmin
 406    is cautious about the placement and management of the two servers.
 407    Waiting for confirmation increases the user's confidence that the
 408    changes will not be lost in the event of server crashes but it also
 409    necessarily increases the response time for the requesting transaction.
 410    The minimum wait time is the round-trip time between primary and
 411    standby.
 412
 413    Read-only transactions and transaction rollbacks need not wait for
 414    replies from standby servers. Subtransaction commits do not wait for
 415    responses from standby servers, only top-level commits. Long running
 416    actions such as data loading or index building do not wait until the
 417    very final commit message. All two-phase commit actions require commit
 418    waits, including both prepare and commit.
 419
 420    A synchronous standby can be a physical replication standby or a
 421    logical replication subscriber. It can also be any other physical or
 422    logical WAL replication stream consumer that knows how to send the
 423    appropriate feedback messages. Besides the built-in physical and
 424    logical replication systems, this includes special programs such as
 425    pg_receivewal and pg_recvlogical as well as some third-party
 426    replication systems and custom programs. Check the respective
 427    documentation for details on synchronous replication support.
 428
 429 26.2.8.1. Basic Configuration #
 430
 431    Once streaming replication has been configured, configuring synchronous
 432    replication requires only one additional configuration step:
 433    synchronous_standby_names must be set to a non-empty value.
 434    synchronous_commit must also be set to on, but since this is the
 435    default value, typically no change is required. (See Section 19.5.1 and
 436    Section 19.6.2.) This configuration will cause each commit to wait for
 437    confirmation that the standby has written the commit record to durable
 438    storage. synchronous_commit can be set by individual users, so it can
 439    be configured in the configuration file, for particular users or
 440    databases, or dynamically by applications, in order to control the
 441    durability guarantee on a per-transaction basis.
 442
 443    After a commit record has been written to disk on the primary, the WAL
 444    record is then sent to the standby. The standby sends reply messages
 445    each time a new batch of WAL data is written to disk, unless
 446    wal_receiver_status_interval is set to zero on the standby. In the case
 447    that synchronous_commit is set to remote_apply, the standby sends reply
 448    messages when the commit record is replayed, making the transaction
 449    visible. If the standby is chosen as a synchronous standby, according
 450    to the setting of synchronous_standby_names on the primary, the reply
 451    messages from that standby will be considered along with those from
 452    other synchronous standbys to decide when to release transactions
 453    waiting for confirmation that the commit record has been received.
 454    These parameters allow the administrator to specify which standby
 455    servers should be synchronous standbys. Note that the configuration of
 456    synchronous replication is mainly on the primary. Named standbys must
 457    be directly connected to the primary; the primary knows nothing about
 458    downstream standby servers using cascaded replication.
 459
 460    Setting synchronous_commit to remote_write will cause each commit to
 461    wait for confirmation that the standby has received the commit record
 462    and written it out to its own operating system, but not for the data to
 463    be flushed to disk on the standby. This setting provides a weaker
 464    guarantee of durability than on does: the standby could lose the data
 465    in the event of an operating system crash, though not a PostgreSQL
 466    crash. However, it's a useful setting in practice because it can
 467    decrease the response time for the transaction. Data loss could only
 468    occur if both the primary and the standby crash and the database of the
 469    primary gets corrupted at the same time.
 470
 471    Setting synchronous_commit to remote_apply will cause each commit to
 472    wait until the current synchronous standbys report that they have
 473    replayed the transaction, making it visible to user queries. In simple
 474    cases, this allows for load balancing with causal consistency.
 475
 476    Users will stop waiting if a fast shutdown is requested. However, as
 477    when using asynchronous replication, the server will not fully shutdown
 478    until all outstanding WAL records are transferred to the currently
 479    connected standby servers.
 480
 481 26.2.8.2. Multiple Synchronous Standbys #
 482
 483    Synchronous replication supports one or more synchronous standby
 484    servers; transactions will wait until all the standby servers which are
 485    considered as synchronous confirm receipt of their data. The number of
 486    synchronous standbys that transactions must wait for replies from is
 487    specified in synchronous_standby_names. This parameter also specifies a
 488    list of standby names and the method (FIRST and ANY) to choose
 489    synchronous standbys from the listed ones.
 490
 491    The method FIRST specifies a priority-based synchronous replication and
 492    makes transaction commits wait until their WAL records are replicated
 493    to the requested number of synchronous standbys chosen based on their
 494    priorities. The standbys whose names appear earlier in the list are
 495    given higher priority and will be considered as synchronous. Other
 496    standby servers appearing later in this list represent potential
 497    synchronous standbys. If any of the current synchronous standbys
 498    disconnects for whatever reason, it will be replaced immediately with
 499    the next-highest-priority standby.
 500
 501    An example of synchronous_standby_names for a priority-based multiple
 502    synchronous standbys is:
 503 synchronous_standby_names = 'FIRST 2 (s1, s2, s3)'
 504
 505    In this example, if four standby servers s1, s2, s3 and s4 are running,
 506    the two standbys s1 and s2 will be chosen as synchronous standbys
 507    because their names appear early in the list of standby names. s3 is a
 508    potential synchronous standby and will take over the role of
 509    synchronous standby when either of s1 or s2 fails. s4 is an
 510    asynchronous standby since its name is not in the list.
 511
 512    The method ANY specifies a quorum-based synchronous replication and
 513    makes transaction commits wait until their WAL records are replicated
 514    to at least the requested number of synchronous standbys in the list.
 515
 516    An example of synchronous_standby_names for a quorum-based multiple
 517    synchronous standbys is:
 518 synchronous_standby_names = 'ANY 2 (s1, s2, s3)'
 519
 520    In this example, if four standby servers s1, s2, s3 and s4 are running,
 521    transaction commits will wait for replies from at least any two
 522    standbys of s1, s2 and s3. s4 is an asynchronous standby since its name
 523    is not in the list.
 524
 525    The synchronous states of standby servers can be viewed using the
 526    pg_stat_replication view.
 527
 528 26.2.8.3. Planning for Performance #
 529
 530    Synchronous replication usually requires carefully planned and placed
 531    standby servers to ensure applications perform acceptably. Waiting
 532    doesn't utilize system resources, but transaction locks continue to be
 533    held until the transfer is confirmed. As a result, incautious use of
 534    synchronous replication will reduce performance for database
 535    applications because of increased response times and higher contention.
 536
 537    PostgreSQL allows the application developer to specify the durability
 538    level required via replication. This can be specified for the system
 539    overall, though it can also be specified for specific users or
 540    connections, or even individual transactions.
 541
 542    For example, an application workload might consist of: 10% of changes
 543    are important customer details, while 90% of changes are less important
 544    data that the business can more easily survive if it is lost, such as
 545    chat messages between users.
 546
 547    With synchronous replication options specified at the application level
 548    (on the primary) we can offer synchronous replication for the most
 549    important changes, without slowing down the bulk of the total workload.
 550    Application level options are an important and practical tool for
 551    allowing the benefits of synchronous replication for high performance
 552    applications.
 553
 554    You should consider that the network bandwidth must be higher than the
 555    rate of generation of WAL data.
 556
 557 26.2.8.4. Planning for High Availability #
 558
 559    synchronous_standby_names specifies the number and names of synchronous
 560    standbys that transaction commits made when synchronous_commit is set
 561    to on, remote_apply or remote_write will wait for responses from. Such
 562    transaction commits may never be completed if any one of the
 563    synchronous standbys should crash.
 564
 565    The best solution for high availability is to ensure you keep as many
 566    synchronous standbys as requested. This can be achieved by naming
 567    multiple potential synchronous standbys using
 568    synchronous_standby_names.
 569
 570    In a priority-based synchronous replication, the standbys whose names
 571    appear earlier in the list will be used as synchronous standbys.
 572    Standbys listed after these will take over the role of synchronous
 573    standby if one of current ones should fail.
 574
 575    In a quorum-based synchronous replication, all the standbys appearing
 576    in the list will be used as candidates for synchronous standbys. Even
 577    if one of them should fail, the other standbys will keep performing the
 578    role of candidates of synchronous standby.
 579
 580    When a standby first attaches to the primary, it will not yet be
 581    properly synchronized. This is described as catchup mode. Once the lag
 582    between standby and primary reaches zero for the first time we move to
 583    real-time streaming state. The catch-up duration may be long
 584    immediately after the standby has been created. If the standby is shut
 585    down, then the catch-up period will increase according to the length of
 586    time the standby has been down. The standby is only able to become a
 587    synchronous standby once it has reached streaming state. This state can
 588    be viewed using the pg_stat_replication view.
 589
 590    If primary restarts while commits are waiting for acknowledgment, those
 591    waiting transactions will be marked fully committed once the primary
 592    database recovers. There is no way to be certain that all standbys have
 593    received all outstanding WAL data at time of the crash of the primary.
 594    Some transactions may not show as committed on the standby, even though
 595    they show as committed on the primary. The guarantee we offer is that
 596    the application will not receive explicit acknowledgment of the
 597    successful commit of a transaction until the WAL data is known to be
 598    safely received by all the synchronous standbys.
 599
 600    If you really cannot keep as many synchronous standbys as requested
 601    then you should decrease the number of synchronous standbys that
 602    transaction commits must wait for responses from in
 603    synchronous_standby_names (or disable it) and reload the configuration
 604    file on the primary server.
 605
 606    If the primary is isolated from remaining standby servers you should
 607    fail over to the best candidate of those other remaining standby
 608    servers.
 609
 610    If you need to re-create a standby server while transactions are
 611    waiting, make sure that the functions pg_backup_start() and
 612    pg_backup_stop() are run in a session with synchronous_commit = off,
 613    otherwise those requests will wait forever for the standby to appear.
 614
 615 26.2.9. Continuous Archiving in Standby #
 616
 617    When continuous WAL archiving is used in a standby, there are two
 618    different scenarios: the WAL archive can be shared between the primary
 619    and the standby, or the standby can have its own WAL archive. When the
 620    standby has its own WAL archive, set archive_mode to always, and the
 621    standby will call the archive command for every WAL segment it
 622    receives, whether it's by restoring from the archive or by streaming
 623    replication. The shared archive can be handled similarly, but the
 624    archive_command or archive_library must test if the file being archived
 625    exists already, and if the existing file has identical contents. This
 626    requires more care in the archive_command or archive_library, as it
 627    must be careful to not overwrite an existing file with different
 628    contents, but return success if the exactly same file is archived
 629    twice. And all that must be done free of race conditions, if two
 630    servers attempt to archive the same file at the same time.
 631
 632    If archive_mode is set to on, the archiver is not enabled during
 633    recovery or standby mode. If the standby server is promoted, it will
 634    start archiving after the promotion, but will not archive any WAL or
 635    timeline history files that it did not generate itself. To get a
 636    complete series of WAL files in the archive, you must ensure that all
 637    WAL is archived, before it reaches the standby. This is inherently true
 638    with file-based log shipping, as the standby can only restore files
 639    that are found in the archive, but not if streaming replication is
 640    enabled. When a server is not in recovery mode, there is no difference
 641    between on and always modes.