2 54.4. Streaming Replication Protocol #
4 To initiate streaming replication, the frontend sends the replication
5 parameter in the startup message. A Boolean value of true (or on, yes,
6 1) tells the backend to go into physical replication walsender mode,
7 wherein a small set of replication commands, shown below, can be issued
8 instead of SQL statements.
10 Passing database as the value for the replication parameter instructs
11 the backend to go into logical replication walsender mode, connecting
12 to the database specified in the dbname parameter. In logical
13 replication walsender mode, the replication commands shown below as
14 well as normal SQL commands can be issued.
16 In either physical replication or logical replication walsender mode,
17 only the simple query protocol can be used.
19 For the purpose of testing replication commands, you can make a
20 replication connection via psql or any other libpq-using tool with a
21 connection string including the replication option, e.g.:
22 psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
24 However, it is often more useful to use pg_receivewal (for physical
25 replication) or pg_recvlogical (for logical replication).
27 Replication commands are logged in the server log when
28 log_replication_commands is enabled.
30 The commands accepted in replication mode are:
33 Requests the server to identify itself. Server replies with a
34 result set of a single row, containing four fields:
37 The unique system identifier identifying the cluster. This
38 can be used to check that the base backup used to
39 initialize the standby came from the same cluster.
42 Current timeline ID. Also useful to check that the standby
43 is consistent with the primary.
46 Current WAL flush location. Useful to get a known location
47 in the write-ahead log where streaming can start.
50 Database connected to or null.
53 Requests the server to send the current setting of a run-time
54 parameter. This is similar to the SQL command SHOW.
57 The name of a run-time parameter. Available parameters are
58 documented in Chapter 19.
60 TIMELINE_HISTORY tli #
61 Requests the server to send over the timeline history file for
62 timeline tli. Server replies with a result set of a single row,
63 containing two fields. While the fields are labeled as text,
64 they effectively return raw bytes, with no encoding conversion:
67 File name of the timeline history file, e.g.,
71 Contents of the timeline history file.
73 CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { PHYSICAL | LOGICAL
74 output_plugin } [ ( option [, ...] ) ] #
75 Create a physical or logical replication slot. See
76 Section 26.2.6 for more about replication slots.
79 The name of the slot to create. Must be a valid
80 replication slot name (see Section 26.2.6.1).
83 The name of the output plugin used for logical decoding
87 Specify that this replication slot is a temporary one.
88 Temporary slots are not saved to disk and are
89 automatically dropped on error or when the session has
92 The following options are supported:
95 If true, this logical replication slot supports decoding
96 of two-phase commit. With this option, commands related to
97 two-phase commit such as PREPARE TRANSACTION, COMMIT
98 PREPARED and ROLLBACK PREPARED are decoded and
99 transmitted. The transaction will be decoded and
100 transmitted at PREPARE TRANSACTION time. The default is
103 RESERVE_WAL [ boolean ]
104 If true, this physical replication slot reserves WAL
105 immediately. Otherwise, WAL is only reserved upon
106 connection from a streaming replication client. The
109 SNAPSHOT { 'export' | 'use' | 'nothing' }
110 Decides what to do with the snapshot created during
111 logical slot initialization. 'export', which is the
112 default, will export the snapshot for use in other
113 sessions. This option can't be used inside a transaction.
114 'use' will use the snapshot for the current transaction
115 executing the command. This option must be used in a
116 transaction, and CREATE_REPLICATION_SLOT must be the first
117 command run in that transaction. Finally, 'nothing' will
118 just use the snapshot for logical decoding as normal but
119 won't do anything else with it.
122 If true, the slot is enabled to be synced to the standbys
123 so that logical replication can be resumed after failover.
124 The default is false.
126 In response to this command, the server will send a one-row
127 result set containing the following fields:
130 The name of the newly-created replication slot.
132 consistent_point (text)
133 The WAL location at which the slot became consistent. This
134 is the earliest location from which streaming can start on
135 this replication slot.
138 The identifier of the snapshot exported by the command.
139 The snapshot is valid until a new command is executed on
140 this connection or the replication connection is closed.
141 Null if the created slot is physical.
144 The name of the output plugin used by the newly-created
145 replication slot. Null if the created slot is physical.
147 CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { PHYSICAL [
148 RESERVE_WAL ] | LOGICAL output_plugin [ EXPORT_SNAPSHOT |
149 NOEXPORT_SNAPSHOT | USE_SNAPSHOT | TWO_PHASE ] } #
150 For compatibility with older releases, this alternative syntax
151 for the CREATE_REPLICATION_SLOT command is still supported.
153 ALTER_REPLICATION_SLOT slot_name ( option [, ...] ) #
154 Change the definition of a replication slot. See Section 26.2.6
155 for more about replication slots. This command is currently only
156 supported for logical replication slots.
159 The name of the slot to alter. Must be a valid replication
160 slot name (see Section 26.2.6.1).
162 The following options are supported:
164 TWO_PHASE [ boolean ]
165 If true, this logical replication slot supports decoding
166 of two-phase commit. With this option, commands related to
167 two-phase commit such as PREPARE TRANSACTION, COMMIT
168 PREPARED and ROLLBACK PREPARED are decoded and
169 transmitted. The transaction will be decoded and
170 transmitted at PREPARE TRANSACTION time.
173 If true, the slot is enabled to be synced to the standbys
174 so that logical replication can be resumed after failover.
176 READ_REPLICATION_SLOT slot_name #
177 Read some information associated with a replication slot.
178 Returns a tuple with NULL values if the replication slot does
179 not exist. This command is currently only supported for physical
182 In response to this command, the server will return a one-row
183 result set, containing the following fields:
186 The replication slot's type, either physical or NULL.
189 The replication slot's restart_lsn.
192 The timeline ID associated with restart_lsn, following the
193 current timeline history.
195 START_REPLICATION [ SLOT slot_name ] [ PHYSICAL ] XXX/XXX [ TIMELINE
197 Instructs server to start streaming WAL, starting at WAL
198 location XXX/XXX. If TIMELINE option is specified, streaming
199 starts on timeline tli; otherwise, the server's current timeline
200 is selected. The server can reply with an error, for example if
201 the requested section of WAL has already been recycled. On
202 success, the server responds with a CopyBothResponse message,
203 and then starts to stream WAL to the frontend.
205 If a slot's name is provided via slot_name, it will be updated
206 as replication progresses so that the server knows which WAL
207 segments, and if hot_standby_feedback is on which transactions,
208 are still needed by the standby.
210 If the client requests a timeline that's not the latest but is
211 part of the history of the server, the server will stream all
212 the WAL on that timeline starting from the requested start point
213 up to the point where the server switched to another timeline.
214 If the client requests streaming at exactly the end of an old
215 timeline, the server skips COPY mode entirely.
217 After streaming all the WAL on a timeline that is not the latest
218 one, the server will end streaming by exiting the COPY mode.
219 When the client acknowledges this by also exiting COPY mode, the
220 server sends a result set with one row and two columns,
221 indicating the next timeline in this server's history. The first
222 column is the next timeline's ID (type int8), and the second
223 column is the WAL location where the switch happened (type
224 text). Usually, the switch position is the end of the WAL that
225 was streamed, but there are corner cases where the server can
226 send some WAL from the old timeline that it has not itself
227 replayed before promoting. Finally, the server sends two
228 CommandComplete messages (one that ends the CopyData and the
229 other ends the START_REPLICATION itself), and is ready to accept
232 WAL data is sent as a series of CopyData messages; see
233 Section 54.6 and Section 54.7 for details. (This allows other
234 information to be intermixed; in particular the server can send
235 an ErrorResponse message if it encounters a failure after
236 beginning to stream.) The payload of each CopyData message from
237 server to the client contains a message of one of the following
243 Identifies the message as WAL data.
246 The starting point of the WAL data in this message.
249 The current end of WAL on the server.
252 The server's system clock at the time of
253 transmission, as microseconds since midnight on
257 A section of the WAL data stream.
259 A single WAL record is never split across two
260 XLogData messages. When a WAL record crosses a WAL
261 page boundary, and is therefore already split using
262 continuation records, it can be split at the page
263 boundary. In other words, the first main WAL record
264 and its continuation records can be sent in
265 different XLogData messages.
267 Primary keepalive message (B) #
270 Identifies the message as a sender keepalive.
273 The current end of WAL on the server.
276 The server's system clock at the time of
277 transmission, as microseconds since midnight on
281 1 means that the client should reply to this message
282 as soon as possible, to avoid a timeout disconnect.
285 The receiving process can send replies back to the sender at any
286 time, using one of the following message formats (also in the
287 payload of a CopyData message):
289 Standby status update (F) #
292 Identifies the message as a receiver status update.
295 The location of the last WAL byte + 1 received and
296 written to disk in the standby.
299 The location of the last WAL byte + 1 flushed to
303 The location of the last WAL byte + 1 applied in the
307 The client's system clock at the time of
308 transmission, as microseconds since midnight on
312 If 1, the client requests the server to reply to
313 this message immediately. This can be used to ping
314 the server, to test if the connection is still
317 Hot standby feedback message (F) #
320 Identifies the message as a hot standby feedback
324 The client's system clock at the time of
325 transmission, as microseconds since midnight on
329 The standby's current global xmin, excluding the
330 catalog_xmin from any replication slots. If both
331 this value and the following catalog_xmin are 0,
332 this is treated as a notification that hot standby
333 feedback will no longer be sent on this connection.
334 Later non-zero messages may reinitiate the feedback
338 The epoch of the global xmin xid on the standby.
341 The lowest catalog_xmin of any replication slots on
342 the standby. Set to 0 if no catalog_xmin exists on
343 the standby or if hot standby feedback is being
347 The epoch of the catalog_xmin xid on the standby.
349 START_REPLICATION SLOT slot_name LOGICAL XXX/XXX [ ( option_name [
350 option_value ] [, ...] ) ] #
351 Instructs server to start streaming WAL for logical replication,
352 starting at either WAL location XXX/XXX or the slot's
353 confirmed_flush_lsn (see Section 53.20), whichever is greater.
354 This behavior makes it easier for clients to avoid updating
355 their local LSN status when there is no data to process.
356 However, starting at a different LSN than requested might not
357 catch certain kinds of client errors; so the client may wish to
358 check that confirmed_flush_lsn matches its expectations before
359 issuing START_REPLICATION.
361 The server can reply with an error, for example if the slot does
362 not exist. On success, the server responds with a
363 CopyBothResponse message, and then starts to stream WAL to the
366 The messages inside the CopyBothResponse messages are of the
367 same format documented for START_REPLICATION ... PHYSICAL,
368 including two CommandComplete messages.
370 The output plugin associated with the selected slot is used to
371 process the output for streaming.
374 The name of the slot to stream changes from. This
375 parameter is required, and must correspond to an existing
376 logical replication slot created with
377 CREATE_REPLICATION_SLOT in LOGICAL mode.
380 The WAL location to begin streaming at.
383 The name of an option passed to the slot's logical
384 decoding output plugin. See Section 54.5 for options that
385 are accepted by the standard (pgoutput) plugin.
388 Optional value, in the form of a string constant,
389 associated with the specified option.
391 DROP_REPLICATION_SLOT slot_name [ WAIT ] #
392 Drops a replication slot, freeing any reserved server-side
396 The name of the slot to drop.
399 This option causes the command to wait if the slot is
400 active until it becomes inactive, instead of the default
401 behavior of raising an error.
404 Uploads a backup manifest in preparation for taking an
407 BASE_BACKUP [ ( option [, ...] ) ] #
408 Instructs the server to start streaming a base backup. The
409 system will automatically be put in backup mode before the
410 backup is started, and taken out of it when the backup is
411 complete. The following options are accepted:
414 Sets the label of the backup. If none is specified, a
415 backup label of base backup will be used. The quoting
416 rules for the label are the same as a standard SQL string
417 with standard_conforming_strings turned on.
420 Tells the server where to send the backup. If the target
421 is client, which is the default, the backup data is sent
422 to the client. If it is server, the backup data is written
423 to the server at the pathname specified by the
424 TARGET_DETAIL option. If it is blackhole, the backup data
425 is not sent anywhere; it is simply discarded.
427 The server target requires superuser privilege or being
428 granted the pg_write_server_files role.
430 TARGET_DETAIL 'detail'
431 Provides additional information about the backup target.
433 Currently, this option can only be used when the backup
434 target is server. It specifies the server directory to
435 which the backup should be written.
438 If set to true, request information required to generate a
439 progress report. This will send back an approximate size
440 in the header of each tablespace, which can be used to
441 calculate how far along the stream is done. This is
442 calculated by enumerating all the file sizes once before
443 the transfer is even started, and might as such have a
444 negative impact on the performance. In particular, it
445 might take longer before the first data is streamed. Since
446 the database files can change during the backup, the size
447 is only approximate and might both grow and shrink between
448 the time of approximation and the sending of the actual
449 files. The default is false.
451 CHECKPOINT { 'fast' | 'spread' }
452 Sets the type of checkpoint to be performed at the
453 beginning of the base backup. The default is spread.
456 If set to true, include the necessary WAL segments in the
457 backup. This will include all the files between start and
458 stop backup in the pg_wal directory of the base directory
459 tar file. The default is false.
462 If set to true, the backup will wait until the last
463 required WAL segment has been archived, or emit a warning
464 if WAL archiving is not enabled. If false, the backup will
465 neither wait nor warn, leaving the client responsible for
466 ensuring the required log is available. The default is
470 Instructs the server to compress the backup using the
471 specified method. Currently, the supported methods are
474 COMPRESSION_DETAIL detail
475 Specifies details for the chosen compression method. This
476 should only be used in conjunction with the COMPRESSION
477 option. If the value is an integer, it specifies the
478 compression level. Otherwise, it should be a
479 comma-separated list of items, each of the form keyword or
480 keyword=value. Currently, the supported keywords are
481 level, long and workers.
483 The level keyword sets the compression level. For gzip the
484 compression level should be an integer between 1 and 9
485 (default Z_DEFAULT_COMPRESSION or -1), for lz4 an integer
486 between 1 and 12 (default 0 for fast compression mode),
487 and for zstd an integer between ZSTD_minCLevel() (usually
488 -131072) and ZSTD_maxCLevel() (usually 22), (default
489 ZSTD_CLEVEL_DEFAULT or 3).
491 The long keyword enables long-distance matching mode, for
492 improved compression ratio, at the expense of higher
493 memory use. Long-distance mode is supported only for zstd.
495 The workers keyword sets the number of threads that should
496 be used for parallel compression. Parallel compression is
497 supported only for zstd.
500 Limit (throttle) the maximum amount of data transferred
501 from server to client per unit of time. The expected unit
502 is kilobytes per second. If this option is specified, the
503 value must either be equal to zero or it must fall within
504 the range from 32 kB through 1 GB (inclusive). If zero is
505 passed or the option is not specified, no restriction is
506 imposed on the transfer.
508 TABLESPACE_MAP [ boolean ]
509 If true, include information about symbolic links present
510 in the directory pg_tblspc in a file named tablespace_map.
511 The tablespace map file includes each symbolic link name
512 as it exists in the directory pg_tblspc/ and the full path
513 of that symbolic link. The default is false.
515 VERIFY_CHECKSUMS [ boolean ]
516 If true, checksums are verified during a base backup if
517 they are enabled. If false, this is skipped. The default
520 MANIFEST manifest_option
521 When this option is specified with a value of yes or
522 force-encode, a backup manifest is created and sent along
523 with the backup. The manifest is a list of every file
524 present in the backup with the exception of any WAL files
525 that may be included. It also stores the size, last
526 modification time, and optionally a checksum for each
527 file. A value of force-encode forces all filenames to be
528 hex-encoded; otherwise, this type of encoding is performed
529 only for files whose names are non-UTF8 octet sequences.
530 force-encode is intended primarily for testing purposes,
531 to be sure that clients which read the backup manifest can
532 handle this case. For compatibility with previous
533 releases, the default is MANIFEST 'no'.
535 MANIFEST_CHECKSUMS checksum_algorithm
536 Specifies the checksum algorithm that should be applied to
537 each file included in the backup manifest. Currently, the
538 available algorithms are NONE, CRC32C, SHA224, SHA256,
539 SHA384, and SHA512. The default is CRC32C.
542 Requests an incremental backup. The UPLOAD_MANIFEST
543 command must be executed before running a base backup with
546 When the backup is started, the server will first send two
547 ordinary result sets, followed by one or more CopyOutResponse
550 The first ordinary result set contains the starting position of
551 the backup, in a single row with two columns. The first column
552 contains the start position given in XLogRecPtr format, and the
553 second column contains the corresponding timeline ID.
555 The second ordinary result set has one row for each tablespace.
556 The fields in this row are:
559 The OID of the tablespace, or null if it's the base
563 The full path of the tablespace directory, or null if it's
567 The approximate size of the tablespace, in kilobytes (1024
568 bytes), if progress report has been requested; otherwise
571 After the second regular result set, a CopyOutResponse will be
572 sent. The payload of each CopyData message will contain a
573 message in one of the following formats:
578 Identifies the message as indicating the start of a
579 new archive. There will be one archive for the main
580 data directory and one for each additional
581 tablespace; each will use tar format (following the
582 “ustar interchange format” specified in the POSIX
583 1003.1-2008 standard).
586 The file name for this archive.
589 For the main data directory, an empty string. For
590 other tablespaces, the full path to the directory
591 from which this archive was created.
596 Identifies the message as indicating the start of
599 archive or manifest data (B)
602 Identifies the message as containing archive or
611 Identifies the message as a progress report.
614 The number of bytes from the current tablespace for
615 which processing has been completed.
617 After the CopyOutResponse, or all such responses, have been
618 sent, a final ordinary result set will be sent, containing the
619 WAL end position of the backup, in the same format as the start
622 The tar archive for the data directory and each tablespace will
623 contain all files in the directories, regardless of whether they
624 are PostgreSQL files or other files added to the same directory.
625 The only excluded files are:
629 + pg_internal.init (found in multiple directories)
630 + Various temporary files and directories created during the
631 operation of the PostgreSQL server, such as any file or
632 directory beginning with pgsql_tmp and temporary relations.
633 + Unlogged relations, except for the init fork which is required
634 to recreate the (empty) unlogged relation on recovery.
635 + pg_wal, including subdirectories. If the backup is run with
636 WAL files included, a synthesized version of pg_wal will be
637 included, but it will only contain the files necessary for the
638 backup to work, not the rest of the contents.
639 + pg_dynshmem, pg_notify, pg_replslot, pg_serial, pg_snapshots,
640 pg_stat_tmp, and pg_subtrans are copied as empty directories
641 (even if they are symbolic links).
642 + Files other than regular files and directories, such as
643 symbolic links (other than for the directories listed above)
644 and special device and operating system files, are skipped.
645 (Symbolic links in pg_tblspc are maintained.)
647 Owner, group, and file mode are set if the underlying file
648 system on the server supports it.
650 In all the above commands, when specifying a parameter of type boolean
651 the value part can be omitted, which is equivalent to specifying TRUE.