begriffs open source - ai-pg/blob - full-docs/txt/replication-origins.txt

   1
   2 Chapter 48. Replication Progress Tracking
   3
   4    Replication origins are intended to make it easier to implement logical
   5    replication solutions on top of logical decoding. They provide a
   6    solution to two common problems:
   7      * How to safely keep track of replication progress
   8      * How to change replication behavior based on the origin of a row;
   9        for example, to prevent loops in bi-directional replication setups
  10
  11    Replication origins have just two properties, a name and an ID. The
  12    name, which is what should be used to refer to the origin across
  13    systems, is free-form text. It should be used in a way that makes
  14    conflicts between replication origins created by different replication
  15    solutions unlikely; e.g., by prefixing the replication solution's name
  16    to it. The ID is used only to avoid having to store the long version in
  17    situations where space efficiency is important. It should never be
  18    shared across systems.
  19
  20    Replication origins can be created using the function
  21    pg_replication_origin_create(); dropped using
  22    pg_replication_origin_drop(); and seen in the pg_replication_origin
  23    system catalog.
  24
  25    One nontrivial part of building a replication solution is to keep track
  26    of replay progress in a safe manner. When the applying process, or the
  27    whole cluster, dies, it needs to be possible to find out up to where
  28    data has successfully been replicated. Naive solutions to this, such as
  29    updating a row in a table for every replayed transaction, have problems
  30    like run-time overhead and database bloat.
  31
  32    Using the replication origin infrastructure a session can be marked as
  33    replaying from a remote node (using the
  34    pg_replication_origin_session_setup() function). Additionally the LSN
  35    and commit time stamp of every source transaction can be configured on
  36    a per transaction basis using pg_replication_origin_xact_setup(). If
  37    that's done replication progress will persist in a crash safe manner.
  38    Replay progress for all replication origins can be seen in the
  39    pg_replication_origin_status view. An individual origin's progress,
  40    e.g., when resuming replication, can be acquired using
  41    pg_replication_origin_progress() for any origin or
  42    pg_replication_origin_session_progress() for the origin configured in
  43    the current session.
  44
  45    In replication topologies more complex than replication from exactly
  46    one system to one other system, another problem can be that it is hard
  47    to avoid replicating replayed rows again. That can lead both to cycles
  48    in the replication and inefficiencies. Replication origins provide an
  49    optional mechanism to recognize and prevent that. When configured using
  50    the functions referenced in the previous paragraph, every change and
  51    transaction passed to output plugin callbacks (see Section 47.6)
  52    generated by the session is tagged with the replication origin of the
  53    generating session. This allows treating them differently in the output
  54    plugin, e.g., ignoring all but locally-originating rows. Additionally
  55    the filter_by_origin_cb callback can be used to filter the logical
  56    decoding change stream based on the source. While less flexible,
  57    filtering via that callback is considerably more efficient than doing
  58    it in the output plugin.