# pg_scribe Command Line Interface This document describes the command line interface for **pg_scribe**, an incremental SQL backup system for PostgreSQL. ## Overview pg_scribe uses action flags similar to other PostgreSQL tools like `pg_recvlogical`. The tool performs one primary action per invocation, specified by action flags. The system organizes backups into **chains** - each chain consists of one base backup plus zero or more differential backups. See `doc/file-handling.md` for detailed terminology and file organization. ## Synopsis ```bash # Initialize backup system (one-time) pg_scribe --init [OPTIONS] # Start streaming incremental backups pg_scribe --start [OPTIONS] # Stop active streaming process pg_scribe --stop [OPTIONS] # Rotate the differential file pg_scribe --rotate-diff [OPTIONS] # Create new chain with fresh base backup pg_scribe --new-chain [OPTIONS] # Restore from backups pg_scribe --restore [OPTIONS] # Check replication slot status pg_scribe --status [OPTIONS] ``` ## Common Options These options apply to all actions: ### Connection Options ``` -d, --dbname=DBNAME Database name (can be a connection string) -h, --host=HOSTNAME Database server host (default: localhost) -p, --port=PORT Database server port (default: 5432) -U, --username=USERNAME Database user (default: $PGUSER or $USER) -w, --no-password Never prompt for password -W, --password Force password prompt ``` ### General Options ``` -v, --verbose Enable verbose mode -V, --version Print version and exit -?, --help Show help and exit ``` ## Action Flags Exactly one of the following action flags must be specified: ### `--init` Initialize the backup system by creating a replication slot, setting up DDL capture via the wal2sql extension, and taking an initial base backup in the first chain. **This is a one-time initialization operation** - following PostgreSQL conventions (`initdb`, `pg_basebackup`), it requires an empty backup directory and will fail if already initialized. Use `--if-not-exists` to make initialization idempotent (succeeds silently if already initialized). If initialization fails, partial state is automatically cleaned up (replication slot dropped, partial backup files removed). **Additional options:** ``` --backup-dir=DIRECTORY Backup output directory (required) --slot=SLOTNAME Replication slot name (default: pg_scribe) --if-not-exists Do not error if already initialized (idempotent) --force Skip validation and force initialization (dangerous!) --compress=METHOD Compression method: gzip, lz4, zstd, or none Can include level (e.g., zstd:9) (default: none) ``` **What it does:** **Phase 1: Validation** (runs first, can fail) 1. **CRITICAL Checks** (must pass or initialization fails): - Verify `wal_level = logical` - Verify `max_replication_slots >= 1` - Verify `max_wal_senders >= 1` - Check all tables have adequate replica identity (PRIMARY KEY, USING INDEX, or FULL) 2. **Coverage Warnings** (non-blocking, informational): - List unlogged tables (will not be backed up) - Check for large objects (not incrementally backed up) **Phase 2: Setup** (only runs if validation passes or `--force` used) 1. Verifies backup directory is empty (or doesn't exist) 2. Creates the wal2sql extension if it doesn't exist (`CREATE EXTENSION IF NOT EXISTS wal2sql;`) - This automatically installs the DDL event trigger 3. Creates a logical replication slot (fails if slot already exists) 4. Generates chain ID from current timestamp (e.g., `20231215T120000Z`) 5. Creates first chain directory: `chain-{CHAIN_ID}/` 6. Takes synchronized base backup using the slot's snapshot → `chain-{CHAIN_ID}/base.sql` 7. Creates `pg_dumpall --globals-only` backup → `chain-{CHAIN_ID}/globals.sql` 8. Generates metadata file → `chain-{CHAIN_ID}/metadata.json` (PostgreSQL version, extensions, encoding) 9. Applies compression if `--compress` specified **Example:** ```bash pg_scribe --init -d mydb --backup-dir /backups/mydb --slot mydb_backup ``` **Output should convey:** - **Validation results** with clear pass/fail status for: - PostgreSQL configuration (wal_level, max_replication_slots, max_wal_senders) - Replica identity for all tables - Coverage warnings (unlogged tables, large objects) - **If validation fails**: List CRITICAL issues with specific fix commands, then exit with error code 5 (unless `--force` used) - **If validation passes** (or `--force`): Progress through setup steps (extension creation, slot creation with snapshot ID, chain creation, backup paths) - **Final status**: Success message with chain ID created **Example output:** ``` ✓ Validation passed ✓ Extension created ✓ Replication slot 'mydb_backup' created at LSN 0/1234567 ✓ Initial chain created: 20231215T120000Z Location: /backups/mydb/chain-20231215T120000Z/ Next step: Start streaming with: pg_scribe --start --backup-dir /backups/mydb ``` --- ### `--start` Start streaming incremental backups continuously to the latest chain in the backup directory. **Additional options:** ``` --backup-dir=DIRECTORY Backup directory (required) --slot=SLOTNAME Replication slot name (default: pg_scribe) -s, --status-interval=SECS Status update interval in seconds (default: 10) -F, --fsync-interval=SECS Fsync interval in seconds (default: 10, 0 to disable) ``` **What it does:** 1. Validates database connection and replication slot 2. Auto-detects latest chain (newest by timestamp) 3. Checks for existing streaming process (via `.pg_scribe.pid`) 4. Creates top-level pidfile: `{backup-dir}/.pg_scribe.pid` 5. Uses `exec` to replace itself with `pg_recvlogical`, which then: - Streams decoded changes using the wal2sql plugin - Writes SQL to `chain-{CHAIN_ID}/active.sql` - Periodically fsyncs output file for crash safety - Reports LSN position and lag to stderr - Responds to SIGHUP by closing and reopening output file (for log rotation) **Implementation:** `pg_scribe --start` is a thin wrapper that validates prerequisites and uses `exec` to become `pg_recvlogical`. This design has several advantages: - **No orphaned processes**: pg_scribe becomes pg_recvlogical (same PID), eliminating parent-child complexity - **Direct signal handling**: SIGHUP, SIGTERM, SIGINT go directly to pg_recvlogical without forwarding - **Simpler code**: No need for signal forwarding, child process tracking, or wait loops - **Reliable cleanup**: Tests and process managers interact with a single process **Log rotation support:** File rotation is provided automatically by `pg_recvlogical`: 1. Rename the current output file 2. Send SIGHUP to the pg_scribe/pg_recvlogical process 3. pg_recvlogical closes the old file and opens a new one with the same name **Example:** ```bash # Stream to latest chain (foreground) pg_scribe --start -d mydb --backup-dir /backups/mydb --slot mydb_backup # Run as background daemon (redirect stderr to log file) pg_scribe --start -d mydb --backup-dir /backups/mydb --slot mydb_backup \ 2>/var/log/pg_scribe.log & # Stream to stdout (for processing with other tools) - NOT SUPPORTED in chain mode # Use --backup-dir; active.sql will be created in latest chain ``` **Output (to stderr):** ``` ✓ Found latest chain: 20231215T120000Z ✓ Starting replication from slot 'mydb_backup' ✓ Writing to: /backups/mydb/chain-20231215T120000Z/active.sql [periodic LSN updates...] ``` **Note:** Only one `--start` process can run per backup directory at a time. --- ### `--stop` Stop the active streaming process gracefully. **Additional options:** ``` --backup-dir=DIRECTORY Backup directory containing active process (required) ``` **What it does:** 1. Locates the pidfile at `{backup-dir}/.pg_scribe.pid` 2. Validates the PID exists and is a `pg_recvlogical` process 3. Sends SIGTERM to the process for graceful shutdown 4. Waits up to 30 seconds for the process to stop 5. Sends SIGKILL if the process doesn't stop gracefully 6. Removes the pidfile **Example:** ```bash pg_scribe --stop --backup-dir /backups/mydb ``` **Output:** ``` ✓ Found pg_recvlogical process (PID 12345) ✓ Active chain: 20231215T120000Z ✓ Sending SIGTERM to process 12345... ✓ Waiting for process to stop... ✓ Streaming process stopped ``` **Use case:** Clean shutdown before maintenance, before transitioning to a new chain manually, or when stopping backups temporarily. **Edge cases handled:** - Stale pidfile (process doesn't exist) - removes pidfile and exits successfully - Wrong process (not pg_recvlogical) - refuses to stop and exits with error - Process doesn't respond to SIGTERM - sends SIGKILL after timeout --- ### `--rotate-diff` Rotate the differential file within the active chain, sealing the current differential and starting a new one (similar to log rotation). **Additional options:** ``` --backup-dir=DIRECTORY Backup directory (required) ``` **What it does:** 1. Finds the active streaming process via `.pg_scribe.pid` 2. Validates process exists and is pg_recvlogical 3. Locates the active chain (finds `active.sql`) 4. Generates new differential name: `diff-{TIMESTAMP}.sql` 5. **Atomic rotation:** - Rename `active.sql` → `diff-{TIMESTAMP}.sql` - Send SIGHUP to pg_recvlogical process - Wait for new `active.sql` to appear (timeout after 30s) - Validate new file is being written to **Example:** ```bash pg_scribe --rotate-diff --backup-dir /backups/mydb ``` **Output:** ``` ✓ Found active chain: 20231215T120000Z ✓ Rotated differential: diff-20231216T083000Z.sql (2.1 GB) ✓ New differential started ``` **Use case:** Daily or weekly differential rotation to keep individual files manageable. **Edge cases handled:** - Stale pidfile (process doesn't exist) - Wrong process (not pg_recvlogical) - SIGHUP timeout (new file doesn't appear) --- ### `--new-chain` Create a new chain with a fresh base backup. This operation takes a new base backup while streaming continues to the old chain. **Additional options:** ``` --backup-dir=DIRECTORY Backup directory (required) --start Automatically stop old streaming and start new (one-step transition) --compress=METHOD Compression method: gzip, lz4, zstd, or none Can include level (e.g., zstd:9) (default: none) ``` **What it does:** 1. Generates new chain ID from current timestamp 2. Creates new chain directory: `chain-{NEW_ID}/` 3. **Takes new base backup** (while streaming continues to old chain if active): - `pg_dump` → `chain-{NEW_ID}/base.sql` - `pg_dumpall --globals-only` → `chain-{NEW_ID}/globals.sql` - Metadata → `chain-{NEW_ID}/metadata.json` - Apply compression if specified 4. **If `--start` flag specified:** - Stops old streaming process (sends SIGTERM, waits up to 30s) - Starts streaming to new chain using `exec pg_recvlogical` - Process continues running in foreground (or background if run with `&`) 5. **If `--start` NOT specified:** - Reports new chain created - Displays manual transition instructions **Example:** ```bash # Create new chain only (manual transition) pg_scribe --new-chain -d mydb --backup-dir /backups/mydb # Create new chain with automatic transition (recommended) pg_scribe --new-chain --start -d mydb --backup-dir /backups/mydb --compress=zstd:9 ``` **Output (without --start):** ``` Taking new base backup... ✓ Base backup complete: chain-20231222T120000Z (12.3 GB) New chain created: 20231222T120000Z To transition to the new chain: 1. Stop the current streaming process: pg_scribe --stop -f /backups/mydb 2. Start streaming to the new chain: pg_scribe --start -d mydb -f /backups/mydb -S mydb_backup ``` **Output (with --start):** ``` Taking new base backup... ✓ Base backup complete: chain-20231222T120000Z (12.3 GB) Automatic Transition (--start specified) ✓ Stopping old streaming process (PID 12345)... ✓ Old streaming process stopped Starting streaming to new chain: 20231222T120000Z ✓ Output file: /backups/mydb/chain-20231222T120000Z/active.sql [continues streaming in foreground...] ``` **Use case:** Monthly or quarterly base backup refresh to keep chains manageable. Use `--start` for automated transition in production (recommended), or omit it for manual control. **Note on overlap:** Since the base backup is taken while streaming continues, there will be some overlap between the old chain's final differential and the new chain's base backup. This is acceptable and simplifies the design. Restore operates on one chain at a time, so overlap doesn't cause issues. **Note on --start:** When using `--start`, the command requires `-d/--dbname` and other connection options since it will exec into `pg_recvlogical` to begin streaming. --- ### `--restore` Restore database from a chain's base backup plus its differential backups. **Additional options:** ``` --backup-dir=DIRECTORY Backup input directory (required) -d, --dbname=DBNAME Target database name (required) -C, --create Create target database --chain-id=ID Specific chain to restore (default: latest) --include-active Also apply incomplete active.sql (risky!) --up-to=TIMESTAMP Only apply differentials up to this time --no-sync-sequences Skip sequence synchronization ``` **What it does:** 1. Determines target chain (latest or specified via `--chain-id`) 2. Locates base backup: `chain-{ID}/base.sql` 3. Finds all sealed differentials: `chain-{ID}/diff-*.sql` (sorted by timestamp) 4. Creates target database (if `--create` specified) 5. Restores globals: `chain-{ID}/globals.sql` 6. Restores base backup: `chain-{ID}/base.sql` 7. Applies sealed differentials in chronological order 8. Synchronizes sequences using `setval()` (unless `--no-sync-sequences`) 9. Reports statistics **Example:** ```bash # Restore latest chain to new database pg_scribe --restore --backup-dir /backups/mydb -d mydb_restored --create # Restore specific chain pg_scribe --restore --backup-dir /backups/mydb --chain-id 20231215T120000Z \ -d mydb_restored --create # Restore with incomplete active.sql (risky!) pg_scribe --restore --backup-dir /backups/mydb -d mydb_restored --create --include-active ``` **Output:** ``` Restoring from chain: 20231215T120000Z ✓ Restored globals ✓ Restored base backup (1,234,567 rows) ✓ Applied 15 differentials - diff-20231216T083000Z.sql - diff-20231217T083000Z.sql ... ✓ Synchronized 42 sequences Restore complete (took 3m 42s) ``` **Default behavior:** By default, `active.sql` is **ignored** because it's incomplete and may be inconsistent. Use `--include-active` only if you need the absolute latest changes and accept the risk. --- ### `--status` Check replication slot health and backup system status. **Additional options:** ``` --slot=SLOTNAME Replication slot name (default: pg_scribe) --backup-dir=DIRECTORY Backup directory to analyze (optional) ``` **What it does:** 1. Queries `pg_replication_slots` for slot health 2. Shows replication lag and WAL retention 3. Analyzes backup directory if provided: - Lists all chains with timestamps - Shows differential counts and sizes - Indicates which chain is actively streaming 4. Reports warnings about potential issues **Example:** ```bash pg_scribe --status -d mydb --slot mydb_backup --backup-dir /backups/mydb ``` **Output:** ``` Replication Slot: mydb_backup Status: active Current LSN: 0/9876543 Confirmed LSN: 0/9876540 Lag: 3 bytes Age: 7 days Backup Directory: /backups/mydb Chains: chain-20231215T120000Z Base: 2023-12-15 12:00:00 (10.2 GB) Differentials: 15 files (2.1 GB total) Status: sealed chain-20231222T120000Z (ACTIVE - streaming) Base: 2023-12-22 12:00:00 (10.5 GB) Differentials: 3 files (1.8 GB total) Last activity: 2 minutes ago PID: 12345 Total backup size: 24.1 GB ``` --- ## Chain-Based File Structure The backup directory contains multiple chain subdirectories, each representing a complete backup lineage: ``` /backups/mydb/ .pg_scribe.pid # Single pidfile for backup directory chain-20231215T120000Z/ # First chain (sealed) base.sql # Base backup (pg_dump) globals.sql # Roles, tablespaces (pg_dumpall) metadata.json # PG version, extensions, etc. diff-20231216T083000Z.sql # Sealed differential diff-20231217T083000Z.sql # Sealed differential ... chain-20231222T120000Z/ # Second chain (active) base.sql globals.sql metadata.json active.sql # Currently streaming (incomplete) ``` **Key concepts:** - **Chain ID**: Timestamp of base backup creation (e.g., `20231215T120000Z`) - **Base backup**: Complete database snapshot at chain creation time - **Active differential**: Currently streaming changes (incomplete, named `active.sql`) - **Sealed differential**: Completed differential (immutable, named `diff-{TIMESTAMP}.sql`) - **One active chain**: Only one chain receives streaming changes at a time - **One pidfile**: Single `.pg_scribe.pid` tracks the active streaming process See `doc/file-handling.md` for comprehensive details on file organization and terminology. --- ## Chain Management Chain management uses standard Unix tools: ```bash # List all chains ls -l /backups/mydb/ # View chain contents ls -lh /backups/mydb/chain-20231215T120000Z/ # Tree view tree /backups/mydb/ # Remove old chain (manual cleanup) rm -rf /backups/mydb/chain-20231215T120000Z/ # Check if streaming is active if [ -f /backups/mydb/.pg_scribe.pid ]; then cat /backups/mydb/.pg_scribe.pid ps -p $(cat /backups/mydb/.pg_scribe.pid) fi ``` **Cleanup strategy**: Old chains accumulate over time. Removal is manual and left to the user (or external scripts) to maintain full control over retention policies. --- ## Exit Status ``` 0 Success 1 General error 2 Database connection error 3 Replication slot error 4 Backup/restore error 5 Invalid arguments or validation failure (--init without --force) 10 Warning conditions (--status only) ``` --- ## Environment Variables Standard PostgreSQL environment variables are supported: ``` PGHOST Database host PGPORT Database port PGDATABASE Database name PGUSER Database user PGPASSWORD Database password (not recommended, use .pgpass instead) PG_COLOR Use color in diagnostics: always, auto, or never ``` --- ## Example Workflows ### Initial Setup and Daily Operation ```bash # 1. Initialize backup system (one-time setup, requires empty directory) pg_scribe --init -d production --backup-dir /backups/production --slot prod_backup # 2. Start streaming backups (run as daemon/service) pg_scribe --start -d production --backup-dir /backups/production --slot prod_backup \ 2>/var/log/pg_scribe.log & # 3. Daily differential rotation (cron: 0 2 * * *) pg_scribe --rotate-diff --backup-dir /backups/production # 4. Monthly new chain with automatic transition (cron: 0 3 1 * *) # This stops old streaming and starts new in one command pg_scribe --new-chain --start -d production --backup-dir /backups/production \ --slot prod_backup --compress=zstd:9 2>/var/log/pg_scribe.log & # 5. Monitor slot health (cron: */15 * * * *) pg_scribe --status -d production --slot prod_backup --backup-dir /backups/production # Alternative: Manual transition (if preferred) # 4a. Create new chain pg_scribe --new-chain -d production --backup-dir /backups/production --compress=zstd:9 # 4b. Stop old streaming pg_scribe --stop --backup-dir /backups/production # 4c. Start streaming to new chain pg_scribe --start -d production --backup-dir /backups/production --slot prod_backup \ 2>/var/log/pg_scribe.log & ``` ### Disaster Recovery ```bash # 1. Check available backups pg_scribe --status --backup-dir /backups/production # 2. Restore latest chain to new database pg_scribe --restore --backup-dir /backups/production -d production_restored --create # 3. Restore specific older chain pg_scribe --restore --backup-dir /backups/production --chain-id 20231215T120000Z \ -d production_december --create # 4. Test restored database psql -d production_restored -c "SELECT COUNT(*) FROM users;" # 5. Switch application to restored database (manual step) ``` --- ## Comparison with PostgreSQL Tools pg_scribe follows the same CLI conventions as core PostgreSQL tools: | Tool | Action Flags | Connection Options | File Options | |------|-------------|-------------------|--------------| | **pg_recvlogical** | `--create-slot`, `--drop-slot`, `--start` | `-d`, `-h`, `-p`, `-U` | `-f` (output file), `--slot` (slot) | | **pg_dump** | (positional dbname) | `-d`, `-h`, `-p`, `-U` | `-f` (output file) | | **pg_basebackup** | (none) | `-d`, `-h`, `-p`, `-U` | `-D` (data directory!) | | **pg_restore** | (none) | `-d`, `-h`, `-p`, `-U` | positional (archive file) | | **pg_scribe** | `--init`, `--start`, `--rotate-diff`, `--new-chain`, `--restore`, `--status` | `-d`, `-h`, `-p`, `-U` | `--backup-dir` (directory), `--slot` (slot) | **Note:** `pg_basebackup` uses `-D/--pgdata` because it creates an actual PostgreSQL data directory cluster. `pg_scribe` uses `--backup-dir` because it manages a collection of chains with SQL backup files. --- ## Implementation Notes ### Design Principles 1. **Consistent with PostgreSQL**: Follow exact same conventions as `pg_recvlogical`, `pg_dump`, `initdb`, `pg_basebackup` 2. **Fail-fast initialization**: `--init` requires empty directory, cleans up on failure (like `initdb` and `pg_basebackup`) 3. **Clear output**: Progress to stderr, data to stdout/files 4. **Scriptable**: Clear text output format, proper exit codes 5. **Opinionated file structure**: Chains provide consistent organization ### Technology Choices (POC) - **Language**: Bash or Python - Bash: Minimal dependencies, matches PostgreSQL tool style - Python: Better error handling, easier testing - **Dependencies**: Only PostgreSQL client tools (pg_recvlogical, pg_dump, pg_dumpall, psql) ### Key Implementation Components 1. **Connection management**: Use libpq environment variables, .pgpass 2. **Error handling**: Validate prerequisites before starting operations 3. **Chain management**: - Auto-detect latest chain for `--start` - Single pidfile at `{backup-dir}/.pg_scribe.pid` - Chain ID format: `YYYYMMDDTHHMMSSZ` (ISO 8601, UTC, sortable) 4. **Signal handling**: - SIGTERM/SIGINT for graceful shutdown (handled directly by pg_recvlogical after exec) - SIGHUP for file rotation (handled directly by pg_recvlogical after exec) 5. **Output conventions**: - Progress and status messages → stderr - SQL output → `active.sql` in chain directory 6. **Process management for --start**: - Validate prerequisites (connection, replication slot, no existing process) - Display configuration to stderr - Use `exec` to replace pg_scribe with `pg_recvlogical` - Benefits: No orphaned processes, direct signal handling, simpler code, same PID **Extensibility Note**: This POC design uses a modular, action-based CLI that can accommodate additional features and options in future versions without breaking compatibility.