1 # pg_scribe Command Line Interface
3 This document describes the command line interface for **pg_scribe**, an incremental SQL backup system for PostgreSQL.
7 pg_scribe uses action flags similar to other PostgreSQL tools like `pg_recvlogical`. The tool performs one primary action per invocation, specified by action flags.
9 The system organizes backups into **chains** - each chain consists of one base backup plus zero or more differential backups. See `doc/file-handling.md` for detailed terminology and file organization.
14 # Initialize backup system (one-time)
15 pg_scribe --init [OPTIONS]
17 # Start streaming incremental backups
18 pg_scribe --start [OPTIONS]
20 # Rotate the differential file
21 pg_scribe --rotate-diff [OPTIONS]
23 # Create new chain with fresh base backup
24 pg_scribe --new-chain [OPTIONS]
26 # Restore from backups
27 pg_scribe --restore [OPTIONS]
29 # Check replication slot status
30 pg_scribe --status [OPTIONS]
35 These options apply to all actions:
37 ### Connection Options
40 -d, --dbname=DBNAME Database name (can be a connection string)
41 -h, --host=HOSTNAME Database server host (default: localhost)
42 -p, --port=PORT Database server port (default: 5432)
43 -U, --username=USERNAME Database user (default: $PGUSER or $USER)
44 -w, --no-password Never prompt for password
45 -W, --password Force password prompt
51 -v, --verbose Enable verbose mode
52 -V, --version Print version and exit
53 -?, --help Show help and exit
58 Exactly one of the following action flags must be specified:
62 Initialize the backup system by creating a replication slot, setting up DDL capture via the wal2sql extension, and taking an initial base backup in the first chain.
64 **This is a one-time initialization operation** - following PostgreSQL conventions (`initdb`, `pg_basebackup`), it requires an empty backup directory and will fail if already initialized. If initialization fails, partial state is automatically cleaned up (replication slot dropped, partial backup files removed).
66 **Additional options:**
69 --backup-dir=DIRECTORY Backup output directory (required)
70 --slot=SLOTNAME Replication slot name (default: pg_scribe)
71 --force Skip validation and force initialization (dangerous!)
72 --compress=METHOD Compression method: gzip, lz4, zstd, or none
73 Can include level (e.g., zstd:9) (default: none)
78 **Phase 1: Validation** (runs first, can fail)
80 1. **CRITICAL Checks** (must pass or initialization fails):
81 - Verify `wal_level = logical`
82 - Verify `max_replication_slots >= 1`
83 - Verify `max_wal_senders >= 1`
84 - Check all tables have adequate replica identity (PRIMARY KEY, USING INDEX, or FULL)
86 2. **Coverage Warnings** (non-blocking, informational):
87 - List unlogged tables (will not be backed up)
88 - Check for large objects (not incrementally backed up)
90 **Phase 2: Setup** (only runs if validation passes or `--force` used)
92 1. Verifies backup directory is empty (or doesn't exist)
93 2. Creates the wal2sql extension if it doesn't exist (`CREATE EXTENSION IF NOT EXISTS wal2sql;`)
94 - This automatically installs the DDL event trigger
95 3. Creates a logical replication slot (fails if slot already exists)
96 4. Generates chain ID from current timestamp (e.g., `20231215T120000Z`)
97 5. Creates first chain directory: `chain-{CHAIN_ID}/`
98 6. Takes synchronized base backup using the slot's snapshot → `chain-{CHAIN_ID}/base.sql`
99 7. Creates `pg_dumpall --globals-only` backup → `chain-{CHAIN_ID}/globals.sql`
100 8. Generates metadata file → `chain-{CHAIN_ID}/metadata.json` (PostgreSQL version, extensions, encoding)
101 9. Applies compression if `--compress` specified
106 pg_scribe --init -d mydb --backup-dir /backups/mydb --slot mydb_backup
109 **Output should convey:**
111 - **Validation results** with clear pass/fail status for:
112 - PostgreSQL configuration (wal_level, max_replication_slots, max_wal_senders)
113 - Replica identity for all tables
114 - Coverage warnings (unlogged tables, large objects)
115 - **If validation fails**: List CRITICAL issues with specific fix commands, then exit with error code 5 (unless `--force` used)
116 - **If validation passes** (or `--force`): Progress through setup steps (extension creation, slot creation with snapshot ID, chain creation, backup paths)
117 - **Final status**: Success message with chain ID created
123 ✓ Replication slot 'mydb_backup' created at LSN 0/1234567
124 ✓ Initial chain created: 20231215T120000Z
125 Location: /backups/mydb/chain-20231215T120000Z/
127 Next step: Start streaming with:
128 pg_scribe --start --backup-dir /backups/mydb
135 Start streaming incremental backups continuously to the latest chain in the backup directory.
137 **Additional options:**
140 --backup-dir=DIRECTORY Backup directory (required)
141 --slot=SLOTNAME Replication slot name (default: pg_scribe)
142 -s, --status-interval=SECS Status update interval in seconds (default: 10)
143 -F, --fsync-interval=SECS Fsync interval in seconds (default: 10, 0 to disable)
148 1. Validates database connection and replication slot
149 2. Auto-detects latest chain (newest by timestamp)
150 3. Checks for existing streaming process (via `.pg_scribe.pid`)
151 4. Creates top-level pidfile: `{backup-dir}/.pg_scribe.pid`
152 5. Uses `exec` to replace itself with `pg_recvlogical`, which then:
153 - Streams decoded changes using the wal2sql plugin
154 - Writes SQL to `chain-{CHAIN_ID}/active.sql`
155 - Periodically fsyncs output file for crash safety
156 - Reports LSN position and lag to stderr
157 - Responds to SIGHUP by closing and reopening output file (for log rotation)
159 **Implementation:** `pg_scribe --start` is a thin wrapper that validates prerequisites and uses `exec` to become `pg_recvlogical`. This design has several advantages:
160 - **No orphaned processes**: pg_scribe becomes pg_recvlogical (same PID), eliminating parent-child complexity
161 - **Direct signal handling**: SIGHUP, SIGTERM, SIGINT go directly to pg_recvlogical without forwarding
162 - **Simpler code**: No need for signal forwarding, child process tracking, or wait loops
163 - **Reliable cleanup**: Tests and process managers interact with a single process
165 **Log rotation support:** File rotation is provided automatically by `pg_recvlogical`:
167 1. Rename the current output file
168 2. Send SIGHUP to the pg_scribe/pg_recvlogical process
169 3. pg_recvlogical closes the old file and opens a new one with the same name
174 # Stream to latest chain (foreground)
175 pg_scribe --start -d mydb --backup-dir /backups/mydb --slot mydb_backup
177 # Run as background daemon (redirect stderr to log file)
178 pg_scribe --start -d mydb --backup-dir /backups/mydb --slot mydb_backup \
179 2>/var/log/pg_scribe.log &
181 # Stream to stdout (for processing with other tools) - NOT SUPPORTED in chain mode
182 # Use --backup-dir; active.sql will be created in latest chain
185 **Output (to stderr):**
187 ✓ Found latest chain: 20231215T120000Z
188 ✓ Starting replication from slot 'mydb_backup'
189 ✓ Writing to: /backups/mydb/chain-20231215T120000Z/active.sql
190 [periodic LSN updates...]
193 **Note:** Only one `--start` process can run per backup directory at a time.
199 Rotate the differential file within the active chain, sealing the current differential and starting a new one (similar to log rotation).
201 **Additional options:**
204 --backup-dir=DIRECTORY Backup directory (required)
209 1. Finds the active streaming process via `.pg_scribe.pid`
210 2. Validates process exists and is pg_recvlogical
211 3. Locates the active chain (finds `active.sql`)
212 4. Generates new differential name: `diff-{TIMESTAMP}.sql`
213 5. **Atomic rotation:**
214 - Rename `active.sql` → `diff-{TIMESTAMP}.sql`
215 - Send SIGHUP to pg_recvlogical process
216 - Wait for new `active.sql` to appear (timeout after 30s)
217 - Validate new file is being written to
222 pg_scribe --rotate-diff --backup-dir /backups/mydb
227 ✓ Found active chain: 20231215T120000Z
228 ✓ Rotated differential: diff-20231216T083000Z.sql (2.1 GB)
229 ✓ New differential started
232 **Use case:** Daily or weekly differential rotation to keep individual files manageable.
234 **Edge cases handled:**
235 - Stale pidfile (process doesn't exist)
236 - Wrong process (not pg_recvlogical)
237 - SIGHUP timeout (new file doesn't appear)
243 Create a new chain with a fresh base backup. This operation takes a new base backup while streaming continues to the old chain, then transitions streaming to the new chain.
245 **Additional options:**
248 --backup-dir=DIRECTORY Backup directory (required)
249 --compress=METHOD Compression method: gzip, lz4, zstd, or none
250 Can include level (e.g., zstd:9) (default: none)
255 1. Generates new chain ID from current timestamp
256 2. Creates new chain directory: `chain-{NEW_ID}/`
257 3. **Takes new base backup** (while streaming continues to old chain):
258 - `pg_dump` → `chain-{NEW_ID}/base.sql`
259 - `pg_dumpall --globals-only` → `chain-{NEW_ID}/globals.sql`
260 - Metadata → `chain-{NEW_ID}/metadata.json`
261 - Apply compression if specified
262 4. **Transitions streaming** (if a process is actively streaming):
263 - Find currently active chain
264 - Seal old chain's `active.sql` → `diff-{TIMESTAMP}.sql`
265 - Send SIGHUP to confirm seal, then SIGTERM to stop
266 - Start streaming to new chain
271 pg_scribe --new-chain -d mydb --backup-dir /backups/mydb
272 pg_scribe --new-chain -d mydb --backup-dir /backups/mydb --compress=zstd:9
277 Taking new base backup...
278 ✓ Base backup complete: chain-20231222T120000Z (12.3 GB)
280 Transitioning from chain-20231215T120000Z...
281 ✓ Sealed old chain: diff-20231222T115900Z.sql
282 ✓ Stopped streaming to old chain
283 ✓ Started streaming to new chain: 20231222T120000Z
286 **Use case:** Monthly or quarterly base backup refresh to keep chains manageable.
288 **Note on overlap:** Since the base backup is taken while streaming continues, there will be some overlap between the old chain's final differential and the new chain's base backup. This is acceptable and simplifies the design. Restore operates on one chain at a time, so overlap doesn't cause issues.
294 Restore database from a chain's base backup plus its differential backups.
296 **Additional options:**
299 --backup-dir=DIRECTORY Backup input directory (required)
300 -d, --dbname=DBNAME Target database name (required)
301 -C, --create Create target database
302 --chain-id=ID Specific chain to restore (default: latest)
303 --include-active Also apply incomplete active.sql (risky!)
304 --up-to=TIMESTAMP Only apply differentials up to this time
305 --no-sync-sequences Skip sequence synchronization
310 1. Determines target chain (latest or specified via `--chain-id`)
311 2. Locates base backup: `chain-{ID}/base.sql`
312 3. Finds all sealed differentials: `chain-{ID}/diff-*.sql` (sorted by timestamp)
313 4. Creates target database (if `--create` specified)
314 5. Restores globals: `chain-{ID}/globals.sql`
315 6. Restores base backup: `chain-{ID}/base.sql`
316 7. Applies sealed differentials in chronological order
317 8. Synchronizes sequences using `setval()` (unless `--no-sync-sequences`)
318 9. Reports statistics
323 # Restore latest chain to new database
324 pg_scribe --restore --backup-dir /backups/mydb -d mydb_restored --create
326 # Restore specific chain
327 pg_scribe --restore --backup-dir /backups/mydb --chain-id 20231215T120000Z \
328 -d mydb_restored --create
330 # Restore with incomplete active.sql (risky!)
331 pg_scribe --restore --backup-dir /backups/mydb -d mydb_restored --create --include-active
336 Restoring from chain: 20231215T120000Z
338 ✓ Restored base backup (1,234,567 rows)
339 ✓ Applied 15 differentials
340 - diff-20231216T083000Z.sql
341 - diff-20231217T083000Z.sql
343 ✓ Synchronized 42 sequences
344 Restore complete (took 3m 42s)
347 **Default behavior:** By default, `active.sql` is **ignored** because it's incomplete and may be inconsistent. Use `--include-active` only if you need the absolute latest changes and accept the risk.
353 Check replication slot health and backup system status.
355 **Additional options:**
358 --slot=SLOTNAME Replication slot name (default: pg_scribe)
359 --backup-dir=DIRECTORY Backup directory to analyze (optional)
364 1. Queries `pg_replication_slots` for slot health
365 2. Shows replication lag and WAL retention
366 3. Analyzes backup directory if provided:
367 - Lists all chains with timestamps
368 - Shows differential counts and sizes
369 - Indicates which chain is actively streaming
370 4. Reports warnings about potential issues
375 pg_scribe --status -d mydb --slot mydb_backup --backup-dir /backups/mydb
380 Replication Slot: mydb_backup
382 Current LSN: 0/9876543
383 Confirmed LSN: 0/9876540
387 Backup Directory: /backups/mydb
390 chain-20231215T120000Z
391 Base: 2023-12-15 12:00:00 (10.2 GB)
392 Differentials: 15 files (2.1 GB total)
395 chain-20231222T120000Z (ACTIVE - streaming)
396 Base: 2023-12-22 12:00:00 (10.5 GB)
397 Differentials: 3 files (1.8 GB total)
398 Last activity: 2 minutes ago
401 Total backup size: 24.1 GB
406 ## Chain-Based File Structure
408 The backup directory contains multiple chain subdirectories, each representing a complete backup lineage:
412 .pg_scribe.pid # Single pidfile for backup directory
413 chain-20231215T120000Z/ # First chain (sealed)
414 base.sql # Base backup (pg_dump)
415 globals.sql # Roles, tablespaces (pg_dumpall)
416 metadata.json # PG version, extensions, etc.
417 diff-20231216T083000Z.sql # Sealed differential
418 diff-20231217T083000Z.sql # Sealed differential
420 chain-20231222T120000Z/ # Second chain (active)
424 active.sql # Currently streaming (incomplete)
428 - **Chain ID**: Timestamp of base backup creation (e.g., `20231215T120000Z`)
429 - **Base backup**: Complete database snapshot at chain creation time
430 - **Active differential**: Currently streaming changes (incomplete, named `active.sql`)
431 - **Sealed differential**: Completed differential (immutable, named `diff-{TIMESTAMP}.sql`)
432 - **One active chain**: Only one chain receives streaming changes at a time
433 - **One pidfile**: Single `.pg_scribe.pid` tracks the active streaming process
435 See `doc/file-handling.md` for comprehensive details on file organization and terminology.
441 Chain management uses standard Unix tools:
447 # View chain contents
448 ls -lh /backups/mydb/chain-20231215T120000Z/
453 # Remove old chain (manual cleanup)
454 rm -rf /backups/mydb/chain-20231215T120000Z/
456 # Check if streaming is active
457 if [ -f /backups/mydb/.pg_scribe.pid ]; then
458 cat /backups/mydb/.pg_scribe.pid
459 ps -p $(cat /backups/mydb/.pg_scribe.pid)
463 **Cleanup strategy**: Old chains accumulate over time. Removal is manual and left to the user (or external scripts) to maintain full control over retention policies.
472 2 Database connection error
473 3 Replication slot error
474 4 Backup/restore error
475 5 Invalid arguments or validation failure (--init without --force)
476 10 Warning conditions (--status only)
481 ## Environment Variables
483 Standard PostgreSQL environment variables are supported:
488 PGDATABASE Database name
490 PGPASSWORD Database password (not recommended, use .pgpass instead)
491 PG_COLOR Use color in diagnostics: always, auto, or never
498 ### Initial Setup and Daily Operation
501 # 1. Initialize backup system (one-time setup, requires empty directory)
502 pg_scribe --init -d production --backup-dir /backups/production --slot prod_backup
504 # 2. Start streaming backups (run as daemon/service)
505 pg_scribe --start -d production --backup-dir /backups/production --slot prod_backup \
506 2>/var/log/pg_scribe.log &
508 # 3. Daily differential rotation (cron: 0 2 * * *)
509 pg_scribe --rotate-diff --backup-dir /backups/production
511 # 4. Monthly new chain with fresh base backup (cron: 0 3 1 * *)
512 pg_scribe --new-chain -d production --backup-dir /backups/production --compress=zstd:9
514 # 5. Monitor slot health (cron: */15 * * * *)
515 pg_scribe --status -d production --slot prod_backup --backup-dir /backups/production
518 ### Disaster Recovery
521 # 1. Check available backups
522 pg_scribe --status --backup-dir /backups/production
524 # 2. Restore latest chain to new database
525 pg_scribe --restore --backup-dir /backups/production -d production_restored --create
527 # 3. Restore specific older chain
528 pg_scribe --restore --backup-dir /backups/production --chain-id 20231215T120000Z \
529 -d production_december --create
531 # 4. Test restored database
532 psql -d production_restored -c "SELECT COUNT(*) FROM users;"
534 # 5. Switch application to restored database (manual step)
539 ## Comparison with PostgreSQL Tools
541 pg_scribe follows the same CLI conventions as core PostgreSQL tools:
543 | Tool | Action Flags | Connection Options | File Options |
544 |------|-------------|-------------------|--------------|
545 | **pg_recvlogical** | `--create-slot`, `--drop-slot`, `--start` | `-d`, `-h`, `-p`, `-U` | `-f` (output file), `--slot` (slot) |
546 | **pg_dump** | (positional dbname) | `-d`, `-h`, `-p`, `-U` | `-f` (output file) |
547 | **pg_basebackup** | (none) | `-d`, `-h`, `-p`, `-U` | `-D` (data directory!) |
548 | **pg_restore** | (none) | `-d`, `-h`, `-p`, `-U` | positional (archive file) |
549 | **pg_scribe** | `--init`, `--start`, `--rotate-diff`, `--new-chain`, `--restore`, `--status` | `-d`, `-h`, `-p`, `-U` | `--backup-dir` (directory), `--slot` (slot) |
551 **Note:** `pg_basebackup` uses `-D/--pgdata` because it creates an actual PostgreSQL data directory cluster. `pg_scribe` uses `--backup-dir` because it manages a collection of chains with SQL backup files.
555 ## Implementation Notes
557 ### Design Principles
559 1. **Consistent with PostgreSQL**: Follow exact same conventions as `pg_recvlogical`, `pg_dump`, `initdb`, `pg_basebackup`
560 2. **Fail-fast initialization**: `--init` requires empty directory, cleans up on failure (like `initdb` and `pg_basebackup`)
561 3. **Clear output**: Progress to stderr, data to stdout/files
562 4. **Scriptable**: Clear text output format, proper exit codes
563 5. **Opinionated file structure**: Chains provide consistent organization
565 ### Technology Choices (POC)
567 - **Language**: Bash or Python
568 - Bash: Minimal dependencies, matches PostgreSQL tool style
569 - Python: Better error handling, easier testing
570 - **Dependencies**: Only PostgreSQL client tools (pg_recvlogical, pg_dump, pg_dumpall, psql)
572 ### Key Implementation Components
574 1. **Connection management**: Use libpq environment variables, .pgpass
575 2. **Error handling**: Validate prerequisites before starting operations
576 3. **Chain management**:
577 - Auto-detect latest chain for `--start`
578 - Single pidfile at `{backup-dir}/.pg_scribe.pid`
579 - Chain ID format: `YYYYMMDDTHHMMSSZ` (ISO 8601, UTC, sortable)
580 4. **Signal handling**:
581 - SIGTERM/SIGINT for graceful shutdown (handled directly by pg_recvlogical after exec)
582 - SIGHUP for file rotation (handled directly by pg_recvlogical after exec)
583 5. **Output conventions**:
584 - Progress and status messages → stderr
585 - SQL output → `active.sql` in chain directory
586 6. **Process management for --start**:
587 - Validate prerequisites (connection, replication slot, no existing process)
588 - Display configuration to stderr
589 - Use `exec` to replace pg_scribe with `pg_recvlogical`
590 - Benefits: No orphaned processes, direct signal handling, simpler code, same PID
592 **Extensibility Note**: This POC design uses a modular, action-based CLI that can accommodate additional features and options in future versions without breaking compatibility.