# pg_fossil CLI Design Specification **Version**: 1.0 **Date**: 2025-10-13 ## Overview `pg_fossil` is a unified command-line interface that orchestrates the underlying PostgreSQL tools (`pg_recvlogical`, `pg_dump`, `pg_dumpall`, event triggers) into a cohesive backup system. It follows Unix principles: simple subcommands, composable operations, clear stdout/stderr separation, and scriptable workflows. ## Design Philosophy **Unix Principles Applied:** - **Composability**: Each subcommand can be used independently or in pipelines - **Minimalism**: One subcommand per logical operation - **Discoverability**: Clear help output, sensible defaults, interactive prompts when needed - **Text-based**: All configuration and metadata in plain text files - **Non-interactive by default**: Suitable for cron jobs and scripts - **Exit codes**: 0 for success, non-zero for failure (scriptable) ## Command Structure ``` pg_fossil [options] Subcommands: init Initialize backup system for a database collect Collect incremental changes (continuous or one-shot) snapshot Take a full backup snapshot restore Restore database from backups verify Verify backup integrity status Show replication slot and backup status prune Remove old backups according to retention policy emergency Emergency procedures (drop slot, etc.) ``` ## Subcommand Specifications ### `pg_fossil init` **Purpose**: Bootstrap the backup system for a database. **Usage**: ```bash pg_fossil init [options] Options: -d, --dbname=NAME Database name (required) -h, --host=HOST Database host (default: localhost) -p, --port=PORT Database port (default: 5432) -U, --username=USER Database user (default: current user) -s, --slot=NAME Replication slot name (default: pg_fossil_) -D, --backup-dir=PATH Backup directory (default: ./pg_fossil_backups) --plugin-path=PATH Path to decoder_raw.so (auto-detected by default) --skip-full-backup Skip initial full backup (for testing) --skip-event-triggers Skip event trigger setup (not recommended) -v, --verbose Verbose output -q, --quiet Minimal output ``` **Example**: ```bash pg_fossil init --dbname=mydb --backup-dir=/var/backups/mydb ``` **What it does**: 1. Checks PostgreSQL configuration (`wal_level = logical`, etc.) 2. Verifies decoder_raw plugin is installed 3. Creates logical replication slot with decoder_raw 4. Exports snapshot from slot 5. Takes initial base backup using exported snapshot 6. Creates `pg_dumpall --globals-only` backup 7. Sets up event triggers for DDL capture 8. Configures `REPLICA IDENTITY` on tables without primary keys (with warnings) 9. Creates backup directory structure 10. Writes configuration file: `$BACKUP_DIR/pg_fossil.conf` 11. Writes metadata file: `$BACKUP_DIR/metadata.json` **Output** (stdout): ``` pg_fossil: Checking PostgreSQL configuration... OK pg_fossil: Verifying decoder_raw plugin... OK pg_fossil: Creating replication slot 'pg_fossil_mydb'... OK pg_fossil: Exported snapshot: 00000003-00000001-1 pg_fossil: Taking initial base backup... OK (2.3 GB) pg_fossil: Backing up global objects... OK pg_fossil: Setting up event triggers... OK pg_fossil: Checking replica identity... pg_fossil: WARNING: Table 'public.logs' has no replica identity pg_fossil: Set to REPLICA IDENTITY FULL pg_fossil: Creating backup directory: /var/backups/mydb pg_fossil: Writing configuration... OK pg_fossil: pg_fossil: Initialization complete! pg_fossil: pg_fossil: Next steps: pg_fossil: 1. Start collection: pg_fossil collect -d mydb pg_fossil: 2. Monitor status: pg_fossil status -d mydb pg_fossil: 3. Schedule snapshots (cron): 0 2 * * * pg_fossil snapshot -d mydb ``` **Exit codes**: - 0: Success - 1: PostgreSQL configuration error (wal_level not logical, etc.) - 2: decoder_raw plugin not found - 3: Permission denied - 4: Slot already exists (use --force to recreate) - 5: Disk space insufficient **Files created**: ``` /var/backups/mydb/ ├── pg_fossil.conf # Configuration (plain text) ├── metadata.json # PostgreSQL version, extensions, encoding ├── base/ │ └── 2024-01-15_base.sql.gz ├── globals/ │ └── 2024-01-15_globals.sql ├── incremental/ │ └── (empty initially) └── logs/ └── pg_fossil.log ``` ### `pg_fossil collect` **Purpose**: Collect incremental changes from logical replication slot. **Usage**: ```bash pg_fossil collect [options] Options: -d, --dbname=NAME Database name (required, or read from config) -D, --backup-dir=PATH Backup directory (default: read from config) -f, --foreground Run in foreground (default: daemon mode) --stdout Write changes to stdout instead of files --rotate-interval=DURATION Rotate files every DURATION (default: 1h) --rotate-size=SIZE Rotate when file exceeds SIZE (default: 100MB) --fsync-interval=SEC Fsync interval in seconds (default: 10) --start-lsn=LSN Start from specific LSN (for recovery) -v, --verbose Verbose output --no-daemon Synonym for --foreground ``` **Example (daemon mode)**: ```bash pg_fossil collect --dbname=mydb # Runs in background, writes PID to backup-dir/pg_fossil_collect.pid ``` **Example (foreground for testing)**: ```bash pg_fossil collect --dbname=mydb --foreground --verbose ``` **Example (stdout for piping)**: ```bash pg_fossil collect --dbname=mydb --stdout | tee incremental.sql | grep "DELETE FROM users" ``` **What it does**: 1. Reads configuration from `pg_fossil.conf` 2. Starts `pg_recvlogical` with decoder_raw plugin 3. Manages file rotation (time-based or size-based) 4. Handles SIGHUP (rotate), SIGTERM (shutdown), SIGINT (shutdown) 5. Writes to timestamped files: `incremental/YYYY-MM-DD_HH-MM-SS.sql` 6. Updates position tracking metadata 7. Monitors replication lag and logs warnings **Output** (stdout in foreground mode): ``` pg_fossil: Starting collection for database 'mydb' pg_fossil: Replication slot: pg_fossil_mydb pg_fossil: Output directory: /var/backups/mydb/incremental pg_fossil: Rotation: every 1h or 100MB pg_fossil: Connected to PostgreSQL, streaming changes... pg_fossil: [2024-01-15 10:30:15] Wrote 1,234 transactions to 2024-01-15_10-00-00.sql pg_fossil: [2024-01-15 11:00:00] Rotating file (time interval reached) pg_fossil: [2024-01-15 11:00:01] New file: 2024-01-15_11-00-00.sql ... ``` **Output** (daemon mode): ``` pg_fossil: Collection started in background (PID: 12345) pg_fossil: Logs: /var/backups/mydb/logs/pg_fossil_collect.log pg_fossil: Stop with: kill -TERM 12345 ``` **Signal handling**: - **SIGHUP**: Rotate current incremental file - **SIGTERM**: Graceful shutdown (flush current file, update position) - **SIGINT**: Same as SIGTERM **Exit codes**: - 0: Success (foreground: clean shutdown; daemon: successfully started) - 1: Configuration error - 2: Cannot connect to database - 3: Replication slot not found (run `pg_fossil init` first) - 4: Already running (PID file exists) ### `pg_fossil snapshot` **Purpose**: Take a full backup snapshot (base + globals). **Usage**: ```bash pg_fossil snapshot [options] Options: -d, --dbname=NAME Database name (required, or read from config) -D, --backup-dir=PATH Backup directory (default: read from config) --compress=METHOD Compression method: gzip, zstd, none (default: gzip) --parallel=N Parallel dump jobs (pg_dump -j, default: 1) --exclude-schema=PATTERN Exclude schemas matching pattern --exclude-table=PATTERN Exclude tables matching pattern -v, --verbose Verbose output ``` **Example**: ```bash pg_fossil snapshot --dbname=mydb --compress=zstd --parallel=4 ``` **What it does**: 1. Runs `pg_dump` with specified options 2. Runs `pg_dumpall --globals-only` 3. Compresses output files 4. Updates metadata with snapshot timestamp and LSN 5. Calculates and stores SHA256 checksums **Output** (stdout): ``` pg_fossil: Taking snapshot of database 'mydb' pg_fossil: Starting pg_dump (parallel: 4, compression: zstd)... pg_fossil: Progress: [=========> ] 45% (2.1 GB / 4.7 GB) pg_fossil: Snapshot complete: base/2024-01-15_base.sql.zst (1.8 GB compressed) pg_fossil: Backing up global objects... pg_fossil: Globals saved: globals/2024-01-15_globals.sql pg_fossil: Calculating checksums... pg_fossil: SHA256: a3f2e9d8c4b1... (base) pg_fossil: SHA256: 7b4c1e9a2f3d... (globals) pg_fossil: Snapshot complete! pg_fossil: LSN: 0/1A2B3C4D pg_fossil: Size: 1.8 GB (4.7 GB uncompressed) pg_fossil: Duration: 3m 42s ``` **Exit codes**: - 0: Success - 1: Configuration error - 2: pg_dump failed - 3: Disk space insufficient - 4: Checksum calculation failed ### `pg_fossil restore` **Purpose**: Restore database from backups. **Usage**: ```bash pg_fossil restore [options] Options: -d, --dbname=NAME Target database name (required) -D, --backup-dir=PATH Backup directory (required) --snapshot=TIMESTAMP Restore specific snapshot (default: latest) --until=TIMESTAMP Point-in-time recovery: apply incrementals until this time --target-lsn=LSN Stop at specific LSN (alternative to --until) --no-globals Skip restoring global objects --no-ddl Skip applying DDL changes from event triggers --no-incremental Restore only snapshot (no incremental changes) --target-db=NAME Database name on target (if different from backup) --dry-run Show restore plan without executing --verify Verify checksums before restore -v, --verbose Verbose output -y, --yes Assume yes to prompts (for scripts) ``` **Example (full restore)**: ```bash pg_fossil restore --dbname=mydb_restored --backup-dir=/var/backups/mydb --verify ``` **Example (point-in-time recovery)**: ```bash pg_fossil restore --dbname=mydb_restored --backup-dir=/var/backups/mydb \ --until="2024-01-15 14:30:00" ``` **Example (dry run)**: ```bash pg_fossil restore --dbname=mydb_restored --backup-dir=/var/backups/mydb --dry-run ``` **What it does**: 1. Finds latest snapshot (or specified snapshot) 2. Identifies all incremental files since snapshot 3. If `--until` specified, filters incrementals by timestamp 4. Verifies checksums if `--verify` enabled 5. Prompts for confirmation (unless `--yes`) 6. Creates target database if it doesn't exist 7. Restores global objects (unless `--no-globals`) 8. Restores base snapshot 9. Applies DDL changes from `ddl_history` in chronological order (unless `--no-ddl`) 10. Applies incremental SQL files in order (unless `--no-incremental`) 11. Synchronizes sequences using `setval()` 12. Reports success and statistics **Output** (stdout): ``` pg_fossil: Restore plan for database 'mydb' pg_fossil: Source: /var/backups/mydb pg_fossil: Target: mydb_restored pg_fossil: pg_fossil: Restore components: pg_fossil: 1. Snapshot: 2024-01-15_base.sql.zst (4.7 GB uncompressed) pg_fossil: 2. Globals: 2024-01-15_globals.sql pg_fossil: 3. Incremental files: 24 files (2024-01-15 to 2024-01-16) pg_fossil: Total transactions: ~45,230 pg_fossil: 4. Point-in-time: 2024-01-15 14:30:00 pg_fossil: pg_fossil: WARNING: This will create/overwrite database 'mydb_restored' pg_fossil: Continue? [y/N]: y pg_fossil: pg_fossil: [1/6] Verifying checksums... OK pg_fossil: [2/6] Creating database 'mydb_restored'... OK pg_fossil: [3/6] Restoring global objects... OK pg_fossil: [4/6] Restoring base snapshot... pg_fossil: Progress: [=================> ] 85% pg_fossil: OK (3m 15s) pg_fossil: [5/6] Applying DDL changes... 12 schema changes applied pg_fossil: [6/6] Applying incremental changes... pg_fossil: Processing: 2024-01-15_10-00-00.sql (3,451 transactions) pg_fossil: Processing: 2024-01-15_11-00-00.sql (2,890 transactions) pg_fossil: ... pg_fossil: Stopped at: 2024-01-15 14:30:00 (--until reached) pg_fossil: Applied: 18 of 24 files, 38,442 transactions pg_fossil: Synchronizing sequences... 15 sequences updated pg_fossil: pg_fossil: Restore complete! pg_fossil: Database: mydb_restored pg_fossil: Final LSN: 0/1A2B3C4D pg_fossil: Total duration: 8m 32s pg_fossil: pg_fossil: Next steps: pg_fossil: 1. Verify data: psql mydb_restored -c "SELECT count(*) FROM users;" pg_fossil: 2. Test application against restored database pg_fossil: 3. When ready, switch application to mydb_restored ``` **Exit codes**: - 0: Success - 1: Configuration error - 2: Snapshot not found - 3: Restore failed (SQL error during apply) - 4: Checksum verification failed - 5: User cancelled (answered 'no' to prompt) ### `pg_fossil status` **Purpose**: Show replication slot health and backup status. **Usage**: ```bash pg_fossil status [options] Options: -d, --dbname=NAME Database name (required, or read from config) -D, --backup-dir=PATH Backup directory (default: read from config) --json Output in JSON format (for monitoring) --check Exit with error if slot lag is critical --warn-lag=SIZE Warning threshold (default: 1GB) --critical-lag=SIZE Critical threshold (default: 10GB) -v, --verbose Show detailed statistics ``` **Example**: ```bash pg_fossil status --dbname=mydb ``` **Example (monitoring script)**: ```bash pg_fossil status --dbname=mydb --check --json | jq '.slot.lag_bytes' ``` **What it does**: 1. Queries `pg_replication_slots` for slot status 2. Calculates replication lag in bytes and estimated time 3. Lists recent snapshots and incremental files 4. Shows disk usage statistics 5. If `--check` enabled, exits with error code if lag exceeds critical threshold **Output** (stdout, human-readable): ``` pg_fossil: Status for database 'mydb' Replication Slot: Name: pg_fossil_mydb Status: active Plugin: decoder_raw Current LSN: 0/1A2B3C4D Restart LSN: 0/1A2B3000 Confirmed Flush: 0/1A2B3C40 Lag: 13 MB (estimated 2 minutes behind) WAL Retained: 156 MB Last Activity: 2024-01-15 15:42:33 (2 seconds ago) Backup Directory: /var/backups/mydb Latest Snapshot: 2024-01-15_base.sql.zst (1.8 GB) Age: 13h 42m Incremental Files: 24 files Range: 2024-01-15 02:00 to 2024-01-15 15:42 Total size: 245 MB Disk Usage: Total backups: 2.3 GB Available space: 847 GB (99% free) Collection Process: Status: running (PID: 12345) Uptime: 13h 42m Current file: incremental/2024-01-15_15-00-00.sql (23 MB) Health: OK ``` **Output** (JSON format): ```json { "database": "mydb", "timestamp": "2024-01-15T15:42:35Z", "slot": { "name": "pg_fossil_mydb", "active": true, "plugin": "decoder_raw", "current_lsn": "0/1A2B3C4D", "restart_lsn": "0/1A2B3000", "confirmed_flush_lsn": "0/1A2B3C40", "lag_bytes": 13631488, "lag_estimated_seconds": 120, "wal_retained_bytes": 163577856, "last_activity": "2024-01-15T15:42:33Z" }, "backups": { "directory": "/var/backups/mydb", "latest_snapshot": { "file": "2024-01-15_base.sql.zst", "size_bytes": 1932735283, "timestamp": "2024-01-15T02:00:00Z", "age_seconds": 49353 }, "incremental_count": 24, "incremental_size_bytes": 256901120, "total_size_bytes": 2402338816 }, "disk": { "available_bytes": 909543923712, "percent_free": 99 }, "collection": { "running": true, "pid": 12345, "uptime_seconds": 49320 }, "health": "ok" } ``` **Exit codes**: - 0: Health OK (or not using `--check`) - 1: Warning threshold exceeded - 2: Critical threshold exceeded - 3: Slot inactive - 4: Collection process not running ### `pg_fossil verify` **Purpose**: Verify backup integrity and executability. **Usage**: ```bash pg_fossil verify [options] Options: -D, --backup-dir=PATH Backup directory (required) --snapshot=TIMESTAMP Verify specific snapshot (default: all) --incremental Verify incremental files --checksums Verify SHA256 checksums --syntax Verify SQL syntax (parse only, no execution) --deep Deep verification (try restoring to temp database) --parallel=N Parallel verification jobs (default: 1) -v, --verbose Verbose output ``` **Example**: ```bash pg_fossil verify --backup-dir=/var/backups/mydb --checksums --syntax ``` **Example (deep verification)**: ```bash pg_fossil verify --backup-dir=/var/backups/mydb --deep ``` **What it does**: 1. Verifies files exist and are readable 2. If `--checksums`: Validates SHA256 checksums 3. If `--syntax`: Parses SQL files for syntax errors (using `psql --no-psqlrc --single-transaction < file`) 4. If `--deep`: Creates temporary database, performs full restore, verifies row counts 5. Reports any corruption or issues **Output** (stdout): ``` pg_fossil: Verifying backups in /var/backups/mydb Snapshots: [OK] 2024-01-15_base.sql.zst (checksum valid, syntax valid) [OK] 2024-01-14_base.sql.zst (checksum valid, syntax valid) [OK] 2024-01-13_base.sql.zst (checksum valid, syntax valid) Globals: [OK] 2024-01-15_globals.sql (checksum valid, syntax valid) [OK] 2024-01-14_globals.sql (checksum valid, syntax valid) Incremental files (24 files): [OK] 2024-01-15_10-00-00.sql (syntax valid) [OK] 2024-01-15_11-00-00.sql (syntax valid) ... [WARN] 2024-01-15_14-00-00.sql (file is empty, may be rotation artifact) Summary: Total files checked: 29 Passed: 28 Warnings: 1 Errors: 0 Verification: PASSED ``` **Exit codes**: - 0: All checks passed - 1: Warnings found - 2: Errors found (corruption detected) - 3: Syntax errors in SQL files ### `pg_fossil prune` **Purpose**: Remove old backups according to retention policy. **Usage**: ```bash pg_fossil prune [options] Options: -D, --backup-dir=PATH Backup directory (required) --keep-days=N Keep incremental backups for N days (default: 7) --keep-snapshots=N Keep N most recent snapshots (default: 30) --keep-monthly=N Keep one snapshot per month for N months (default: 12) --dry-run Show what would be deleted without deleting --force Skip confirmation prompt -v, --verbose Verbose output ``` **Example**: ```bash pg_fossil prune --backup-dir=/var/backups/mydb --keep-days=7 --keep-snapshots=30 ``` **Example (dry run)**: ```bash pg_fossil prune --backup-dir=/var/backups/mydb --dry-run ``` **What it does**: 1. Identifies backups outside retention policy 2. Shows deletion plan 3. Prompts for confirmation (unless `--force` or `--dry-run`) 4. Deletes old backups 5. Updates metadata **Output** (stdout): ``` pg_fossil: Pruning backups in /var/backups/mydb Retention policy: Incremental: 7 days Snapshots: 30 most recent Monthly: 12 months Files to delete: Snapshots (3 files, 5.4 GB): 2023-11-15_base.sql.zst (1.8 GB) 2023-10-15_base.sql.zst (1.8 GB) 2023-09-15_base.sql.zst (1.8 GB) Incremental (156 files, 2.1 GB): 2024-01-01_*.sql through 2024-01-07_*.sql Total: 159 files, 7.5 GB Continue? [y/N]: y Deleting files... Progress: [==================>] 100% (159/159) Pruning complete! Deleted: 159 files (7.5 GB) Remaining: 87 files (12.3 GB) ``` **Exit codes**: - 0: Success - 1: Configuration error - 2: User cancelled - 3: Deletion failed (partial deletion) ### `pg_fossil emergency` **Purpose**: Emergency procedures for critical situations. **Usage**: ```bash pg_fossil emergency [options] Operations: drop-slot Drop replication slot (stops WAL retention) advance-slot Advance slot to current LSN (loses backup coverage) reset Full reset (drop slot, delete backups, start fresh) Options: -d, --dbname=NAME Database name (required) -s, --slot=NAME Replication slot name (default: from config) -D, --backup-dir=PATH Backup directory (for reset operation) --force Skip all confirmations (DANGEROUS) -v, --verbose Verbose output ``` **Example (drop slot)**: ```bash pg_fossil emergency drop-slot --dbname=mydb ``` **What it does**: 1. Shows impact assessment (WAL will be cleaned, backups may be incomplete) 2. Requires typing "YES" to confirm (unless `--force`) 3. Executes emergency operation 4. Logs action to audit log **Output** (stdout): ``` pg_fossil: EMERGENCY OPERATION: drop-slot WARNING: This will drop the replication slot 'pg_fossil_mydb' Impact: - PostgreSQL will immediately start cleaning old WAL files - Disk space will be freed - Incremental backup collection will STOP - Existing backups will remain but may be incomplete - You will need to run 'pg_fossil init' to restart backups Current slot status: WAL retained: 15.3 GB Lag: 12.5 GB (estimated 3 hours behind) Last activity: 3 hours ago This operation CANNOT be undone. Type 'YES' to confirm: YES Dropping replication slot 'pg_fossil_mydb'... OK Emergency operation complete. Replication slot has been dropped. Next steps: 1. Verify disk space freed: df -h 2. To resume backups: pg_fossil init --dbname=mydb 3. Review incident and prevent recurrence ``` **Exit codes**: - 0: Success - 1: User cancelled - 2: Operation failed - 3: Slot not found ## Configuration File Format **Location**: `$BACKUP_DIR/pg_fossil.conf` **Format**: Plain text, INI-style (compatible with PostgreSQL config format) **Example**: ```ini # pg_fossil configuration # Generated: 2024-01-15 02:00:00 [database] name = mydb host = localhost port = 5432 user = postgres [replication] slot_name = pg_fossil_mydb plugin = decoder_raw [backup] backup_dir = /var/backups/mydb compress = gzip parallel_jobs = 4 [incremental] rotate_interval = 1h rotate_size = 100MB fsync_interval = 10 [retention] incremental_days = 7 snapshot_count = 30 monthly_count = 12 [monitoring] warn_lag = 1GB critical_lag = 10GB ``` ## Metadata File Format **Location**: `$BACKUP_DIR/metadata.json` **Format**: JSON (machine-readable) **Example**: ```json { "version": "1.0", "database": { "name": "mydb", "version": "PostgreSQL 18.0 (Debian 18.0-1.pgdg120+1)", "encoding": "UTF8", "collation": "en_US.UTF-8", "ctype": "en_US.UTF-8" }, "extensions": [ {"name": "plpgsql", "version": "1.0"}, {"name": "pg_stat_statements", "version": "1.10"} ], "replication": { "slot_name": "pg_fossil_mydb", "plugin": "decoder_raw", "plugin_path": "/usr/lib/postgresql/18/lib/decoder_raw.so" }, "initialization": { "timestamp": "2024-01-15T02:00:00Z", "snapshot": "00000003-00000001-1", "lsn": "0/1A000000" }, "snapshots": [ { "timestamp": "2024-01-15T02:00:00Z", "file": "base/2024-01-15_base.sql.zst", "size_bytes": 1932735283, "uncompressed_bytes": 4932735283, "lsn": "0/1A2B3C4D", "sha256": "a3f2e9d8c4b1..." } ], "last_updated": "2024-01-15T15:42:35Z" } ``` ## Exit Code Convention All `pg_fossil` commands follow consistent exit code conventions: ``` 0 = Success 1 = Configuration/usage error 2 = Database connection error 3 = Operation failed (specific to command) 4 = Resource error (disk space, permissions, etc.) 5 = User cancelled 10+ = Command-specific error codes ``` ## Standard Output Conventions **Stdout**: Progress, results, structured data (e.g., JSON with `--json`) **Stderr**: Warnings, errors, diagnostics **Prefix convention**: - `pg_fossil:` for normal messages - `pg_fossil: WARNING:` for warnings - `pg_fossil: ERROR:` for errors **Enables filtering**: ```bash # Show only errors pg_fossil status 2>&1 | grep ERROR # Get JSON output without messages pg_fossil status --json 2>/dev/null | jq '.slot.lag_bytes' ``` ## Composability Examples ### Example 1: Monitor lag and alert ```bash #!/bin/bash lag=$(pg_fossil status --dbname=mydb --json | jq '.slot.lag_bytes') if [ "$lag" -gt 1073741824 ]; then # 1GB echo "CRITICAL: Replication lag is $(($lag / 1024 / 1024)) MB" | mail -s "Backup Alert" admin@example.com fi ``` ### Example 2: Stream changes to monitoring system ```bash pg_fossil collect --dbname=mydb --stdout | \ grep "DELETE FROM" | \ while read line; do echo "$line" | send-to-monitoring-system done ``` ### Example 3: Automated restore testing ```bash #!/bin/bash # Test restore every night to verify backups work pg_fossil restore --dbname=test_restore --backup-dir=/var/backups/mydb \ --yes --no-globals 2>&1 | tee /var/log/restore-test.log if [ $? -eq 0 ]; then echo "Restore test PASSED" | mail -s "Backup Verification" admin@example.com psql test_restore -c "DROP DATABASE test_restore;" else echo "Restore test FAILED" | mail -s "URGENT: Backup Failure" admin@example.com fi ``` ### Example 4: Backup rotation in cron ```cron # Daily full snapshot at 2 AM 0 2 * * * /usr/local/bin/pg_fossil snapshot --dbname=mydb --quiet # Weekly pruning on Sundays at 3 AM 0 3 * * 0 /usr/local/bin/pg_fossil prune --backup-dir=/var/backups/mydb --force --quiet # Hourly monitoring 0 * * * * /usr/local/bin/pg_fossil status --dbname=mydb --check || echo "Replication lag critical" | mail -s "Alert" admin@example.com ``` ### Example 5: Point-in-time recovery workflow ```bash # User reports: "I accidentally deleted customer records at 2:30 PM today" # 1. Find exact timestamp from application logs # 2. Restore to 5 minutes before the incident pg_fossil restore --dbname=mydb_recovery \ --backup-dir=/var/backups/mydb \ --until="2024-01-15 14:25:00" \ --verify # 3. Extract the deleted records psql mydb_recovery -c "COPY (SELECT * FROM customers WHERE deleted_at IS NULL) TO '/tmp/recovered_customers.csv' CSV HEADER;" # 4. Re-insert into production (or compare and merge manually) ``` ## Implementation Notes ### Language and Dependencies **Language**: Python 3.8+ recommended - Good PostgreSQL library support (`psycopg2` or `psycopg3`) - Easy subprocess management for `pg_recvlogical`, `pg_dump` - Rich CLI library ecosystem (`argparse` or `click`) - JSON handling built-in - Cross-platform compatibility **Dependencies**: - PostgreSQL client utilities (pg_recvlogical, pg_dump, pg_dumpall, psql) - Python 3.8+ - psycopg2 or psycopg3 - Standard library only (no exotic dependencies) ### Installation ```bash pip install pg-fossil # or apt install pg-fossil # Future package ``` ### Testing Strategy - Unit tests for each subcommand - Integration tests with PostgreSQL in Docker - End-to-end restore tests - Crash recovery tests - Signal handling tests ## Quick Start Example For users who want to try the system immediately: ```bash # 1. Setup (one time) pg_fossil init --dbname=mydb --backup-dir=/var/backups/mydb # 2. Start collecting changes (run once, continues in background) pg_fossil collect --dbname=mydb # 3. Take daily snapshots (add to cron) pg_fossil snapshot --dbname=mydb # 4. Monitor health pg_fossil status --dbname=mydb # 5. Restore when needed pg_fossil restore --dbname=mydb_restored --backup-dir=/var/backups/mydb ``` ## Unix Philosophy Alignment This CLI design adheres to the Unix philosophy: 1. **Do one thing well**: Each subcommand has a focused purpose 2. **Text-based**: All input/output in plain text (SQL, JSON, INI) 3. **Composable**: Commands can be piped and scripted 4. **Non-interactive by default**: Suitable for automation 5. **Clear separation**: stdout for data, stderr for diagnostics 6. **Orthogonal**: Subcommands don't overlap in functionality 7. **Discoverable**: Clear help, sensible defaults, progress messages 8. **Scriptable**: Exit codes, JSON output, quiet modes ## Design Rationale ### Why a unified CLI? The underlying PostgreSQL tools (`pg_recvlogical`, `pg_dump`, etc.) are powerful but require careful orchestration. A unified CLI: 1. **Reduces complexity**: Users learn one tool instead of five 2. **Ensures correctness**: Orchestration logic is tested and reliable 3. **Improves discoverability**: `pg_fossil --help` shows all capabilities 4. **Enables best practices**: Built-in monitoring, verification, rotation 5. **Maintains composability**: Each subcommand still usable independently ### Why text-based config? - Human-readable and editable - Version control friendly - Compatible with PostgreSQL conventions - No binary parsing dependencies ### Why both INI and JSON? - **INI** (`pg_fossil.conf`): Human-editable configuration - **JSON** (`metadata.json`): Machine-readable metadata for tooling ## Future Extensions Potential future enhancements (not in v1.0): 1. **Remote backups**: `pg_fossil collect --remote=s3://bucket/path` 2. **Encryption**: `pg_fossil snapshot --encrypt=gpg-key-id` 3. **Compression streaming**: `pg_fossil collect --compress=zstd` 4. **Web UI**: `pg_fossil serve --port=8080` (view status, trigger restores) 5. **Plugin system**: Allow custom hooks for monitoring, alerting 6. **Multi-database**: `pg_fossil init --all-databases` 7. **Differential snapshots**: Faster snapshots between full dumps ## References - [The Art of Unix Programming](http://www.catb.org/~esr/writings/taoup/) by Eric S. Raymond - [PostgreSQL Logical Replication Documentation](https://www.postgresql.org/docs/current/logical-replication.html) - [decoder_raw plugin](https://github.com/michaelpq/pg_plugins/tree/main/decoder_raw) - [12 Factor CLI Apps](https://medium.com/@jdxcode/12-factor-cli-apps-dd3c227a0e46)