# pg_scribe Tutorial Watch a database under load become plain SQL backups in real-time. Three terminals recommended. ## Prerequisites ```bash # Install the extension cd wal2sql && make && make install # Configure PostgreSQL for logical replication echo "wal_level = logical" >> ~/.pgenv/pgsql/data/postgresql.conf echo "max_replication_slots = 10" >> ~/.pgenv/pgsql/data/postgresql.conf echo "max_wal_senders = 10" >> ~/.pgenv/pgsql/data/postgresql.conf pgenv restart ``` ## Phase 1: Initialize with pgbench workload **Terminal 1** - Initialize database and backup system: ```bash # Create database and populate with pgbench tables createdb -U postgres demo pgbench -i -s 10 -U postgres demo # Creates 1M rows in pgbench_accounts # Try to initialize pg_scribe pg_scribe --init -d demo -f /tmp/demo_backup -U postgres ``` This will fail with a validation error: ``` ERROR: CRITICAL: The following tables lack adequate replica identity: ERROR: - public.pgbench_history ERROR: Fix: Add a primary key or set replica identity: ERROR: ALTER TABLE ADD PRIMARY KEY (id); ERROR: -- OR -- ERROR: ALTER TABLE
REPLICA IDENTITY FULL; ``` **Why?** The `pgbench_history` table is append-only (no primary key) and needs replica identity for logical replication. pg_scribe validates this upfront to prevent silent data loss. **Fix the issue:** ```bash # Set replica identity for the append-only history table psql -U postgres demo -c "ALTER TABLE pgbench_history REPLICA IDENTITY FULL;" # Now initialize pg_scribe successfully pg_scribe --init -d demo -f /tmp/demo_backup -U postgres # Start streaming changes to active.sql pg_scribe --start -d demo -f /tmp/demo_backup -U postgres ``` Leave Terminal 1 streaming. Every database change now becomes SQL. **Terminal 2** - Generate realistic load: ```bash # Run pgbench workload in background: 5 clients, 10 minutes # This will run continuously throughout the tutorial pgbench -c 5 -T 600 -U postgres demo & ``` Leave this running in the background. It will generate continuous load for the rest of the tutorial. **Terminal 3** - Watch the backup grow: ```bash # Check replication status (run this a few times while pgbench is running) pg_scribe --status -d demo -f /tmp/demo_backup -U postgres ``` You'll see actual replication metrics updating in real-time: LSN positions, lag bytes, transaction counts, and file sizes growing! ```bash # Peek at the SQL being captured tail -n 30 /tmp/demo_backup/chain-*/active.sql ``` You'll see thousands of INSERT/UPDATE transactions accumulating as plain SQL! ## Phase 2: Rotate differentials under load (pgbench is still running in the background from Phase 1) **Terminal 3** - Rotate differential while load continues: ```bash pg_scribe --rotate-diff -f /tmp/demo_backup ``` **Terminal 3** - Verify the rotation: ```bash # Check status - you'll see the new active.sql starting fresh pg_scribe --status -d demo -f /tmp/demo_backup -U postgres # Or list the files to see the sealed differential with timestamp ls -lht /tmp/demo_backup/chain-*/ ``` **What happened:** The old `active.sql` was sealed with a timestamp (e.g., `diff-YYYYMMDD-HHMMSS.sql`). New transactions now write to a fresh `active.sql`. The rotation happened instantly with zero transaction loss. Replication never stopped. ## Phase 3: Chain transfer under load (zero-downtime operation) Chains let you periodically create fresh compressed base backups (e.g., weekly). This keeps old chains archive-ready while new data streams to the new chain. (pgbench is still running in the background from Phase 1) **Terminal 3** - Create new chain and transfer streaming to it: ```bash # Stops old streaming, creates new compressed base backup, starts streaming to new chain pg_scribe --new-chain --start -d demo -f /tmp/demo_backup -Z gzip:6 -U postgres ``` This automatically stops the old stream in Terminal 1. Terminal 3 now holds the new streaming process. **Terminal 1** - Verify new chain is active (Terminal 1 is now free): ```bash pg_scribe --status -d demo -f /tmp/demo_backup -U postgres ``` The status output will show which chain is active and confirm streaming is working. ```bash # You can also list directory to see both chains ls -lh /tmp/demo_backup/ # chain-YYYYMMDD-HHMMSS/ (old chain, sealed) # chain-YYYYMMDD-HHMMSS/ (new chain, active) ``` The old chain's final differential was sealed. The new chain has a fresh compressed base backup. No transactions were lost during the transfer. ## Phase 4: Catching up after pause (shows resilience) **Terminal 1** - Stop streaming (this stops Terminal 3's stream): ```bash pg_scribe --stop -f /tmp/demo_backup ``` **Terminal 2** - Stop background pgbench (if still running) and generate load while streaming is stopped: ```bash # Stop the background pgbench if it's still running pkill pgbench # Generate new load while streaming is stopped pgbench -c 5 -t 1000 -U postgres demo ``` **Terminal 1** - Check status (streaming stopped, WAL accumulating): ```bash pg_scribe --status -d demo -f /tmp/demo_backup -U postgres ``` You'll see something like: "Replication slot is 45 MB behind current WAL position" The replication slot preserved the WAL even though streaming stopped! **Terminal 1** - Restart streaming and watch it catch up: ```bash pg_scribe --start -d demo -f /tmp/demo_backup -U postgres ``` Leave Terminal 1 streaming. **Terminal 3** - Monitor catch-up progress: ```bash watch -n 1 'pg_scribe --status -d demo -f /tmp/demo_backup -U postgres' ``` Watch the lag decrease as it replays accumulated WAL. Ctrl+C when caught up. ## Phase 5: Restore and verify **Terminal 1** - Seal final differential, then stop streaming: ```bash pg_scribe --rotate-diff -f /tmp/demo_backup pg_scribe --stop -f /tmp/demo_backup ``` **Terminal 1** - Restore to new database: ```bash pg_scribe --restore -d demo_restored -f /tmp/demo_backup -C -U postgres ``` **Terminal 1** - Verify data matches: ```bash # Compare row counts psql -U postgres demo -c "SELECT count(*) FROM pgbench_accounts;" psql -U postgres demo_restored -c "SELECT count(*) FROM pgbench_accounts;" # Compare transaction history (shows all transactions were captured) psql -U postgres demo -c "SELECT count(*) FROM pgbench_history;" psql -U postgres demo_restored -c "SELECT count(*) FROM pgbench_history;" # Compare balances (proves data integrity) psql -U postgres demo -c "SELECT sum(abalance) FROM pgbench_accounts;" psql -U postgres demo_restored -c "SELECT sum(abalance) FROM pgbench_accounts;" ``` Perfect match! ## Cleanup ```bash # Stop pgbench if still running pkill pgbench # Clean up databases and backup files dropdb -U postgres demo_restored dropdb -U postgres demo rm -rf /tmp/demo_backup ``` ## What just happened - `pgbench -i` created realistic TPC-B-like tables with 1M rows - `pg_scribe --init` created the wal2sql extension, replication slot, and base backup - `pg_scribe --start` streamed thousands of transactions as plain SQL - Differential rotation worked safely during active writes - Chain transfer happened with zero transaction loss - Replication slot preserved WAL during streaming pause - System caught up gracefully from lag - `--restore` perfectly reconstructed the database - All backups are plain SQL readable with `less` ## Production usage ```bash # Initialize once pg_scribe --init -d mydb -f /backups/mydb -U postgres # Run continuously (use systemd or supervisor) pg_scribe --start -d mydb -f /backups/mydb -U postgres # Rotate differentials daily (cron job) 0 0 * * * pg_scribe --rotate-diff -f /backups/mydb # Create new chain weekly with compressed base backup # This automatically stops old streaming and starts streaming to new chain 0 0 * * 0 pg_scribe --new-chain --start -d mydb -f /backups/mydb -Z gzip:6 -U postgres # Monitor replication lag (nagios/prometheus) pg_scribe --status -d mydb -f /backups/mydb -U postgres ```