# pg_scribe Tutorial
Watch a database under load become plain SQL backups in real-time. Three terminals recommended.
## Prerequisites
```bash
# Install the extension
cd wal2sql && make && make install
# Configure PostgreSQL for logical replication
echo "wal_level = logical" >> ~/.pgenv/pgsql/data/postgresql.conf
echo "max_replication_slots = 10" >> ~/.pgenv/pgsql/data/postgresql.conf
echo "max_wal_senders = 10" >> ~/.pgenv/pgsql/data/postgresql.conf
pgenv restart
```
## Phase 1: Initialize with pgbench workload
**Terminal 1** - Initialize database and backup system:
```bash
# Create database and populate with pgbench tables
createdb -U postgres demo
pgbench -i -s 10 -U postgres demo # Creates 1M rows in pgbench_accounts
# Try to initialize pg_scribe
pg_scribe --init -d demo -f /tmp/demo_backup -U postgres
```
This will fail with a validation error:
```
ERROR: CRITICAL: The following tables lack adequate replica identity:
ERROR: - public.pgbench_history
ERROR: Fix: Add a primary key or set replica identity:
ERROR: ALTER TABLE
ADD PRIMARY KEY (id);
ERROR: -- OR --
ERROR: ALTER TABLE REPLICA IDENTITY FULL;
```
**Why?** The `pgbench_history` table is append-only (no primary key) and needs replica identity for logical replication. pg_scribe validates this upfront to prevent silent data loss.
**Fix the issue:**
```bash
# Set replica identity for the append-only history table
psql -U postgres demo -c "ALTER TABLE pgbench_history REPLICA IDENTITY FULL;"
# Now initialize pg_scribe successfully
pg_scribe --init -d demo -f /tmp/demo_backup -U postgres
# Start streaming changes to active.sql
pg_scribe --start -d demo -f /tmp/demo_backup -U postgres
```
Leave Terminal 1 streaming. Every database change now becomes SQL.
**Terminal 2** - Generate realistic load:
```bash
# Run pgbench workload in background: 5 clients, 10 minutes
# This will run continuously throughout the tutorial
pgbench -c 5 -T 600 -U postgres demo &
```
Leave this running in the background. It will generate continuous load for the rest of the tutorial.
**Terminal 3** - Watch the backup grow:
```bash
# Check replication status (run this a few times while pgbench is running)
pg_scribe --status -d demo -f /tmp/demo_backup -U postgres
```
You'll see actual replication metrics updating in real-time: LSN positions, lag bytes, transaction counts, and file sizes growing!
```bash
# Peek at the SQL being captured
tail -n 30 /tmp/demo_backup/chain-*/active.sql
```
You'll see thousands of INSERT/UPDATE transactions accumulating as plain SQL!
## Phase 2: Rotate differentials under load
(pgbench is still running in the background from Phase 1)
**Terminal 3** - Rotate differential while load continues:
```bash
pg_scribe --rotate-diff -f /tmp/demo_backup
```
**Terminal 3** - Verify the rotation:
```bash
# Check status - you'll see the new active.sql starting fresh
pg_scribe --status -d demo -f /tmp/demo_backup -U postgres
# Or list the files to see the sealed differential with timestamp
ls -lht /tmp/demo_backup/chain-*/
```
**What happened:** The old `active.sql` was sealed with a timestamp (e.g., `diff-YYYYMMDD-HHMMSS.sql`). New transactions now write to a fresh `active.sql`. The rotation happened instantly with zero transaction loss. Replication never stopped.
## Phase 3: Chain transfer under load (zero-downtime operation)
Chains let you periodically create fresh compressed base backups (e.g., weekly). This keeps old chains archive-ready while new data streams to the new chain.
(pgbench is still running in the background from Phase 1)
**Terminal 3** - Create new chain and transfer streaming to it:
```bash
# Stops old streaming, creates new compressed base backup, starts streaming to new chain
pg_scribe --new-chain --start -d demo -f /tmp/demo_backup -Z gzip:6 -U postgres
```
This automatically stops the old stream in Terminal 1. Terminal 3 now holds the new streaming process.
**Terminal 1** - Verify new chain is active (Terminal 1 is now free):
```bash
pg_scribe --status -d demo -f /tmp/demo_backup -U postgres
```
The status output will show which chain is active and confirm streaming is working.
```bash
# You can also list directory to see both chains
ls -lh /tmp/demo_backup/
# chain-YYYYMMDD-HHMMSS/ (old chain, sealed)
# chain-YYYYMMDD-HHMMSS/ (new chain, active)
```
The old chain's final differential was sealed. The new chain has a fresh compressed base backup. No transactions were lost during the transfer.
## Phase 4: Catching up after pause (shows resilience)
**Terminal 1** - Stop streaming (this stops Terminal 3's stream):
```bash
pg_scribe --stop -f /tmp/demo_backup
```
**Terminal 2** - Stop background pgbench (if still running) and generate load while streaming is stopped:
```bash
# Stop the background pgbench if it's still running
pkill pgbench
# Generate new load while streaming is stopped
pgbench -c 5 -t 1000 -U postgres demo
```
**Terminal 1** - Check status (streaming stopped, WAL accumulating):
```bash
pg_scribe --status -d demo -f /tmp/demo_backup -U postgres
```
You'll see something like: "Replication slot is 45 MB behind current WAL position"
The replication slot preserved the WAL even though streaming stopped!
**Terminal 1** - Restart streaming and watch it catch up:
```bash
pg_scribe --start -d demo -f /tmp/demo_backup -U postgres
```
Leave Terminal 1 streaming.
**Terminal 3** - Monitor catch-up progress:
```bash
watch -n 1 'pg_scribe --status -d demo -f /tmp/demo_backup -U postgres'
```
Watch the lag decrease as it replays accumulated WAL. Ctrl+C when caught up.
## Phase 5: Restore and verify
**Terminal 1** - Seal final differential, then stop streaming:
```bash
pg_scribe --rotate-diff -f /tmp/demo_backup
pg_scribe --stop -f /tmp/demo_backup
```
**Terminal 1** - Restore to new database:
```bash
pg_scribe --restore -d demo_restored -f /tmp/demo_backup -C -U postgres
```
**Terminal 1** - Verify data matches:
```bash
# Compare row counts
psql -U postgres demo -c "SELECT count(*) FROM pgbench_accounts;"
psql -U postgres demo_restored -c "SELECT count(*) FROM pgbench_accounts;"
# Compare transaction history (shows all transactions were captured)
psql -U postgres demo -c "SELECT count(*) FROM pgbench_history;"
psql -U postgres demo_restored -c "SELECT count(*) FROM pgbench_history;"
# Compare balances (proves data integrity)
psql -U postgres demo -c "SELECT sum(abalance) FROM pgbench_accounts;"
psql -U postgres demo_restored -c "SELECT sum(abalance) FROM pgbench_accounts;"
```
Perfect match!
## Cleanup
```bash
# Stop pgbench if still running
pkill pgbench
# Clean up databases and backup files
dropdb -U postgres demo_restored
dropdb -U postgres demo
rm -rf /tmp/demo_backup
```
## What just happened
- `pgbench -i` created realistic TPC-B-like tables with 1M rows
- `pg_scribe --init` created the wal2sql extension, replication slot, and base backup
- `pg_scribe --start` streamed thousands of transactions as plain SQL
- Differential rotation worked safely during active writes
- Chain transfer happened with zero transaction loss
- Replication slot preserved WAL during streaming pause
- System caught up gracefully from lag
- `--restore` perfectly reconstructed the database
- All backups are plain SQL readable with `less`
## Production usage
```bash
# Initialize once
pg_scribe --init -d mydb -f /backups/mydb -U postgres
# Run continuously (use systemd or supervisor)
pg_scribe --start -d mydb -f /backups/mydb -U postgres
# Rotate differentials daily (cron job)
0 0 * * * pg_scribe --rotate-diff -f /backups/mydb
# Create new chain weekly with compressed base backup
# This automatically stops old streaming and starts streaming to new chain
0 0 * * 0 pg_scribe --new-chain --start -d mydb -f /backups/mydb -Z gzip:6 -U postgres
# Monitor replication lag (nagios/prometheus)
pg_scribe --status -d mydb -f /backups/mydb -U postgres
```