# pg_scribe Tutorial
Watch a database under load become plain SQL backups in real-time. Three terminals recommended.
## Prerequisites
```bash
# Install the extension
cd wal2sql && make && make install
# Configure PostgreSQL for logical replication
echo "wal_level = logical" >> ~/.pgenv/pgsql/data/postgresql.conf
echo "max_replication_slots = 10" >> ~/.pgenv/pgsql/data/postgresql.conf
echo "max_wal_senders = 10" >> ~/.pgenv/pgsql/data/postgresql.conf
pgenv restart
```
## Phase 1: Initialize with pgbench workload
**Terminal 1** - Initialize database and backup system:
```bash
# Create database and populate with pgbench tables
createdb -U postgres demo
pgbench -i -s 10 -U postgres demo # Creates 1M rows in pgbench_accounts
# Try to initialize pg_scribe
pg_scribe --init -d demo -f /tmp/demo_backup -U postgres
```
This will fail with a validation error:
```
ERROR: CRITICAL: The following tables lack adequate replica identity:
ERROR: - public.pgbench_history
ERROR: Fix: Add a primary key or set replica identity:
ERROR: ALTER TABLE
ADD PRIMARY KEY (id);
ERROR: -- OR --
ERROR: ALTER TABLE REPLICA IDENTITY FULL;
```
**Why?** The `pgbench_history` table is append-only (no primary key) and needs replica identity for logical replication. pg_scribe validates this upfront to prevent silent data loss.
**Fix the issue:**
```bash
# Set replica identity for the append-only history table
psql -U postgres demo -c "ALTER TABLE pgbench_history REPLICA IDENTITY FULL;"
# Now initialize pg_scribe successfully
pg_scribe --init -d demo -f /tmp/demo_backup -U postgres
# Start streaming changes to active.sql
pg_scribe --start -d demo -f /tmp/demo_backup -U postgres
```
Leave Terminal 1 streaming. Every database change now becomes SQL.
**Terminal 2** - Generate realistic load:
```bash
# Run pgbench workload: 5 clients, 30 seconds
pgbench -c 5 -T 30 -U postgres demo
```
**Terminal 3** - Watch the backup file grow:
```bash
# Check file size (run this a few times while pgbench is running)
ls -lh /tmp/demo_backup/chain-*/active.sql
# Peek at the SQL being captured
tail -n 30 /tmp/demo_backup/chain-*/active.sql
```
You'll see thousands of INSERT/UPDATE transactions accumulating!
**Terminal 3** - Check replication status:
```bash
pg_scribe --status -d demo -f /tmp/demo_backup -U postgres
```
You'll see actual replication metrics: LSN positions, lag bytes, transaction counts.
## Phase 2: Rotate differentials under load
**Terminal 2** - Start longer background workload:
```bash
# Run for 2 minutes in background
pgbench -c 5 -T 120 -U postgres demo &
```
**Terminal 3** - Watch status update continuously:
```bash
# Refresh status every 2 seconds
watch -n 2 'pg_scribe --status -d demo -f /tmp/demo_backup -U postgres'
```
See transactions flowing through, backup file growing.
**Terminal 3** - Rotate differential while load continues (Ctrl+C to exit watch, then):
```bash
pg_scribe --rotate-diff -f /tmp/demo_backup
```
**Terminal 3** - See the rotation happen:
```bash
# Old active.sql sealed with timestamp, new active.sql starts fresh
ls -lht /tmp/demo_backup/chain-*/
```
The sealed differential is an immutable point-in-time snapshot. New writes go to the new active.sql.
## Phase 3: Chain transfer under load (zero-downtime operation)
**Terminal 2** - Keep load running:
```bash
pgbench -c 5 -T 60 -U postgres demo &
```
**Terminal 1** - Create new chain and transfer streaming to it:
```bash
# Stops old streaming, creates new compressed base backup, starts streaming to new chain
pg_scribe --new-chain --start -d demo -f /tmp/demo_backup -Z gzip:6 -U postgres
```
Leave Terminal 1 streaming on the new chain.
**Terminal 3** - Verify new chain is active:
```bash
pg_scribe --status -d demo -f /tmp/demo_backup -U postgres
```
**Terminal 3** - See two chains:
```bash
ls -lh /tmp/demo_backup/
# chain-YYYYMMDD-HHMMSS/ (old chain, sealed)
# chain-YYYYMMDD-HHMMSS/ (new chain, active)
```
No transactions lost during the transfer! The old chain's final differential was sealed, new chain has fresh compressed base backup.
## Phase 4: Catching up after pause (shows resilience)
**Terminal 1** - Stop streaming:
```bash
pg_scribe --stop -f /tmp/demo_backup
```
**Terminal 2** - Generate load while streaming is stopped:
```bash
pgbench -c 5 -t 1000 -U postgres demo
```
**Terminal 1** - Check status (streaming stopped, WAL accumulating):
```bash
pg_scribe --status -d demo -f /tmp/demo_backup -U postgres
```
You'll see something like: "Replication slot is 45 MB behind current WAL position"
The replication slot preserved the WAL even though streaming stopped!
**Terminal 1** - Restart streaming and watch it catch up:
```bash
pg_scribe --start -d demo -f /tmp/demo_backup -U postgres
```
Leave Terminal 1 streaming.
**Terminal 3** - Monitor catch-up progress:
```bash
watch -n 1 'pg_scribe --status -d demo -f /tmp/demo_backup -U postgres'
```
Watch the lag decrease as it replays accumulated WAL. Ctrl+C when caught up.
## Phase 5: Restore and verify
**Terminal 1** - Stop streaming and seal final differential:
```bash
pg_scribe --stop -f /tmp/demo_backup
pg_scribe --rotate-diff -f /tmp/demo_backup
```
**Terminal 1** - Restore to new database:
```bash
pg_scribe --restore -d demo_restored -f /tmp/demo_backup -C -U postgres
```
**Terminal 1** - Verify data matches:
```bash
# Compare row counts
psql -U postgres demo -c "SELECT count(*) FROM pgbench_accounts;"
psql -U postgres demo_restored -c "SELECT count(*) FROM pgbench_accounts;"
# Compare transaction history (shows all transactions were captured)
psql -U postgres demo -c "SELECT count(*) FROM pgbench_history;"
psql -U postgres demo_restored -c "SELECT count(*) FROM pgbench_history;"
# Compare balances (proves data integrity)
psql -U postgres demo -c "SELECT sum(abalance) FROM pgbench_accounts;"
psql -U postgres demo_restored -c "SELECT sum(abalance) FROM pgbench_accounts;"
```
Perfect match!
## Cleanup
```bash
dropdb -U postgres demo_restored
dropdb -U postgres demo
rm -rf /tmp/demo_backup
```
## What just happened
- `pgbench -i` created realistic TPC-B-like tables with 1M rows
- `pg_scribe --init` created the wal2sql extension, replication slot, and base backup
- `pg_scribe --start` streamed thousands of transactions as plain SQL
- Differential rotation worked safely during active writes
- Chain transfer happened with zero transaction loss
- Replication slot preserved WAL during streaming pause
- System caught up gracefully from lag
- `--restore` perfectly reconstructed the database
- All backups are plain SQL readable with `less`
## Production usage
```bash
# Initialize once
pg_scribe --init -d mydb -f /backups/mydb -U postgres
# Run continuously (use systemd or supervisor)
pg_scribe --start -d mydb -f /backups/mydb -U postgres
# Rotate differentials daily (cron job)
0 0 * * * pg_scribe --rotate-diff -f /backups/mydb
# Create new chain weekly with compressed base backup
# This automatically stops old streaming and starts streaming to new chain
0 0 * * 0 pg_scribe --new-chain --start -d mydb -f /backups/mydb -Z gzip:6 -U postgres
# Monitor replication lag (nagios/prometheus)
pg_scribe --status -d mydb -f /backups/mydb -U postgres
```