# Migration from Old CLI to New Chain-Based CLI This document describes the changes needed to migrate from the preliminary CLI implementation to the new chain-based design. ## Overview of Changes The new CLI design introduces **chains** (base backup + differentials) as the fundamental organizational unit, with these key changes: 1. **Chain-based file structure**: Backup directory contains multiple `chain-{TIMESTAMP}/` subdirectories 2. **Single shared replication slot**: All chains share one slot per backup directory 3. **Auto-detect latest chain**: Commands operate on newest chain automatically 4. **Accept overlap**: Chain transitions prioritize zero-downtime over perfect boundaries 5. **Single top-level pidfile**: One `.pg_scribe.pid` per backup directory 6. **New commands**: `--rotate-diff` and `--new-chain` replace some old workflows ## High-Level Command Mapping | Old CLI | New CLI | Notes | |---------|---------|-------| | `--init -f /backups/mydb` | `--init --backup-dir /backups/mydb` | Creates slot + first chain | | `--start -f /backups/mydb/inc.sql` | `--start --backup-dir /backups/mydb` | Auto-streams to latest chain | | *(no equivalent)* | `--rotate-diff --backup-dir /backups/mydb` | **NEW**: Rotate differential file | | `--full-backup -f /backups/mydb` | `--new-chain --backup-dir /backups/mydb` | Creates new chain with base | | `--restore -f /backups/mydb` | `--restore --backup-dir /backups/mydb` | Chain-aware restore | | `--status -f /backups/mydb` | `--status --backup-dir /backups/mydb` | Shows chain inventory | ## Detailed Changes by Command ### `--init` Changes **Old implementation:** ```bash pg_scribe --init -d mydb -f /backups/mydb -S myslot ``` **Old behavior:** - `-f` specified a **directory** - Created: `base.sql`, `globals.sql`, `metadata.json` directly in `/backups/mydb/` - Required empty directory **New implementation:** ```bash pg_scribe --init -d mydb --backup-dir /backups/mydb --slot myslot ``` **New behavior:** - `--backup-dir` replaces `-f` (more explicit) - `--slot` replaces `-S` (more readable, though `-S` can remain as alias) - Creates **first chain** in subdirectory: `/backups/mydb/chain-{TIMESTAMP}/` - Chain contains: `base.sql`, `globals.sql`, `metadata.json` - Creates top-level `.pg_scribe.pid` placeholder (empty until --start runs) **Key implementation changes:** 1. Generate chain ID: `CHAIN_ID=$(date -u +%Y%m%dT%H%M%SZ)` 2. Create chain directory: `mkdir -p "$BACKUP_DIR/chain-$CHAIN_ID"` 3. Write backups to chain directory: `pg_dump -f "$BACKUP_DIR/chain-$CHAIN_ID/base.sql"` 4. Output chain ID to user so they know which chain was created **Output changes:** ``` Old: ✓ Base backup created: /backups/mydb/base.sql New: ✓ Initial chain created: 20231215T120000Z Location: /backups/mydb/chain-20231215T120000Z/ ``` --- ### `--start` Changes **Old implementation:** ```bash pg_scribe --start -d mydb -f /backups/mydb/incremental.sql -S myslot ``` **Old behavior:** - `-f` specified exact **file path** for output - User controlled filename explicitly - Wrote to single ongoing file **New implementation:** ```bash pg_scribe --start -d mydb --backup-dir /backups/mydb --slot myslot ``` **New behavior:** - No explicit filename - tool auto-detects latest chain - Always writes to `active.sql` in the latest chain directory - Only one `--start` process per backup directory **Key implementation changes:** 1. **Find latest chain:** ```bash LATEST_CHAIN=$(ls -1d "$BACKUP_DIR"/chain-* 2>/dev/null | sort | tail -1) if [ -z "$LATEST_CHAIN" ]; then echo "Error: No chains found. Run --init first." exit 1 fi CHAIN_ID=$(basename "$LATEST_CHAIN" | sed 's/^chain-//') ``` 2. **Check for existing process:** ```bash PIDFILE="$BACKUP_DIR/.pg_scribe.pid" if [ -f "$PIDFILE" ]; then PID=$(cat "$PIDFILE") if kill -0 "$PID" 2>/dev/null; then echo "Error: Already streaming to $BACKUP_DIR (PID $PID)" exit 1 fi fi ``` 3. **Write pidfile:** ```bash echo $$ > "$PIDFILE" ``` 4. **Exec to pg_recvlogical:** ```bash OUTPUT_FILE="$BACKUP_DIR/chain-$CHAIN_ID/active.sql" exec pg_recvlogical -d "$DBNAME" -U "$USER" --slot="$SLOT" \ --start -f "$OUTPUT_FILE" --option include_transaction=on ``` **Output changes:** ``` Old: ✓ Writing to: /backups/mydb/incremental.sql New: ✓ Found latest chain: 20231215T120000Z ✓ Writing to: /backups/mydb/chain-20231215T120000Z/active.sql ``` --- ### `--rotate-diff` (NEW Command) **No old equivalent** - this is entirely new functionality. **Purpose**: Rotate the differential file within the active chain (like log rotation). **Usage:** ```bash pg_scribe --rotate-diff --backup-dir /backups/mydb ``` **Implementation steps:** 1. **Find pidfile and validate:** ```bash PIDFILE="$BACKUP_DIR/.pg_scribe.pid" if [ ! -f "$PIDFILE" ]; then echo "Error: No active streaming process found" exit 1 fi PID=$(cat "$PIDFILE") if ! kill -0 "$PID" 2>/dev/null; then echo "Error: Stale pidfile (process $PID not running)" exit 1 fi # Verify it's actually pg_recvlogical PROC_NAME=$(ps -p "$PID" -o comm=) if [ "$PROC_NAME" != "pg_recvlogical" ]; then echo "Error: PID $PID is not pg_recvlogical" exit 1 fi ``` 2. **Find active chain:** ```bash ACTIVE_FILE=$(find "$BACKUP_DIR"/chain-*/active.sql 2>/dev/null | head -1) if [ -z "$ACTIVE_FILE" ]; then echo "Error: No active.sql found" exit 1 fi CHAIN_DIR=$(dirname "$ACTIVE_FILE") CHAIN_ID=$(basename "$CHAIN_DIR" | sed 's/^chain-//') ``` 3. **Generate new differential name:** ```bash DIFF_TIMESTAMP=$(date -u +%Y%m%dT%H%M%SZ) SEALED_FILE="$CHAIN_DIR/diff-$DIFF_TIMESTAMP.sql" ``` 4. **Atomic rotation:** ```bash # Rename active → diff (pg_recvlogical still has file open) mv "$CHAIN_DIR/active.sql" "$SEALED_FILE" # Send SIGHUP to trigger file rotation kill -HUP "$PID" # Wait for new active.sql to appear (timeout after 30 seconds) TIMEOUT=30 while [ $TIMEOUT -gt 0 ]; do if [ -f "$CHAIN_DIR/active.sql" ]; then # Verify it's actually being written sleep 1 if [ -s "$CHAIN_DIR/active.sql" ]; then echo "✓ Rotated differential: diff-$DIFF_TIMESTAMP.sql" echo "✓ New differential started" exit 0 fi fi sleep 1 TIMEOUT=$((TIMEOUT - 1)) done echo "Error: Timeout waiting for new active.sql" exit 1 ``` **Output:** ``` ✓ Found active chain: 20231215T120000Z ✓ Rotated differential: diff-20231216T083000Z.sql (2.1 GB) ✓ New differential started ``` --- ### `--new-chain` Changes **Old equivalent:** `--full-backup` **Old implementation:** ```bash pg_scribe --full-backup -d mydb -f /backups/mydb --compress=gzip ``` **Old behavior:** - Took standalone full backup - Wrote to same directory as base backups - No awareness of ongoing streaming **New implementation:** ```bash pg_scribe --new-chain -d mydb --backup-dir /backups/mydb --compress=gzip ``` **New behavior:** - Creates new chain with fresh base backup - **Automatically transitions** streaming from old chain to new chain - Handles overlap (base backup taken while streaming continues) **Key implementation changes:** 1. **Generate new chain ID and create directory:** ```bash NEW_CHAIN_ID=$(date -u +%Y%m%dT%H%M%SZ) NEW_CHAIN_DIR="$BACKUP_DIR/chain-$NEW_CHAIN_ID" mkdir -p "$NEW_CHAIN_DIR" ``` 2. **Take new base backup (while streaming continues to old chain):** ```bash echo "Taking new base backup..." pg_dump -d "$DBNAME" -f "$NEW_CHAIN_DIR/base.sql" pg_dumpall --globals-only -f "$NEW_CHAIN_DIR/globals.sql" # Generate metadata.json # Apply compression if requested if [ -n "$COMPRESS" ]; then # Compress base.sql, globals.sql fi echo "✓ Base backup complete: chain-$NEW_CHAIN_ID" ``` 3. **Transition streaming process:** ```bash PIDFILE="$BACKUP_DIR/.pg_scribe.pid" # Check if streaming is active if [ ! -f "$PIDFILE" ]; then echo "Note: No active streaming. Start with --start" exit 0 fi OLD_PID=$(cat "$PIDFILE") if ! kill -0 "$OLD_PID" 2>/dev/null; then echo "Note: Stale pidfile. Start streaming with --start" rm "$PIDFILE" exit 0 fi # Find old chain OLD_ACTIVE=$(find "$BACKUP_DIR"/chain-*/active.sql 2>/dev/null | head -1) OLD_CHAIN_DIR=$(dirname "$OLD_ACTIVE") OLD_CHAIN_ID=$(basename "$OLD_CHAIN_DIR" | sed 's/^chain-//') echo "Transitioning from chain-$OLD_CHAIN_ID..." # Seal the old chain DIFF_TIMESTAMP=$(date -u +%Y%m%dT%H%M%SZ) mv "$OLD_ACTIVE" "$OLD_CHAIN_DIR/diff-$DIFF_TIMESTAMP.sql" # Send SIGHUP and wait for confirmation kill -HUP "$OLD_PID" sleep 2 # Brief wait for new active.sql in old chain # Now terminate the old process kill -TERM "$OLD_PID" wait "$OLD_PID" 2>/dev/null echo "✓ Sealed old chain: diff-$DIFF_TIMESTAMP.sql" ``` 4. **Start streaming to new chain:** ```bash # This would typically be done by --start, but we can do it inline: echo "Starting streaming to new chain..." echo $$ > "$PIDFILE" exec pg_recvlogical -d "$DBNAME" --slot="$SLOT" --start \ -f "$NEW_CHAIN_DIR/active.sql" --option include_transaction=on ``` **Output:** ``` Taking new base backup... ✓ Base backup complete: chain-20231222T120000Z (12.3 GB) Transitioning from chain-20231215T120000Z... ✓ Sealed old chain: diff-20231222T115900Z.sql ✓ Started streaming to new chain: 20231222T120000Z ``` **Note**: The transition logic is complex. Consider whether `--new-chain` should: - **Option A**: Just create the chain, user manually runs `--start` after stopping old process - **Option B**: Fully automate transition (as described above) For POC/initial implementation, **Option A is simpler**. Option B can be added later. --- ### `--restore` Changes **Old implementation:** ```bash pg_scribe --restore -f /backups/mydb -d targetdb --create ``` **Old behavior:** - Found `base.sql` and `incremental.sql` (or similar) in flat directory - Applied them in order **New implementation:** ```bash pg_scribe --restore --backup-dir /backups/mydb --chain-id 20231215T120000Z -d targetdb --create ``` **New behavior:** - Operates on specific chain (or latest if `--chain-id` not specified) - Restores from chain's `base.sql` + all `diff-*.sql` files - Ignores `active.sql` by default (incomplete) **Key implementation changes:** 1. **Determine target chain:** ```bash if [ -n "$CHAIN_ID" ]; then CHAIN_DIR="$BACKUP_DIR/chain-$CHAIN_ID" if [ ! -d "$CHAIN_DIR" ]; then echo "Error: Chain not found: $CHAIN_ID" exit 1 fi else # Use latest chain CHAIN_DIR=$(ls -1d "$BACKUP_DIR"/chain-* 2>/dev/null | sort | tail -1) if [ -z "$CHAIN_DIR" ]; then echo "Error: No chains found in $BACKUP_DIR" exit 1 fi CHAIN_ID=$(basename "$CHAIN_DIR" | sed 's/^chain-//') fi echo "Restoring from chain: $CHAIN_ID" ``` 2. **Find all differential files:** ```bash BASE_BACKUP="$CHAIN_DIR/base.sql" GLOBALS_BACKUP="$CHAIN_DIR/globals.sql" # Find all sealed differentials (sorted by timestamp) DIFFERENTIALS=$(ls -1 "$CHAIN_DIR"/diff-*.sql 2>/dev/null | sort) if [ ! -f "$BASE_BACKUP" ]; then echo "Error: Base backup not found in chain $CHAIN_ID" exit 1 fi ``` 3. **Apply backups in order:** ```bash # Create database if requested if [ "$CREATE_DB" = "true" ]; then createdb "$TARGET_DB" fi # Restore globals echo "✓ Restoring globals..." psql -d postgres -f "$GLOBALS_BACKUP" # Restore base echo "✓ Restoring base backup..." psql -d "$TARGET_DB" -f "$BASE_BACKUP" # Apply differentials DIFF_COUNT=0 for DIFF in $DIFFERENTIALS; do echo " - Applying $(basename "$DIFF")..." psql -d "$TARGET_DB" -f "$DIFF" DIFF_COUNT=$((DIFF_COUNT + 1)) done echo "✓ Applied $DIFF_COUNT differentials" # Sync sequences if [ "$SYNC_SEQUENCES" = "true" ]; then # Implementation of sequence sync echo "✓ Synchronized sequences" fi ``` **New option: `--include-active`** ```bash --include-active # Also apply active.sql (incomplete, risky) ``` If specified, include `active.sql` in the restore: ```bash if [ "$INCLUDE_ACTIVE" = "true" ] && [ -f "$CHAIN_DIR/active.sql" ]; then echo "⚠ Applying incomplete active.sql..." psql -d "$TARGET_DB" -f "$CHAIN_DIR/active.sql" fi ``` **Output changes:** ``` Old: Restoring from /backups/mydb/base.sql New: Restoring from chain: 20231215T120000Z ✓ Restored base backup (1,234,567 rows) ✓ Applied 15 differentials - diff-20231216T083000Z.sql - diff-20231217T083000Z.sql ... ``` --- ### `--status` Changes **Old implementation:** ```bash pg_scribe --status -d mydb -S myslot -f /backups/mydb ``` **Old behavior:** - Showed replication slot status - Basic backup directory info **New implementation:** ```bash pg_scribe --status -d mydb --slot myslot --backup-dir /backups/mydb ``` **New behavior:** - Shows replication slot status (unchanged) - **Chain inventory**: Lists all chains with details - Shows which chain is actively streaming **Key implementation changes:** 1. **Query replication slot (unchanged):** ```bash psql -d "$DBNAME" -t -A -c " SELECT active, confirmed_flush_lsn, restart_lsn FROM pg_replication_slots WHERE slot_name = '$SLOT' " ``` 2. **Analyze backup directory:** ```bash # Find all chains CHAINS=$(ls -1d "$BACKUP_DIR"/chain-* 2>/dev/null | sort) echo "Backup Directory: $BACKUP_DIR" echo "" echo "Chains:" # Check which chain is active PIDFILE="$BACKUP_DIR/.pg_scribe.pid" ACTIVE_CHAIN="" if [ -f "$PIDFILE" ]; then PID=$(cat "$PIDFILE") if kill -0 "$PID" 2>/dev/null; then # Find which chain has active.sql ACTIVE_FILE=$(find "$BACKUP_DIR"/chain-*/active.sql 2>/dev/null | head -1) if [ -n "$ACTIVE_FILE" ]; then ACTIVE_CHAIN=$(basename "$(dirname "$ACTIVE_FILE")" | sed 's/^chain-//') fi fi fi for CHAIN_DIR in $CHAINS; do CHAIN_ID=$(basename "$CHAIN_DIR" | sed 's/^chain-//') # Gather chain info BASE_SIZE=$(du -h "$CHAIN_DIR/base.sql" 2>/dev/null | cut -f1) DIFF_COUNT=$(ls -1 "$CHAIN_DIR"/diff-*.sql 2>/dev/null | wc -l) TOTAL_SIZE=$(du -sh "$CHAIN_DIR" | cut -f1) # Check if active if [ "$CHAIN_ID" = "$ACTIVE_CHAIN" ]; then echo " chain-$CHAIN_ID (ACTIVE - streaming)" echo " PID: $PID" else echo " chain-$CHAIN_ID" fi echo " Base: $CHAIN_ID (${BASE_SIZE})" echo " Differentials: $DIFF_COUNT files" echo " Total size: $TOTAL_SIZE" # Last activity if [ -f "$CHAIN_DIR/active.sql" ]; then LAST_MOD=$(stat -c %y "$CHAIN_DIR/active.sql" 2>/dev/null | cut -d. -f1) echo " Last activity: $LAST_MOD" fi echo "" done ``` **Output changes:** ``` Old: Replication Slot: myslot Status: active LSN: 0/1234567 Backup Directory: /backups/mydb Base backup: base.sql (10.2 GB) Incremental: incremental.sql (2.3 GB) New: Replication Slot: myslot Status: active Current LSN: 0/9876543 Confirmed LSN: 0/9876540 Lag: 3 bytes Backup Directory: /backups/mydb Chains: chain-20231215T120000Z Base: 2023-12-15 12:00:00 (10.2 GB) Differentials: 15 files Total size: 12.3 GB Status: sealed chain-20231222T120000Z (ACTIVE - streaming) Base: 2023-12-22 12:00:00 (10.5 GB) Differentials: 3 files Total size: 11.8 GB Last activity: 2 minutes ago PID: 12345 Total backup size: 24.1 GB ``` --- ## File Structure Migration ### Old Structure ``` /backups/mydb/ base.sql # Latest base backup base-20231201.sql # Older base backup (maybe) incremental.sql # Ongoing stream globals.sql # Latest globals metadata.json # Latest metadata ``` ### New Structure ``` /backups/mydb/ .pg_scribe.pid # Single pidfile chain-20231215T120000Z/ base.sql globals.sql metadata.json diff-20231216T083000Z.sql diff-20231217T083000Z.sql ... chain-20231222T120000Z/ base.sql globals.sql metadata.json active.sql # Currently streaming ``` ### Migration Script Needed? If users have existing backups in old format, we may need a migration tool: ```bash pg_scribe --migrate /backups/mydb ``` This would: 1. Detect old-style flat structure 2. Create first chain from existing `base.sql` and `globals.sql` 3. Move/copy files to `chain-{TIMESTAMP}/` 4. Leave note about manual handling of `incremental.sql` **Recommendation**: Document migration in release notes but don't auto-migrate. Users can manually structure chains if needed. --- ## Implementation Priorities ### Phase 1: Core Chain Structure 1. Update `--init` to create `chain-{TIMESTAMP}/` subdirectory 2. Update `--start` to auto-detect latest chain and write to `active.sql` 3. Create `.pg_scribe.pid` management 4. Update `--restore` to work with chain directories ### Phase 2: Rotation Commands 5. Implement `--rotate-diff` (new command) 6. Implement `--new-chain` (replaces `--full-backup`) 7. Update `--status` to show chain inventory ### Phase 3: Polish 8. Add `--include-active` flag to `--restore` 9. Add `--up-to` timestamp filtering to `--restore` 10. Comprehensive error handling for edge cases 11. Documentation updates --- ## Testing Checklist ### Basic Chain Operations - [ ] `--init` creates first chain correctly - [ ] `--start` finds and streams to latest chain - [ ] `active.sql` is created and grows - [ ] `.pg_scribe.pid` contains correct PID ### Differential Rotation - [ ] `--rotate-diff` renames `active.sql` correctly - [ ] SIGHUP triggers new `active.sql` creation - [ ] New `active.sql` is being written - [ ] Sealed differential has correct timestamp ### Chain Rotation - [ ] `--new-chain` creates new chain directory - [ ] Base backup completes successfully - [ ] (If automated) Transition stops old streaming and starts new - [ ] Old chain's final differential is sealed ### Restore - [ ] `--restore` finds latest chain - [ ] `--restore --chain-id` uses specific chain - [ ] All differentials applied in order - [ ] `active.sql` ignored by default - [ ] `--include-active` applies active.sql if present ### Edge Cases - [ ] Multiple chains exist, latest is selected - [ ] Stale pidfile is detected and handled - [ ] No active.sql exists (error handling) - [ ] SIGHUP timeout during rotation - [ ] Concurrent `--start` attempts fail properly --- ## Breaking Changes for Users ### Parameter Changes - `-f/--file` → `--backup-dir` (required for all commands) - `-S` → `--slot` (though `-S` can remain as alias) ### Behavioral Changes - **`--start` no longer accepts explicit filename**: Writes to `active.sql` in latest chain - **`--full-backup` removed**: Use `--new-chain` instead - **Directory structure**: Chains are in subdirectories, not flat files ### Migration Path 1. Document new structure in release notes 2. Recommend users start fresh with `--init` for new installations 3. For existing installations: manual chain creation or continue with old flat structure until next major version --- ## Notes for Implementation ### Pidfile Handling - Write PID with `echo $$ > pidfile` before exec - Since we use `exec`, the PID won't change when becoming pg_recvlogical - Clean up stale pidfiles (process doesn't exist) - Never overwrite active pidfile (fail instead) ### Chain ID Format ```bash CHAIN_ID=$(date -u +%Y%m%dT%H%M%SZ) # Example: 20231215T120000Z ``` - ISO 8601 format - UTC timezone (Z suffix) - Second precision is sufficient - Sortable lexicographically ### Finding Latest Chain ```bash LATEST=$(ls -1d "$BACKUP_DIR"/chain-* 2>/dev/null | sort | tail -1) ``` - Relies on ISO timestamp being sortable - Handle case when no chains exist - Validate chain directory structure ### SIGHUP Handling in pg_recvlogical - pg_recvlogical handles SIGHUP automatically - Closes current file and reopens with same path - We rename before SIGHUP so it creates new file - Need to wait and verify new file creation ### Error Exit Codes Maintain the same exit codes from old CLI: ``` 0 Success 1 General error 2 Database connection error 3 Replication slot error 4 Backup/restore error 5 Invalid arguments or validation failure 10 Warning conditions (--status only) ``` --- ## Documentation Updates Needed 1. **cli.md**: Complete rewrite to reflect new commands and chain structure 2. **file-handling.md**: Already written, may need minor updates 3. **design.md**: Update implementation section to reference chains 4. **README.md**: Update examples and quick start guide 5. **Migration guide**: Create doc/migration-v2.md for users upgrading --- ## Open Questions 1. **Should `--new-chain` automate the transition or just create the chain?** - Simple: Just create chain, user manually stops/starts streaming - Advanced: Fully automated seal + stop + start - **Recommendation for POC**: Simple version first 2. **Should we support multiple backup directories with different slots?** - Current design: Yes, each backup dir has its own pidfile and slot - This should work naturally with current design 3. **Compression handling in chains?** - Old: `--compress` flag on `--full-backup` - New: `--compress` on both `--init` and `--new-chain` - Should differentials be compressed? (Probably not during streaming, but could be rotated to compressed format) 4. **What happens to active.sql when --start is killed ungracefully?** - File remains as incomplete differential - Next `--start` should create new `active.sql` (or append? probably create new) - Document this behavior 5. **Chain cleanup strategy?** - Old chains accumulate - when to delete? - Leave to user (use `rm -rf chain-*/`) - Or add `--cleanup-chains --keep=N` command later? - **Recommendation**: Leave to user for now, document with examples