# pg_scribe Command Line Interface

This document describes the command line interface for **pg_scribe**, an incremental SQL backup system for PostgreSQL.

## Overview

pg_scribe uses action flags similar to other PostgreSQL tools like `pg_recvlogical`. The tool performs one primary action per invocation, specified by action flags.

The system organizes backups into **chains** - each chain consists of one base backup plus zero or more differential backups. See `doc/file-handling.md` for detailed terminology and file organization.

## Synopsis

```bash
# Initialize backup system (one-time)
pg_scribe --init [OPTIONS]

# Start streaming incremental backups
pg_scribe --start [OPTIONS]

# Stop active streaming process
pg_scribe --stop [OPTIONS]

# Rotate the differential file
pg_scribe --rotate-diff [OPTIONS]

# Create new chain with fresh base backup
pg_scribe --new-chain [OPTIONS]

# Restore from backups
pg_scribe --restore [OPTIONS]

# Check replication slot status
pg_scribe --status [OPTIONS]
```

## Common Options

These options apply to all actions:

### Connection Options

```
-d, --dbname=DBNAME          Database name (can be a connection string)
-h, --host=HOSTNAME          Database server host (default: localhost)
-p, --port=PORT              Database server port (default: 5432)
-U, --username=USERNAME      Database user (default: $PGUSER or $USER)
-w, --no-password            Never prompt for password
-W, --password               Force password prompt
```

### General Options

```
-v, --verbose                Enable verbose mode
-V, --version                Print version and exit
-?, --help                   Show help and exit
```

## Action Flags

Exactly one of the following action flags must be specified:

### `--init`

Initialize the backup system by creating a replication slot, setting up DDL capture via the wal2sql extension, and taking an initial base backup in the first chain.

**This is a one-time initialization operation** - following PostgreSQL conventions (`initdb`, `pg_basebackup`), it requires an empty backup directory and will fail if already initialized. Use `--if-not-exists` to make initialization idempotent (succeeds silently if already initialized). If initialization fails, partial state is automatically cleaned up (replication slot dropped, partial backup files removed).

**Additional options:**

```
--backup-dir=DIRECTORY       Backup output directory (required)
--slot=SLOTNAME              Replication slot name (default: pg_scribe)
--if-not-exists              Do not error if already initialized (idempotent)
--force                      Skip validation and force initialization (dangerous!)
--compress=METHOD            Compression method: gzip, lz4, zstd, or none
                             Can include level (e.g., zstd:9) (default: none)
```

**What it does:**

**Phase 1: Validation** (runs first, can fail)

1. **CRITICAL Checks** (must pass or initialization fails):
   - Verify `wal_level = logical`
   - Verify `max_replication_slots >= 1`
   - Verify `max_wal_senders >= 1`
   - Check all tables have adequate replica identity (PRIMARY KEY, USING INDEX, or FULL)

2. **Coverage Warnings** (non-blocking, informational):
   - List unlogged tables (will not be backed up)
   - Check for large objects (not incrementally backed up)

**Phase 2: Setup** (only runs if validation passes or `--force` used)

1. Verifies backup directory is empty (or doesn't exist)
2. Creates the wal2sql extension if it doesn't exist (`CREATE EXTENSION IF NOT EXISTS wal2sql;`)
   - This automatically installs the DDL event trigger
3. Creates a logical replication slot (fails if slot already exists)
4. Generates chain ID from current timestamp (e.g., `20231215T120000Z`)
5. Creates first chain directory: `chain-{CHAIN_ID}/`
6. Takes synchronized base backup using the slot's snapshot → `chain-{CHAIN_ID}/base.sql`
7. Creates `pg_dumpall --globals-only` backup → `chain-{CHAIN_ID}/globals.sql`
8. Generates metadata file → `chain-{CHAIN_ID}/metadata.json` (PostgreSQL version, extensions, encoding)
9. Applies compression if `--compress` specified

**Example:**

```bash
pg_scribe --init -d mydb --backup-dir /backups/mydb --slot mydb_backup
```

**Output should convey:**

- **Validation results** with clear pass/fail status for:
  - PostgreSQL configuration (wal_level, max_replication_slots, max_wal_senders)
  - Replica identity for all tables
  - Coverage warnings (unlogged tables, large objects)
- **If validation fails**: List CRITICAL issues with specific fix commands, then exit with error code 5 (unless `--force` used)
- **If validation passes** (or `--force`): Progress through setup steps (extension creation, slot creation with snapshot ID, chain creation, backup paths)
- **Final status**: Success message with chain ID created

**Example output:**
```
✓ Validation passed
✓ Extension created
✓ Replication slot 'mydb_backup' created at LSN 0/1234567
✓ Initial chain created: 20231215T120000Z
  Location: /backups/mydb/chain-20231215T120000Z/

Next step: Start streaming with:
  pg_scribe --start --backup-dir /backups/mydb
```

---

### `--start`

Start streaming incremental backups continuously to the latest chain in the backup directory.

**Additional options:**

```
--backup-dir=DIRECTORY       Backup directory (required)
--slot=SLOTNAME              Replication slot name (default: pg_scribe)
-s, --status-interval=SECS   Status update interval in seconds (default: 10)
-F, --fsync-interval=SECS    Fsync interval in seconds (default: 10, 0 to disable)
```

**What it does:**

1. Validates database connection and replication slot
2. Auto-detects latest chain (newest by timestamp)
3. Checks for existing streaming process (via `.pg_scribe.pid`)
4. Creates top-level pidfile: `{backup-dir}/.pg_scribe.pid`
5. Uses `exec` to replace itself with `pg_recvlogical`, which then:
   - Streams decoded changes using the wal2sql plugin
   - Writes SQL to `chain-{CHAIN_ID}/active.sql`
   - Periodically fsyncs output file for crash safety
   - Reports LSN position and lag to stderr
   - Responds to SIGHUP by closing and reopening output file (for log rotation)

**Implementation:** `pg_scribe --start` is a thin wrapper that validates prerequisites and uses `exec` to become `pg_recvlogical`. This design has several advantages:
- **No orphaned processes**: pg_scribe becomes pg_recvlogical (same PID), eliminating parent-child complexity
- **Direct signal handling**: SIGHUP, SIGTERM, SIGINT go directly to pg_recvlogical without forwarding
- **Simpler code**: No need for signal forwarding, child process tracking, or wait loops
- **Reliable cleanup**: Tests and process managers interact with a single process

**Log rotation support:** File rotation is provided automatically by `pg_recvlogical`:

1. Rename the current output file
2. Send SIGHUP to the pg_scribe/pg_recvlogical process
3. pg_recvlogical closes the old file and opens a new one with the same name

**Example:**

```bash
# Stream to latest chain (foreground)
pg_scribe --start -d mydb --backup-dir /backups/mydb --slot mydb_backup

# Run as background daemon (redirect stderr to log file)
pg_scribe --start -d mydb --backup-dir /backups/mydb --slot mydb_backup \
  2>/var/log/pg_scribe.log &

# Stream to stdout (for processing with other tools) - NOT SUPPORTED in chain mode
# Use --backup-dir; active.sql will be created in latest chain
```

**Output (to stderr):**
```
✓ Found latest chain: 20231215T120000Z
✓ Starting replication from slot 'mydb_backup'
✓ Writing to: /backups/mydb/chain-20231215T120000Z/active.sql
[periodic LSN updates...]
```

**Note:** Only one `--start` process can run per backup directory at a time.

---

### `--stop`

Stop the active streaming process gracefully.

**Additional options:**

```
--backup-dir=DIRECTORY       Backup directory containing active process (required)
```

**What it does:**

1. Locates the pidfile at `{backup-dir}/.pg_scribe.pid`
2. Validates the PID exists and is a `pg_recvlogical` process
3. Sends SIGTERM to the process for graceful shutdown
4. Waits up to 30 seconds for the process to stop
5. Sends SIGKILL if the process doesn't stop gracefully
6. Removes the pidfile

**Example:**

```bash
pg_scribe --stop --backup-dir /backups/mydb
```

**Output:**
```
✓ Found pg_recvlogical process (PID 12345)
✓ Active chain: 20231215T120000Z
✓ Sending SIGTERM to process 12345...
✓ Waiting for process to stop...
✓ Streaming process stopped
```

**Use case:** Clean shutdown before maintenance, before transitioning to a new chain manually, or when stopping backups temporarily.

**Edge cases handled:**
- Stale pidfile (process doesn't exist) - removes pidfile and exits successfully
- Wrong process (not pg_recvlogical) - refuses to stop and exits with error
- Process doesn't respond to SIGTERM - sends SIGKILL after timeout

---

### `--rotate-diff`

Rotate the differential file within the active chain, sealing the current differential and starting a new one (similar to log rotation).

**Additional options:**

```
--backup-dir=DIRECTORY       Backup directory (required)
```

**What it does:**

1. Finds the active streaming process via `.pg_scribe.pid`
2. Validates process exists and is pg_recvlogical
3. Locates the active chain (finds `active.sql`)
4. Generates new differential name: `diff-{TIMESTAMP}.sql`
5. **Atomic rotation:**
   - Rename `active.sql` → `diff-{TIMESTAMP}.sql`
   - Send SIGHUP to pg_recvlogical process
   - Wait for new `active.sql` to appear (timeout after 30s)
   - Validate new file is being written to

**Example:**

```bash
pg_scribe --rotate-diff --backup-dir /backups/mydb
```

**Output:**
```
✓ Found active chain: 20231215T120000Z
✓ Rotated differential: diff-20231216T083000Z.sql (2.1 GB)
✓ New differential started
```

**Use case:** Daily or weekly differential rotation to keep individual files manageable.

**Edge cases handled:**
- Stale pidfile (process doesn't exist)
- Wrong process (not pg_recvlogical)
- SIGHUP timeout (new file doesn't appear)

---

### `--new-chain`

Create a new chain with a fresh base backup. This operation takes a new base backup while streaming continues to the old chain.

**Additional options:**

```
--backup-dir=DIRECTORY       Backup directory (required)
--start                      Automatically stop old streaming and start new (one-step transition)
--compress=METHOD            Compression method: gzip, lz4, zstd, or none
                             Can include level (e.g., zstd:9) (default: none)
```

**What it does:**

1. Generates new chain ID from current timestamp
2. Creates new chain directory: `chain-{NEW_ID}/`
3. **Takes new base backup** (while streaming continues to old chain if active):
   - `pg_dump` → `chain-{NEW_ID}/base.sql`
   - `pg_dumpall --globals-only` → `chain-{NEW_ID}/globals.sql`
   - Metadata → `chain-{NEW_ID}/metadata.json`
   - Apply compression if specified
4. **If `--start` flag specified:**
   - Stops old streaming process (sends SIGTERM, waits up to 30s)
   - Starts streaming to new chain using `exec pg_recvlogical`
   - Process continues running in foreground (or background if run with `&`)
5. **If `--start` NOT specified:**
   - Reports new chain created
   - Displays manual transition instructions

**Example:**

```bash
# Create new chain only (manual transition)
pg_scribe --new-chain -d mydb --backup-dir /backups/mydb

# Create new chain with automatic transition (recommended)
pg_scribe --new-chain --start -d mydb --backup-dir /backups/mydb --compress=zstd:9
```

**Output (without --start):**
```
Taking new base backup...
✓ Base backup complete: chain-20231222T120000Z (12.3 GB)

New chain created: 20231222T120000Z
To transition to the new chain:
  1. Stop the current streaming process:
     pg_scribe --stop -f /backups/mydb
  2. Start streaming to the new chain:
     pg_scribe --start -d mydb -f /backups/mydb -S mydb_backup
```

**Output (with --start):**
```
Taking new base backup...
✓ Base backup complete: chain-20231222T120000Z (12.3 GB)

Automatic Transition (--start specified)
✓ Stopping old streaming process (PID 12345)...
✓ Old streaming process stopped

Starting streaming to new chain: 20231222T120000Z
✓ Output file: /backups/mydb/chain-20231222T120000Z/active.sql
[continues streaming in foreground...]
```

**Use case:** Monthly or quarterly base backup refresh to keep chains manageable. Use `--start` for automated transition in production (recommended), or omit it for manual control.

**Note on overlap:** Since the base backup is taken while streaming continues, there will be some overlap between the old chain's final differential and the new chain's base backup. This is acceptable and simplifies the design. Restore operates on one chain at a time, so overlap doesn't cause issues.

**Note on --start:** When using `--start`, the command requires `-d/--dbname` and other connection options since it will exec into `pg_recvlogical` to begin streaming.

---

### `--restore`

Restore database from a chain's base backup plus its differential backups.

**Additional options:**

```
--backup-dir=DIRECTORY       Backup input directory (required)
-d, --dbname=DBNAME          Target database name (required)
-C, --create                 Create target database
--chain-id=ID                Specific chain to restore (default: latest)
--include-active             Also apply incomplete active.sql (risky!)
--up-to=TIMESTAMP            Only apply differentials up to this time
--no-sync-sequences          Skip sequence synchronization
```

**What it does:**

1. Determines target chain (latest or specified via `--chain-id`)
2. Locates base backup: `chain-{ID}/base.sql`
3. Finds all sealed differentials: `chain-{ID}/diff-*.sql` (sorted by timestamp)
4. Creates target database (if `--create` specified)
5. Restores globals: `chain-{ID}/globals.sql`
6. Restores base backup: `chain-{ID}/base.sql`
7. Applies sealed differentials in chronological order
8. Synchronizes sequences using `setval()` (unless `--no-sync-sequences`)
9. Reports statistics

**Example:**

```bash
# Restore latest chain to new database
pg_scribe --restore --backup-dir /backups/mydb -d mydb_restored --create

# Restore specific chain
pg_scribe --restore --backup-dir /backups/mydb --chain-id 20231215T120000Z \
  -d mydb_restored --create

# Restore with incomplete active.sql (risky!)
pg_scribe --restore --backup-dir /backups/mydb -d mydb_restored --create --include-active
```

**Output:**
```
Restoring from chain: 20231215T120000Z
✓ Restored globals
✓ Restored base backup (1,234,567 rows)
✓ Applied 15 differentials
  - diff-20231216T083000Z.sql
  - diff-20231217T083000Z.sql
  ...
✓ Synchronized 42 sequences
Restore complete (took 3m 42s)
```

**Default behavior:** By default, `active.sql` is **ignored** because it's incomplete and may be inconsistent. Use `--include-active` only if you need the absolute latest changes and accept the risk.

---

### `--status`

Check replication slot health and backup system status.

**Additional options:**

```
--slot=SLOTNAME              Replication slot name (default: pg_scribe)
--backup-dir=DIRECTORY       Backup directory to analyze (optional)
```

**What it does:**

1. Queries `pg_replication_slots` for slot health
2. Shows replication lag and WAL retention
3. Analyzes backup directory if provided:
   - Lists all chains with timestamps
   - Shows differential counts and sizes
   - Indicates which chain is actively streaming
4. Reports warnings about potential issues

**Example:**

```bash
pg_scribe --status -d mydb --slot mydb_backup --backup-dir /backups/mydb
```

**Output:**
```
Replication Slot: mydb_backup
  Status: active
  Current LSN: 0/9876543
  Confirmed LSN: 0/9876540
  Lag: 3 bytes
  Age: 7 days

Backup Directory: /backups/mydb

Chains:
  chain-20231215T120000Z
    Base: 2023-12-15 12:00:00 (10.2 GB)
    Differentials: 15 files (2.1 GB total)
    Status: sealed

  chain-20231222T120000Z (ACTIVE - streaming)
    Base: 2023-12-22 12:00:00 (10.5 GB)
    Differentials: 3 files (1.8 GB total)
    Last activity: 2 minutes ago
    PID: 12345

Total backup size: 24.1 GB
```

---

## Chain-Based File Structure

The backup directory contains multiple chain subdirectories, each representing a complete backup lineage:

```
/backups/mydb/
  .pg_scribe.pid                   # Single pidfile for backup directory
  chain-20231215T120000Z/          # First chain (sealed)
    base.sql                       # Base backup (pg_dump)
    globals.sql                    # Roles, tablespaces (pg_dumpall)
    metadata.json                  # PG version, extensions, etc.
    diff-20231216T083000Z.sql      # Sealed differential
    diff-20231217T083000Z.sql      # Sealed differential
    ...
  chain-20231222T120000Z/          # Second chain (active)
    base.sql
    globals.sql
    metadata.json
    active.sql                     # Currently streaming (incomplete)
```

**Key concepts:**
- **Chain ID**: Timestamp of base backup creation (e.g., `20231215T120000Z`)
- **Base backup**: Complete database snapshot at chain creation time
- **Active differential**: Currently streaming changes (incomplete, named `active.sql`)
- **Sealed differential**: Completed differential (immutable, named `diff-{TIMESTAMP}.sql`)
- **One active chain**: Only one chain receives streaming changes at a time
- **One pidfile**: Single `.pg_scribe.pid` tracks the active streaming process

See `doc/file-handling.md` for comprehensive details on file organization and terminology.

---

## Chain Management

Chain management uses standard Unix tools:

```bash
# List all chains
ls -l /backups/mydb/

# View chain contents
ls -lh /backups/mydb/chain-20231215T120000Z/

# Tree view
tree /backups/mydb/

# Remove old chain (manual cleanup)
rm -rf /backups/mydb/chain-20231215T120000Z/

# Check if streaming is active
if [ -f /backups/mydb/.pg_scribe.pid ]; then
  cat /backups/mydb/.pg_scribe.pid
  ps -p $(cat /backups/mydb/.pg_scribe.pid)
fi
```

**Cleanup strategy**: Old chains accumulate over time. Removal is manual and left to the user (or external scripts) to maintain full control over retention policies.

---

## Exit Status

```
0   Success
1   General error
2   Database connection error
3   Replication slot error
4   Backup/restore error
5   Invalid arguments or validation failure (--init without --force)
10  Warning conditions (--status only)
```

---

## Environment Variables

Standard PostgreSQL environment variables are supported:

```
PGHOST              Database host
PGPORT              Database port
PGDATABASE          Database name
PGUSER              Database user
PGPASSWORD          Database password (not recommended, use .pgpass instead)
PG_COLOR            Use color in diagnostics: always, auto, or never
```

---

## Example Workflows

### Initial Setup and Daily Operation

```bash
# 1. Initialize backup system (one-time setup, requires empty directory)
pg_scribe --init -d production --backup-dir /backups/production --slot prod_backup

# 2. Start streaming backups (run as daemon/service)
pg_scribe --start -d production --backup-dir /backups/production --slot prod_backup \
  2>/var/log/pg_scribe.log &

# 3. Daily differential rotation (cron: 0 2 * * *)
pg_scribe --rotate-diff --backup-dir /backups/production

# 4. Monthly new chain with automatic transition (cron: 0 3 1 * *)
# This stops old streaming and starts new in one command
pg_scribe --new-chain --start -d production --backup-dir /backups/production \
  --slot prod_backup --compress=zstd:9 2>/var/log/pg_scribe.log &

# 5. Monitor slot health (cron: */15 * * * *)
pg_scribe --status -d production --slot prod_backup --backup-dir /backups/production

# Alternative: Manual transition (if preferred)
# 4a. Create new chain
pg_scribe --new-chain -d production --backup-dir /backups/production --compress=zstd:9
# 4b. Stop old streaming
pg_scribe --stop --backup-dir /backups/production
# 4c. Start streaming to new chain
pg_scribe --start -d production --backup-dir /backups/production --slot prod_backup \
  2>/var/log/pg_scribe.log &
```

### Disaster Recovery

```bash
# 1. Check available backups
pg_scribe --status --backup-dir /backups/production

# 2. Restore latest chain to new database
pg_scribe --restore --backup-dir /backups/production -d production_restored --create

# 3. Restore specific older chain
pg_scribe --restore --backup-dir /backups/production --chain-id 20231215T120000Z \
  -d production_december --create

# 4. Test restored database
psql -d production_restored -c "SELECT COUNT(*) FROM users;"

# 5. Switch application to restored database (manual step)
```

---

## Comparison with PostgreSQL Tools

pg_scribe follows the same CLI conventions as core PostgreSQL tools:

| Tool | Action Flags | Connection Options | File Options |
|------|-------------|-------------------|--------------|
| **pg_recvlogical** | `--create-slot`, `--drop-slot`, `--start` | `-d`, `-h`, `-p`, `-U` | `-f` (output file), `--slot` (slot) |
| **pg_dump** | (positional dbname) | `-d`, `-h`, `-p`, `-U` | `-f` (output file) |
| **pg_basebackup** | (none) | `-d`, `-h`, `-p`, `-U` | `-D` (data directory!) |
| **pg_restore** | (none) | `-d`, `-h`, `-p`, `-U` | positional (archive file) |
| **pg_scribe** | `--init`, `--start`, `--rotate-diff`, `--new-chain`, `--restore`, `--status` | `-d`, `-h`, `-p`, `-U` | `--backup-dir` (directory), `--slot` (slot) |

**Note:** `pg_basebackup` uses `-D/--pgdata` because it creates an actual PostgreSQL data directory cluster. `pg_scribe` uses `--backup-dir` because it manages a collection of chains with SQL backup files.

---

## Implementation Notes

### Design Principles

1. **Consistent with PostgreSQL**: Follow exact same conventions as `pg_recvlogical`, `pg_dump`, `initdb`, `pg_basebackup`
2. **Fail-fast initialization**: `--init` requires empty directory, cleans up on failure (like `initdb` and `pg_basebackup`)
3. **Clear output**: Progress to stderr, data to stdout/files
4. **Scriptable**: Clear text output format, proper exit codes
5. **Opinionated file structure**: Chains provide consistent organization

### Technology Choices (POC)

- **Language**: Bash or Python
  - Bash: Minimal dependencies, matches PostgreSQL tool style
  - Python: Better error handling, easier testing
- **Dependencies**: Only PostgreSQL client tools (pg_recvlogical, pg_dump, pg_dumpall, psql)

### Key Implementation Components

1. **Connection management**: Use libpq environment variables, .pgpass
2. **Error handling**: Validate prerequisites before starting operations
3. **Chain management**:
   - Auto-detect latest chain for `--start`
   - Single pidfile at `{backup-dir}/.pg_scribe.pid`
   - Chain ID format: `YYYYMMDDTHHMMSSZ` (ISO 8601, UTC, sortable)
4. **Signal handling**:
   - SIGTERM/SIGINT for graceful shutdown (handled directly by pg_recvlogical after exec)
   - SIGHUP for file rotation (handled directly by pg_recvlogical after exec)
5. **Output conventions**:
   - Progress and status messages → stderr
   - SQL output → `active.sql` in chain directory
6. **Process management for --start**:
   - Validate prerequisites (connection, replication slot, no existing process)
   - Display configuration to stderr
   - Use `exec` to replace pg_scribe with `pg_recvlogical`
   - Benefits: No orphaned processes, direct signal handling, simpler code, same PID

**Extensibility Note**: This POC design uses a modular, action-based CLI that can accommodate additional features and options in future versions without breaking compatibility.