# AI-Unix Tools Test Suite

This test suite validates the functionality of the AI-powered Unix tools through multiple testing strategies that account for the non-deterministic nature of LLM outputs.

## Test Structure

```
tests/
├── run-tests.sh           # Main test runner
├── data/                  # Test data files
│   ├── sample-log.txt     # Sample log entries
│   ├── contacts.txt       # Contact information
│   ├── feedback.txt       # User feedback samples
│   └── empty.txt          # Empty file for edge cases
├── unit/                  # Deterministic unit tests
│   ├── test-basic.sh      # Exit codes, option parsing
│   └── test-formats.sh    # Output format validation
└── integration/           # Semantic integration tests
    └── test-semantic.sh   # AI-powered semantic validation
```

## Testing Strategy

### 1. Deterministic Tests (unit/test-basic.sh)
Tests properties that should always be consistent:
- **Exit codes**: Correct error codes for various scenarios
- **Option parsing**: Help flags, invalid options, missing arguments
- **File handling**: Non-existent files, empty files
- **Input/output**: Stdin vs file input behavior

### 2. Format Validation Tests (unit/test-formats.sh)
Tests output format correctness:
- **JSON validation**: ai-cut -j produces valid JSON
- **TSV format**: Tab-separated values in default ai-cut output
- **Line numbers**: ai-grep -n prefixes with line numbers
- **Count format**: ai-grep -c outputs single numbers
- **Category prefixes**: ai-class outputs "category:" format

### 3. Semantic Tests (integration/test-semantic.sh)
Uses ai-test itself to validate semantic properties:
- **Semantic matching**: ai-grep finds semantically related content
- **Extraction quality**: ai-cut extracts appropriate fields
- **Categorization accuracy**: ai-class assigns reasonable categories
- **Transformation preservation**: ai-tr maintains semantic meaning
- **Pipeline behavior**: Tools work together semantically

## Running Tests

### Run All Tests
```bash
cd tests
./run-tests.sh
```

### Run Specific Test Suites
```bash
./run-tests.sh --basic      # Only deterministic tests
./run-tests.sh --format     # Only format validation tests
./run-tests.sh --semantic   # Only semantic tests (requires claude)
./run-tests.sh --no-semantic # Skip semantic tests
```

## Prerequisites

- All ai-unix tools must be built and executable in parent directory
- Python3 (for JSON validation in format tests)
- Claude CLI tool (for semantic tests)

## Understanding Test Results

### Deterministic Tests
These should always pass if the tools are correctly implemented. Failures indicate:
- Incorrect exit codes
- Broken option parsing
- File I/O issues
- Basic functionality problems

### Format Tests
These validate that output formats match specifications. Failures indicate:
- Invalid JSON output from ai-cut -j
- Missing tab separators in TSV output
- Incorrect line number formatting
- Wrong count output format

### Semantic Tests
These use AI to validate AI output quality. Failures may indicate:
- Poor semantic matching by ai-grep
- Incorrect field extraction by ai-cut
- Bad categorization by ai-class
- Meaning loss in ai-tr transformations
- Claude API issues

## Interpreting Non-Deterministic Results

Since AI outputs vary, semantic tests may occasionally fail even with correct tools. Consider:
- **Single failures**: May be normal variation, re-run to confirm
- **Consistent failures**: Likely indicate real issues
- **Pattern analysis**: Look for patterns across multiple runs

## Extending the Test Suite

### Adding New Test Data
Place new test files in `tests/data/` and reference them in test scripts.

### Adding Deterministic Tests
Add new test functions to `unit/test-basic.sh` or `unit/test-formats.sh`.

### Adding Semantic Tests
Add new semantic validation tests to `integration/test-semantic.sh` using ai-test.

### Self-Improving Tests
Use ai-test creatively to validate its own assumptions:
```bash
# Example: Test if test data is appropriate
ai-test "contains realistic log entries" tests/data/sample-log.txt
```

## Best Practices

1. **Test exit codes first** - They're the most reliable indicators
2. **Validate formats before semantics** - Structure problems are easier to debug
3. **Use ai-test liberally** - Let AI validate AI output quality
4. **Create focused test data** - Each file should test specific scenarios
5. **Document expected behaviors** - Especially for semantic edge cases