# AI-Unix Tools Test Suite This test suite validates the functionality of the AI-powered Unix tools through multiple testing strategies that account for the non-deterministic nature of LLM outputs. ## Test Structure ``` tests/ ├── run-tests.sh # Main test runner ├── data/ # Test data files │ ├── sample-log.txt # Sample log entries │ ├── contacts.txt # Contact information │ ├── feedback.txt # User feedback samples │ └── empty.txt # Empty file for edge cases ├── unit/ # Deterministic unit tests │ ├── test-basic.sh # Exit codes, option parsing │ └── test-formats.sh # Output format validation └── integration/ # Semantic integration tests └── test-semantic.sh # AI-powered semantic validation ``` ## Testing Strategy ### 1. Deterministic Tests (unit/test-basic.sh) Tests properties that should always be consistent: - **Exit codes**: Correct error codes for various scenarios - **Option parsing**: Help flags, invalid options, missing arguments - **File handling**: Non-existent files, empty files - **Input/output**: Stdin vs file input behavior ### 2. Format Validation Tests (unit/test-formats.sh) Tests output format correctness: - **JSON validation**: ai-cut -j produces valid JSON - **TSV format**: Tab-separated values in default ai-cut output - **Line numbers**: ai-grep -n prefixes with line numbers - **Count format**: ai-grep -c outputs single numbers - **Category prefixes**: ai-class outputs "category:" format ### 3. Semantic Tests (integration/test-semantic.sh) Uses ai-test itself to validate semantic properties: - **Semantic matching**: ai-grep finds semantically related content - **Extraction quality**: ai-cut extracts appropriate fields - **Categorization accuracy**: ai-class assigns reasonable categories - **Transformation preservation**: ai-tr maintains semantic meaning - **Pipeline behavior**: Tools work together semantically ## Running Tests ### Run All Tests ```bash cd tests ./run-tests.sh ``` ### Run Specific Test Suites ```bash ./run-tests.sh --basic # Only deterministic tests ./run-tests.sh --format # Only format validation tests ./run-tests.sh --semantic # Only semantic tests (requires claude) ./run-tests.sh --no-semantic # Skip semantic tests ``` ## Prerequisites - All ai-unix tools must be built and executable in parent directory - Python3 (for JSON validation in format tests) - Claude CLI tool (for semantic tests) ## Understanding Test Results ### Deterministic Tests These should always pass if the tools are correctly implemented. Failures indicate: - Incorrect exit codes - Broken option parsing - File I/O issues - Basic functionality problems ### Format Tests These validate that output formats match specifications. Failures indicate: - Invalid JSON output from ai-cut -j - Missing tab separators in TSV output - Incorrect line number formatting - Wrong count output format ### Semantic Tests These use AI to validate AI output quality. Failures may indicate: - Poor semantic matching by ai-grep - Incorrect field extraction by ai-cut - Bad categorization by ai-class - Meaning loss in ai-tr transformations - Claude API issues ## Interpreting Non-Deterministic Results Since AI outputs vary, semantic tests may occasionally fail even with correct tools. Consider: - **Single failures**: May be normal variation, re-run to confirm - **Consistent failures**: Likely indicate real issues - **Pattern analysis**: Look for patterns across multiple runs ## Extending the Test Suite ### Adding New Test Data Place new test files in `tests/data/` and reference them in test scripts. ### Adding Deterministic Tests Add new test functions to `unit/test-basic.sh` or `unit/test-formats.sh`. ### Adding Semantic Tests Add new semantic validation tests to `integration/test-semantic.sh` using ai-test. ### Self-Improving Tests Use ai-test creatively to validate its own assumptions: ```bash # Example: Test if test data is appropriate ai-test "contains realistic log entries" tests/data/sample-log.txt ``` ## Best Practices 1. **Test exit codes first** - They're the most reliable indicators 2. **Validate formats before semantics** - Structure problems are easier to debug 3. **Use ai-test liberally** - Let AI validate AI output quality 4. **Create focused test data** - Each file should test specific scenarios 5. **Document expected behaviors** - Especially for semantic edge cases