begriffs open source - ai-unix/blob - tests/README.md

   1 # AI-Unix Tools Test Suite
   2
   3 This test suite validates the functionality of the AI-powered Unix tools through multiple testing strategies that account for the non-deterministic nature of LLM outputs.
   4
   5 ## Test Structure
   6
   7 ```
   8 tests/
   9 ├── run-tests.sh           # Main test runner
  10 ├── data/                  # Test data files
  11 │   ├── sample-log.txt     # Sample log entries
  12 │   ├── contacts.txt       # Contact information
  13 │   ├── feedback.txt       # User feedback samples
  14 │   └── empty.txt          # Empty file for edge cases
  15 ├── unit/                  # Deterministic unit tests
  16 │   ├── test-basic.sh      # Exit codes, option parsing
  17 │   └── test-formats.sh    # Output format validation
  18 └── integration/           # Semantic integration tests
  19     └── test-semantic.sh   # AI-powered semantic validation
  20 ```
  21
  22 ## Testing Strategy
  23
  24 ### 1. Deterministic Tests (unit/test-basic.sh)
  25 Tests properties that should always be consistent:
  26 - **Exit codes**: Correct error codes for various scenarios
  27 - **Option parsing**: Help flags, invalid options, missing arguments
  28 - **File handling**: Non-existent files, empty files
  29 - **Input/output**: Stdin vs file input behavior
  30
  31 ### 2. Format Validation Tests (unit/test-formats.sh)
  32 Tests output format correctness:
  33 - **JSON validation**: ai-cut -j produces valid JSON
  34 - **TSV format**: Tab-separated values in default ai-cut output
  35 - **Line numbers**: ai-grep -n prefixes with line numbers
  36 - **Count format**: ai-grep -c outputs single numbers
  37 - **Category prefixes**: ai-class outputs "category:" format
  38
  39 ### 3. Semantic Tests (integration/test-semantic.sh)
  40 Uses ai-test itself to validate semantic properties:
  41 - **Semantic matching**: ai-grep finds semantically related content
  42 - **Extraction quality**: ai-cut extracts appropriate fields
  43 - **Categorization accuracy**: ai-class assigns reasonable categories
  44 - **Transformation preservation**: ai-tr maintains semantic meaning
  45 - **Pipeline behavior**: Tools work together semantically
  46
  47 ## Running Tests
  48
  49 ### Run All Tests
  50 ```bash
  51 cd tests
  52 ./run-tests.sh
  53 ```
  54
  55 ### Run Specific Test Suites
  56 ```bash
  57 ./run-tests.sh --basic      # Only deterministic tests
  58 ./run-tests.sh --format     # Only format validation tests
  59 ./run-tests.sh --semantic   # Only semantic tests (requires claude)
  60 ./run-tests.sh --no-semantic # Skip semantic tests
  61 ```
  62
  63 ## Prerequisites
  64
  65 - All ai-unix tools must be built and executable in parent directory
  66 - Python3 (for JSON validation in format tests)
  67 - Claude CLI tool (for semantic tests)
  68
  69 ## Understanding Test Results
  70
  71 ### Deterministic Tests
  72 These should always pass if the tools are correctly implemented. Failures indicate:
  73 - Incorrect exit codes
  74 - Broken option parsing
  75 - File I/O issues
  76 - Basic functionality problems
  77
  78 ### Format Tests
  79 These validate that output formats match specifications. Failures indicate:
  80 - Invalid JSON output from ai-cut -j
  81 - Missing tab separators in TSV output
  82 - Incorrect line number formatting
  83 - Wrong count output format
  84
  85 ### Semantic Tests
  86 These use AI to validate AI output quality. Failures may indicate:
  87 - Poor semantic matching by ai-grep
  88 - Incorrect field extraction by ai-cut
  89 - Bad categorization by ai-class
  90 - Meaning loss in ai-tr transformations
  91 - Claude API issues
  92
  93 ## Interpreting Non-Deterministic Results
  94
  95 Since AI outputs vary, semantic tests may occasionally fail even with correct tools. Consider:
  96 - **Single failures**: May be normal variation, re-run to confirm
  97 - **Consistent failures**: Likely indicate real issues
  98 - **Pattern analysis**: Look for patterns across multiple runs
  99
 100 ## Extending the Test Suite
 101
 102 ### Adding New Test Data
 103 Place new test files in `tests/data/` and reference them in test scripts.
 104
 105 ### Adding Deterministic Tests
 106 Add new test functions to `unit/test-basic.sh` or `unit/test-formats.sh`.
 107
 108 ### Adding Semantic Tests
 109 Add new semantic validation tests to `integration/test-semantic.sh` using ai-test.
 110
 111 ### Self-Improving Tests
 112 Use ai-test creatively to validate its own assumptions:
 113 ```bash
 114 # Example: Test if test data is appropriate
 115 ai-test "contains realistic log entries" tests/data/sample-log.txt
 116 ```
 117
 118 ## Best Practices
 119
 120 1. **Test exit codes first** - They're the most reliable indicators
 121 2. **Validate formats before semantics** - Structure problems are easier to debug
 122 3. **Use ai-test liberally** - Let AI validate AI output quality
 123 4. **Create focused test data** - Each file should test specific scenarios
 124 5. **Document expected behaviors** - Especially for semantic edge cases