# CSV Parser A Flex/Bison-based CSV parser that can handle various CSV formats including quoted fields, empty fields, and escaped characters. ## Project Structure ``` sa-parse/ ├── .gitignore # Ignore build artifacts and temporary files ├── build.sh # Build script for easy compilation ├── meson.build # Meson build configuration ├── README.md # This file ├── src/ # Source code directory │ ├── csv.l # Flex lexer specification │ └── csv.y # Bison parser specification └── tests/ # Test files and documentation ├── README.md # Test documentation ├── test.csv # Basic test case ├── complex.csv # Complex field content test ├── quotes.csv # Quote handling test ├── empty_fields.csv # Empty field handling test ├── header_only.csv # Header-only file test └── empty.csv # Empty file test ``` ## Features - **RFC 4180 CSV Parsing**: Handles standard CSV format - **Quoted Field Support**: Properly processes quoted fields with commas and quotes - **Empty Field Handling**: Correctly parses empty fields in any position - **Memory Safe**: Conservative memory management with proper error handling - **Robust Error Reporting**: Line number and context information for parse errors ## Building ### Quick Build (Recommended) ```bash ./build.sh ``` ### Manual Build ```bash cd src yacc -d csv.y # Generate parser flex csv.l # Generate lexer gcc -o csv_parser y.tab.c lex.yy.c -ll ``` ### Meson Build ```bash meson setup builddir meson compile -C builddir ``` ## Usage ```bash # Parse from file ./src/csv_parser < tests/test.csv # Parse from stdin echo "name,age,city John,25,Boston" | ./src/csv_parser # Test various formats ./src/csv_parser < tests/complex.csv ./src/csv_parser < tests/empty_fields.csv ``` ## Output Format The parser outputs structured information about the CSV content: ``` Header: [field1], [field2], [field3] Record 1: [value1], [value2], [value3] Record 2: [value4], [value5], [value6] ``` ## Testing See `tests/README.md` for detailed information about test cases and expected behavior. ## Implementation Notes - Built with Flex 2.6+ and Bison 3.0+ - Uses conservative memory management approach - Handles both CRLF and LF line endings - Supports files with or without headers - Gracefully handles malformed input with informative error messages ## Recent Improvements - Fixed buffer overflow vulnerabilities in lexer - Added proper memory allocation failure handling - Enhanced error checking in semantic actions - Improved memory leak prevention - Added comprehensive test suite