1.1 KiB
1.1 KiB
11. Improve File Handling
Summary
Make file ingestion boundaries clearer by using Path, explicit UTF-8 decoding, and validation before reading.
Implementation Steps
- Change internal file ingestion APIs to accept
Pathinstead of rawstr. - Convert CLI string paths to
Pathin the command adapter or handler. - Validate that the path exists and is a regular file before reading.
- Read text with
encoding="utf-8". - Raise a clear app-level file error for missing paths, directories, and decoding failures.
- Leave PDF and future file loaders out of scope for now.
Public Interface Changes
- CLI argument remains a file path string.
- Error messages for missing or invalid files become clearer.
Test Plan
- Test successful text-file loading.
- Test missing file, directory path, and invalid UTF-8 handling.
- Smoke test
add-datawith a valid UTF-8 file.
Assumptions
- Only plain text ingestion is supported in this plan.
- Existing metadata can continue storing the original path string as
file_nameunless a later plan changes metadata shape.