diff --git a/README.md b/README.md index e69de29..0f7561c 100644 --- a/README.md +++ b/README.md @@ -0,0 +1,106 @@ +# chroma + +A small command-line utility for working with a local Chroma database. It lets you create collections, ingest file contents as chunked embeddings, and run similarity queries against stored documents. + +## What it does + +- manages local Chroma collections +- chunks files with `semchunk` +- generates embeddings with Chroma's default embedding function +- stores chunk text plus source file metadata +- queries collections and prints readable results + +## Requirements + +- Python 3.12+ +- a local environment able to install the project dependencies in `pyproject.toml` + +## Installation + +Using `uv`: + +```bash +uv sync +``` + +Or with pip: + +```bash +python -m venv .venv +source .venv/bin/activate +pip install -e . +``` + +## Running the CLI + +The project entrypoint is `main.py`. + +```bash +uv run python main.py --help +``` + +## Commands + +```text +list-collections | lc +create-collection | cc +delete-collection | dc +count | co +add-data | ad +query | q +``` + +### Examples + +Create a collection: + +```bash +uv run python main.py create-collection notes +``` + +Add a file: + +```bash +uv run python main.py add-data notes ./docs/example.txt +``` + +Count stored records: + +```bash +uv run python main.py count notes +``` + +Search the collection: + +```bash +uv run python main.py query notes "How do I configure this project?" +``` + +List collections: + +```bash +uv run python main.py list-collections +``` + +Delete a collection: + +```bash +uv run python main.py delete-collection notes +``` + +## How ingestion works + +When you run `add-data`, the file is: + +1. read from disk +2. split into chunks +3. embedded +4. inserted into the target collection with the original file path stored as metadata + +Query results include the stored document chunk, its id, distance, and file name when available. + +## Notes + +- collections are stored in a local persistent Chroma database +- `add-data` requires the target collection to already exist +- the CLI prints friendly messages for common errors such as missing collections or missing files