107 lines
2.0 KiB
Markdown
107 lines
2.0 KiB
Markdown
# chroma
|
|
|
|
A small command-line utility for working with a local Chroma database. It lets you create collections, ingest file contents as chunked embeddings, and run similarity queries against stored documents.
|
|
|
|
## What it does
|
|
|
|
- manages local Chroma collections
|
|
- chunks files with `semchunk`
|
|
- generates embeddings with Chroma's default embedding function
|
|
- stores chunk text plus source file metadata
|
|
- queries collections and prints readable results
|
|
|
|
## Requirements
|
|
|
|
- Python 3.12+
|
|
- a local environment able to install the project dependencies in `pyproject.toml`
|
|
|
|
## Installation
|
|
|
|
Using `uv`:
|
|
|
|
```bash
|
|
uv sync
|
|
```
|
|
|
|
Or with pip:
|
|
|
|
```bash
|
|
python -m venv .venv
|
|
source .venv/bin/activate
|
|
pip install -e .
|
|
```
|
|
|
|
## Running the CLI
|
|
|
|
The project entrypoint is `main.py`.
|
|
|
|
```bash
|
|
uv run python main.py --help
|
|
```
|
|
|
|
## Commands
|
|
|
|
```text
|
|
list-collections | lc
|
|
create-collection | cc <collection>
|
|
delete-collection | dc <collection>
|
|
count | co <collection>
|
|
add-data | ad <collection> <file>
|
|
query | q <collection> <query_text>
|
|
```
|
|
|
|
### Examples
|
|
|
|
Create a collection:
|
|
|
|
```bash
|
|
uv run python main.py create-collection notes
|
|
```
|
|
|
|
Add a file:
|
|
|
|
```bash
|
|
uv run python main.py add-data notes ./docs/example.txt
|
|
```
|
|
|
|
Count stored records:
|
|
|
|
```bash
|
|
uv run python main.py count notes
|
|
```
|
|
|
|
Search the collection:
|
|
|
|
```bash
|
|
uv run python main.py query notes "How do I configure this project?"
|
|
```
|
|
|
|
List collections:
|
|
|
|
```bash
|
|
uv run python main.py list-collections
|
|
```
|
|
|
|
Delete a collection:
|
|
|
|
```bash
|
|
uv run python main.py delete-collection notes
|
|
```
|
|
|
|
## How ingestion works
|
|
|
|
When you run `add-data`, the file is:
|
|
|
|
1. read from disk
|
|
2. split into chunks
|
|
3. embedded
|
|
4. inserted into the target collection with the original file path stored as metadata
|
|
|
|
Query results include the stored document chunk, its id, distance, and file name when available.
|
|
|
|
## Notes
|
|
|
|
- collections are stored in a local persistent Chroma database
|
|
- `add-data` requires the target collection to already exist
|
|
- the CLI prints friendly messages for common errors such as missing collections or missing files
|