Files
Chromy/README.md
T

277 lines
6.7 KiB
Markdown
Raw Normal View History

2026-04-22 10:25:02 +02:00
# Chromy
2026-04-21 20:13:28 +02:00
2026-04-24 22:46:09 +02:00
<div align="center">
<img src="logo.png" width=300 />
</div>
2026-04-24 22:49:36 +02:00
Chromy is small and simple to use command-line utility for working with a local Chroma database. It lets you create collections, ingest files as chunked embeddings, and run similarity queries against stored documents. It integrates perfectly with agentic coding tools via simple skills (see an [example](./skills/chromy/SKILL.md) in the `skills` directory).
2026-04-21 20:13:28 +02:00
## What it does
- manages local Chroma collections
- chunks files with `semchunk`
- generates embeddings with Chroma's default embedding function
- stores chunk text plus source file metadata
- queries collections and prints readable results
## Requirements
- Python 3.12+
- a local environment able to install the project dependencies in `pyproject.toml`
2026-05-10 17:26:43 +02:00
## Libraries
### Runtime libraries
- `chromadb` — persistent vector database used to store collections, embeddings, documents, and metadata.
- `openai` — dependency used by the embedding stack for model and API integrations.
- `pymupdf4llm` — extracts text from PDF documents for ingestion.
- `python-dotenv` — loads environment variables from local `.env` files.
- `rich` — provides styled terminal output and progress bars.
- `semchunk` — splits source documents into chunks before embedding.
- `tiktoken` — tokenization support used during chunking and embedding preparation.
- `transformers` — model and tokenizer support used by the embedding pipeline.
- `typer` — powers the CLI commands and argument parsing.
### Development libraries
- `mypy` — static type checking.
- `nuitka[onefile]` — builds standalone one-file executables.
- `pytest` — test runner for the project.
- `ruff` — linting and formatting.
2026-04-21 20:13:28 +02:00
## Installation
2026-04-22 10:23:08 +02:00
For local development, install the project dependencies with `uv`:
2026-04-21 20:13:28 +02:00
```bash
uv sync
```
Or with pip:
```bash
python -m venv .venv
source .venv/bin/activate
pip install -e .
```
2026-04-22 10:23:08 +02:00
## Build
Build the source distribution and wheel with `uv`:
```bash
uv build
```
The build artifacts are written to `dist/`.
## Install as a tool with uv
The project exposes a `chromy` command through the Python packaging entrypoint.
Install it as a standalone `uv` tool from the project directory:
```bash
uv tool install .
```
After installation, run the CLI directly:
```bash
chromy --help
```
To install from a built wheel instead:
```bash
uv build
uv tool install dist/chromy-1.0.0-py3-none-any.whl
```
During development, install the tool in editable mode so changes in the working
tree are picked up without reinstalling:
```bash
uv tool install --editable .
```
2026-04-21 20:13:28 +02:00
## Running the CLI
2026-04-22 10:23:08 +02:00
The project entrypoint is available as the `chromy` command after installing the
tool:
```bash
chromy --help
```
You can also run it from the source tree without installing the tool:
2026-04-21 20:13:28 +02:00
```bash
2026-04-22 15:47:46 +02:00
uv run python -m chromy.main --help
2026-04-21 20:13:28 +02:00
```
2026-05-06 21:23:37 +02:00
## Chroma storage location
By default, Chromy uses Chroma's default persistent location behavior (a local
`chroma/` directory based on your current working directory when you run the
command).
You can override this with `CHROMA_FOLDER`.
- `CHROMA_FOLDER` must point to a **parent directory**.
- Chromy will store data in `<CHROMA_FOLDER>/chroma`.
- Relative paths are supported and are resolved from the current working directory.
- If `CHROMA_FOLDER` is set, it takes precedence over the default behavior.
- If the configured location is invalid or not writable, the command fails with an explicit error (no fallback to the default location).
2026-05-10 17:26:43 +02:00
Setting the variable once in `.zprofile` or `.profile` ensures a consistent usage of the variable.
2026-05-06 21:23:37 +02:00
Examples:
```bash
# absolute parent path
CHROMA_FOLDER=/tmp/chromy-data chromy list-collections
# relative parent path (resolved from current directory)
CHROMA_FOLDER=.local-data chromy create-collection notes
```
## Running Tests
Run the test suite with pytest:
```bash
uv run pytest -q
```
2026-04-22 17:03:01 +02:00
## Development Checks
Run Ruff linting:
```bash
uv run ruff check .
```
Check Ruff formatting:
```bash
uv run ruff format --check .
```
Run static type checking with mypy:
```bash
uv run mypy .
```
2026-04-21 20:13:28 +02:00
## Commands
```text
2026-05-10 17:26:43 +02:00
list-collections | lc
create-collection <collection> | cc <collection>
delete-collection <collection> | dc <collection>
count <collection> | c <collection>
import <collection> <file> [<file> ...] | i <collection> <file> [<file> ...]
query <collection> <query_text> | q <collection> <query_text>
delete <collection> --where <condition>=<value> | del <collection> --where <condition>=<value>
2026-04-21 20:13:28 +02:00
```
2026-05-10 17:26:43 +02:00
### Aliases
- `lc``list-collections`
- `cc``create-collection`
- `dc``delete-collection`
- `c``count`
- `i``import`
- `q``query`
- `del``delete`
2026-04-21 20:13:28 +02:00
### Examples
Create a collection:
```bash
2026-04-22 10:23:08 +02:00
chromy create-collection notes
2026-05-10 17:26:43 +02:00
# alias
chromy cc notes
2026-04-21 20:13:28 +02:00
```
2026-04-29 15:39:42 +02:00
Add one or more files:
2026-04-21 20:13:28 +02:00
```bash
2026-04-24 22:46:09 +02:00
chromy import notes ./docs/example.txt
2026-04-29 15:39:42 +02:00
chromy import notes ./docs/intro.md ./docs/setup.md
chromy import notes *.md
2026-05-10 17:26:43 +02:00
# alias
chromy i notes ./docs/example.txt
```
Import a large batch of files with `find`:
```bash
find ./docs -type f \( -name '*.md' -o -name '*.txt' \) -exec chromy import notes {} +
2026-04-21 20:13:28 +02:00
```
Count stored records:
```bash
2026-04-22 10:23:08 +02:00
chromy count notes
2026-05-10 17:26:43 +02:00
# alias
chromy c notes
2026-04-21 20:13:28 +02:00
```
Search the collection:
```bash
2026-04-22 10:23:08 +02:00
chromy query notes "How do I configure this project?"
2026-05-10 17:26:43 +02:00
# alias
chromy q notes "How do I configure this project?"
2026-04-21 20:13:28 +02:00
```
List collections:
```bash
2026-04-22 10:23:08 +02:00
chromy list-collections
2026-05-10 17:26:43 +02:00
# alias
chromy lc
2026-04-21 20:13:28 +02:00
```
Delete a collection:
```bash
2026-04-22 10:23:08 +02:00
chromy delete-collection notes
2026-05-10 17:26:43 +02:00
# alias
chromy dc notes
2026-04-21 20:13:28 +02:00
```
2026-04-22 22:14:26 +02:00
Delete records by metadata:
```bash
chromy delete notes --where file_name=example.txt
2026-05-10 17:26:43 +02:00
# alias
chromy del notes --where file_name=example.txt
2026-04-22 22:14:26 +02:00
```
2026-04-21 20:13:28 +02:00
## How ingestion works
2026-04-29 15:39:42 +02:00
When you run `import`, each file is:
2026-04-21 20:13:28 +02:00
1. read from disk
2. split into chunks
3. embedded
4. inserted into the target collection with the original file path stored as metadata
Query results include the stored document chunk, its id, distance, and file name when available.
## Notes
2026-05-06 21:23:37 +02:00
- by default, collections are stored in a local persistent Chroma database in the current directory
- set `CHROMA_FOLDER` to override the parent location; Chromy will use `<CHROMA_FOLDER>/chroma`
2026-04-24 22:46:09 +02:00
- `import` requires the target collection to already exist
2026-04-29 15:39:42 +02:00
- `import` accepts one or more file paths
- unquoted glob patterns such as `*.md` are expanded by the shell before `chromy` starts
- quoted glob patterns such as `"*.md"` are treated as literal paths and are not expanded by `chromy`
- unmatched unquoted globs may behave differently by shell: `zsh` commonly fails before `chromy` starts, while `bash` may pass the literal pattern through depending on shell settings
- the CLI reports file-specific import failures and continues with the remaining files
- when importing multiple files in an interactive terminal, the CLI shows a Rich progress bar