Compare commits

...

4 Commits

Author SHA1 Message Date
mrosati b45a92e2e2 validate AGENTS.md against current implementation
build / build (push) Successful in 9s
pytest / pytest (push) Failing after 24s
2026-05-10 17:28:31 +02:00
mrosati 5e59e3dcdc update README 2026-05-10 17:26:43 +02:00
mrosati 87ecaec3f3 add command aliases
build / build (push) Successful in 9s
pytest / pytest (push) Failing after 23s
2026-05-10 16:56:50 +02:00
mrosati 150cc57932 running without arguments prints the help 2026-05-10 16:52:31 +02:00
4 changed files with 224 additions and 23 deletions
+6 -7
View File
@@ -2,9 +2,9 @@
## Project Structure & Module Organization
`chromy/` contains the Python package and CLI implementation. The entrypoint is `chromy/main.py`, which loads environment variables and invokes the Typer app defined in `chromy/cli.py`. The active CLI commands are `list-collections`, `create-collection`, `delete-collection`, `count`, `import`, `query`, and `delete`. Command-specific behavior belongs in `chromy/handlers/`. Shared Chroma, embedding, chunking, querying, and output helpers live in package modules such as `chroma_functions.py`, `embed.py`, `chunk_functions.py`, and `utilities.py`.
`chromy/` contains the Python package and CLI implementation. The entrypoint is `chromy/main.py`, which loads environment variables and invokes the Typer app defined in `chromy/cli.py`. The active CLI commands are `list-collections`, `create-collection`, `delete-collection`, `count`, `import`, `query`, and `delete`, with short aliases defined in `chromy/cli.py`. Command-specific behavior lives in `chromy/handlers/`. Shared Chroma access and collection operations live in `chromy/chroma_functions.py`; chunking and embedding are implemented in `chromy/chunking/service.py` and `chromy/embedding/service.py`; query/import helpers are in `chromy/utilities.py`; formatted terminal output helpers are in `chromy/output.py`; shared exceptions are in `chromy/errors.py`.
`tests/` contains the test suite for the CLI, handlers, and embedding helpers. `README.md` documents user-facing behavior, `pyproject.toml` defines packaging and tool configuration, and `romeo_and_juliet.txt` is a checked-in sample input used by tests and manual CLI runs. Treat generated or local-state directories such as `chroma/`, `dist/`, `chromy.egg-info/`, `.pytest_cache/`, `.mypy_cache/`, `.ruff_cache/`, `.venv/`, `__pycache__/`, and `main.onefile-build/` as non-source. The top-level `handlers/` directory currently contains only legacy bytecode artifacts and should not be treated as source.
`tests/` contains the test suite for the CLI, handlers, Chroma helpers, embedding, and utilities. `README.md` documents user-facing behavior, `pyproject.toml` defines packaging and tool configuration, and `romeo_and_juliet.txt` is a checked-in sample input used by tests and manual CLI runs. Treat generated or local-state directories such as `build/`, `chroma/`, `dist/`, `chromy.egg-info/`, `.pytest_cache/`, `.mypy_cache/`, `.ruff_cache/`, `.venv/`, `__pycache__/`, and `main.onefile-build/` as non-source. The top-level `handlers/` directory currently contains only legacy bytecode artifacts and should not be treated as source.
## Build, Test, and Development Commands
@@ -14,19 +14,18 @@
- `uv run pytest -q`: run the test suite.
- `uv run ruff check .`: run lint checks.
- `uv run ruff format --check .`: verify formatting.
- `uv run mypy .`: run static type checks.
- `uv run mypy chromy tests`: run static type checks for source and tests.
- `uv build`: build the source distribution and wheel into `dist/`.
- `uv tool install --editable .`: install the `chromy` command in editable mode for local CLI testing.
## Coding Style & Naming Conventions
Use Python 3.12+ syntax, type hints, and `from __future__ import annotations`. Follow the current style: 4-space indentation, snake_case functions and modules, PascalCase classes, and Typer command functions in `chromy/cli.py` that delegate to small handler functions. Keep handlers focused on CLI orchestration and user-facing output; place reusable database, chunking, embedding, query, and formatting logic in shared modules. Prefer `rich` output for user-facing CLI messages to stay consistent with the existing commands.
Docstrings follow Google coding style conventions and are always added to any function.
Use Python 3.12+ syntax, type hints, and `from __future__ import annotations`. Follow the current style: 4-space indentation, snake_case functions and modules, PascalCase classes, and Typer command functions in `chromy/cli.py` that delegate to small handler functions. Keep handlers focused on CLI orchestration and user-facing output; place reusable database, chunking, embedding, query, and formatting logic in shared modules. Prefer `rich` output for user-facing CLI messages to stay consistent with the existing commands. Add concise Google-style docstrings when introducing new public functions or when a functions behavior is not obvious; match the surrounding file style for small private helpers.
## Testing Guidelines
Tests run with pytest and are currently written in `unittest.TestCase` style. Name test files `test_*.py` and test methods `test_*`. Prefer mocking Chroma-facing and filesystem-facing functions in CLI and handler tests so unit tests stay deterministic. Run `uv run pytest -q` before submitting changes, and use `uv run ruff check .` plus `uv run mypy .` when touching typed code or shared modules. Add tests for new commands, Typer wiring, handlers, and error paths.
Tests run with pytest and are currently written in `unittest.TestCase` style. Name test files `test_*.py` and test methods `test_*`. Prefer mocking Chroma-facing and filesystem-facing functions in CLI and handler tests so unit tests stay deterministic. Run `uv run pytest -q` before submitting changes, and use `uv run ruff check .` plus `uv run mypy chromy tests` when touching typed code or shared modules. Add tests for new commands, Typer wiring, handlers, and error paths.
## Security & Configuration Tips
The CLI loads environment variables via `python-dotenv`; keep secrets in local `.env` files and do not commit them. Treat `chroma/` as local persistent database state created by `chromadb.PersistentClient()`. Avoid committing generated build artifacts, cache directories, onefile build outputs, or large ad hoc input files unless they are intentional fixtures. If you change command names or examples, update both `README.md` and the tests so the documented CLI stays aligned with the implementation.
The CLI loads environment variables via `python-dotenv`; keep secrets in local `.env` files and do not commit them. Treat `chroma/` as local persistent database state created by `chromadb.PersistentClient()`. The optional `CHROMA_FOLDER` environment variable must point to a parent directory; Chromy stores data in `<CHROMA_FOLDER>/chroma` and fails explicitly if that location is invalid or not writable. Avoid committing generated build artifacts, cache directories, onefile build outputs, or large ad hoc input files unless they are intentional fixtures. If you change command names, aliases, or examples, update both `README.md` and the tests so the documented CLI stays aligned with the implementation.
+60 -7
View File
@@ -19,6 +19,27 @@ Chromy is small and simple to use command-line utility for working with a local
- Python 3.12+
- a local environment able to install the project dependencies in `pyproject.toml`
## Libraries
### Runtime libraries
- `chromadb` — persistent vector database used to store collections, embeddings, documents, and metadata.
- `openai` — dependency used by the embedding stack for model and API integrations.
- `pymupdf4llm` — extracts text from PDF documents for ingestion.
- `python-dotenv` — loads environment variables from local `.env` files.
- `rich` — provides styled terminal output and progress bars.
- `semchunk` — splits source documents into chunks before embedding.
- `tiktoken` — tokenization support used during chunking and embedding preparation.
- `transformers` — model and tokenizer support used by the embedding pipeline.
- `typer` — powers the CLI commands and argument parsing.
### Development libraries
- `mypy` — static type checking.
- `nuitka[onefile]` — builds standalone one-file executables.
- `pytest` — test runner for the project.
- `ruff` — linting and formatting.
## Installation
For local development, install the project dependencies with `uv`:
@@ -103,6 +124,8 @@ You can override this with `CHROMA_FOLDER`.
- If `CHROMA_FOLDER` is set, it takes precedence over the default behavior.
- If the configured location is invalid or not writable, the command fails with an explicit error (no fallback to the default location).
Setting the variable once in `.zprofile` or `.profile` ensures a consistent usage of the variable.
Examples:
```bash
@@ -144,21 +167,33 @@ uv run mypy .
## Commands
```text
list-collections
create-collection <collection>
delete-collection <collection>
count <collection>
import <collection> <file> [<file> ...]
query <collection> <query_text>
delete <collection> --where <condition>=<value>
list-collections | lc
create-collection <collection> | cc <collection>
delete-collection <collection> | dc <collection>
count <collection> | c <collection>
import <collection> <file> [<file> ...] | i <collection> <file> [<file> ...]
query <collection> <query_text> | q <collection> <query_text>
delete <collection> --where <condition>=<value> | del <collection> --where <condition>=<value>
```
### Aliases
- `lc``list-collections`
- `cc``create-collection`
- `dc``delete-collection`
- `c``count`
- `i``import`
- `q``query`
- `del``delete`
### Examples
Create a collection:
```bash
chromy create-collection notes
# alias
chromy cc notes
```
Add one or more files:
@@ -167,36 +202,54 @@ Add one or more files:
chromy import notes ./docs/example.txt
chromy import notes ./docs/intro.md ./docs/setup.md
chromy import notes *.md
# alias
chromy i notes ./docs/example.txt
```
Import a large batch of files with `find`:
```bash
find ./docs -type f \( -name '*.md' -o -name '*.txt' \) -exec chromy import notes {} +
```
Count stored records:
```bash
chromy count notes
# alias
chromy c notes
```
Search the collection:
```bash
chromy query notes "How do I configure this project?"
# alias
chromy q notes "How do I configure this project?"
```
List collections:
```bash
chromy list-collections
# alias
chromy lc
```
Delete a collection:
```bash
chromy delete-collection notes
# alias
chromy dc notes
```
Delete records by metadata:
```bash
chromy delete notes --where file_name=example.txt
# alias
chromy del notes --where file_name=example.txt
```
## How ingestion works
+37 -9
View File
@@ -24,8 +24,17 @@ app = typer.Typer(
"Storage location:\n"
"- By default, Chromy uses Chroma's default persistent location behavior.\n"
f"- Set {CHROMA_FOLDER_ENV_VAR} to a parent directory to override it.\n"
f"- Chromy stores data in <{CHROMA_FOLDER_ENV_VAR}>/chroma."
)
f"- Chromy stores data in <{CHROMA_FOLDER_ENV_VAR}>/chroma.\n\n"
"Command aliases:\n"
"- list-collections: lc\n"
"- create-collection: cc\n"
"- delete-collection: dc\n"
"- count: c\n"
"- import: i\n"
"- query: q\n"
"- delete: del"
),
invoke_without_command=True,
)
ExitCodeHandler = Callable[[], int]
@@ -46,12 +55,22 @@ def _fail(message: str) -> None:
raise typer.Exit(1)
@app.callback()
def main(ctx: typer.Context) -> None:
"""Run the CLI and show help when no command is provided."""
if ctx.invoked_subcommand is None:
print("[bold red]Error[/]: Missing command.")
typer.echo(ctx.get_help())
raise typer.Exit(1)
# ------------------------------------------------------------------------------
# LIST COLLECTIONS
# ------------------------------------------------------------------------------
@app.command("lc", help="Alias for list-collections.")
@app.command(
"list-collections",
help="List all collections stored in the local Chroma database.",
help="List all collections stored in the local Chroma database. Alias: lc.",
)
def list_collections() -> None:
_run(handle_list_collections)
@@ -60,9 +79,10 @@ def list_collections() -> None:
# ------------------------------------------------------------------------------
# CREATE A COLLECTION
# ------------------------------------------------------------------------------
@app.command("cc", help="Alias for create-collection.")
@app.command(
"create-collection",
help="Create a collection in the local Chroma database.",
help="Create a collection in the local Chroma database. Alias: cc.",
)
def create_collection(
collection: Annotated[
@@ -79,9 +99,10 @@ def create_collection(
# ------------------------------------------------------------------------------
# DELETE A COLLECTION
# ------------------------------------------------------------------------------
@app.command("dc", help="Alias for delete-collection.")
@app.command(
"delete-collection",
help="Delete a collection from the local Chroma database.",
help="Delete a collection from the local Chroma database. Alias: dc.",
)
def delete_collection(
collection: Annotated[
@@ -98,9 +119,10 @@ def delete_collection(
# ------------------------------------------------------------------------------
# COUNT RECORDS
# ------------------------------------------------------------------------------
@app.command("c", help="Alias for count.")
@app.command(
"count",
help="Count records in a collection from the local Chroma database.",
help="Count records in a collection from the local Chroma database. Alias: c.",
)
def count(
collection: Annotated[
@@ -117,11 +139,12 @@ def count(
# ------------------------------------------------------------------------------
# IMPORT DATA
# ------------------------------------------------------------------------------
@app.command("i", help="Alias for import.")
@app.command(
"import",
help=(
"Chunk, embed, and add one or more files to a collection in the "
"local Chroma database."
"local Chroma database. Alias: i."
),
)
def import_data(
@@ -145,7 +168,8 @@ def import_data(
# ------------------------------------------------------------------------------
# QUERY
# ------------------------------------------------------------------------------
@app.command("query", help="Query a collection with the provided text.")
@app.command("q", help="Alias for query.")
@app.command("query", help="Query a collection with the provided text. Alias: q.")
def query(
collection: Annotated[
str,
@@ -165,7 +189,11 @@ def query(
# ------------------------------------------------------------------------------
# DELETE DATA
# ------------------------------------------------------------------------------
@app.command("delete", help="Delete records from a collection using a metadata filter.")
@app.command("del", help="Alias for delete.")
@app.command(
"delete",
help="Delete records from a collection using a metadata filter. Alias: del.",
)
def delete_records(
collection: Annotated[
str,
+121
View File
@@ -95,6 +95,106 @@ class CliTests(unittest.TestCase):
"The 'notes' collection contains 7 records.\n",
)
def test_command_aliases(self) -> None:
with self.subTest(alias="lc"):
with patch(
"chromy.handlers.list_collections.list_collections",
return_value=[],
) as mocked:
result = _invoke(["lc"])
mocked.assert_called_once_with()
self.assertEqual(result.exit_code, 0)
self.assertEqual(result.stdout, "No collections found.\n")
with self.subTest(alias="cc"):
with patch(
"chromy.handlers.create_collection.create_collection",
return_value="notes",
) as mocked:
result = _invoke(["cc", "notes"])
mocked.assert_called_once_with("notes")
self.assertEqual(result.exit_code, 0)
self.assertEqual(result.stdout, "Created: collection 'notes'.\n")
with self.subTest(alias="dc"):
with patch(
"chromy.handlers.delete_collection.delete_collection",
) as mocked:
result = _invoke(["dc", "notes"])
mocked.assert_called_once_with("notes")
self.assertEqual(result.exit_code, 0)
self.assertEqual(result.stdout, "Deleted collection 'notes'.\n")
with self.subTest(alias="c"):
with patch(
"chromy.handlers.count_collection.count_collection",
return_value=7,
) as mocked:
result = _invoke(["c", "notes"])
mocked.assert_called_once_with("notes")
self.assertEqual(result.exit_code, 0)
self.assertEqual(
result.stdout,
"The 'notes' collection contains 7 records.\n",
)
with self.subTest(alias="i"):
with patch(
"chromy.handlers.import_data.ingest_file",
return_value=3,
) as mocked:
result = _invoke(["i", "notes", "romeo_and_juliet.txt"])
mocked.assert_called_once_with(
"notes",
self._fixture_path("romeo_and_juliet.txt"),
)
self.assertEqual(result.exit_code, 0)
self.assertEqual(
result.stdout,
"Added 3 records from 'romeo_and_juliet.txt' to collection 'notes'.\n"
"Imported 1 file(s) successfully; 0 failed.\n",
)
with self.subTest(alias="q"):
query_result = {"ids": [["1"]], "documents": [["hello"]]}
with (
patch(
"chromy.handlers.query.run_query",
return_value=query_result,
) as mocked,
patch(
"chromy.handlers.query.format_query_result",
return_value=["1"],
),
):
result = _invoke(["q", "notes", "Where is Romeo?"])
mocked.assert_called_once_with("notes", "Where is Romeo?")
self.assertEqual(result.exit_code, 0)
self.assertEqual(result.stdout, "1\n")
with self.subTest(alias="del"):
with patch(
"chromy.handlers.delete_collection.delete_data",
return_value=2,
) as mocked:
result = _invoke(
["del", "notes", "--where", "file_name=play.txt"],
)
mocked.assert_called_once_with("notes", {"file_name": "play.txt"})
self.assertEqual(result.exit_code, 0)
self.assertEqual(
result.stdout,
"Deleted 2 record(s) from collection 'notes' where "
"file_name=play.txt.\n",
)
def test_import_data(self) -> None:
with patch(
"chromy.handlers.import_data.ingest_file",
@@ -252,6 +352,14 @@ class CliTests(unittest.TestCase):
self.assertNotEqual(result.exit_code, 0)
self.assertIn("Missing option", result.output)
def test_cli_without_arguments_prints_error_and_help(self) -> None:
result = _invoke([])
self.assertEqual(result.exit_code, 1)
self.assertIn("Error: Missing command.", result.stdout)
self.assertIn("Usage:", result.stdout)
self.assertIn("Commands", result.stdout)
def test_cli_help_documents_chroma_folder_env_var(self) -> None:
result = _invoke(["--help"])
@@ -260,6 +368,19 @@ class CliTests(unittest.TestCase):
self.assertIn("parent directory", result.stdout)
self.assertIn("<CHROMA_FOLDER>/chroma", result.stdout)
def test_cli_help_documents_command_aliases(self) -> None:
result = _invoke(["--help"])
self.assertEqual(result.exit_code, 0)
self.assertIn("Command aliases", result.stdout)
self.assertIn("list-collections: lc", result.stdout)
self.assertIn("create-collection: cc", result.stdout)
self.assertIn("delete-collection: dc", result.stdout)
self.assertIn("count: c", result.stdout)
self.assertIn("import: i", result.stdout)
self.assertIn("query: q", result.stdout)
self.assertIn("delete: del", result.stdout)
def test_cli_surfaces_chroma_path_errors(self) -> None:
with patch(
"chromy.handlers.list_collections.list_collections",