Compare commits
2 Commits
87ecaec3f3
...
b45a92e2e2
| Author | SHA1 | Date | |
|---|---|---|---|
| b45a92e2e2 | |||
| 5e59e3dcdc |
@@ -2,9 +2,9 @@
|
|||||||
|
|
||||||
## Project Structure & Module Organization
|
## Project Structure & Module Organization
|
||||||
|
|
||||||
`chromy/` contains the Python package and CLI implementation. The entrypoint is `chromy/main.py`, which loads environment variables and invokes the Typer app defined in `chromy/cli.py`. The active CLI commands are `list-collections`, `create-collection`, `delete-collection`, `count`, `import`, `query`, and `delete`. Command-specific behavior belongs in `chromy/handlers/`. Shared Chroma, embedding, chunking, querying, and output helpers live in package modules such as `chroma_functions.py`, `embed.py`, `chunk_functions.py`, and `utilities.py`.
|
`chromy/` contains the Python package and CLI implementation. The entrypoint is `chromy/main.py`, which loads environment variables and invokes the Typer app defined in `chromy/cli.py`. The active CLI commands are `list-collections`, `create-collection`, `delete-collection`, `count`, `import`, `query`, and `delete`, with short aliases defined in `chromy/cli.py`. Command-specific behavior lives in `chromy/handlers/`. Shared Chroma access and collection operations live in `chromy/chroma_functions.py`; chunking and embedding are implemented in `chromy/chunking/service.py` and `chromy/embedding/service.py`; query/import helpers are in `chromy/utilities.py`; formatted terminal output helpers are in `chromy/output.py`; shared exceptions are in `chromy/errors.py`.
|
||||||
|
|
||||||
`tests/` contains the test suite for the CLI, handlers, and embedding helpers. `README.md` documents user-facing behavior, `pyproject.toml` defines packaging and tool configuration, and `romeo_and_juliet.txt` is a checked-in sample input used by tests and manual CLI runs. Treat generated or local-state directories such as `chroma/`, `dist/`, `chromy.egg-info/`, `.pytest_cache/`, `.mypy_cache/`, `.ruff_cache/`, `.venv/`, `__pycache__/`, and `main.onefile-build/` as non-source. The top-level `handlers/` directory currently contains only legacy bytecode artifacts and should not be treated as source.
|
`tests/` contains the test suite for the CLI, handlers, Chroma helpers, embedding, and utilities. `README.md` documents user-facing behavior, `pyproject.toml` defines packaging and tool configuration, and `romeo_and_juliet.txt` is a checked-in sample input used by tests and manual CLI runs. Treat generated or local-state directories such as `build/`, `chroma/`, `dist/`, `chromy.egg-info/`, `.pytest_cache/`, `.mypy_cache/`, `.ruff_cache/`, `.venv/`, `__pycache__/`, and `main.onefile-build/` as non-source. The top-level `handlers/` directory currently contains only legacy bytecode artifacts and should not be treated as source.
|
||||||
|
|
||||||
## Build, Test, and Development Commands
|
## Build, Test, and Development Commands
|
||||||
|
|
||||||
@@ -14,19 +14,18 @@
|
|||||||
- `uv run pytest -q`: run the test suite.
|
- `uv run pytest -q`: run the test suite.
|
||||||
- `uv run ruff check .`: run lint checks.
|
- `uv run ruff check .`: run lint checks.
|
||||||
- `uv run ruff format --check .`: verify formatting.
|
- `uv run ruff format --check .`: verify formatting.
|
||||||
- `uv run mypy .`: run static type checks.
|
- `uv run mypy chromy tests`: run static type checks for source and tests.
|
||||||
- `uv build`: build the source distribution and wheel into `dist/`.
|
- `uv build`: build the source distribution and wheel into `dist/`.
|
||||||
- `uv tool install --editable .`: install the `chromy` command in editable mode for local CLI testing.
|
- `uv tool install --editable .`: install the `chromy` command in editable mode for local CLI testing.
|
||||||
|
|
||||||
## Coding Style & Naming Conventions
|
## Coding Style & Naming Conventions
|
||||||
|
|
||||||
Use Python 3.12+ syntax, type hints, and `from __future__ import annotations`. Follow the current style: 4-space indentation, snake_case functions and modules, PascalCase classes, and Typer command functions in `chromy/cli.py` that delegate to small handler functions. Keep handlers focused on CLI orchestration and user-facing output; place reusable database, chunking, embedding, query, and formatting logic in shared modules. Prefer `rich` output for user-facing CLI messages to stay consistent with the existing commands.
|
Use Python 3.12+ syntax, type hints, and `from __future__ import annotations`. Follow the current style: 4-space indentation, snake_case functions and modules, PascalCase classes, and Typer command functions in `chromy/cli.py` that delegate to small handler functions. Keep handlers focused on CLI orchestration and user-facing output; place reusable database, chunking, embedding, query, and formatting logic in shared modules. Prefer `rich` output for user-facing CLI messages to stay consistent with the existing commands. Add concise Google-style docstrings when introducing new public functions or when a function’s behavior is not obvious; match the surrounding file style for small private helpers.
|
||||||
Docstrings follow Google coding style conventions and are always added to any function.
|
|
||||||
|
|
||||||
## Testing Guidelines
|
## Testing Guidelines
|
||||||
|
|
||||||
Tests run with pytest and are currently written in `unittest.TestCase` style. Name test files `test_*.py` and test methods `test_*`. Prefer mocking Chroma-facing and filesystem-facing functions in CLI and handler tests so unit tests stay deterministic. Run `uv run pytest -q` before submitting changes, and use `uv run ruff check .` plus `uv run mypy .` when touching typed code or shared modules. Add tests for new commands, Typer wiring, handlers, and error paths.
|
Tests run with pytest and are currently written in `unittest.TestCase` style. Name test files `test_*.py` and test methods `test_*`. Prefer mocking Chroma-facing and filesystem-facing functions in CLI and handler tests so unit tests stay deterministic. Run `uv run pytest -q` before submitting changes, and use `uv run ruff check .` plus `uv run mypy chromy tests` when touching typed code or shared modules. Add tests for new commands, Typer wiring, handlers, and error paths.
|
||||||
|
|
||||||
## Security & Configuration Tips
|
## Security & Configuration Tips
|
||||||
|
|
||||||
The CLI loads environment variables via `python-dotenv`; keep secrets in local `.env` files and do not commit them. Treat `chroma/` as local persistent database state created by `chromadb.PersistentClient()`. Avoid committing generated build artifacts, cache directories, onefile build outputs, or large ad hoc input files unless they are intentional fixtures. If you change command names or examples, update both `README.md` and the tests so the documented CLI stays aligned with the implementation.
|
The CLI loads environment variables via `python-dotenv`; keep secrets in local `.env` files and do not commit them. Treat `chroma/` as local persistent database state created by `chromadb.PersistentClient()`. The optional `CHROMA_FOLDER` environment variable must point to a parent directory; Chromy stores data in `<CHROMA_FOLDER>/chroma` and fails explicitly if that location is invalid or not writable. Avoid committing generated build artifacts, cache directories, onefile build outputs, or large ad hoc input files unless they are intentional fixtures. If you change command names, aliases, or examples, update both `README.md` and the tests so the documented CLI stays aligned with the implementation.
|
||||||
|
|||||||
@@ -19,6 +19,27 @@ Chromy is small and simple to use command-line utility for working with a local
|
|||||||
- Python 3.12+
|
- Python 3.12+
|
||||||
- a local environment able to install the project dependencies in `pyproject.toml`
|
- a local environment able to install the project dependencies in `pyproject.toml`
|
||||||
|
|
||||||
|
## Libraries
|
||||||
|
|
||||||
|
### Runtime libraries
|
||||||
|
|
||||||
|
- `chromadb` — persistent vector database used to store collections, embeddings, documents, and metadata.
|
||||||
|
- `openai` — dependency used by the embedding stack for model and API integrations.
|
||||||
|
- `pymupdf4llm` — extracts text from PDF documents for ingestion.
|
||||||
|
- `python-dotenv` — loads environment variables from local `.env` files.
|
||||||
|
- `rich` — provides styled terminal output and progress bars.
|
||||||
|
- `semchunk` — splits source documents into chunks before embedding.
|
||||||
|
- `tiktoken` — tokenization support used during chunking and embedding preparation.
|
||||||
|
- `transformers` — model and tokenizer support used by the embedding pipeline.
|
||||||
|
- `typer` — powers the CLI commands and argument parsing.
|
||||||
|
|
||||||
|
### Development libraries
|
||||||
|
|
||||||
|
- `mypy` — static type checking.
|
||||||
|
- `nuitka[onefile]` — builds standalone one-file executables.
|
||||||
|
- `pytest` — test runner for the project.
|
||||||
|
- `ruff` — linting and formatting.
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
For local development, install the project dependencies with `uv`:
|
For local development, install the project dependencies with `uv`:
|
||||||
@@ -103,6 +124,8 @@ You can override this with `CHROMA_FOLDER`.
|
|||||||
- If `CHROMA_FOLDER` is set, it takes precedence over the default behavior.
|
- If `CHROMA_FOLDER` is set, it takes precedence over the default behavior.
|
||||||
- If the configured location is invalid or not writable, the command fails with an explicit error (no fallback to the default location).
|
- If the configured location is invalid or not writable, the command fails with an explicit error (no fallback to the default location).
|
||||||
|
|
||||||
|
Setting the variable once in `.zprofile` or `.profile` ensures a consistent usage of the variable.
|
||||||
|
|
||||||
Examples:
|
Examples:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -144,21 +167,33 @@ uv run mypy .
|
|||||||
## Commands
|
## Commands
|
||||||
|
|
||||||
```text
|
```text
|
||||||
list-collections
|
list-collections | lc
|
||||||
create-collection <collection>
|
create-collection <collection> | cc <collection>
|
||||||
delete-collection <collection>
|
delete-collection <collection> | dc <collection>
|
||||||
count <collection>
|
count <collection> | c <collection>
|
||||||
import <collection> <file> [<file> ...]
|
import <collection> <file> [<file> ...] | i <collection> <file> [<file> ...]
|
||||||
query <collection> <query_text>
|
query <collection> <query_text> | q <collection> <query_text>
|
||||||
delete <collection> --where <condition>=<value>
|
delete <collection> --where <condition>=<value> | del <collection> --where <condition>=<value>
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Aliases
|
||||||
|
|
||||||
|
- `lc` → `list-collections`
|
||||||
|
- `cc` → `create-collection`
|
||||||
|
- `dc` → `delete-collection`
|
||||||
|
- `c` → `count`
|
||||||
|
- `i` → `import`
|
||||||
|
- `q` → `query`
|
||||||
|
- `del` → `delete`
|
||||||
|
|
||||||
### Examples
|
### Examples
|
||||||
|
|
||||||
Create a collection:
|
Create a collection:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
chromy create-collection notes
|
chromy create-collection notes
|
||||||
|
# alias
|
||||||
|
chromy cc notes
|
||||||
```
|
```
|
||||||
|
|
||||||
Add one or more files:
|
Add one or more files:
|
||||||
@@ -167,36 +202,54 @@ Add one or more files:
|
|||||||
chromy import notes ./docs/example.txt
|
chromy import notes ./docs/example.txt
|
||||||
chromy import notes ./docs/intro.md ./docs/setup.md
|
chromy import notes ./docs/intro.md ./docs/setup.md
|
||||||
chromy import notes *.md
|
chromy import notes *.md
|
||||||
|
# alias
|
||||||
|
chromy i notes ./docs/example.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
Import a large batch of files with `find`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
find ./docs -type f \( -name '*.md' -o -name '*.txt' \) -exec chromy import notes {} +
|
||||||
```
|
```
|
||||||
|
|
||||||
Count stored records:
|
Count stored records:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
chromy count notes
|
chromy count notes
|
||||||
|
# alias
|
||||||
|
chromy c notes
|
||||||
```
|
```
|
||||||
|
|
||||||
Search the collection:
|
Search the collection:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
chromy query notes "How do I configure this project?"
|
chromy query notes "How do I configure this project?"
|
||||||
|
# alias
|
||||||
|
chromy q notes "How do I configure this project?"
|
||||||
```
|
```
|
||||||
|
|
||||||
List collections:
|
List collections:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
chromy list-collections
|
chromy list-collections
|
||||||
|
# alias
|
||||||
|
chromy lc
|
||||||
```
|
```
|
||||||
|
|
||||||
Delete a collection:
|
Delete a collection:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
chromy delete-collection notes
|
chromy delete-collection notes
|
||||||
|
# alias
|
||||||
|
chromy dc notes
|
||||||
```
|
```
|
||||||
|
|
||||||
Delete records by metadata:
|
Delete records by metadata:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
chromy delete notes --where file_name=example.txt
|
chromy delete notes --where file_name=example.txt
|
||||||
|
# alias
|
||||||
|
chromy del notes --where file_name=example.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
## How ingestion works
|
## How ingestion works
|
||||||
|
|||||||
Reference in New Issue
Block a user