add progress bar when importing multiple files

refactor chunking and embedding into their own modules
add multi-file import support
2026-05-01 15:45:41 +02:00 · 2026-05-01 11:01:30 +02:00 · 2026-04-29 15:39:42 +02:00 · 2026-04-29 14:46:41 +02:00 · 2026-04-29 12:44:28 +02:00 · 2026-04-24 22:49:36 +02:00
24 changed files with 659 additions and 126 deletions
@@ -2,32 +2,30 @@
 ## Project Structure & Module Organization
-`chromy/` contains the Python package and CLI implementation. The entrypoint is `chromy/main.py`, which loads environment variables and invokes the Typer app defined in `chromy/cli.py`. Command-specific behavior belongs in `chromy/handlers/`. Shared Chroma, embedding, chunking, and formatting helpers live in package modules such as `chroma_functions.py`, `embed.py`, `chunk_functions.py`, and `utilities.py`.
+`chromy/` contains the Python package and CLI implementation. The entrypoint is `chromy/main.py`, which loads environment variables and invokes the Typer app defined in `chromy/cli.py`. The active CLI commands are `list-collections`, `create-collection`, `delete-collection`, `count`, `import`, `query`, and `delete`. Command-specific behavior belongs in `chromy/handlers/`. Shared Chroma, embedding, chunking, querying, and output helpers live in package modules such as `chroma_functions.py`, `embed.py`, `chunk_functions.py`, and `utilities.py`.
-`tests/` contains the test suite for the CLI, handlers, and embedding helpers. Generated data and build outputs such as `chroma/`, `dist/`, `chromy.egg-info/`, `.pytest_cache/`, `.mypy_cache/`, `.ruff_cache/`, and `.venv/` are not source.
+`tests/` contains the test suite for the CLI, handlers, and embedding helpers. `README.md` documents user-facing behavior, `pyproject.toml` defines packaging and tool configuration, and `romeo_and_juliet.txt` is a checked-in sample input used by tests and manual CLI runs. Treat generated or local-state directories such as `chroma/`, `dist/`, `chromy.egg-info/`, `.pytest_cache/`, `.mypy_cache/`, `.ruff_cache/`, `.venv/`, `__pycache__/`, and `main.onefile-build/` as non-source. The top-level `handlers/` directory currently contains only legacy bytecode artifacts and should not be treated as source.
 ## Build, Test, and Development Commands
 - `uv sync`: install runtime and development dependencies from `pyproject.toml` and `uv.lock`.
 - `uv run python -m chromy.main --help`: run the CLI from the source tree.
 - `uv run chromy --help`: run the packaged console script inside the project environment.
 - `uv run pytest -q`: run the test suite.
 - `uv run ruff check .`: run lint checks.
 - `uv run ruff format --check .`: verify formatting.
 - `uv run mypy .`: run static type checks.
 - `uv build`: build the source distribution and wheel into `dist/`.
 - `uv tool install --editable .`: install the `chromy` command in editable mode for local CLI testing.
 ## Coding Style & Naming Conventions
-Use Python 3.12+ syntax, type hints, and `from __future__ import annotations`. Follow the current style: 4-space indentation, snake_case functions and modules, PascalCase classes, and Typer command functions in `chromy/cli.py` that delegate to small handler functions. Keep handlers focused on CLI orchestration and user-facing output; place reusable database, chunking, embedding, and formatting logic in shared modules.
+Use Python 3.12+ syntax, type hints, and `from __future__ import annotations`. Follow the current style: 4-space indentation, snake_case functions and modules, PascalCase classes, and Typer command functions in `chromy/cli.py` that delegate to small handler functions. Keep handlers focused on CLI orchestration and user-facing output; place reusable database, chunking, embedding, query, and formatting logic in shared modules. Prefer `rich` output for user-facing CLI messages to stay consistent with the existing commands.
 ## Testing Guidelines
-Tests run with pytest and are currently written in `unittest.TestCase` style. Name test files `test_*.py` and test methods `test_*`. Prefer mocking Chroma-facing and filesystem-facing functions in CLI and handler tests so unit tests stay deterministic. Run `uv run pytest -q` before submitting changes, and add tests for new commands, Typer wiring, handlers, and error paths.
+Tests run with pytest and are currently written in `unittest.TestCase` style. Name test files `test_*.py` and test methods `test_*`. Prefer mocking Chroma-facing and filesystem-facing functions in CLI and handler tests so unit tests stay deterministic. Run `uv run pytest -q` before submitting changes, and use `uv run ruff check .` plus `uv run mypy .` when touching typed code or shared modules. Add tests for new commands, Typer wiring, handlers, and error paths.
 ## Commit & Pull Request Guidelines
 Git history uses short, imperative, lowercase commit subjects, for example `move top-level modules into a real package` and `add typer-based delete command`. Keep commits scoped to one logical change.
 Pull requests should include a concise description, test results, and notes for any CLI behavior changes. Link related issues or plan files when applicable. Include terminal output examples or screenshots only when user-facing command output changes.
 ## Security & Configuration Tips
-The CLI loads environment variables via `python-dotenv`; keep secrets in local `.env` files and do not commit them. Treat `chroma/` as local persistent database state. Avoid committing generated build artifacts, cache directories, or large ad hoc input files unless they are intentional fixtures.
+The CLI loads environment variables via `python-dotenv`; keep secrets in local `.env` files and do not commit them. Treat `chroma/` as local persistent database state created by `chromadb.PersistentClient()`. Avoid committing generated build artifacts, cache directories, onefile build outputs, or large ad hoc input files unless they are intentional fixtures. If you change command names or examples, update both `README.md` and the tests so the documented CLI stays aligned with the implementation.
@@ -1,6 +1,10 @@
 # Chromy
-A small command-line utility for working with a local Chroma database. It lets you create collections, ingest file contents as chunked embeddings, and run similarity queries against stored documents.
+<div align="center">
    <img src="logo.png" width=300 />
 </div>
 Chromy is small and simple to use command-line utility for working with a local Chroma database. It lets you create collections, ingest files as chunked embeddings, and run similarity queries against stored documents. It integrates perfectly with agentic coding tools via simple skills (see an [example](./skills/chromy/SKILL.md) in the `skills` directory).
 ## What it does
@@ -120,7 +124,7 @@ list-collections
 create-collection <collection>
 delete-collection <collection>
 count <collection>
-add-data <collection> <file>
+import <collection> <file> [<file> ...]
 query <collection> <query_text>
 delete <collection> --where <condition>=<value>
 ```
@@ -133,10 +137,12 @@ Create a collection:
 chromy create-collection notes
 ```
-Add a file:
+Add one or more files:
 ```bash
-chromy add-data notes ./docs/example.txt
+chromy import notes ./docs/example.txt
 chromy import notes ./docs/intro.md ./docs/setup.md
 chromy import notes *.md
 ```
 Count stored records:
@@ -171,7 +177,7 @@ chromy delete notes --where file_name=example.txt
 ## How ingestion works
-When you run `add-data`, the file is:
+When you run `import`, each file is:
 1. read from disk
 2. split into chunks
@@ -182,6 +188,11 @@ Query results include the stored document chunk, its id, distance, and file name
 ## Notes
- collections are stored in a local persistent Chroma database
+- collections are stored in a local persistent Chroma database in the current directory
- `add-data` requires the target collection to already exist
+- `import` requires the target collection to already exist
- the CLI prints friendly messages for common errors such as missing collections or missing files
+- `import` accepts one or more file paths
 - unquoted glob patterns such as `*.md` are expanded by the shell before `chromy` starts
 - quoted glob patterns such as `"*.md"` are treated as literal paths and are not expanded by `chromy`
 - unmatched unquoted globs may behave differently by shell: `zsh` commonly fails before `chromy` starts, while `bash` may pass the literal pattern through depending on shell settings
 - the CLI reports file-specific import failures and continues with the remaining files
 - when importing multiple files in an interactive terminal, the CLI shows a Rich progress bar
@@ -9,7 +9,7 @@ from chromadb.api import ClientAPI
 from chromadb.api.types import QueryResult, Where
 from chromadb.errors import NotFoundError
-from chromy.embed import EmbeddingRecord
+from chromy.embedding import EmbeddingRecord
 def _get_client_and_collection(
@@ -54,11 +54,17 @@ def delete_data(collection_name: str, where: dict[str, str]) -> int:
    return int(result.get("deleted", 0))
-def count_collection(collection_name: str) -> str:
+def has_data_for_file(collection_name: str, file_name: str) -> bool:
    _, collection = _get_client_and_collection(collection_name)
-    count = collection.count()
+    result = collection.get(where=cast(Where, {"file_name": file_name}))
    ids = result.get("ids", [])
-    return f"The '{collection_name}' collection contains [bold green]{count}[/] records."
+    return len(ids) > 0
 def count_collection(collection_name: str) -> int:
    _, collection = _get_client_and_collection(collection_name)
    return collection.count()
 def add_data(
@@ -71,8 +77,7 @@ def add_data(
    _, collection = _get_client_and_collection(collection_name)
-    embeddings: list[Sequence[float]] = [record["embedding"]
+    embeddings: list[Sequence[float]] = [record["embedding"] for record in data]
                                         for record in data]
    collection.add(
        ids=[str(uuid4()) for _ in data],
@@ -0,0 +1,5 @@
 from __future__ import annotations
 from chromy.chunking.service import chunk_file, chunk_text
 __all__ = ["chunk_file", "chunk_text"]
@@ -3,7 +3,7 @@ from __future__ import annotations
 from pathlib import Path
 from typing import cast
-import semchunk
+from semchunk import semchunk
 def chunk_text(text: str, chunk_size: int = 800) -> list[str]:
@@ -3,16 +3,16 @@ from __future__ import annotations
 from typing import Annotated, Callable
 import typer
 from rich import print
 from chromadb.errors import InternalError, NotFoundError
 from rich import print
 from chromy.handlers.import_data import handle_import
 from chromy.handlers.count_collection import handle_count_collection
 from chromy.handlers.create_collection import handle_create_collection
 from chromy.handlers.delete_collection import (
    handle_delete_collection,
    handle_delete_records,
 )
 from chromy.handlers.import_data import handle_import
 from chromy.handlers.list_collections import handle_list_collections
 from chromy.handlers.query import handle_query
@@ -105,25 +105,27 @@ def count(
 # ------------------------------------------------------------------------------
@app.command(
    "import",
-    help="Chunk, embed, and add a file to a collection in the local Chroma database.",
+    help=(
        "Chunk, embed, and add one or more files to a collection in the "
        "local Chroma database."
    ),
 )
 def import_data(
    collection: Annotated[
        str,
        typer.Argument(help="Name of the target collection."),
    ],
-    file: Annotated[
+    files: Annotated[
-        str,
+        list[str],
        typer.Argument(
-            help="Path to the file to chunk and add to the collection."),
+            help="Path(s) to the file(s) to chunk and add to the collection."
        ),
    ],
 ) -> None:
    try:
-        _run(lambda: handle_import(collection, file))
+        _run(lambda: handle_import(collection, files))
    except NotFoundError:
        _fail(f"Collection '{collection}' does not exist.")
    except FileNotFoundError:
        _fail(f"The file {file} was not found.")
 # ------------------------------------------------------------------------------
@@ -0,0 +1,5 @@
 from __future__ import annotations
 from chromy.embedding.service import EmbeddingRecord, embed
 __all__ = ["EmbeddingRecord", "embed"]
@@ -0,0 +1,5 @@
 from __future__ import annotations
 class UnsupportedTextFileError(Exception):
    """Raised when a file does not appear to contain supported text content."""
@@ -1,9 +1,11 @@
 from __future__ import annotations
 from rich import print
 from chromy.chroma_functions import count_collection
 from chromy.output import format_count_message
 def handle_count_collection(collection: str) -> int:
-    print(count_collection(collection))
+    print(format_count_message(collection, count_collection(collection)))
    return 0
@@ -1,6 +1,7 @@
 from __future__ import annotations
 from rich import print
 from chromy.chroma_functions import create_collection
@@ -1,6 +1,7 @@
 from __future__ import annotations
 from rich import print
 from chromy.chroma_functions import delete_collection, delete_data
@@ -8,15 +9,13 @@ def _parse_where_clause(where_clause: str) -> dict[str, str]:
    condition, separator, value = where_clause.partition("=")
    if separator == "":
-        raise ValueError(
+        raise ValueError("Invalid --where value. Expected <condition>=<value>.")
            "Invalid --where value. Expected <condition>=<value>.")
    condition = condition.strip()
    value = value.strip()
    if not condition or not value:
-        raise ValueError(
+        raise ValueError("Invalid --where value. Expected <condition>=<value>.")
            "Invalid --where value. Expected <condition>=<value>.")
    return {condition: value}
@@ -1,10 +1,27 @@
 from __future__ import annotations
 import os
 import sys
 from pathlib import Path
 from typing import Final
 from rich import print
 from rich.progress import (
    BarColumn,
    MofNCompleteColumn,
    Progress,
    SpinnerColumn,
    TextColumn,
 )
 from chromy.errors import UnsupportedTextFileError
 from chromy.utilities import ingest_file
 from ..utilities import is_probably_text_file
 SUCCESS_EXIT_CODE: Final = 0
 FAILURE_EXIT_CODE: Final = 1
 def _get_absolute_path(file: str) -> str:
    """
@@ -21,11 +38,94 @@ def _get_absolute_path(file: str) -> str:
        raise FileNotFoundError()
    file_path = Path(file)
-    return str(file_path.resolve(file_path))
+    return str(file_path.resolve())
-def handle_import(collection: str, file: str) -> int:
+def _import_one(collection: str, file: str) -> int:
-    records_added = ingest_file(collection, _get_absolute_path(file))
+    absolute_path = _get_absolute_path(file)
    if not Path(absolute_path).is_file():
        raise FileNotFoundError()
    if not is_probably_text_file(absolute_path):
        raise UnsupportedTextFileError()
    return ingest_file(collection, absolute_path)
 def _should_show_progress(file_count: int) -> bool:
    return file_count > 1 and sys.stdout.isatty()
 def _truncate_file_name(file_name: str, max_length: int = 20) -> str:
    if len(file_name) <= max_length:
        return file_name
    return f"{file_name[: max_length - 3]}"
 def handle_import(collection: str, files: list[str]) -> int:
    successful_imports = 0
    failed_imports = 0
    seen_paths: set[str] = set()
    unique_files: list[str] = []
    for file in files:
        try:
            absolute_path = _get_absolute_path(file)
        except FileNotFoundError:
            unique_files.append(file)
            continue
        if absolute_path in seen_paths:
            continue
        seen_paths.add(absolute_path)
        unique_files.append(file)
    show_progress = _should_show_progress(len(unique_files))
    with Progress(
        SpinnerColumn(),
        TextColumn("[progress.description]{task.description}"),
        BarColumn(),
        MofNCompleteColumn(),
        transient=True,
        disable=not show_progress,
    ) as progress:
        task_id = progress.add_task("Importing files...", total=len(unique_files))
        for file in unique_files:
            file_name = _truncate_file_name(Path(file).name)
            description = f"Importing [bold]{file_name}[/]..."
            progress.update(task_id, description=description)
            try:
                records_added = _import_one(collection, file)
                successful_imports += 1
                if not show_progress:
                    progress.console.print(
                        "[bold green]Added[/] "
                        f"{records_added} records from '{file}' to "
                        f"collection '{collection}'."
                    )
            except FileNotFoundError:
                failed_imports += 1
                progress.console.print(
                    f"[bold red]Error[/]: The file '{file}' was not found."
                )
            except UnsupportedTextFileError:
                failed_imports += 1
                progress.console.print(
                    f"[bold red]Error[/]: The file '{file}' is not a text file."
                )
            finally:
                progress.advance(task_id)
    print(
-        f"[bold green]Added[/] {records_added} records to collection '{collection}'.")
+        f"Imported {successful_imports} file(s) successfully; {failed_imports} failed."
-    return 0
+    )
    if failed_imports:
        return FAILURE_EXIT_CODE
    return SUCCESS_EXIT_CODE
@@ -1,14 +1,16 @@
 from __future__ import annotations
 from chromy.chroma_functions import list_collections
-from chromy.utilities import print_lines
+from chromy.output import format_collection_names, print_lines
 def handle_list_collections() -> int:
    collections = list_collections()
    if not collections:
        print("No collections found.")
        return 0
-    print_lines(collections)
+    print_lines(format_collection_names(collections))
    return 0
@@ -1,6 +1,7 @@
 from __future__ import annotations
-from chromy.utilities import format_query_result, print_lines, run_query
+from chromy.output import format_query_result, print_lines
 from chromy.utilities import run_query
 def handle_query(collection: str, query_text: str) -> int:
@@ -0,0 +1,70 @@
 from __future__ import annotations
 from collections.abc import Mapping, Sequence
 from chromadb import QueryResult
 from rich.console import Console
 from rich.rule import Rule
 from rich.text import Text
 CONSOLE = Console()
 def print_lines(lines: Sequence[Rule | Text | str]) -> None:
    for line in lines:
        CONSOLE.print(line)
 def format_collection_names(collections: Sequence[str]) -> list[Text]:
    return [Text(f"· {collection}") for collection in collections]
 def format_count_message(collection_name: str, count: int) -> str:
    return (
        f"The '{collection_name}' collection contains [bold green]{count}[/] records."
    )
 def format_query_result(result: QueryResult) -> list[Rule | Text]:
    ids = result.get("ids", [[]])
    documents = result.get("documents", [[]])
    distances = result.get("distances", [[]])
    metadatas = result.get("metadatas", [[]])
    first_ids = ids[0] if ids else []
    first_documents = documents[0] if documents else []
    first_distances = distances[0] if distances else []
    first_metadatas = metadatas[0] if metadatas else []
    if not first_ids:
        return [Text.from_markup("[yellow]No results found.[/]")]
    lines: list[Rule | Text] = [Rule(title="Query results")]
    for index, document_id in enumerate(first_ids, start=1):
        lines.append(
            Text.from_markup(f"[bold]{index}[/].\t[green]id[/]\t\t{document_id}")
        )
        i = index - 1
        if i < len(first_distances):
            lines.append(
                Text.from_markup(f"\t[green]distance[/]\t{first_distances[i]}")
            )
        if i < len(first_metadatas):
            metadata = first_metadatas[i]
            if isinstance(metadata, Mapping):
                file_name = metadata.get("file_name")
                if file_name:
                    lines.append(Text.from_markup(f"\t[green]file_name[/]\t{file_name}"))
        if i < len(first_documents):
            lines.append(Text.from_markup("\n[bold green]Retrieved contents[/]\n"))
            lines.append(Text(first_documents[i]))
        lines.append(Rule())
    return lines
@@ -1,26 +1,18 @@
 from __future__ import annotations
-from rich.text import Text
+from pathlib import Path
 from rich.rule import Rule
 from rich.console import Console
 from collections.abc import Mapping, Sequence
 from chromadb import QueryResult
-from chromy.chroma_functions import add_data, query_data
+from chromy.chroma_functions import add_data, delete_data, has_data_for_file, query_data
-from chromy.chunk_functions import chunk_file
+from chromy.chunking import chunk_file
-from chromy.embed import embed
+from chromy.embedding import embed
 CONSOLE = Console()
 def print_lines(lines: Sequence[str]) -> None:
    for line in lines:
        CONSOLE.print(line)
 def ingest_file(collection_name: str, file_path: str) -> int:
    if has_data_for_file(collection_name, file_path):
        delete_data(collection_name, {"file_name": file_path})
    chunks = chunk_file(file_path)
    embeddings = embed(chunks)
    add_data(collection_name, embeddings, file_path)
@@ -31,50 +23,39 @@ def run_query(collection_name: str, query_text: str) -> QueryResult:
    return query_data(collection_name, [query_text])
-def format_query_result(result: QueryResult) -> list[str]:
+def is_probably_text_file(path: str | Path, sample_size: int = 8192) -> bool:
-    ids = result.get("ids", [[]])
+    """
-    documents = result.get("documents", [[]])
+    Return whether a file appears to contain text.
    distances = result.get("distances", [[]])
    metadatas = result.get("metadatas", [[]])
-    first_ids = ids[0] if ids else []
+    Args:
-    first_documents = documents[0] if documents else []
+        path (str | Path): The path to the file to inspect.
-    first_distances = distances[0] if distances else []
+        sample_size (int): The maximum number of bytes to read from the file.
    first_metadatas = metadatas[0] if metadatas else []
-    if not first_ids:
+    Returns:
-        return ["No results found."]
+        bool: ``True`` if the sampled bytes decode as UTF-8, UTF-8 with BOM,
        UTF-16, or UTF-32, or if the file is empty. Otherwise, ``False``.
    """
-    lines = [Rule(title="Query results")]
+    path = Path(path)
-    for index, document_id in enumerate(first_ids, start=1):
+    with path.open("rb") as f:
-        # lines.append(f"{index}.\tid: {document_id}")
+        sample = f.read(sample_size)
        lines.append(
            Text.from_markup(f"[bold]{index}[/].\t[green]id[/]\t\t{document_id}")
        )
        i = index - 1
-        if i < len(first_distances):
+    if not sample:
-            lines.append(
+        return True
                Text.from_markup(f"\t[green]distance[/]\t{first_distances[i]}")
            )
-        if i < len(first_metadatas):
+    encodings = (
-            metadata = first_metadatas[i]
+        "utf-8",
        "utf-8-sig",
        "utf-16",
        "utf-32",
    )
-            if isinstance(metadata, Mapping):
+    for encoding in encodings:
-                file_name = metadata.get("file_name")
+        try:
            sample.decode(encoding)
            return True
        except UnicodeDecodeError:
            pass
-                if file_name:
+    return False
                    lines.append(
                        Text.from_markup(f"\t[green]file_name[/]\t{file_name}")
                    )
        if i < len(first_documents):
            lines.append(Text.from_markup("\n[bold green]Retrieved contents[/]\n"))
            lines.append(first_documents[i])
        # Print a separator between documents
        lines.append(Rule())
    return lines
@@ -24,7 +24,7 @@ dependencies = [
 chromy = "chromy.main:main"
 [tool.setuptools]
-packages = ["chromy", "chromy.handlers"]
+packages = ["chromy", "chromy.chunking", "chromy.embedding", "chromy.handlers"]
 [dependency-groups]
 dev = [
@@ -72,7 +72,7 @@ module = [
 ignore_missing_imports = true
 [[tool.mypy.overrides]]
-module = "chromy.chunk_functions"
+module = "chromy.chunking.service"
 disable_error_code = [
    "attr-defined",
 ]
@@ -0,0 +1,43 @@
 ---
 name: chromy
 description: This skill provides access to a RAG-like context enhancer that uses Chromadb locally.
 ---
 # Chromy
 Whenever the user asks to "use chromy", you should invoke `chromy`, which is a cli tool to perform RAG search.
 The tool should be available in the `$PATH` as `chromy`.
 You have access to these commands:
 - `$ chromy lc` -> Lists the existing collections.
 - `$ chromy q <collection> <query>` -> Performs a query. Be sure to quote the `<query>` if this is composed by multiple words.
 Then use the response from Chromy to enhance the context and give the user a refined response.
 ## A note on file sources
 The Chromy response returns the metadatas for the chunks it finds. Among these metadatas, there is `file_name`, which refers to the original file that was chunked and imported. **DO NOT ATTEMPT** to find or fetch these files. They most likely do not exist in the filesystem. You **SHOULD ALWAYS** however cite correctly from which files (**ONLY** from Chromy's metadatas) the information is coming.
 ## Example use case
 **START**
 User query:
 > Search in Chromy information about lovecraft's Dunwich horror.
 Step 1: Get the available collections with `chromy lc`. The output is:
 ```
 lovecraft
 documents
 ```
 Most likely our information is in the `lovecraft` collection. We will use that for the query.
 Step 2: Query using `chromy q lovecraft <query>`. The query _is up to you_, create one keeping into account that this is a raw query on a vector DB. Be concise, extract keywords, avoid noise.
 Step 3: Get the results, enhance the context, and respond to the user.
 **END**
@@ -31,11 +31,11 @@ class CliTests(unittest.TestCase):
        with patch(
            "chromy.handlers.list_collections.list_collections",
            return_value=["books", "code"],
-        ):
+        ): 
            result = _invoke(["list-collections"])
        self.assertEqual(result.exit_code, 0)
-        self.assertEqual(result.stdout, "books\ncode\n")
+        self.assertEqual(result.stdout, "· books\n· code\n")
    def test_create_collection(self) -> None:
        with patch(
@@ -51,15 +51,13 @@ class CliTests(unittest.TestCase):
    def test_create_collection_with_same_name(self) -> None:
        with patch(
            "chromy.handlers.create_collection.create_collection",
-            side_effect=InternalError()
+            side_effect=InternalError(),
        ) as create_collection:
            result = _invoke(["create-collection", "notes"])
        create_collection.assert_called_once_with("notes")
        self.assertEqual(result.exit_code, 1)
-        self.assertEqual(
+        self.assertEqual(result.stdout, "Error: Collection 'notes' already exists.\n")
            result.stdout, "Error: Collection 'notes' already exists.\n")
    def test_delete_collection(self) -> None:
        with patch(
@@ -74,14 +72,13 @@ class CliTests(unittest.TestCase):
    def test_delete_non_existent_collection(self) -> None:
        with patch(
            "chromy.handlers.delete_collection.delete_collection",
-            side_effect=NotFoundError()
+            side_effect=NotFoundError(),
        ) as delete_collection:
            result = _invoke(["delete-collection", "notes"])
        delete_collection.assert_called_once_with("notes")
        self.assertEqual(result.exit_code, 1)
-        self.assertEqual(
+        self.assertEqual(result.stdout, "Error: Collection 'notes' does not exist.\n")
            result.stdout, "Error: Collection 'notes' does not exist.\n")
    def test_count(self) -> None:
        with patch(
@@ -92,7 +89,10 @@ class CliTests(unittest.TestCase):
        count_collection.assert_called_once_with("notes")
        self.assertEqual(result.exit_code, 0)
-        self.assertEqual(result.stdout, "7\n")
+        self.assertEqual(
            result.stdout,
            "The 'notes' collection contains 7 records.\n",
        )
    def test_import_data(self) -> None:
        with patch(
@@ -107,7 +107,101 @@ class CliTests(unittest.TestCase):
        )
        self.assertEqual(result.exit_code, 0)
        self.assertEqual(
-            result.stdout, "Added 3 records to collection 'notes'.\n")
+            result.stdout,
            "Added 3 records from 'romeo_and_juliet.txt' to collection 'notes'.\n"
            "Imported 1 file(s) successfully; 0 failed.\n",
        )
    def test_import_data_accepts_multiple_files(self) -> None:
        with patch(
            "chromy.handlers.import_data.ingest_file",
            side_effect=[3, 2],
        ) as ingest_file:
            result = _invoke(
                ["import", "notes", "romeo_and_juliet.txt", "README.md"],
            )
        self.assertEqual(ingest_file.call_count, 2)
        ingest_file.assert_any_call(
            "notes",
            self._fixture_path("romeo_and_juliet.txt"),
        )
        ingest_file.assert_any_call(
            "notes",
            self._fixture_path("README.md"),
        )
        self.assertEqual(result.exit_code, 0)
        self.assertEqual(
            result.stdout,
            "Added 3 records from 'romeo_and_juliet.txt' to collection 'notes'.\n"
            "Added 2 records from 'README.md' to collection 'notes'.\n"
            "Imported 2 file(s) successfully; 0 failed.\n",
        )
    def test_import_data_continues_after_missing_file(self) -> None:
        with patch(
            "chromy.handlers.import_data.ingest_file",
            return_value=3,
        ) as ingest_file:
            result = _invoke(
                ["import", "notes", "missing.txt", "romeo_and_juliet.txt"],
            )
        ingest_file.assert_called_once_with(
            "notes",
            self._fixture_path("romeo_and_juliet.txt"),
        )
        self.assertEqual(result.exit_code, 1)
        self.assertEqual(
            result.stdout,
            "Error: The file 'missing.txt' was not found.\n"
            "Added 3 records from 'romeo_and_juliet.txt' to collection 'notes'.\n"
            "Imported 1 file(s) successfully; 1 failed.\n",
        )
    def test_import_data_rejects_non_text_files(self) -> None:
        with patch(
            "chromy.handlers.import_data.is_probably_text_file",
            return_value=False,
        ):
            result = _invoke(["import", "notes", "romeo_and_juliet.txt"])
        self.assertEqual(result.exit_code, 1)
        self.assertEqual(
            result.stdout,
            "Error: The file 'romeo_and_juliet.txt' is not a text file.\n"
            "Imported 0 file(s) successfully; 1 failed.\n",
        )
    def test_import_data_treats_literal_glob_as_missing_file(self) -> None:
        result = _invoke(["import", "notes", "*.md"])
        self.assertEqual(result.exit_code, 1)
        self.assertEqual(
            result.stdout,
            "Error: The file '*.md' was not found.\n"
            "Imported 0 file(s) successfully; 1 failed.\n",
        )
    def test_import_data_deduplicates_paths_within_single_invocation(self) -> None:
        with patch(
            "chromy.handlers.import_data.ingest_file",
            return_value=3,
        ) as ingest_file:
            result = _invoke(
                ["import", "notes", "README.md", "./README.md"],
            )
        ingest_file.assert_called_once_with(
            "notes",
            self._fixture_path("README.md"),
        )
        self.assertEqual(result.exit_code, 0)
        self.assertEqual(
            result.stdout,
            "Added 3 records from 'README.md' to collection 'notes'.\n"
            "Imported 1 file(s) successfully; 0 failed.\n",
        )
    def test_query(self) -> None:
        query_result = {"ids": [["1"]], "documents": [["hello"]]}
@@ -139,8 +233,7 @@ class CliTests(unittest.TestCase):
        self.assertEqual(result.exit_code, 0)
        self.assertEqual(
            result.stdout,
-            "Deleted 2 record(s) from collection 'notes' "
+            "Deleted 2 record(s) from collection 'notes' where file_name=play.txt.\n",
            "where file_name=play.txt.\n",
        )
    def test_invalid_delete_filter_keeps_user_facing_error(self) -> None:
@@ -1,11 +1,29 @@
 from __future__ import annotations
 import unittest
 from unittest.mock import patch
 from chromy.embedding import embed
 class EmbedTest(unittest.TestCase):
-    def test_embed_function(self) -> None:
+    def test_embed_returns_empty_list_for_empty_chunks(self) -> None:
-        self.assertEqual(0, 0)
+        self.assertEqual(embed([]), [])
    def test_embed_pairs_text_with_list_embeddings(self) -> None:
        with patch(
            "chromy.embedding.service.DefaultEmbeddingFunction",
            return_value=lambda chunks: ((1.0, 2.0), (3.0, 4.0)),
        ):
            result = embed(["first", "second"])
        self.assertEqual(
            result,
            [
                {"text": "first", "embedding": [1.0, 2.0]},
                {"text": "second", "embedding": [3.0, 4.0]},
            ],
        )
 if __name__ == "__main__":
@@ -6,15 +6,15 @@ from collections.abc import Callable
 from contextlib import redirect_stdout
 from pathlib import Path
 from typing import TypeVar
-from unittest.mock import patch
+from unittest.mock import MagicMock, patch
 from chromy.handlers.import_data import handle_import
 from chromy.handlers.count_collection import handle_count_collection
 from chromy.handlers.create_collection import handle_create_collection
 from chromy.handlers.delete_collection import (
    handle_delete_collection,
    handle_delete_records,
 )
 from chromy.handlers.import_data import handle_import
 from chromy.handlers.list_collections import handle_list_collections
 from chromy.handlers.query import handle_query
@@ -47,7 +47,7 @@ class HandlerTests(unittest.TestCase):
            )
        self.assertEqual(exit_code, 0)
-        self.assertEqual(output, "notes\nplays\n")
+        self.assertEqual(output, "· notes\n· plays\n")
    def test_create_collection_uses_typed_input(self) -> None:
        with patch(
@@ -86,7 +86,7 @@ class HandlerTests(unittest.TestCase):
        count.assert_called_once_with("notes")
        self.assertEqual(exit_code, 0)
-        self.assertEqual(output, "7\n")
+        self.assertEqual(output, "The 'notes' collection contains 7 records.\n")
    def test_import_data_uses_typed_input(self) -> None:
        with patch(
@@ -96,7 +96,7 @@ class HandlerTests(unittest.TestCase):
            exit_code, output = _capture_output(
                handle_import,
                "notes",
-                "romeo_and_juliet.txt",
+                ["romeo_and_juliet.txt"],
            )
        ingest_file.assert_called_once_with(
@@ -104,7 +104,132 @@ class HandlerTests(unittest.TestCase):
            self._fixture_path("romeo_and_juliet.txt"),
        )
        self.assertEqual(exit_code, 0)
-        self.assertEqual(output, "Added 3 records to collection 'notes'.\n")
+        self.assertEqual(
            output,
            "Added 3 records from 'romeo_and_juliet.txt' to collection 'notes'.\n"
            "Imported 1 file(s) successfully; 0 failed.\n",
        )
    def test_import_data_continues_after_missing_file(self) -> None:
        with patch(
            "chromy.handlers.import_data.ingest_file",
            return_value=3,
        ) as ingest_file:
            exit_code, output = _capture_output(
                handle_import,
                "notes",
                ["missing.txt", "romeo_and_juliet.txt"],
            )
        ingest_file.assert_called_once_with(
            "notes",
            self._fixture_path("romeo_and_juliet.txt"),
        )
        self.assertEqual(exit_code, 1)
        self.assertEqual(
            output,
            "Error: The file 'missing.txt' was not found.\n"
            "Added 3 records from 'romeo_and_juliet.txt' to collection 'notes'.\n"
            "Imported 1 file(s) successfully; 1 failed.\n",
        )
    def test_import_data_rejects_non_text_files(self) -> None:
        with patch(
            "chromy.handlers.import_data.is_probably_text_file",
            return_value=False,
        ):
            exit_code, output = _capture_output(
                handle_import,
                "notes",
                ["romeo_and_juliet.txt"],
            )
        self.assertEqual(exit_code, 1)
        self.assertEqual(
            output,
            "Error: The file 'romeo_and_juliet.txt' is not a text file.\n"
            "Imported 0 file(s) successfully; 1 failed.\n",
        )
    def test_import_data_deduplicates_files(self) -> None:
        with patch(
            "chromy.handlers.import_data.ingest_file",
            return_value=3,
        ) as ingest_file:
            exit_code, output = _capture_output(
                handle_import,
                "notes",
                ["README.md", "./README.md"],
            )
        ingest_file.assert_called_once_with(
            "notes",
            self._fixture_path("README.md"),
        )
        self.assertEqual(exit_code, 0)
        self.assertEqual(
            output,
            "Added 3 records from 'README.md' to collection 'notes'.\n"
            "Imported 1 file(s) successfully; 0 failed.\n",
        )
    def test_import_data_suppresses_per_file_output_with_progress(self) -> None:
        progress = MagicMock()
        progress.__enter__.return_value = progress
        progress.__exit__.return_value = None
        progress.console.print = print
        progress.add_task.return_value = 1
        with (
            patch("chromy.handlers.import_data.ingest_file", side_effect=[3, 2]),
            patch(
                "chromy.handlers.import_data._should_show_progress",
                return_value=True,
            ),
            patch("chromy.handlers.import_data.Progress", return_value=progress),
        ):
            exit_code, output = _capture_output(
                handle_import,
                "notes",
                ["romeo_and_juliet.txt", "README.md"],
            )
        self.assertEqual(exit_code, 0)
        self.assertEqual(output, "Imported 2 file(s) successfully; 0 failed.\n")
    def test_import_data_truncates_long_file_names_in_progress(self) -> None:
        progress = MagicMock()
        progress.__enter__.return_value = progress
        progress.__exit__.return_value = None
        progress.console.print = print
        progress.add_task.return_value = 1
        with (
            patch(
                "chromy.handlers.import_data._get_absolute_path",
                side_effect=[
                    "/tmp/this_is_a_very_long_file_name.txt",
                    self._fixture_path("README.md"),
                    "/tmp/this_is_a_very_long_file_name.txt",
                    self._fixture_path("README.md"),
                ],
            ),
            patch("chromy.handlers.import_data._import_one", return_value=3),
            patch(
                "chromy.handlers.import_data._should_show_progress",
                return_value=True,
            ),
            patch("chromy.handlers.import_data.Progress", return_value=progress),
        ):
            handle_import(
                "notes",
                ["this_is_a_very_long_file_name.txt", "README.md"],
            )
        progress.update.assert_any_call(
            1,
            description="Importing [bold]this_is_a_very_lo...[/]...",
        )
    def test_query_uses_typed_input(self) -> None:
        query_result = {"ids": [["1"]], "documents": [["hello"]]}
@@ -154,7 +279,7 @@ class HandlerTests(unittest.TestCase):
 def _capture_output(
    handler: Callable[..., int],
-    *arguments: CommandT,
+    *arguments: object,
 ) -> tuple[int, str]:
    output = io.StringIO()
@@ -0,0 +1,67 @@
 from __future__ import annotations
 import unittest
 from unittest.mock import MagicMock, call, patch
 from chromy.utilities import ingest_file
 class UtilityTests(unittest.TestCase):
    def test_ingest_file_adds_new_file_without_deleting(self) -> None:
        chunks = ["chunk 1", "chunk 2"]
        embeddings = [
            {"text": "chunk 1", "embedding": [0.1, 0.2]},
            {"text": "chunk 2", "embedding": [0.3, 0.4]},
        ]
        with (
            patch("chromy.utilities.has_data_for_file", return_value=False) as has_data,
            patch("chromy.utilities.delete_data") as delete_data,
            patch("chromy.utilities.chunk_file", return_value=chunks) as chunk_file,
            patch("chromy.utilities.embed", return_value=embeddings) as embed,
            patch("chromy.utilities.add_data") as add_data,
        ):
            records_added = ingest_file("notes", "/tmp/play.txt")
        has_data.assert_called_once_with("notes", "/tmp/play.txt")
        delete_data.assert_not_called()
        chunk_file.assert_called_once_with("/tmp/play.txt")
        embed.assert_called_once_with(chunks)
        add_data.assert_called_once_with("notes", embeddings, "/tmp/play.txt")
        self.assertEqual(records_added, 2)
    def test_ingest_file_replaces_existing_file_records_before_adding(self) -> None:
        chunks = ["chunk 1"]
        embeddings = [{"text": "chunk 1", "embedding": [0.1, 0.2]}]
        manager = MagicMock()
        with (
            patch("chromy.utilities.has_data_for_file", return_value=True) as has_data,
            patch("chromy.utilities.delete_data") as delete_data,
            patch("chromy.utilities.chunk_file", return_value=chunks) as chunk_file,
            patch("chromy.utilities.embed", return_value=embeddings) as embed,
            patch("chromy.utilities.add_data") as add_data,
        ):
            manager.attach_mock(has_data, "has_data")
            manager.attach_mock(delete_data, "delete_data")
            manager.attach_mock(chunk_file, "chunk_file")
            manager.attach_mock(embed, "embed")
            manager.attach_mock(add_data, "add_data")
            records_added = ingest_file("notes", "/tmp/play.txt")
        self.assertEqual(
            manager.mock_calls,
            [
                call.has_data("notes", "/tmp/play.txt"),
                call.delete_data("notes", {"file_name": "/tmp/play.txt"}),
                call.chunk_file("/tmp/play.txt"),
                call.embed(chunks),
                call.add_data("notes", embeddings, "/tmp/play.txt"),
            ],
        )
        self.assertEqual(records_added, 1)
 if __name__ == "__main__":
    unittest.main()
Author	SHA1	Message	Date
mrosati	28ec29f8af	add progress bar when importing multiple files build / build (push) Successful in 11s Details pytest / pytest (push) Failing after 28s Details	2026-05-01 15:45:41 +02:00
mrosati	fb62d1b539	refactor chunking and embedding into their own modules build / build (push) Successful in 45s Details pytest / pytest (push) Successful in 26s Details	2026-05-01 11:01:30 +02:00
Matteo Rosati	26df98c08e	add multi-file import support build / build (push) Successful in 9s Details pytest / pytest (push) Successful in 26s Details	2026-04-29 15:39:42 +02:00
Matteo Rosati	74e48fbcd5	replace existing file records on re-import build / build (push) Successful in 9s Details pytest / pytest (push) Successful in 25s Details	2026-04-29 14:46:41 +02:00
Matteo Rosati	d1b1238897	decouple core data from CLI formatting build / build (push) Successful in 49s Details pytest / pytest (push) Successful in 30s Details	2026-04-29 12:44:28 +02:00
mrosati	615ab14a1a	add skill build / build (push) Successful in 9s Details pytest / pytest (push) Successful in 24s Details	2026-04-24 22:49:36 +02:00
mrosati	508d036815	add logo, update README build / build (push) Successful in 35s Details pytest / pytest (push) Successful in 27s Details	2026-04-24 22:46:09 +02:00
mrosati	292d0eb139	update agents file build / build (push) Successful in 10s Details pytest / pytest (push) Successful in 24s Details	2026-04-24 18:48:37 +02:00
mrosati	d71fce7a6a	cannot import non-text files! build / build (push) Successful in 39s Details pytest / pytest (push) Successful in 35s Details	2026-04-24 18:40:51 +02:00
mrosati	c6ad060e85	fix types and print middle dot in collections list	2026-04-24 18:28:03 +02:00
mrosati	c5b6b196b5	fix syntax and types	2026-04-24 18:23:02 +02:00
mrosati	948f8500be	types cleanup	2026-04-24 18:20:22 +02:00