ducklm/docs/memory_architecture.md

2.0 KiB

Memory Architecture

DuckLM currently has two memory layers:

  • SQLite memory in duck_core.memory.store.MemoryStore for durable structured records.
  • Vector memory in duck_core.memory.vector_memory.VectorMemory for semantic search through Qdrant.

SQLite Memory

SQLite is the primary durable store. Runtime writes memory records after memory_policy decides that a completed task contains reusable information. Manual memory records can also be added through /v1/memory and the WebChat memory drawer.

SQLite memory remains available even when Qdrant is down.

Vector Memory

Vector memory stores the same useful memory summaries in Qdrant when vector storage is configured and reachable. Qdrant is managed by the local service scripts:

bash scripts/duck.sh start
bash scripts/duck.sh status --probe
bash scripts/duck.sh stop

The MTP stack uses the same memory lifecycle through scripts/duck-mtp.sh.

Embeddings

The default embedding source is a local sentence-transformers model:

./models/all-MiniLM-L6-v2

VectorMemory lazy-loads that model only when it needs to write or search vectors. Health checks do not load the embedding model; they only probe Qdrant.

A remote OpenAI-compatible embeddings endpoint can be used by setting embeddings_base_url, but the normal local stack does not rely on llama-server embeddings.

If no embedding source is configured, VectorMemory raises EmbeddingsUnavailableError. It does not silently invent fallback embeddings.

Status And Verification

Runtime status is available through:

curl --noproxy '*' 'http://127.0.0.1:8000/v1/status?probe=true'

scripts/duck.sh status --probe prints the same backend result plus Docker Compose state for Qdrant. WebChat also shows model and vector memory state in the Runtime panel.

The live smoke test for Qdrant write/search is:

.venv/bin/python -m pytest tests/smoke/test_vector_memory_live.py -q

The test skips when Qdrant is not reachable, and runs a real add/search cycle when the local stack is up.