ducklm/docs/memory_architecture.md

69 lines
2.0 KiB
Markdown

# Memory Architecture
DuckLM currently has two memory layers:
- SQLite memory in `duck_core.memory.store.MemoryStore` for durable structured records.
- Vector memory in `duck_core.memory.vector_memory.VectorMemory` for semantic search through Qdrant.
## SQLite Memory
SQLite is the primary durable store. Runtime writes memory records after
`memory_policy` decides that a completed task contains reusable information.
Manual memory records can also be added through `/v1/memory` and the WebChat
memory drawer.
SQLite memory remains available even when Qdrant is down.
## Vector Memory
Vector memory stores the same useful memory summaries in Qdrant when vector
storage is configured and reachable. Qdrant is managed by the local service
scripts:
```bash
bash scripts/duck.sh start
bash scripts/duck.sh status --probe
bash scripts/duck.sh stop
```
The MTP stack uses the same memory lifecycle through `scripts/duck-mtp.sh`.
## Embeddings
The default embedding source is a local `sentence-transformers` model:
```text
./models/all-MiniLM-L6-v2
```
`VectorMemory` lazy-loads that model only when it needs to write or search
vectors. Health checks do not load the embedding model; they only probe Qdrant.
A remote OpenAI-compatible embeddings endpoint can be used by setting
`embeddings_base_url`, but the normal local stack does not rely on
`llama-server` embeddings.
If no embedding source is configured, `VectorMemory` raises
`EmbeddingsUnavailableError`. It does not silently invent fallback embeddings.
## Status And Verification
Runtime status is available through:
```bash
curl --noproxy '*' 'http://127.0.0.1:8000/v1/status?probe=true'
```
`scripts/duck.sh status --probe` prints the same backend result plus Docker
Compose state for Qdrant. WebChat also shows model and vector memory state in
the Runtime panel.
The live smoke test for Qdrant write/search is:
```bash
.venv/bin/python -m pytest tests/smoke/test_vector_memory_live.py -q
```
The test skips when Qdrant is not reachable, and runs a real add/search cycle
when the local stack is up.