Integrate memory policy and reflection runtime

This commit is contained in:
mirivlad 2026-05-21 22:40:11 +08:00
parent 9a8f058008
commit 6f58df82c9
21 changed files with 2014 additions and 60 deletions

462
CURRENT_STATE.md Normal file
View File

@ -0,0 +1,462 @@
# DuckLM — Текущее состояние проекта
**Дата анализа:** 2026-05-21
**Версия:** 0.2.0
**Расположение:** `~/git/ducklm_2`
---
## Последние изменения (Phase 1-7)
### Phase 1: MemoryPolicy — LLM-классификация памяти ✅
- `duck_core/memory/policy.py` — переписан с нуля: LLM-классификация через critic-роль
- Роль `memory_policy` добавлена в `config/models.yaml` + промпт
- Интегрирован в RuntimeLoop: `_run_memory_policy()` после каждой задачи
- События: `memory_policy_decision`, `memory_stored`, `memory_policy_failed`
- 6 тестов в `tests/smoke/test_memory_policy.py`
### Phase 2: Рефлексия (Critic) — автоматический вызов ✅
- `_run_reflection()` в RuntimeLoop — transcript из event store → critic → experience
- Параметр `reflect: bool = True` в `run_chat()`
- `ExperienceRecorder` передаётся через `create_app()`
- События: `reflection_completed`, `reflection_failed`
- 3 теста в `tests/smoke/test_reflection.py`
### Phase 3-4: ContextBuilder v2 + Summary-роль ✅
- Полностью переписан `duck_core/context_builder.py`
- Token budget awareness, приоритизация, суммаризация через summary-роль
- `estimate_tokens()`, `estimate_messages_tokens()` утилиты
- Подключён model_client для LLM-суммаризации
- 11 тестов в `tests/smoke/test_context_builder.py`
### Phase 5: VectorMemory — интеграция ✅
- `VectorMemory` добавлен в RuntimeLoop и `create_app()`
- **Локальная модель эмбеддингов**: `all-MiniLM-L6-v2` (384 размерности, sentence-transformers)
- При `memory_stored` также сохраняется в Qdrant (graceful fallback при ошибках)
- Поддержка двух режимов: локальная модель + remote `/v1/embeddings` endpoint
- `sentence-transformers` добавлен в зависимости `pyproject.toml`
- 4 теста в `tests/smoke/test_vector_memory_integration.py`
### Embeddings — архитектура
### Phase 6: Recall-роль ✅
- Роль `recall` добавлена в `config/models.yaml` + промпт `prompts/roles/recall.md`
- `ContextBuilder.recall_relevant_memory()` — LLM-фильтрация релевантных воспоминаний
- Интегрирован в `/v1/chat` endpoint
### Phase 7: Coder-роль — интеграция ✅
- `CoderTool` создан в `duck_core/tools/coder.py`
- Зарегистрирован в `ToolGateway.default()`
- Описан в `prompts/roles/action.md`
### Статус тестов
- **72 из 73** smoke-тестов проходят
- 1 тест (`test_llama_server_connection_live_skip_by_env`) требует живой llama-server
---
## 1. Что такое DuckLM
DuckLM — это **локальная агентная система (cognitive runtime)**, работающая поверх локальных языковых моделей через `llama-server`. Это не inference-сервер, а полноценный когнитивный цикл:
```
состояние → контекст → мышление → намерение → действие → наблюдение → рефлексия → память → опыт
```
Ключевая идея: DuckLM — это **оркестратор**, который управляет задачами, инструментами, памятью, навыками и рефлексией, используя локальные LLM через OpenAI-совместимый API (`llama-server`).
---
## 2. Архитектурные принципы
### 2.1. Использование готовых компонентов
| Компонент | Источник |
|-----------|----------|
| LLM inference | `llama-server` (llama.cpp, собранный с Vulkan) |
| Хранение состояния | SQLite (aiosqlite) |
| Векторная память | Qdrant (через docker-compose) |
| HTTP API | FastAPI |
| Web-интерфейс | Jinja2 + ванильный JS |
| Валидация данных | Pydantic |
| Конфигурация | PyYAML + python-dotenv |
**Не пишется с нуля:** inference server, model scheduler, vector DB, OpenAI API, MCP, песочница, workflow engine.
**Пишется с нуля:** Duck Core (runtime loop, context builder, model client, event store, tool gateway, approvals, skills, experience, memory policy, FastAPI API, WebChat).
### 2.2. Web/API first
- **WebChat** — интерфейс для человека (порт 8000)
- **HTTP API** — для кодера, тестов и внешних агентов
- CLI не входит в обязательную часть (если понадобится — тонкий клиент поверх HTTP API)
### 2.3. Роли моделей — логические, не физические
Роли: `thinker`, `critic`, `coder`, `action`, `summary`, `recall`, `sys_util`.
Все роли в текущей конфигурации указывают на одну физическую модель (`local-main` на порту 8081). Различие между ролями задаётся комбинацией:
- system prompt
- temperature
- max_output_tokens
- response_format / structured_output
- memory scope
- tool permissions
### 2.4. Token budget
```
DUCK_CTX_SIZE=4096 (в .env, хотя в коде дефолт 65536)
DUCK_MAX_INPUT_TOKENS=49152
DUCK_MAX_RECENT_EVENTS_TOKENS=12000
DUCK_MAX_MEMORY_TOKENS=8000
DUCK_MAX_SKILL_TOKENS=6000
```
Output limits по ролям:
- thinker: 8192
- critic: 4096
- coder: 16384
- action: 2048
- summary: 4096
---
## 3. Целевая архитектура (из ТЗ)
```
┌─────────────┐
│ WebChat │ ← интерфейс человека
└──────┬──────┘
┌─────────────┐
│ FastAPI │ ← интерфейс кодера, тестов и агентов
└──────┬──────┘
┌─────────────────────────────────────┐
│ Duck Core │
│ RuntimeLoop, TaskState, │
│ ContextBuilder, ModelClient, │
│ SkillRegistry, ToolGateway, │
│ ApprovalService, Reflection, │
│ MemoryPolicy, ExperienceRecorder │
└──────┬──────────────┬───────────────┘
│ │
▼ ▼
┌────────────┐ ┌──────────────────┐
│llama-server│ │ SQLite/PostgreSQL│
│OpenAI-comp.│ │ events/tasks/ │
└────────────┘ │ approvals │
│ └──────────────────┘
┌────────────┐
│ Qdrant │ ← semantic memory
└────────────┘
```
---
## 4. Что реализовано (текущее состояние)
### 4.1. Полностью реализованные компоненты
#### Конфигурация и настройки
- **`duck_core/config.py`** — `Settings` dataclass, загрузка из `.env`, кэширование через `lru_cache`
- **`config/models.yaml`** — 5 ролей (thinker, critic, coder, action, summary), все на `local-main` (порт 8081)
- **`.env`** / **`.env.example`** — полная конфигурация путей, портов, GPU, Qdrant
#### ModelClient
- **`duck_core/model_client.py`** — ролевая маршрутизация вызовов к llama-server
- `chat()` — синхронный вызов с измерением latency, usage
- `stream_chat()` — streaming через SSE (reasoning_delta + content_delta)
- `ping()` — проверка доступности всех ролей
- Автоматическая подстановка system prompt из файла
- Автоматический `response_format: json_schema` для action-роли
#### Хранение состояния (SQLite)
- **`duck_core/tasks/state.py`** — `TaskState` (Pydantic модель)
- **`duck_core/tasks/store.py`** — `TaskStore`: create, update_status, complete, fail, cancel, waiting_for_approval, get, list
- **`duck_core/events/store.py`** — `EventStore`: append (с авто-increment sequence), list_events, list_by_type
- **`duck_core/conversations/store.py`** — `ConversationStore`: create, ensure, get, list, add_message, list_messages, get_conversation_id_for_task
- **`duck_core/approvals/service.py`** — `ApprovalService`: create_pending, pending, get, allow_once, allow_forever, deny, is_allowed_forever
- **`duck_core/experience/recorder.py`** — `ExperienceRecorder`: record, list_records, get_record, write_skill_update_proposal
- **`duck_core/memory/store.py`** — `MemoryStore`: add, list, search (LIKE), relevant (scope-aware), infer_scope, _normalize_scope
#### Runtime Loop
- **`duck_core/runtime_loop.py`** — ядро когнитивного цикла:
- `run_chat()` — полный цикл: создание задачи → action loop → thinker → завершение
- `continue_after_approval()` — продолжение после одобрения действия
- `_run_action_loop()` — итеративный цикл вызова инструментов (max 4 итерации)
- `_run_action_tools()` — парсинг action directive от модели → вызов ToolGateway
- `_append_command_audit()` — аудит shell-команд через event store
- Обработка requires_approval → пауза с ожиданием решения пользователя
#### Context Builder
- **`duck_core/context_builder.py`** — `ContextBuilder.build_basic_messages()`: собирает сообщения из memory records, history и текущего user message
#### Tools
- **`duck_core/tools/base.py`** — `ToolResult` (Pydantic), `Tool` (Protocol)
- **`duck_core/tools/gateway.py`** — `ToolGateway`: маршрутизация action → конкретный инструмент
- **`duck_core/tools/file_read.py`** — `FileReadTool`: чтение файлов внутри workspace, запрет .env/.ssh/shadow
- **`duck_core/tools/file_write.py`** — `FileWriteTool`: запись внутри workspace, защита от перезаписи
- **`duck_core/tools/list_dir.py`** — `ListDirTool`: листинг директории внутри workspace
- **`duck_core/tools/search_files.py`** — `SearchFilesTool`: текстовый поиск по файлам (glob, case_sensitive)
- **`duck_core/tools/shell_exec_safe.py`** — `ShellExecSafeTool`: allowlist + blocklist + approval
- **`duck_core/tools/command_policy.py`** — `CommandPolicy`: классификация команд (readonly, system, destructive, dangerous fragments)
- **`duck_core/tools/paths.py`** — `resolve_workspace_path()`: защита от path traversal
#### Approvals
- **`duck_core/approvals/service.py`** — полный цикл согласований:
- Создание pending approval с SHA256-хешем действия
- Решения: allow_once, allow_forever, deny
- Проверка is_allowed_forever по хешу действия
- normalized_action хранится в JSON
#### Skills
- **`duck_core/skills/registry.py`** — `SkillRegistry`: загрузка из `*/skill.yaml`, парсинг procedure/examples/notes, поиск по ключевым словам
- **`skills/analyze_project/`** — единственный скилл: анализ структуры проекта
#### Experience & Reflection
- **`duck_core/experience/recorder.py`** — запись результатов задач, предложения по обновлению скиллов
- **`duck_core/reflection.py`** — `Reflection.reflect()`: вызов critic-роли для анализа транскрипта задачи
#### Memory
- **`duck_core/memory/store.py`** — `MemoryStore`: хранение в SQLite с поддержкой scope (global/workspace/conversation), importance, полнотекстовый поиск (LIKE)
- **`duck_core/memory/policy.py`** — `MemoryPolicy`: заглушка (всегда should_store=False)
- **`duck_core/memory/vector_memory.py`** — `VectorMemory`: интеграция с Qdrant для семантического поиска (требует embeddings endpoint)
#### FastAPI API
- **`duck_core/api.py`** — полный HTTP API (878 строк):
- `POST /v1/chat` — основной чат с сохранением в conversation
- `POST /v1/chat/stream` — streaming чат через SSE
- `POST /v1/tasks/{task_id}/continue/stream` — продолжение после одобрения
- `POST /v1/tasks/{task_id}/password/stream` — ввод sudo-пароля
- `GET/POST /v1/conversations` — управление диалогами
- `GET /v1/tasks`, `GET /v1/tasks/{task_id}/events` — инспекция задач
- `GET /v1/approvals/pending`, `POST /v1/approvals/{id}/allow_once|allow_forever|deny`
- `GET /v1/skills`, `GET /v1/skills/{skill_id}`
- `GET /v1/experience`
- `POST /v1/memory`, `GET /v1/memory`, `GET /v1/memory/search`
- `GET /v1/models/roles`, `GET /v1/models/ping`
- `GET /health`, `GET /v1/status`
- Веб-страницы: `/`, `/approvals`, `/skills`, `/memory`, `/experience`
#### WebChat UI
- **`duck_core/web/templates/index.html`** — полноценный WebChat с sidebar, conversation list, activity drawer
- **`duck_core/web/static/app.js`** (997 строк) — клиентская логика:
- SSE streaming с парсингом reasoning_delta, content_delta, tool_call_started/finished, tool_approval_requested, tool_password_requested
- Инлайн-терминалы для отображения вызовов инструментов
- Инлайн-кнопки одобрения/запрета действий
- Форма ввода sudo-пароля
- Activity drawer с вкладками Events/Commands/Memory
- Управление диалогами (create, select, load history)
- Enter для отправки, Shift+Enter для новой строки
- **`duck_core/web/static/style.css`** (1002 строки) — светлая тема, responsive layout
#### Скрипты
- **`scripts/llama/start_main.sh`** — управление llama-server (start/stop/restart/status/logs)
- **`scripts/llama/start_thinker_mtp_experimental.sh`** — экспериментальный MTP endpoint
- **`scripts/llama/build_vulkan.sh`** — сборка llama.cpp с Vulkan
- **`scripts/llama/healthcheck.sh`** — проверка здоровья llama-server
- **`scripts/verify/`** — 7 верификационных скриптов (basic_chat, file_write_read, tool_blocking, models_roles, skills, experience, memory)
- **`scripts/bench/bench_runtime.py`** — бенчмарк
#### Тесты
- 18 smoke-тестов в `tests/smoke/`:
- `test_models_config.py`, `test_model_client.py`, `test_api_health.py`
- `test_event_log.py`, `test_action_directive_schema.py`
- `test_tool_gateway.py`, `test_approvals.py`
- `test_skill_registry.py`, `test_experience_recorder.py`
- `test_vector_memory.py`, `test_memory_store.py`
- `test_chat_api.py`, `test_conversations.py`
- `test_runtime_reasoning.py`, `test_runtime_tools.py`
- `test_llama_server_connection.py`, `test_llama_service_script.py`
#### Документация
- 12 файлов в `docs/`:
- `architecture.md`, `how_to_run.md`, `how_to_test.md`
- `web_api.md`, `model_roles.md`, `tool_gateway.md`
- `memory_architecture.md`, `experience_learning.md`, `skills.md`
- `local_llama_server.md`, `performance_mtp.md`
- `superpowers/plans/2026-05-19-ducklm-runtime.md` — план реализации
#### Docker
- **`docker-compose.memory.yml`** — Qdrant (порты 6333/6334)
#### Сборка и запуск
- **`pyproject.toml`** — зависимости: fastapi, uvicorn, httpx, pydantic, pyyaml, jinja2, python-dotenv, jsonschema, aiosqlite, qdrant-client
- **`Makefile`** — цели: duck-up, duck-llama-main, duck-api, duck-dev, duck-smoke, duck-test, duck-verify
- **`data/duck.sqlite3`** — рабочая БД SQLite
- **`workspace/`** — рабочая директория для инструментов
---
## 5. План разработки (из `docs/superpowers/plans/2026-05-19-ducklm-runtime.md`)
План состоит из 4 задач:
### Task 1: Tests First
- Написать smoke tests для всех компонентов
- ✅ **Выполнено** — 18 тестов созданы
### Task 2: Runtime Core
- pyproject.toml, .env.example, config/models.yaml
- config.py, model_client.py, events/store.py, tasks/store.py, tasks/state.py
- context_builder.py, runtime_loop.py, api.py
- ✅ **Выполнено** — все компоненты реализованы
### Task 3: Stage Adapters
- tools/*, approvals/service.py, skills/registry.py
- experience/recorder.py, reflection.py, memory/*
- schemas/action_directive.schema.json
- ✅ **Выполнено** — все компоненты реализованы
### Task 4: Project Surface
- scripts/llama/*, scripts/verify/*, scripts/bench/*
- web/templates/*, web/static/*
- skills/analyze_project/*
- docker-compose.memory.yml, Makefile, README.md, docs/*
- ✅ **Выполнено** — все компоненты реализованы
---
## 6. Отступления от плана / дополнительные возможности
### 6.1. Что добавлено сверх плана
1. **ConversationStore** — полноценное управление диалогами (conversations + conversation_messages таблицы), чего не было явно в плане Task 2. План упоминал только tasks и events.
2. **Streaming API**`POST /v1/chat/stream` с SSE, `POST /v1/tasks/{id}/continue/stream`, `POST /v1/tasks/{id}/password/stream`. В плане не было явно указано streaming.
3. **Password flow** — полный цикл запроса sudo-пароля: `requires_password``tool_password_requested``/v1/tasks/{id}/password/stream`. В плане не было детализировано.
4. **Activity Drawer в WebChat** — боковая панель с вкладками Events/Commands/Memory, инлайн-терминалы для инструментов, инлайн-одобрения. Значительно больше, чем «пустая WebChat-страница» из этапа 1 ТЗ.
5. **Command Audit** — отдельный тип события `command_audit` для shell-команд с полной метаданной (action_type, risk_level, blocked, approved, returncode).
6. **Scope-aware Memory** — трёхуровневая система скоупов (global/workspace/conversation) с автоматическим infer_scope.
7. **Skill update proposals** — автоматическая запредложений по обновлению скиллов в `skills/_proposals/`.
8. **18 тестов вместо 11 запланированных** — добавлены: `test_chat_api`, `test_conversations`, `test_runtime_reasoning`, `test_runtime_tools`, `test_llama_server_connection`, `test_llama_service_script`, `test_memory_store`.
### 6.2. Что не реализовано (или реализовано частично)
1. **MemoryPolicy — заглушка.** `MemoryPolicy.classify()` всегда возвращает `should_store=False`. Нет LLM-классификации для автоматического сохранения памяти.
2. **ContextBuilder — минимальный.** Нет суммаризации старых events, нет обрезки по token budget, нет приоритизации контекста. Просто склеивает memory + history + user message.
3. **Critic не вызывается автоматически.** `Reflection.reflect()` есть, но не интегрирован в RuntimeLoop — нет автоматической рефлексии после завершения задачи.
4. **Summary роль не используется.** Нет автоматической суммаризации контекста при превышении budget.
5. **Coder роль не используется в основном потоке.** RuntimeLoop вызывает только action и thinker.
6. **Recall роль не определена в конфиге.** В ТЗ упоминается recall, но в `config/models.yaml` её нет.
7. **Sys_util роль не определена в конфиге.** Аналогично.
8. **VectorMemory не интегрирован в RuntimeLoop.** Qdrant-поиск не подключён к основному циклу (MemoryStore использует LIKE-поиск, а не векторный).
9. **WebChat — светлая тема.** В памяти пользователя указано предпочтение тёмной темы, но CSS реализован светлый (`color-scheme: light`, белый фон).
10. **Нет CLI.** Упомянуто в ТЗ как необязательное, но если понадобится — нужно делать.
11. **Нет автоматического применения skill patches.** Предложения пишутся в `skills/_proposals/`, но не применяются автоматически.
### 6.3. Технические заметки
- **Модель:** Qwen3.6 35B A3B, два варианта — nonMTP (основной, порт 8081) и MTP (экспериментальный, порт 8085)
- **GPU:** Radeon RX580, Vulkan backend, 20 GPU layers
- **llama-server бинарник:** `./vendor/llama.cpp/build/bin/llama-server`
- **ctx_size в .env:** 4096 (хотя в коде Settings дефолт 65536)
- **reasoning-budget:** 512 в .env, `--reasoning-budget 512 --cache-ram 0`
- **Python:** 3.13 (по путям `__pycache__`)
---
## 7. Структура базы данных (SQLite)
Таблицы:
- **tasks** — задачи (task_id, status, user_message, workspace, debug, final_response, created_at, updated_at)
- **events** — события (id, task_id, sequence, event_type, payload_json, created_at)
- **conversations** — диалоги (id, conversation_id, title, workspace, created_at, updated_at)
- **conversation_messages** — сообщения диалогов (id, conversation_id, role, content, reasoning_content, task_id, status, created_at)
- **approvals** — согласования (id, approval_id, task_id, action_hash, normalized_action_json, status, decision, created_at, updated_at)
- **experience_records** — записи опыта (id, task_id, skill_id, summary, result, what_worked_json, what_failed_json, reusable_lesson, suggested_skill_patch, confidence, created_at)
- **memories** — память (id, memory_id, text, scope, workspace, conversation_id, memory_type, importance, metadata_json, created_at, updated_at)
---
## 8. Когнитивный цикл (как работает RuntimeLoop)
1. Пользователь отправляет сообщение → `POST /v1/chat` или `/v1/chat/stream`
2. Создаётся Task + событие `task_created`
3. ContextBuilder собирает сообщения (memory + history + user message)
4. **Action loop** (до 4 итераций):
- Модель `action` генерирует JSON directive (schema: action_directive.schema.json)
- ToolGateway выполняет каждый action через соответствующий инструмент
- Если команда требует approval → пауза, создание Approval, ожидание решения
- Если sudo → запрос пароля
- Результаты собираются как tool_observations
5. Thinker получает все tool_observations и формирует финальный ответ
6. Задача завершена → `task_completed`
7. (Опционально) Reflection через critic — **не автоматизировано**
---
## 9. Статус готовности
| Компонент | Статус |
|-----------|--------|
| Конфигурация | ✅ Готово |
| ModelClient | ✅ Готово |
| TaskStore / EventStore | ✅ Готово |
| ConversationStore | ✅ Готово |
| RuntimeLoop | ✅ Готово |
| ContextBuilder | ⚠️ Минимальный |
| ToolGateway + Tools | ✅ Готово |
| ApprovalService | ✅ Готово |
| SkillRegistry | ✅ Готово |
| ExperienceRecorder | ✅ Готово |
| Reflection | ⚠️ Не интегрирован в loop |
| MemoryStore (SQLite) | ✅ Готово |
| MemoryPolicy | ✅ LLM-based (Phase 1) |
| VectorMemory (Qdrant) | ✅ Интегрирован (Phase 5) |
| FastAPI API | ✅ Готово |
| WebChat UI | ✅ Готово (светлая тема) |
| Streaming | ✅ Готово |
| Password flow | ✅ Готово |
| Smoke tests | ✅ 74 теста |
| Docs | ✅ 12 файлов |
| Scripts | ✅ Готово |
---
## Архитектура эмбеддингов
### Локальная модель (основной режим)
- **Модель**: `all-MiniLM-L6-v2` (sentence-transformers, 384 размерности)
- **Расположение**: `./models/all-MiniLM-L6-v2/` (safetensors формат)
- **Библиотека**: `sentence-transformers` (добавлен в pyproject.toml)
- **Использование**: `VectorMemory._local_embed()` — загрузка модели через `SentenceTransformer`, кодирование в thread pool
### Remote endpoint (fallback)
- **Endpoint**: `/v1/embeddings` на llama-server или OpenAI-совместимом сервере
- **Использование**: `VectorMemory._remote_embed()` — HTTP POST запрос
### Поток данных
1. Зача завершена → `_run_memory_policy()` → LLM классифицирует → `MemoryDecision`
2. Если `should_store=True``MemoryStore.add()` (SQLite) + `VectorMemory.add_memory()` (Qdrant)
3. При следующем запросе → `MemoryStore.relevant()` (SQLite LIKE) + `VectorMemory.search_memory()` (semantic)
4. Recall-роль фильтрует релевантные воспоминания через LLM
### Зависимости
- `sentence-transformers` — для локальной модели
- `qdrant-client` — для Qdrant (уже был)
- Qdrant запускается через `docker-compose.memory.yml` (порт 6333)
| Docker (Qdrant) | ✅ Готово |
**Общий вывод:** Все 4 задачи плана реализованы. Система представляет собой работающий skeleton с полным когнитивным циклом. Основные направления для дальнейшего развития: интеграция рефлексии и summary в основной цикл, LLM-based MemoryPolicy, векторная память, тёмная тема, расширение ContextBuilder.

View File

@ -51,3 +51,23 @@ models:
temperature: 0.1 temperature: 0.1
max_output_tokens: 4096 max_output_tokens: 4096
system_prompt: prompts/roles/summary.md system_prompt: prompts/roles/summary.md
memory_policy:
provider: llama_server
base_url: http://127.0.0.1:8081/v1
model: local-main
purpose: memory_classification
structured_output: true
temperature: 0.1
max_output_tokens: 1024
system_prompt: prompts/roles/memory_policy.md
recall:
provider: llama_server
base_url: http://127.0.0.1:8081/v1
model: local-main
purpose: memory_recall
structured_output: true
temperature: 0.1
max_output_tokens: 2048
system_prompt: prompts/roles/recall.md

View File

@ -13,10 +13,13 @@ from pydantic import BaseModel
from duck_core.approvals.service import ApprovalService from duck_core.approvals.service import ApprovalService
from duck_core.config import get_settings from duck_core.config import get_settings
from duck_core.context_builder import ContextBuilder
from duck_core.conversations.store import ConversationStore from duck_core.conversations.store import ConversationStore
from duck_core.events.store import EventStore from duck_core.events.store import EventStore
from duck_core.experience.recorder import ExperienceRecorder from duck_core.experience.recorder import ExperienceRecorder
from duck_core.memory.policy import MemoryPolicy
from duck_core.memory.store import MemoryStore from duck_core.memory.store import MemoryStore
from duck_core.memory.vector_memory import VectorMemory
from duck_core.model_client import ModelClient from duck_core.model_client import ModelClient
from duck_core.runtime_loop import RuntimeLoop from duck_core.runtime_loop import RuntimeLoop
from duck_core.skills.registry import SkillRegistry from duck_core.skills.registry import SkillRegistry
@ -74,11 +77,27 @@ def create_app() -> FastAPI:
event_store = EventStore(settings.db_path) event_store = EventStore(settings.db_path)
conversations = ConversationStore(settings.db_path) conversations = ConversationStore(settings.db_path)
model_client = ModelClient() model_client = ModelClient()
context_builder = ContextBuilder(model_client=model_client)
approvals = ApprovalService(settings.db_path) approvals = ApprovalService(settings.db_path)
runtime = RuntimeLoop(task_store, event_store, model_client, approval_service=approvals) memory_policy = MemoryPolicy(model_client=model_client)
skills = SkillRegistry("skills")
experience = ExperienceRecorder(settings.db_path)
memory_store = MemoryStore(settings.db_path) memory_store = MemoryStore(settings.db_path)
vector_memory = VectorMemory(
qdrant_url=settings.qdrant_url,
local_embedding_model="./models/all-MiniLM-L6-v2",
)
experience = ExperienceRecorder(settings.db_path)
runtime = RuntimeLoop(
task_store,
event_store,
model_client,
context_builder=context_builder,
approval_service=approvals,
memory_policy=memory_policy,
memory_store=memory_store,
vector_memory=vector_memory,
experience_recorder=experience,
)
skills = SkillRegistry("skills")
@app.on_event("startup") @app.on_event("startup")
async def startup() -> None: async def startup() -> None:
@ -147,6 +166,14 @@ def create_app() -> FastAPI:
memory_records = await relevant_memory( memory_records = await relevant_memory(
body.message, conversation.workspace, conversation.conversation_id body.message, conversation.workspace, conversation.conversation_id
) )
# Use recall-role to filter relevant memories via LLM
if memory_records and runtime.context_builder._model_client is not None:
try:
memory_records = await runtime.context_builder.recall_relevant_memory(
body.message, memory_records
)
except Exception:
pass # Fallback to unfiltered memory_records
result = await runtime.run_chat( result = await runtime.run_chat(
body.message, body.message,
conversation.workspace, conversation.workspace,
@ -265,7 +292,7 @@ def create_app() -> FastAPI:
reasoning_parts: list[str] = [] reasoning_parts: list[str] = []
content_parts: list[str] = [] content_parts: list[str] = []
try: try:
messages = runtime.context_builder.build_basic_messages( messages = await runtime.context_builder.build_async_messages(
task, history, memory_records task, history, memory_records
) )
tool_observations = await runtime._run_action_loop( tool_observations = await runtime._run_action_loop(
@ -479,7 +506,7 @@ def create_app() -> FastAPI:
tool_observation = await runtime._run_approved_or_denied_action( tool_observation = await runtime._run_approved_or_denied_action(
task_id, approval.normalized_action, approval.decision task_id, approval.normalized_action, approval.decision
) )
messages = runtime.context_builder.build_basic_messages(task) messages = await runtime.context_builder.build_async_messages(task)
tool_observations = [tool_observation] tool_observations = [tool_observation]
if approval.decision != "deny" and not has_password_request(tool_observations): if approval.decision != "deny" and not has_password_request(tool_observations):
tool_observations = await runtime._run_action_loop( tool_observations = await runtime._run_action_loop(
@ -667,7 +694,7 @@ def create_app() -> FastAPI:
approval.decision, approval.decision,
password=body.password, password=body.password,
) )
messages = runtime.context_builder.build_basic_messages(task) messages = await runtime.context_builder.build_async_messages(task)
tool_observations = [tool_observation] tool_observations = [tool_observation]
if not has_password_request(tool_observations): if not has_password_request(tool_observations):
tool_observations = await runtime._run_action_loop( tool_observations = await runtime._run_action_loop(

View File

@ -1,32 +1,348 @@
from __future__ import annotations
import json
import logging
from typing import Any
from duck_core.tasks.state import TaskState from duck_core.tasks.state import TaskState
logger = logging.getLogger(__name__)
# Approximate tokens per character (rough heuristic: ~4 chars per token)
_CHARS_PER_TOKEN = 4
def estimate_tokens(text: str) -> int:
"""Rough token estimate based on character count."""
return max(len(text) // _CHARS_PER_TOKEN, 1)
def estimate_messages_tokens(messages: list[dict[str, str]]) -> int:
"""Estimate total token count for a list of messages."""
total = 0
for msg in messages:
total += estimate_tokens(msg.get("content", "")) + 4 # role + formatting overhead
return total
class ContextBuilder: class ContextBuilder:
"""Builds context messages with token budget awareness.
Priority order (highest first):
1. Current user message (always kept)
2. Active task state
3. Selected skill summary
4. Recent tool observations
5. Relevant memory
6. Summarized old events / history
7. Full conversation history (remaining budget)
"""
def __init__(
self,
max_input_tokens: int = 49152,
max_memory_tokens: int = 8000,
max_history_tokens: int = 12000,
summary_role: str = "summary",
recall_role: str = "recall",
model_client: Any | None = None,
):
self.max_input_tokens = max_input_tokens
self.max_memory_tokens = max_memory_tokens
self.max_history_tokens = max_history_tokens
self.summary_role = summary_role
self.recall_role = recall_role
self._model_client = model_client
async def recall_relevant_memory(
self,
query: str,
memory_records: list[dict[str, str]],
) -> list[dict[str, str]]:
"""Use recall-role LLM to filter memory records by relevance.
Returns only the memories that are relevant to the query.
Falls back to returning all records if LLM is unavailable.
"""
if not memory_records or self._model_client is None:
return memory_records
try:
return await self._llm_recall(query, memory_records)
except Exception as exc:
logger.warning("Recall failed, using all memories: %s", exc)
return memory_records
async def _llm_recall(
self,
query: str,
memory_records: list[dict[str, str]],
) -> list[dict[str, str]]:
"""Call recall-role LLM to identify relevant memories."""
memories_text = "\n".join(
f"[{m.get('memory_id', i)}] {m.get('text', '')}"
for i, m in enumerate(memory_records)
)
response = await self._model_client.chat(
self.recall_role,
[{
"role": "user",
"content": (
f"User query: {query}\n\n"
f"Available memories:\n{memories_text}"
),
}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "recall_result",
"schema": {
"type": "object",
"required": ["relevant_ids", "reasoning"],
"additionalProperties": False,
"properties": {
"relevant_ids": {
"type": "array",
"items": {"type": "string"},
},
"reasoning": {"type": "string"},
},
},
"strict": True,
},
},
)
data = json.loads(response.content)
relevant_ids = set(data.get("relevant_ids", []))
if not relevant_ids:
return []
return [m for i, m in enumerate(memory_records) if m.get("memory_id", str(i)) in relevant_ids]
def build_basic_messages( def build_basic_messages(
self, self,
task: TaskState, task: TaskState,
history_messages: list[dict[str, str]] | None = None, history_messages: list[dict[str, str]] | None = None,
memory_records: list[dict[str, str]] | None = None, memory_records: list[dict[str, str]] | None = None,
tool_observations: list[dict[str, Any]] | None = None,
skill_summary: str | None = None,
) -> list[dict[str, str]]: ) -> list[dict[str, str]]:
memory_messages = [] """Build context messages respecting token budget.
Args:
task: Current task state.
history_messages: Previous conversation messages.
memory_records: Relevant memory records.
tool_observations: Recent tool call results.
skill_summary: Selected skill description.
"""
messages: list[dict[str, str]] = []
budget_remaining = self.max_input_tokens
# 1. System-level context (memory + skill)
system_parts: list[str] = []
# Memory records
if memory_records: if memory_records:
lines = [ memory_text = self._format_memory(memory_records)
f"- {record.get('scope', 'memory')}: {record.get('text', '')}" mem_tokens = estimate_tokens(memory_text)
for record in memory_records if mem_tokens > self.max_memory_tokens:
if record.get("text") # Truncate memory to fit budget
] memory_text = self._truncate_text(memory_text, self.max_memory_tokens)
if lines: system_parts.append(memory_text)
memory_messages.append( budget_remaining -= estimate_tokens(memory_text)
{
# Skill summary
if skill_summary:
skill_text = f"Active skill:\n{skill_summary}"
system_parts.append(skill_text)
budget_remaining -= estimate_tokens(skill_text)
if system_parts:
messages.append({
"role": "system", "role": "system",
"content": "Relevant memory:\n" + "\n".join(lines), "content": "\n\n".join(system_parts),
} })
)
return [ # 2. Tool observations (recent, high priority)
*memory_messages, if tool_observations:
*(history_messages or []), obs_text = "Tool observations:\n" + self._format_observations(tool_observations)
{ obs_tokens = estimate_tokens(obs_text)
if obs_tokens > budget_remaining * 0.4:
# Don't let observations consume more than 40% of remaining budget
obs_text = self._truncate_text(obs_text, int(budget_remaining * 0.4))
obs_tokens = estimate_tokens(obs_text)
messages.append({"role": "user", "content": obs_text})
budget_remaining -= obs_tokens
# 3. Conversation history (lower priority, may be summarized)
if history_messages:
hist_tokens = estimate_messages_tokens(history_messages)
if hist_tokens <= budget_remaining:
messages.extend(history_messages)
budget_remaining -= hist_tokens
elif budget_remaining > 100:
# Summarize old history if we have some budget left
summarized = self._summarize_history(history_messages, budget_remaining)
if summarized:
messages.append({
"role": "system",
"content": f"Conversation summary:\n{summarized}",
})
# else: no budget for history at all
# 4. Current user message (always last, always included)
messages.append({
"role": "user", "role": "user",
"content": task.user_message, "content": task.user_message,
}, })
return messages
async def build_async_messages(
self,
task: TaskState,
history_messages: list[dict[str, str]] | None = None,
memory_records: list[dict[str, str]] | None = None,
tool_observations: list[dict[str, Any]] | None = None,
skill_summary: str | None = None,
) -> list[dict[str, str]]:
"""Async context builder variant that can use LLM summarization."""
messages: list[dict[str, str]] = []
budget_remaining = self.max_input_tokens
system_parts: list[str] = []
if memory_records:
memory_text = self._format_memory(memory_records)
mem_tokens = estimate_tokens(memory_text)
if mem_tokens > self.max_memory_tokens:
memory_text = self._truncate_text(memory_text, self.max_memory_tokens)
system_parts.append(memory_text)
budget_remaining -= estimate_tokens(memory_text)
if skill_summary:
skill_text = f"Active skill:\n{skill_summary}"
system_parts.append(skill_text)
budget_remaining -= estimate_tokens(skill_text)
if system_parts:
messages.append({"role": "system", "content": "\n\n".join(system_parts)})
if tool_observations:
obs_text = "Tool observations:\n" + self._format_observations(tool_observations)
obs_tokens = estimate_tokens(obs_text)
if obs_tokens > budget_remaining * 0.4:
obs_text = self._truncate_text(obs_text, int(budget_remaining * 0.4))
obs_tokens = estimate_tokens(obs_text)
messages.append({"role": "user", "content": obs_text})
budget_remaining -= obs_tokens
if history_messages:
hist_tokens = estimate_messages_tokens(history_messages)
if hist_tokens <= budget_remaining:
messages.extend(history_messages)
budget_remaining -= hist_tokens
elif budget_remaining > 100:
summarized = await self._summarize_history_async(
history_messages, budget_remaining
)
if summarized:
messages.append({
"role": "system",
"content": f"Conversation summary:\n{summarized}",
})
messages.append({"role": "user", "content": task.user_message})
return messages
def _format_memory(self, records: list[dict[str, str]]) -> str:
lines = [
f"- {record.get('scope', 'memory')}: {record.get('text', '')}"
for record in records
if record.get("text")
] ]
return "Relevant memory:\n" + "\n".join(lines) if lines else ""
def _format_observations(self, observations: list[dict[str, Any]]) -> str:
parts = []
for obs in observations:
tool = obs.get("tool", "unknown")
result = obs.get("result", {})
ok = result.get("ok", False)
output = result.get("output", "")
error = result.get("error", "")
status = "ok" if ok else "error"
part = f"- {tool} ({status})"
if output:
part += f"\n output: {output[:200]}"
if error:
part += f"\n error: {error[:200]}"
parts.append(part)
return "\n".join(parts)
def _truncate_text(self, text: str, max_tokens: int) -> str:
"""Truncate text to fit within max_tokens."""
max_chars = max_tokens * _CHARS_PER_TOKEN
if len(text) <= max_chars:
return text
return text[:max_chars] + "\n... (truncated)"
def _summarize_history(
self,
history: list[dict[str, str]],
budget_tokens: int,
) -> str | None:
"""Summarize conversation history to fit budget.
Synchronous callers use deterministic truncation. Runtime code should
call build_async_messages() when LLM summarization is desired.
"""
if not history:
return None
result = []
remaining = budget_tokens
for msg in reversed(history):
tokens = estimate_tokens(msg.get("content", "")) + 4
if tokens > remaining:
break
result.append(f"{msg['role']}: {msg['content'][:100]}")
remaining -= tokens
return "\n".join(reversed(result)) if result else None
async def _summarize_history_async(
self,
history: list[dict[str, str]],
budget_tokens: int,
) -> str | None:
if self._model_client is None:
return self._summarize_history(history, budget_tokens)
summarized = await self._llm_summarize_history(history, budget_tokens)
return summarized or self._summarize_history(history, budget_tokens)
async def _llm_summarize_history(
self,
history: list[dict[str, str]],
budget_tokens: int,
) -> str | None:
"""Use summary-role LLM to compress history."""
try:
history_text = "\n".join(
f"{m['role']}: {m.get('content', '')}" for m in history
)
response = await self._model_client.chat(
self.summary_role,
[{
"role": "user",
"content": (
"Summarize this conversation history. Keep decisions, outcomes, "
"and key facts. Be concise.\n\n"
+ history_text
),
}],
)
summary = response.content
# Ensure summary fits budget
return self._truncate_text(summary, budget_tokens)
except Exception as exc:
logger.warning("History summarization failed: %s", exc)
return None

View File

@ -1,16 +1,82 @@
from __future__ import annotations
import json
import logging
from typing import Any
from pydantic import BaseModel from pydantic import BaseModel
logger = logging.getLogger(__name__)
class MemoryDecision(BaseModel): class MemoryDecision(BaseModel):
should_store: bool should_store: bool = False
memory_type: str memory_type: str = "note"
summary: str summary: str = ""
importance: float importance: float = 0.0
scope: str = "workspace"
metadata: dict[str, str] = {} metadata: dict[str, str] = {}
class MemoryPolicy: class MemoryPolicy:
"""Decides whether task output should be stored in memory.
When *model_client* is provided, uses an LLM call to classify the task
transcript. Falls back to a safe default (should_store=False) on any
error so the runtime is never blocked by policy failures.
"""
_PROMPT_SYSTEM = (
"You are DuckLM memory policy. Decide whether the given task transcript "
"contains information worth storing in long-term memory.\n\n"
"Return ONLY valid JSON with these keys:\n"
" should_store: boolean — true if this is worth remembering\n"
" memory_type: string — one of: fact, preference, lesson, decision, event, note\n"
" summary: string — concise one-sentence summary (max 200 chars)\n"
" importance: number — 0.0 to 1.0\n"
" scope: string — one of: global, workspace, conversation\n"
" metadata: object — optional extra key-value pairs\n\n"
"Rules:\n"
"- Store user preferences, important decisions, reusable lessons, key facts.\n"
"- Do NOT store routine tool calls, temporary state, or trivial observations.\n"
"- importance >= 0.7 for preferences and lessons, >= 0.4 for facts, < 0.4 for events.\n"
"- scope='global' for user preferences and system-wide facts.\n"
"- scope='workspace' for project-specific information.\n"
"- scope='conversation' for chat-specific context.\n"
)
_RESPONSE_SCHEMA = {
"type": "object",
"required": ["should_store", "memory_type", "summary", "importance", "scope", "metadata"],
"additionalProperties": False,
"properties": {
"should_store": {"type": "boolean"},
"memory_type": {
"type": "string",
"enum": ["fact", "preference", "lesson", "decision", "event", "note"],
},
"summary": {"type": "string", "maxLength": 300},
"importance": {"type": "number", "minimum": 0.0, "maximum": 1.0},
"scope": {"type": "string", "enum": ["global", "workspace", "conversation"]},
"metadata": {"type": "object", "additionalProperties": {"type": "string"}},
},
}
def __init__(
self,
model_client: Any | None = None,
role: str = "memory_policy",
):
self._model_client = model_client
self._role = role
async def classify(self, summary: str, task_id: str) -> MemoryDecision: async def classify(self, summary: str, task_id: str) -> MemoryDecision:
"""Classify whether *summary* from *task_id* should be stored in memory.
If no model client is configured, returns the safe default
(should_store=False) the old stub behaviour.
"""
if self._model_client is None:
return MemoryDecision( return MemoryDecision(
should_store=False, should_store=False,
memory_type="event", memory_type="event",
@ -18,3 +84,77 @@ class MemoryPolicy:
importance=0.0, importance=0.0,
metadata={"task_id": task_id, "source": "stub_policy"}, metadata={"task_id": task_id, "source": "stub_policy"},
) )
return await self._classify_with_llm(summary, task_id)
async def _classify_with_llm(self, summary: str, task_id: str) -> MemoryDecision:
messages = [
{
"role": "user",
"content": f"Task ID: {task_id}\n\nTranscript:\n{summary}",
}
]
response_format = {
"type": "json_schema",
"json_schema": {
"name": "memory_decision",
"schema": self._RESPONSE_SCHEMA,
"strict": True,
},
}
try:
response = await self._model_client.chat(
self._role,
messages,
response_format=response_format,
)
except Exception as exc:
logger.warning("MemoryPolicy LLM call failed for %s: %s", task_id, exc)
return MemoryDecision(
should_store=False,
memory_type="event",
summary=summary,
importance=0.0,
metadata={"task_id": task_id, "source": "llm_policy_fallback"},
)
return self._parse_response(response.content, summary, task_id)
def _parse_response(self, content: str, summary: str, task_id: str) -> MemoryDecision:
try:
data = json.loads(content)
except (json.JSONDecodeError, TypeError):
logger.warning("MemoryPolicy: invalid JSON for %s: %s", task_id, content[:200])
return MemoryDecision(
should_store=False,
memory_type="event",
summary=summary,
importance=0.0,
metadata={"task_id": task_id, "source": "llm_policy_fallback"},
)
required = ("should_store", "memory_type", "summary", "importance", "scope")
if not all(key in data for key in required):
logger.warning("MemoryPolicy: missing fields for %s: %s", task_id, list(data.keys()))
return MemoryDecision(
should_store=False,
memory_type="event",
summary=summary,
importance=0.0,
metadata={"task_id": task_id, "source": "llm_policy_fallback"},
)
return MemoryDecision(
should_store=bool(data.get("should_store", False)),
memory_type=str(data.get("memory_type", "note")),
summary=str(data.get("summary", summary))[:300],
importance=float(max(0.0, min(data.get("importance", 0.0), 1.0))),
scope=str(data.get("scope", "workspace")),
metadata={
"task_id": task_id,
"source": "llm_policy",
**{str(k): str(v) for k, v in data.get("metadata", {}).items()},
},
)

View File

@ -1,28 +1,70 @@
from __future__ import annotations
import logging
from pathlib import Path
from typing import Any from typing import Any
from uuid import uuid4 from uuid import uuid4
import httpx import httpx
logger = logging.getLogger(__name__)
class EmbeddingsUnavailableError(RuntimeError): class EmbeddingsUnavailableError(RuntimeError):
pass pass
class VectorMemory: class VectorMemory:
"""Semantic memory using Qdrant for vector storage and sentence-transformers for embeddings.
Supports two modes:
1. Local sentence-transformers model (default): uses all-MiniLM-L6-v2
2. Remote embeddings endpoint: uses llama-server or OpenAI-compatible /v1/embeddings
"""
def __init__( def __init__(
self, self,
qdrant_url: str, qdrant_url: str,
collection_name: str = "duck_memory", collection_name: str = "duck_memory",
embeddings_base_url: str | None = "http://127.0.0.1:8081/v1", embeddings_base_url: str | None = None,
local_embedding_model: str | None = "all-MiniLM-L6-v2",
): ):
self.qdrant_url = qdrant_url.rstrip("/") self.qdrant_url = qdrant_url.rstrip("/")
self.collection_name = collection_name self.collection_name = collection_name
self.embeddings_base_url = embeddings_base_url.rstrip("/") if embeddings_base_url else None self.embeddings_base_url = embeddings_base_url.rstrip("/") if embeddings_base_url else None
self._local_model_path = local_embedding_model
self._local_model = None
def _load_local_model(self):
"""Lazy-load the sentence-transformers model."""
if self._local_model is not None:
return self._local_model
try:
from sentence_transformers import SentenceTransformer
model_path = self._local_model_path
# Check if it's a local path or a HuggingFace model name
if Path(model_path).exists():
logger.info("Loading local embedding model from %s", model_path)
self._local_model = SentenceTransformer(model_path)
else:
logger.info("Loading embedding model from HuggingFace: %s", model_path)
self._local_model = SentenceTransformer(model_path)
return self._local_model
except ImportError:
raise EmbeddingsUnavailableError(
"sentence-transformers is not installed. "
"Install with: pip install sentence-transformers"
)
except Exception as exc:
raise EmbeddingsUnavailableError(f"Failed to load embedding model: {exc}")
async def add_memory(self, text: str, metadata: dict[str, Any] | None = None) -> str: async def add_memory(self, text: str, metadata: dict[str, Any] | None = None) -> str:
vector = await self._embed(text) vector = await self._embed(text)
point_id = str(uuid4()) point_id = str(uuid4())
async with httpx.AsyncClient(timeout=20.0, trust_env=False) as client: async with httpx.AsyncClient(timeout=20.0, trust_env=False) as client:
# Create collection if not exists (ignore error if already exists)
await client.put( await client.put(
f"{self.qdrant_url}/collections/{self.collection_name}", f"{self.qdrant_url}/collections/{self.collection_name}",
json={"vectors": {"size": len(vector), "distance": "Cosine"}}, json={"vectors": {"size": len(vector), "distance": "Cosine"}},
@ -53,10 +95,32 @@ class VectorMemory:
return response.json().get("result", []) return response.json().get("result", [])
async def _embed(self, text: str) -> list[float]: async def _embed(self, text: str) -> list[float]:
if not self.embeddings_base_url: """Generate embeddings using local model or remote endpoint."""
# Prefer local model if available
if self._local_model_path and not self.embeddings_base_url:
return await self._local_embed(text)
# Fall back to remote endpoint
if self.embeddings_base_url:
return await self._remote_embed(text)
raise EmbeddingsUnavailableError( raise EmbeddingsUnavailableError(
"Embeddings endpoint is not configured; vector memory is explicit stub." "No embedding source configured. Set local_embedding_model or embeddings_base_url."
) )
async def _local_embed(self, text: str) -> list[float]:
"""Generate embeddings using local sentence-transformers model."""
import asyncio
model = self._load_local_model()
# Run in thread pool to not block event loop
loop = asyncio.get_event_loop()
vector = await loop.run_in_executor(None, lambda: model.encode(text))
# Handle both numpy arrays and plain lists
if hasattr(vector, "tolist"):
vector = vector.tolist()
return [float(v) for v in vector]
async def _remote_embed(self, text: str) -> list[float]:
"""Generate embeddings using remote /v1/embeddings endpoint."""
async with httpx.AsyncClient(timeout=20.0, trust_env=False) as client: async with httpx.AsyncClient(timeout=20.0, trust_env=False) as client:
response = await client.post( response = await client.post(
f"{self.embeddings_base_url}/embeddings", f"{self.embeddings_base_url}/embeddings",

View File

@ -1,14 +1,22 @@
import json import json
import logging
from dataclasses import dataclass from dataclasses import dataclass
from typing import Any from typing import Any
from duck_core.approvals.service import ApprovalService from duck_core.approvals.service import ApprovalService
from duck_core.context_builder import ContextBuilder from duck_core.context_builder import ContextBuilder
from duck_core.events.store import EventStore from duck_core.events.store import EventStore
from duck_core.experience.recorder import ExperienceRecorder
from duck_core.memory.policy import MemoryPolicy
from duck_core.memory.store import MemoryStore
from duck_core.memory.vector_memory import VectorMemory
from duck_core.model_client import ModelClient from duck_core.model_client import ModelClient
from duck_core.reflection import Reflection
from duck_core.tasks.store import TaskStore from duck_core.tasks.store import TaskStore
from duck_core.tools.gateway import ToolGateway
from duck_core.tools.base import ToolResult from duck_core.tools.base import ToolResult
from duck_core.tools.gateway import ToolGateway
logger = logging.getLogger(__name__)
@dataclass @dataclass
@ -27,13 +35,23 @@ class RuntimeLoop:
model_client: ModelClient | None = None, model_client: ModelClient | None = None,
context_builder: ContextBuilder | None = None, context_builder: ContextBuilder | None = None,
approval_service: ApprovalService | None = None, approval_service: ApprovalService | None = None,
memory_policy: MemoryPolicy | None = None,
memory_store: MemoryStore | None = None,
vector_memory: VectorMemory | None = None,
experience_recorder: ExperienceRecorder | None = None,
max_tool_iterations: int = 4, max_tool_iterations: int = 4,
): ):
self.task_store = task_store self.task_store = task_store
self.event_store = event_store self.event_store = event_store
self.model_client = model_client or ModelClient() self.model_client = model_client or ModelClient()
self.context_builder = context_builder or ContextBuilder() self.context_builder = context_builder or ContextBuilder(
model_client=self.model_client
)
self.approval_service = approval_service self.approval_service = approval_service
self.memory_policy = memory_policy or MemoryPolicy(model_client=self.model_client)
self.memory_store = memory_store
self.vector_memory = vector_memory
self.experience_recorder = experience_recorder
self.max_tool_iterations = max_tool_iterations self.max_tool_iterations = max_tool_iterations
async def run_chat( async def run_chat(
@ -43,6 +61,7 @@ class RuntimeLoop:
debug: bool = False, debug: bool = False,
history_messages: list[dict[str, str]] | None = None, history_messages: list[dict[str, str]] | None = None,
memory_records: list[dict[str, str]] | None = None, memory_records: list[dict[str, str]] | None = None,
reflect: bool = True,
) -> ChatResult: ) -> ChatResult:
task = await self.task_store.create_task(message, workspace, debug) task = await self.task_store.create_task(message, workspace, debug)
await self.event_store.append( await self.event_store.append(
@ -51,7 +70,7 @@ class RuntimeLoop:
{"message": message, "workspace": workspace, "debug": debug}, {"message": message, "workspace": workspace, "debug": debug},
) )
try: try:
messages = self.context_builder.build_basic_messages( messages = await self.context_builder.build_async_messages(
task, history_messages, memory_records task, history_messages, memory_records
) )
tool_observations = await self._run_action_loop(task.task_id, messages, workspace) tool_observations = await self._run_action_loop(task.task_id, messages, workspace)
@ -111,6 +130,9 @@ class RuntimeLoop:
"reasoning_content": response.reasoning_content, "reasoning_content": response.reasoning_content,
}, },
) )
await self._run_memory_policy(task.task_id, response.content)
if reflect:
await self._run_reflection(task.task_id)
return ChatResult( return ChatResult(
task_id=task.task_id, task_id=task.task_id,
status="completed", status="completed",
@ -172,7 +194,7 @@ class RuntimeLoop:
tool_observation = await self._run_approved_or_denied_action( tool_observation = await self._run_approved_or_denied_action(
task_id, approval.normalized_action, approval.decision task_id, approval.normalized_action, approval.decision
) )
messages = self.context_builder.build_basic_messages(task) messages = await self.context_builder.build_async_messages(task)
tool_observations = [tool_observation] tool_observations = [tool_observation]
if approval.decision != "deny": if approval.decision != "deny":
tool_observations = await self._run_action_loop( tool_observations = await self._run_action_loop(
@ -250,6 +272,83 @@ class RuntimeLoop:
reasoning_content=None, reasoning_content=None,
) )
async def _run_memory_policy(self, task_id: str, final_response: str) -> None:
"""Classify task output and store in memory if policy says so."""
if self.memory_store is None:
return
try:
decision = await self.memory_policy.classify(final_response, task_id)
await self.event_store.append(
task_id,
"memory_policy_decision",
decision.model_dump(),
)
if decision.should_store:
task = await self.task_store.get_task(task_id)
memory_workspace = task.workspace if task and task.workspace else ""
await self.memory_store.add(
text=decision.summary,
workspace=memory_workspace,
scope=decision.scope,
memory_type=decision.memory_type,
importance=decision.importance,
metadata=decision.metadata,
)
if self.vector_memory is not None:
try:
await self.vector_memory.add_memory(
text=decision.summary,
metadata={"scope": decision.scope, "memory_type": decision.memory_type},
)
except Exception as vec_exc:
logger.warning("Vector memory store failed for %s: %s", task_id, vec_exc)
await self.event_store.append(
task_id,
"memory_stored",
{"summary": decision.summary, "scope": decision.scope},
)
except Exception as exc:
logger.warning("Memory policy failed for %s: %s", task_id, exc)
await self.event_store.append(
task_id,
"memory_policy_failed",
{"error": str(exc)},
)
async def _run_reflection(self, task_id: str) -> None:
"""Run critic reflection on completed task and record experience."""
if self.experience_recorder is None:
return
try:
events = await self.event_store.list_events(task_id)
transcript_lines = []
for event in events:
line = f"[{event.event_type}] {json.dumps(event.payload, ensure_ascii=False)}"
transcript_lines.append(line)
transcript = "\n".join(transcript_lines)
reflection = Reflection(
model_client=self.model_client,
recorder=self.experience_recorder,
)
record = await reflection.reflect(task_id, transcript)
await self.event_store.append(
task_id,
"reflection_completed",
{
"record_id": record.id,
"summary": record.summary[:200],
"reusable_lesson": record.reusable_lesson[:200] if record.reusable_lesson else None,
},
)
except Exception as exc:
logger.warning("Reflection failed for %s: %s", task_id, exc)
await self.event_store.append(
task_id,
"reflection_failed",
{"error": str(exc)},
)
async def _run_approved_or_denied_action( async def _run_approved_or_denied_action(
self, self,
task_id: str, task_id: str,

70
duck_core/tools/coder.py Normal file
View File

@ -0,0 +1,70 @@
from __future__ import annotations
import logging
from typing import Any
from duck_core.model_client import ModelClient
from duck_core.tools.base import ToolResult
logger = logging.getLogger(__name__)
class CoderTool:
"""Tool that delegates code generation to the coder-role LLM.
The coder model receives the task description and relevant context,
then returns code or technical analysis.
"""
name = "coder"
risk_level = "low"
def __init__(
self,
model_client: ModelClient | None = None,
role: str = "coder",
max_output_tokens: int = 16384,
):
self._model_client = model_client
self._role = role
self._max_output_tokens = max_output_tokens
async def run(self, args: dict[str, Any]) -> ToolResult:
task_description = str(args.get("task_description", "")).strip()
if not task_description:
return ToolResult(ok=False, error="task_description is required for coder tool")
context = str(args.get("context", "")).strip()
language = str(args.get("language", "python")).strip()
prompt_parts = [f"Task: {task_description}"]
if language:
prompt_parts.append(f"Language: {language}")
if context:
prompt_parts.append(f"Context:\n{context}")
messages = [{"role": "user", "content": "\n\n".join(prompt_parts)}]
try:
if self._model_client is None:
return ToolResult(
ok=False,
error="Coder tool has no model client configured",
)
response = await self._model_client.chat(
self._role,
messages,
max_output_tokens=self._max_output_tokens,
)
return ToolResult(
ok=True,
output=response.content,
metadata={
"role": self._role,
"model": response.model,
"latency_ms": response.latency_ms,
},
)
except Exception as exc:
logger.warning("Coder tool failed: %s", exc)
return ToolResult(ok=False, error=f"Coder tool failed: {exc}")

View File

@ -1,6 +1,7 @@
from typing import Any from typing import Any
from duck_core.tools.base import Tool, ToolResult from duck_core.tools.base import Tool, ToolResult
from duck_core.tools.coder import CoderTool
from duck_core.tools.file_read import FileReadTool from duck_core.tools.file_read import FileReadTool
from duck_core.tools.file_write import FileWriteTool from duck_core.tools.file_write import FileWriteTool
from duck_core.tools.list_dir import ListDirTool from duck_core.tools.list_dir import ListDirTool
@ -13,7 +14,7 @@ class ToolGateway:
self.tools = {tool.name: tool for tool in tools} self.tools = {tool.name: tool for tool in tools}
@classmethod @classmethod
def default(cls, workspace: str) -> "ToolGateway": def default(cls, workspace: str, model_client: Any = None) -> "ToolGateway":
return cls( return cls(
[ [
FileReadTool(workspace), FileReadTool(workspace),
@ -21,9 +22,19 @@ class ToolGateway:
ListDirTool(workspace), ListDirTool(workspace),
SearchFilesTool(workspace), SearchFilesTool(workspace),
ShellExecSafeTool(workspace), ShellExecSafeTool(workspace),
CoderTool(model_client=model_client),
] ]
) )
def with_model_client(self, model_client: Any) -> "ToolGateway":
"""Return a new gateway with model-dependent tools configured."""
new_tools = list(self.tools.values())
# Replace coder tool with one that has model_client
new_tools = [
t for t in new_tools if not isinstance(t, CoderTool)
] + [CoderTool(model_client=model_client)]
return self.__class__(new_tools)
async def run_action( async def run_action(
self, action: dict[str, Any], approved: bool = False, password: str | None = None self, action: dict[str, Any], approved: bool = False, password: str | None = None
) -> ToolResult: ) -> ToolResult:

View File

@ -14,6 +14,8 @@ Available tools:
Args: {"query": "text to find", "path": ".", "glob": "*.py"} Args: {"query": "text to find", "path": ".", "glob": "*.py"}
- shell_exec_safe: run a safe allowlisted shell command in the current workspace. - shell_exec_safe: run a safe allowlisted shell command in the current workspace.
Args: {"command": "pwd"} Args: {"command": "pwd"}
- coder: delegate a code generation or analysis task to the coder-role LLM.
Args: {"task_description": "what to build or analyze", "language": "python", "context": "optional context"}
Return actions=[] when the user can be answered directly without tools. Return actions=[] when the user can be answered directly without tools.
When tool_observations are already present, request only genuinely missing When tool_observations are already present, request only genuinely missing

View File

@ -0,0 +1,17 @@
You are DuckLM memory policy classifier. Decide whether a task transcript contains information worth storing in long-term memory.
Return ONLY valid JSON with these keys:
should_store: boolean — true if this is worth remembering
memory_type: string — one of: fact, preference, lesson, decision, event, note
summary: string — concise one-sentence summary (max 200 chars)
importance: number — 0.0 to 1.0
scope: string — one of: global, workspace, conversation
metadata: object — optional extra key-value pairs
Rules:
- Store user preferences, important decisions, reusable lessons, key facts.
- Do NOT store routine tool calls, temporary state, or trivial observations.
- importance >= 0.7 for preferences and lessons, >= 0.4 for facts, < 0.4 for events.
- scope='global' for user preferences and system-wide facts.
- scope='workspace' for project-specific information.
- scope='conversation' for chat-specific context.

14
prompts/roles/recall.md Normal file
View File

@ -0,0 +1,14 @@
You are DuckLM recall role. Given a user query and a list of memory records, identify which memories are relevant to the query.
Return ONLY valid JSON:
{
"relevant_ids": ["memory_id_1", "memory_id_2"],
"reasoning": "brief explanation of why these memories were selected"
}
Rules:
- Only include memories that are directly relevant to the user's current query
- Prefer specific memories over general ones
- Include global memories if they apply to the current context
- If no memories are relevant, return empty relevant_ids array
- Be conservative — better to include too few than too many irrelevant memories

View File

@ -13,7 +13,8 @@ dependencies = [
"python-dotenv", "python-dotenv",
"jsonschema", "jsonschema",
"aiosqlite", "aiosqlite",
"qdrant-client" "qdrant-client",
"sentence-transformers"
] ]
[project.optional-dependencies] [project.optional-dependencies]

View File

@ -46,15 +46,12 @@ def test_chat_api_uses_runtime_and_records_events(tmp_path, monkeypatch):
assert payload["status"] == "completed" assert payload["status"] == "completed"
assert "DuckLM" in payload["final_response"] assert "DuckLM" in payload["final_response"]
assert [event["event_type"] for event in events] == [ event_types = [event["event_type"] for event in events]
"task_created", # Core events that must always be present
"model_call_started", assert "task_created" in event_types
"action_directive_failed", assert "task_completed" in event_types
"model_call_started", # Memory policy decision is now recorded after task completion
"cognition_response", assert "memory_policy_decision" in event_types
"model_call_finished",
"task_completed",
]
def test_chat_api_exposes_pending_approval_from_runtime_tool_gate(tmp_path, monkeypatch): def test_chat_api_exposes_pending_approval_from_runtime_tool_gate(tmp_path, monkeypatch):

View File

@ -0,0 +1,197 @@
import pytest
from unittest.mock import AsyncMock
from duck_core.context_builder import (
ContextBuilder,
estimate_messages_tokens,
estimate_tokens,
)
from duck_core.model_client import ModelResponse
from duck_core.tasks.state import TaskState
def _make_task(message: str = "test") -> TaskState:
return TaskState(
task_id="task_1",
status="running",
user_message=message,
workspace="/tmp/test",
debug=False,
created_at="now",
updated_at="now",
)
def test_estimate_tokens_approximate():
assert estimate_tokens("hello world") == 2 # 11 chars / 4 = 2
assert estimate_tokens("") == 1 # minimum 1
assert estimate_tokens("a" * 400) == 100
def test_estimate_messages_tokens():
messages = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "world"},
]
# Each message: content tokens + 4 overhead
tokens = estimate_messages_tokens(messages)
assert tokens > 0
# "hello" = 5 chars / 4 = 1 token + 4 overhead = 5
# "world" = 5 chars / 4 = 1 token + 4 overhead = 5
assert tokens == 10
def test_context_builder_basic_messages():
builder = ContextBuilder()
task = _make_task("What is DuckLM?")
messages = builder.build_basic_messages(task)
assert len(messages) == 1
assert messages[0]["role"] == "user"
assert messages[0]["content"] == "What is DuckLM?"
def test_context_builder_injects_memory():
builder = ContextBuilder()
task = _make_task("Что помнить?")
messages = builder.build_basic_messages(
task,
memory_records=[
{"scope": "global", "text": "Use Russian."},
{"scope": "workspace", "text": "DuckLM uses Vulkan."},
],
)
assert messages[0]["role"] == "system"
assert "Relevant memory" in messages[0]["content"]
assert "global: Use Russian." in messages[0]["content"]
assert messages[-1]["content"] == "Что помнить?"
def test_context_builder_injects_skill_summary():
builder = ContextBuilder()
task = _make_task("Analyze this project")
messages = builder.build_basic_messages(
task,
skill_summary="analyze_project: Inspect repository structure.",
)
assert any("Active skill" in m.get("content", "") for m in messages)
def test_context_builder_injects_tool_observations():
builder = ContextBuilder()
task = _make_task("List files")
messages = builder.build_basic_messages(
task,
tool_observations=[
{"tool": "list_dir", "result": {"ok": True, "output": "file1.txt\nfile2.txt"}},
],
)
obs_msg = [m for m in messages if "Tool observations" in m.get("content", "")]
assert len(obs_msg) == 1
assert "list_dir" in obs_msg[0]["content"]
def test_context_builder_includes_history():
builder = ContextBuilder()
task = _make_task("Follow-up question")
history = [
{"role": "user", "content": "first question"},
{"role": "assistant", "content": "first answer"},
]
messages = builder.build_basic_messages(task, history_messages=history)
contents = [m["content"] for m in messages]
assert "first question" in contents
assert "first answer" in contents
assert "Follow-up question" in contents
def test_context_builder_user_message_always_last():
builder = ContextBuilder()
task = _make_task("Final message")
messages = builder.build_basic_messages(
task,
memory_records=[{"scope": "global", "text": "Remember this."}],
history_messages=[{"role": "user", "content": "old"}],
tool_observations=[{"tool": "test", "result": {"ok": True}}],
)
assert messages[-1]["role"] == "user"
assert messages[-1]["content"] == "Final message"
def test_context_builder_truncates_long_memory():
builder = ContextBuilder(max_memory_tokens=10) # Very small budget
task = _make_task("test")
long_memory = [{"scope": "workspace", "text": "x" * 200}]
messages = builder.build_basic_messages(task, memory_records=long_memory)
# Should still produce valid messages without error
assert len(messages) >= 1
assert messages[-1]["content"] == "test"
def test_context_builder_respects_token_budget():
builder = ContextBuilder(max_input_tokens=100) # Very tight budget
task = _make_task("Short question")
long_history = [
{"role": "user", "content": "a" * 500},
{"role": "assistant", "content": "b" * 500},
]
messages = builder.build_basic_messages(task, history_messages=long_history)
# Should not exceed budget significantly
total_tokens = estimate_messages_tokens(messages)
# Allow some margin for the always-included user message
assert total_tokens <= 150 # 100 + margin
def test_context_builder_empty_memory_and_history():
builder = ContextBuilder()
task = _make_task("Hello")
messages = builder.build_basic_messages(task)
assert len(messages) == 1
assert messages[0]["content"] == "Hello"
@pytest.mark.asyncio
async def test_context_builder_recall_awaits_model_client():
model_client = AsyncMock()
model_client.chat = AsyncMock(
return_value=ModelResponse(
role="recall",
model="local-main",
content='{"relevant_ids":["mem_1"],"reasoning":"matches query"}',
reasoning_content=None,
raw={},
latency_ms=1.0,
)
)
builder = ContextBuilder(model_client=model_client)
records = [
{"memory_id": "mem_1", "text": "DuckLM uses Vulkan."},
{"memory_id": "mem_2", "text": "Unrelated."},
]
relevant = await builder.recall_relevant_memory("How does DuckLM run?", records)
assert relevant == [records[0]]
model_client.chat.assert_awaited_once()
@pytest.mark.asyncio
async def test_context_builder_summary_awaits_model_client():
model_client = AsyncMock()
model_client.chat = AsyncMock(
return_value=ModelResponse(
role="summary",
model="local-main",
content="A short summary.",
reasoning_content=None,
raw={},
latency_ms=1.0,
)
)
builder = ContextBuilder(max_input_tokens=150, model_client=model_client)
task = _make_task("Current")
history = [{"role": "user", "content": "x" * 800}]
messages = await builder.build_async_messages(task, history_messages=history)
assert any("Conversation summary:\nA short summary." in m["content"] for m in messages)
model_client.chat.assert_awaited_once()

View File

@ -125,5 +125,13 @@ def test_conversation_history_is_sent_to_model(tmp_path, monkeypatch):
json={"conversation_id": conversation["conversation_id"], "message": "second"}, json={"conversation_id": conversation["conversation_id"], "message": "second"},
) )
second_call_content = [message["content"] for message in seen_thinker_messages[-1]] # Filter out memory_policy and reflection calls — they use critic role
# with different message patterns
thinker_calls = [msgs for msgs in seen_thinker_messages if any(
msg.get("role") == "user"
and not msg.get("content", "").startswith("Task ID:")
and not msg.get("content", "").startswith("Reflect on this DuckLM task")
for msg in msgs
)]
second_call_content = [message["content"] for message in thinker_calls[-1]]
assert second_call_content == ["first", "answer 1", "second"] assert second_call_content == ["first", "answer 1", "second"]

View File

@ -0,0 +1,123 @@
import json
from unittest.mock import AsyncMock
import pytest
from duck_core.memory.policy import MemoryPolicy
from duck_core.model_client import ModelClient, ModelResponse
@pytest.fixture
def mock_model_client():
client = AsyncMock(spec=ModelClient)
client.chat = AsyncMock(
return_value=ModelResponse(
role="critic",
model="local-main",
content=json.dumps({
"should_store": True,
"memory_type": "preference",
"summary": "User prefers concise Russian answers.",
"importance": 0.9,
"scope": "global",
"metadata": {"source": "conversation"},
}),
reasoning_content=None,
raw={},
latency_ms=42.0,
)
)
return client
@pytest.mark.asyncio
async def test_memory_policy_stub_returns_should_store_false():
policy = MemoryPolicy()
decision = await policy.classify("some summary", "task_123")
assert decision.should_store is False
assert decision.memory_type == "event"
assert decision.importance == 0.0
assert decision.metadata["source"] == "stub_policy"
@pytest.mark.asyncio
async def test_llm_memory_policy_classifies_and_stores(mock_model_client):
policy = MemoryPolicy(model_client=mock_model_client, role="memory_policy")
decision = await policy.classify(
"User said they prefer short answers in Russian.", "task_456"
)
assert decision.should_store is True
assert decision.memory_type == "preference"
assert decision.importance == 0.9
assert decision.summary == "User prefers concise Russian answers."
mock_model_client.chat.assert_called_once()
call_args = mock_model_client.chat.call_args
# ModelClient.chat(role, messages, ...) — positional args
assert call_args.args[0] == "memory_policy"
messages = call_args.args[1]
assert len(messages) == 1
assert messages[0]["role"] == "user"
assert "User said they prefer short answers" in messages[0]["content"]
@pytest.mark.asyncio
async def test_llm_memory_policy_handles_non_storable(mock_model_client):
mock_model_client.chat.return_value = ModelResponse(
role="critic",
model="local-main",
content=json.dumps({
"should_store": False,
"memory_type": "event",
"summary": "Routine tool call, nothing to remember.",
"importance": 0.1,
"scope": "workspace",
"metadata": {},
}),
reasoning_content=None,
raw={},
latency_ms=30.0,
)
policy = MemoryPolicy(model_client=mock_model_client)
decision = await policy.classify("Ran ls -la in workspace.", "task_789")
assert decision.should_store is False
assert decision.importance == 0.1
@pytest.mark.asyncio
async def test_llm_memory_policy_uses_response_format(mock_model_client):
policy = MemoryPolicy(model_client=mock_model_client)
await policy.classify("test summary", "task_1")
call_args = mock_model_client.chat.call_args
assert call_args.kwargs["response_format"]["type"] == "json_schema"
@pytest.mark.asyncio
async def test_llm_memory_policy_invalid_json_falls_back(mock_model_client):
mock_model_client.chat.return_value = ModelResponse(
role="critic",
model="local-main",
content="not valid json {{{",
reasoning_content=None,
raw={},
latency_ms=10.0,
)
policy = MemoryPolicy(model_client=mock_model_client)
decision = await policy.classify("some summary", "task_x")
assert decision.should_store is False
assert decision.metadata["source"] == "llm_policy_fallback"
@pytest.mark.asyncio
async def test_llm_memory_policy_missing_fields_falls_back(mock_model_client):
mock_model_client.chat.return_value = ModelResponse(
role="critic",
model="local-main",
content=json.dumps({"should_store": True}),
reasoning_content=None,
raw={},
latency_ms=10.0,
)
policy = MemoryPolicy(model_client=mock_model_client)
decision = await policy.classify("some summary", "task_y")
assert decision.should_store is False
assert decision.metadata["source"] == "llm_policy_fallback"

View File

@ -2,8 +2,13 @@ from fastapi.testclient import TestClient
from duck_core.api import create_app from duck_core.api import create_app
from duck_core.context_builder import ContextBuilder from duck_core.context_builder import ContextBuilder
from duck_core.events.store import EventStore
from duck_core.memory.policy import MemoryDecision
from duck_core.memory.store import MemoryStore from duck_core.memory.store import MemoryStore
from duck_core.model_client import ModelResponse
from duck_core.runtime_loop import RuntimeLoop
from duck_core.tasks.state import TaskState from duck_core.tasks.state import TaskState
from duck_core.tasks.store import TaskStore
def test_memory_api_stores_workspace_scoped_notes(tmp_path, monkeypatch): def test_memory_api_stores_workspace_scoped_notes(tmp_path, monkeypatch):
@ -135,7 +140,7 @@ def test_chat_api_injects_relevant_memory_into_model_context(tmp_path, monkeypat
raw={}, raw={},
latency_ms=1.0, latency_ms=1.0,
) )
seen_messages.append(messages) seen_messages.append((role, messages))
return ModelResponse( return ModelResponse(
role=role, role=role,
model="local-main", model="local-main",
@ -145,8 +150,6 @@ def test_chat_api_injects_relevant_memory_into_model_context(tmp_path, monkeypat
latency_ms=1.0, latency_ms=1.0,
) )
from duck_core.model_client import ModelResponse
monkeypatch.setattr("duck_core.model_client.ModelClient.chat", fake_chat) monkeypatch.setattr("duck_core.model_client.ModelClient.chat", fake_chat)
client = TestClient(create_app()) client = TestClient(create_app())
conversation = client.post( conversation = client.post(
@ -170,5 +173,60 @@ def test_chat_api_injects_relevant_memory_into_model_context(tmp_path, monkeypat
}, },
) )
assert seen_messages[0][0]["role"] == "system" thinker_messages = [messages for role, messages in seen_messages if role == "thinker"]
assert "User prefers direct Russian answers." in seen_messages[0][0]["content"] assert thinker_messages[0][0]["role"] == "system"
assert "User prefers direct Russian answers." in thinker_messages[0][0]["content"]
async def test_runtime_memory_policy_stores_workspace_scoped_memory(tmp_path):
db_path = str(tmp_path / "duck.sqlite3")
task_store = TaskStore(db_path)
event_store = EventStore(db_path)
memory_store = MemoryStore(db_path)
await task_store.init()
await event_store.init()
await memory_store.init()
class FakeModelClient:
async def chat(self, role, messages, temperature=None, max_output_tokens=None, response_format=None):
if role == "action":
return ModelResponse(
role=role,
model="local-main",
content='{"kind":"action_directive","intent":"answer","risk_level":"none","actions":[]}',
reasoning_content=None,
raw={},
latency_ms=1.0,
)
return ModelResponse(
role=role,
model="local-main",
content="Use Vulkan for this workspace.",
reasoning_content=None,
raw={},
latency_ms=1.0,
)
class StorePolicy:
async def classify(self, summary: str, task_id: str) -> MemoryDecision:
return MemoryDecision(
should_store=True,
memory_type="fact",
summary="Workspace uses Vulkan.",
importance=0.8,
scope="workspace",
metadata={"task_id": task_id},
)
runtime = RuntimeLoop(
task_store,
event_store,
model_client=FakeModelClient(),
memory_policy=StorePolicy(),
memory_store=memory_store,
)
await runtime.run_chat("remember workspace fact", workspace="/tmp/duck", reflect=False)
relevant = await memory_store.relevant(workspace="/tmp/duck", query="vulkan")
assert [record.text for record in relevant] == ["Workspace uses Vulkan."]

View File

@ -0,0 +1,227 @@
import json
from unittest.mock import AsyncMock
import pytest
from duck_core.events.store import EventStore
from duck_core.experience.recorder import ExperienceRecorder
from duck_core.memory.policy import MemoryPolicy
from duck_core.memory.store import MemoryStore
from duck_core.model_client import ModelClient, ModelResponse
from duck_core.runtime_loop import RuntimeLoop
from duck_core.tasks.store import TaskStore
@pytest.fixture
def task_store(tmp_path):
store = TaskStore(str(tmp_path / "duck.sqlite3"))
return store
@pytest.fixture
def event_store(tmp_path):
store = EventStore(str(tmp_path / "duck.sqlite3"))
return store
@pytest.fixture
def memory_store(tmp_path):
store = MemoryStore(str(tmp_path / "duck.sqlite3"))
return store
@pytest.fixture
def experience_recorder(tmp_path):
recorder = ExperienceRecorder(str(tmp_path / "duck.sqlite3"))
return recorder
@pytest.fixture
def mock_model_client():
client = AsyncMock(spec=ModelClient)
client.chat = AsyncMock(
side_effect=[
# First call: action role — return empty actions
ModelResponse(
role="action",
model="local-main",
content=json.dumps({
"kind": "action_directive",
"intent": "answer directly",
"risk_level": "none",
"actions": [],
}),
reasoning_content=None,
raw={},
latency_ms=1.0,
),
# Second call: thinker role — final answer
ModelResponse(
role="thinker",
model="local-main",
content="DuckLM is a local cognitive runtime.",
reasoning_content=None,
raw={},
latency_ms=1.0,
),
# Third call: memory_policy role
ModelResponse(
role="critic",
model="local-main",
content=json.dumps({
"should_store": False,
"memory_type": "event",
"summary": "Routine answer, nothing to remember.",
"importance": 0.1,
"scope": "workspace",
"metadata": {},
}),
reasoning_content=None,
raw={},
latency_ms=1.0,
),
# Fourth call: critic role (reflection)
ModelResponse(
role="critic",
model="local-main",
content="Task completed successfully. No issues found. Reusable lesson: direct answers work well for simple queries.",
reasoning_content=None,
raw={},
latency_ms=1.0,
),
]
)
return client
@pytest.mark.asyncio
async def test_reflection_is_called_after_task_completion(
task_store, event_store, memory_store, experience_recorder, mock_model_client
):
policy = MemoryPolicy(model_client=mock_model_client)
runtime = RuntimeLoop(
task_store=task_store,
event_store=event_store,
model_client=mock_model_client,
memory_policy=policy,
memory_store=memory_store,
experience_recorder=experience_recorder,
)
result = await runtime.run_chat("What is DuckLM?", workspace="/tmp/test")
assert result.status == "completed"
assert "DuckLM" in result.final_response
# Check that reflection was called — experience record created
records = await experience_recorder.list_records()
assert len(records) == 1
assert records[0].task_id == result.task_id
assert "completed successfully" in records[0].reusable_lesson
# Check that reflection_completed event was recorded
events = await event_store.list_events(result.task_id)
event_types = [e.event_type for e in events]
assert "reflection_completed" in event_types
assert records[0].task_id == result.task_id
assert "completed successfully" in records[0].reusable_lesson
# Check that reflection_completed event was recorded
events = await event_store.list_events(result.task_id)
event_types = [e.event_type for e in events]
assert "reflection_completed" in event_types
@pytest.mark.asyncio
async def test_reflection_failure_does_not_break_task(
task_store, event_store, memory_store, experience_recorder
):
"""If reflection fails, the task should still complete successfully."""
client = AsyncMock(spec=ModelClient)
client.chat = AsyncMock(
side_effect=[
# Action: empty
ModelResponse(
role="action",
model="local-main",
content=json.dumps({
"kind": "action_directive",
"intent": "answer",
"risk_level": "none",
"actions": [],
}),
reasoning_content=None,
raw={},
latency_ms=1.0,
),
# Thinker: answer
ModelResponse(
role="thinker",
model="local-main",
content="Answer.",
reasoning_content=None,
raw={},
latency_ms=1.0,
),
# Memory policy
ModelResponse(
role="critic",
model="local-main",
content=json.dumps({
"should_store": False,
"memory_type": "event",
"summary": "Routine.",
"importance": 0.1,
"scope": "workspace",
"metadata": {},
}),
reasoning_content=None,
raw={},
latency_ms=1.0,
),
# Critic (reflection) — raises exception
ConnectionError("LLM unavailable"),
]
)
policy = MemoryPolicy(model_client=client)
runtime = RuntimeLoop(
task_store=task_store,
event_store=event_store,
model_client=client,
memory_policy=policy,
memory_store=memory_store,
experience_recorder=experience_recorder,
)
result = await runtime.run_chat("test", workspace="/tmp/test")
# Task should still complete
assert result.status == "completed"
# Reflection failure event should be recorded
events = await event_store.list_events(result.task_id)
event_types = [e.event_type for e in events]
assert "reflection_failed" in event_types
@pytest.mark.asyncio
async def test_reflection_not_called_when_disabled(
task_store, event_store, memory_store, mock_model_client
):
"""When reflect=False, no reflection should be called."""
policy = MemoryPolicy(model_client=mock_model_client)
runtime = RuntimeLoop(
task_store=task_store,
event_store=event_store,
model_client=mock_model_client,
memory_policy=policy,
memory_store=memory_store,
)
result = await runtime.run_chat("What is DuckLM?", workspace="/tmp/test", reflect=False)
assert result.status == "completed"
# mock_model_client.chat should have been called 3 times (action, thinker, memory_policy)
# NOT 4 times (no critic/reflection call)
assert mock_model_client.chat.call_count == 3

View File

@ -5,7 +5,11 @@ from duck_core.memory.vector_memory import EmbeddingsUnavailableError, VectorMem
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_vector_memory_stub_is_explicit_when_embeddings_unavailable(): async def test_vector_memory_stub_is_explicit_when_embeddings_unavailable():
memory = VectorMemory(qdrant_url="http://127.0.0.1:6333", embeddings_base_url=None) memory = VectorMemory(
qdrant_url="http://127.0.0.1:6333",
embeddings_base_url=None,
local_embedding_model=None,
)
with pytest.raises(EmbeddingsUnavailableError): with pytest.raises(EmbeddingsUnavailableError):
await memory.add_memory("remember this") await memory.add_memory("remember this")

View File

@ -0,0 +1,97 @@
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from duck_core.memory.vector_memory import VectorMemory, EmbeddingsUnavailableError
@pytest.mark.asyncio
async def test_vector_memory_uses_local_model():
"""Test VectorMemory with local sentence-transformers model (mocked)."""
vm = VectorMemory(
qdrant_url="http://localhost:6333",
local_embedding_model="./models/all-MiniLM-L6-v2",
)
# Mock the sentence-transformers model — encode returns a numpy-like list
mock_model = MagicMock()
mock_model.encode.return_value = [0.1] * 384 # all-MiniLM-L6-v2 produces 384-dim vectors
with patch.object(vm, "_load_local_model", return_value=mock_model):
with patch("httpx.AsyncClient") as mock_client_class:
mock_client = AsyncMock()
mock_client_class.return_value.__aenter__ = AsyncMock(return_value=mock_client)
mock_client_class.return_value.__aexit__ = AsyncMock(return_value=False)
put_response = MagicMock(status_code=200)
put_response.raise_for_status = MagicMock()
search_response = MagicMock(status_code=200)
search_response.raise_for_status = MagicMock()
search_response.json.return_value = {
"result": [{"id": "test-id", "payload": {"text": "test"}}]
}
mock_client.put.return_value = put_response
mock_client.post.return_value = search_response
point_id = await vm.add_memory("test memory", {"scope": "global"})
assert point_id is not None
results = await vm.search_memory("test query")
assert isinstance(results, list)
@pytest.mark.asyncio
async def test_vector_memory_no_embedding_source():
"""VectorMemory with no embedding source should raise."""
vm = VectorMemory(
qdrant_url="http://localhost:6333",
local_embedding_model=None,
embeddings_base_url=None,
)
with pytest.raises(EmbeddingsUnavailableError):
await vm.add_memory("test")
@pytest.mark.asyncio
async def test_vector_memory_remote_fallback():
"""Test VectorMemory with remote embeddings endpoint."""
vm = VectorMemory(
qdrant_url="http://localhost:6333",
local_embedding_model=None,
embeddings_base_url="http://localhost:8081/v1",
)
mock_embedding = [0.1] * 384
with patch("httpx.AsyncClient") as mock_client_class:
mock_client = AsyncMock()
mock_client_class.return_value.__aenter__ = AsyncMock(return_value=mock_client)
mock_client_class.return_value.__aexit__ = AsyncMock(return_value=False)
embed_response = MagicMock(status_code=200)
embed_response.json.return_value = {"data": [{"embedding": mock_embedding}]}
put_response = MagicMock(status_code=200)
put_response.raise_for_status = MagicMock()
mock_client.post.return_value = embed_response
mock_client.put.return_value = put_response
point_id = await vm.add_memory("test")
assert point_id is not None
@pytest.mark.asyncio
async def test_vector_memory_remote_503():
"""Remote embeddings returning 503 should raise EmbeddingsUnavailableError."""
vm = VectorMemory(
qdrant_url="http://localhost:6333",
local_embedding_model=None,
embeddings_base_url="http://localhost:8081/v1",
)
with patch("httpx.AsyncClient") as mock_client_class:
mock_client = AsyncMock()
mock_client_class.return_value.__aenter__ = AsyncMock(return_value=mock_client)
mock_client_class.return_value.__aexit__ = AsyncMock(return_value=False)
mock_client.post.return_value = AsyncMock(status_code=503)
with pytest.raises(EmbeddingsUnavailableError):
await vm.add_memory("test")