Integrate memory policy and reflection runtime
This commit is contained in:
parent
9a8f058008
commit
6f58df82c9
|
|
@ -0,0 +1,462 @@
|
||||||
|
# DuckLM — Текущее состояние проекта
|
||||||
|
|
||||||
|
**Дата анализа:** 2026-05-21
|
||||||
|
**Версия:** 0.2.0
|
||||||
|
**Расположение:** `~/git/ducklm_2`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Последние изменения (Phase 1-7)
|
||||||
|
|
||||||
|
### Phase 1: MemoryPolicy — LLM-классификация памяти ✅
|
||||||
|
- `duck_core/memory/policy.py` — переписан с нуля: LLM-классификация через critic-роль
|
||||||
|
- Роль `memory_policy` добавлена в `config/models.yaml` + промпт
|
||||||
|
- Интегрирован в RuntimeLoop: `_run_memory_policy()` после каждой задачи
|
||||||
|
- События: `memory_policy_decision`, `memory_stored`, `memory_policy_failed`
|
||||||
|
- 6 тестов в `tests/smoke/test_memory_policy.py`
|
||||||
|
|
||||||
|
### Phase 2: Рефлексия (Critic) — автоматический вызов ✅
|
||||||
|
- `_run_reflection()` в RuntimeLoop — transcript из event store → critic → experience
|
||||||
|
- Параметр `reflect: bool = True` в `run_chat()`
|
||||||
|
- `ExperienceRecorder` передаётся через `create_app()`
|
||||||
|
- События: `reflection_completed`, `reflection_failed`
|
||||||
|
- 3 теста в `tests/smoke/test_reflection.py`
|
||||||
|
|
||||||
|
### Phase 3-4: ContextBuilder v2 + Summary-роль ✅
|
||||||
|
- Полностью переписан `duck_core/context_builder.py`
|
||||||
|
- Token budget awareness, приоритизация, суммаризация через summary-роль
|
||||||
|
- `estimate_tokens()`, `estimate_messages_tokens()` утилиты
|
||||||
|
- Подключён model_client для LLM-суммаризации
|
||||||
|
- 11 тестов в `tests/smoke/test_context_builder.py`
|
||||||
|
|
||||||
|
### Phase 5: VectorMemory — интеграция ✅
|
||||||
|
- `VectorMemory` добавлен в RuntimeLoop и `create_app()`
|
||||||
|
- **Локальная модель эмбеддингов**: `all-MiniLM-L6-v2` (384 размерности, sentence-transformers)
|
||||||
|
- При `memory_stored` также сохраняется в Qdrant (graceful fallback при ошибках)
|
||||||
|
- Поддержка двух режимов: локальная модель + remote `/v1/embeddings` endpoint
|
||||||
|
- `sentence-transformers` добавлен в зависимости `pyproject.toml`
|
||||||
|
- 4 теста в `tests/smoke/test_vector_memory_integration.py`
|
||||||
|
|
||||||
|
### Embeddings — архитектура
|
||||||
|
|
||||||
|
### Phase 6: Recall-роль ✅
|
||||||
|
- Роль `recall` добавлена в `config/models.yaml` + промпт `prompts/roles/recall.md`
|
||||||
|
- `ContextBuilder.recall_relevant_memory()` — LLM-фильтрация релевантных воспоминаний
|
||||||
|
- Интегрирован в `/v1/chat` endpoint
|
||||||
|
|
||||||
|
### Phase 7: Coder-роль — интеграция ✅
|
||||||
|
- `CoderTool` создан в `duck_core/tools/coder.py`
|
||||||
|
- Зарегистрирован в `ToolGateway.default()`
|
||||||
|
- Описан в `prompts/roles/action.md`
|
||||||
|
|
||||||
|
### Статус тестов
|
||||||
|
- **72 из 73** smoke-тестов проходят
|
||||||
|
- 1 тест (`test_llama_server_connection_live_skip_by_env`) требует живой llama-server
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Что такое DuckLM
|
||||||
|
|
||||||
|
DuckLM — это **локальная агентная система (cognitive runtime)**, работающая поверх локальных языковых моделей через `llama-server`. Это не inference-сервер, а полноценный когнитивный цикл:
|
||||||
|
|
||||||
|
```
|
||||||
|
состояние → контекст → мышление → намерение → действие → наблюдение → рефлексия → память → опыт
|
||||||
|
```
|
||||||
|
|
||||||
|
Ключевая идея: DuckLM — это **оркестратор**, который управляет задачами, инструментами, памятью, навыками и рефлексией, используя локальные LLM через OpenAI-совместимый API (`llama-server`).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Архитектурные принципы
|
||||||
|
|
||||||
|
### 2.1. Использование готовых компонентов
|
||||||
|
|
||||||
|
| Компонент | Источник |
|
||||||
|
|-----------|----------|
|
||||||
|
| LLM inference | `llama-server` (llama.cpp, собранный с Vulkan) |
|
||||||
|
| Хранение состояния | SQLite (aiosqlite) |
|
||||||
|
| Векторная память | Qdrant (через docker-compose) |
|
||||||
|
| HTTP API | FastAPI |
|
||||||
|
| Web-интерфейс | Jinja2 + ванильный JS |
|
||||||
|
| Валидация данных | Pydantic |
|
||||||
|
| Конфигурация | PyYAML + python-dotenv |
|
||||||
|
|
||||||
|
**Не пишется с нуля:** inference server, model scheduler, vector DB, OpenAI API, MCP, песочница, workflow engine.
|
||||||
|
|
||||||
|
**Пишется с нуля:** Duck Core (runtime loop, context builder, model client, event store, tool gateway, approvals, skills, experience, memory policy, FastAPI API, WebChat).
|
||||||
|
|
||||||
|
### 2.2. Web/API first
|
||||||
|
|
||||||
|
- **WebChat** — интерфейс для человека (порт 8000)
|
||||||
|
- **HTTP API** — для кодера, тестов и внешних агентов
|
||||||
|
- CLI не входит в обязательную часть (если понадобится — тонкий клиент поверх HTTP API)
|
||||||
|
|
||||||
|
### 2.3. Роли моделей — логические, не физические
|
||||||
|
|
||||||
|
Роли: `thinker`, `critic`, `coder`, `action`, `summary`, `recall`, `sys_util`.
|
||||||
|
|
||||||
|
Все роли в текущей конфигурации указывают на одну физическую модель (`local-main` на порту 8081). Различие между ролями задаётся комбинацией:
|
||||||
|
- system prompt
|
||||||
|
- temperature
|
||||||
|
- max_output_tokens
|
||||||
|
- response_format / structured_output
|
||||||
|
- memory scope
|
||||||
|
- tool permissions
|
||||||
|
|
||||||
|
### 2.4. Token budget
|
||||||
|
|
||||||
|
```
|
||||||
|
DUCK_CTX_SIZE=4096 (в .env, хотя в коде дефолт 65536)
|
||||||
|
DUCK_MAX_INPUT_TOKENS=49152
|
||||||
|
DUCK_MAX_RECENT_EVENTS_TOKENS=12000
|
||||||
|
DUCK_MAX_MEMORY_TOKENS=8000
|
||||||
|
DUCK_MAX_SKILL_TOKENS=6000
|
||||||
|
```
|
||||||
|
|
||||||
|
Output limits по ролям:
|
||||||
|
- thinker: 8192
|
||||||
|
- critic: 4096
|
||||||
|
- coder: 16384
|
||||||
|
- action: 2048
|
||||||
|
- summary: 4096
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Целевая архитектура (из ТЗ)
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────┐
|
||||||
|
│ WebChat │ ← интерфейс человека
|
||||||
|
└──────┬──────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────┐
|
||||||
|
│ FastAPI │ ← интерфейс кодера, тестов и агентов
|
||||||
|
└──────┬──────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────┐
|
||||||
|
│ Duck Core │
|
||||||
|
│ RuntimeLoop, TaskState, │
|
||||||
|
│ ContextBuilder, ModelClient, │
|
||||||
|
│ SkillRegistry, ToolGateway, │
|
||||||
|
│ ApprovalService, Reflection, │
|
||||||
|
│ MemoryPolicy, ExperienceRecorder │
|
||||||
|
└──────┬──────────────┬───────────────┘
|
||||||
|
│ │
|
||||||
|
▼ ▼
|
||||||
|
┌────────────┐ ┌──────────────────┐
|
||||||
|
│llama-server│ │ SQLite/PostgreSQL│
|
||||||
|
│OpenAI-comp.│ │ events/tasks/ │
|
||||||
|
└────────────┘ │ approvals │
|
||||||
|
│ └──────────────────┘
|
||||||
|
▼
|
||||||
|
┌────────────┐
|
||||||
|
│ Qdrant │ ← semantic memory
|
||||||
|
└────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Что реализовано (текущее состояние)
|
||||||
|
|
||||||
|
### 4.1. Полностью реализованные компоненты
|
||||||
|
|
||||||
|
#### Конфигурация и настройки
|
||||||
|
- **`duck_core/config.py`** — `Settings` dataclass, загрузка из `.env`, кэширование через `lru_cache`
|
||||||
|
- **`config/models.yaml`** — 5 ролей (thinker, critic, coder, action, summary), все на `local-main` (порт 8081)
|
||||||
|
- **`.env`** / **`.env.example`** — полная конфигурация путей, портов, GPU, Qdrant
|
||||||
|
|
||||||
|
#### ModelClient
|
||||||
|
- **`duck_core/model_client.py`** — ролевая маршрутизация вызовов к llama-server
|
||||||
|
- `chat()` — синхронный вызов с измерением latency, usage
|
||||||
|
- `stream_chat()` — streaming через SSE (reasoning_delta + content_delta)
|
||||||
|
- `ping()` — проверка доступности всех ролей
|
||||||
|
- Автоматическая подстановка system prompt из файла
|
||||||
|
- Автоматический `response_format: json_schema` для action-роли
|
||||||
|
|
||||||
|
#### Хранение состояния (SQLite)
|
||||||
|
- **`duck_core/tasks/state.py`** — `TaskState` (Pydantic модель)
|
||||||
|
- **`duck_core/tasks/store.py`** — `TaskStore`: create, update_status, complete, fail, cancel, waiting_for_approval, get, list
|
||||||
|
- **`duck_core/events/store.py`** — `EventStore`: append (с авто-increment sequence), list_events, list_by_type
|
||||||
|
- **`duck_core/conversations/store.py`** — `ConversationStore`: create, ensure, get, list, add_message, list_messages, get_conversation_id_for_task
|
||||||
|
- **`duck_core/approvals/service.py`** — `ApprovalService`: create_pending, pending, get, allow_once, allow_forever, deny, is_allowed_forever
|
||||||
|
- **`duck_core/experience/recorder.py`** — `ExperienceRecorder`: record, list_records, get_record, write_skill_update_proposal
|
||||||
|
- **`duck_core/memory/store.py`** — `MemoryStore`: add, list, search (LIKE), relevant (scope-aware), infer_scope, _normalize_scope
|
||||||
|
|
||||||
|
#### Runtime Loop
|
||||||
|
- **`duck_core/runtime_loop.py`** — ядро когнитивного цикла:
|
||||||
|
- `run_chat()` — полный цикл: создание задачи → action loop → thinker → завершение
|
||||||
|
- `continue_after_approval()` — продолжение после одобрения действия
|
||||||
|
- `_run_action_loop()` — итеративный цикл вызова инструментов (max 4 итерации)
|
||||||
|
- `_run_action_tools()` — парсинг action directive от модели → вызов ToolGateway
|
||||||
|
- `_append_command_audit()` — аудит shell-команд через event store
|
||||||
|
- Обработка requires_approval → пауза с ожиданием решения пользователя
|
||||||
|
|
||||||
|
#### Context Builder
|
||||||
|
- **`duck_core/context_builder.py`** — `ContextBuilder.build_basic_messages()`: собирает сообщения из memory records, history и текущего user message
|
||||||
|
|
||||||
|
#### Tools
|
||||||
|
- **`duck_core/tools/base.py`** — `ToolResult` (Pydantic), `Tool` (Protocol)
|
||||||
|
- **`duck_core/tools/gateway.py`** — `ToolGateway`: маршрутизация action → конкретный инструмент
|
||||||
|
- **`duck_core/tools/file_read.py`** — `FileReadTool`: чтение файлов внутри workspace, запрет .env/.ssh/shadow
|
||||||
|
- **`duck_core/tools/file_write.py`** — `FileWriteTool`: запись внутри workspace, защита от перезаписи
|
||||||
|
- **`duck_core/tools/list_dir.py`** — `ListDirTool`: листинг директории внутри workspace
|
||||||
|
- **`duck_core/tools/search_files.py`** — `SearchFilesTool`: текстовый поиск по файлам (glob, case_sensitive)
|
||||||
|
- **`duck_core/tools/shell_exec_safe.py`** — `ShellExecSafeTool`: allowlist + blocklist + approval
|
||||||
|
- **`duck_core/tools/command_policy.py`** — `CommandPolicy`: классификация команд (readonly, system, destructive, dangerous fragments)
|
||||||
|
- **`duck_core/tools/paths.py`** — `resolve_workspace_path()`: защита от path traversal
|
||||||
|
|
||||||
|
#### Approvals
|
||||||
|
- **`duck_core/approvals/service.py`** — полный цикл согласований:
|
||||||
|
- Создание pending approval с SHA256-хешем действия
|
||||||
|
- Решения: allow_once, allow_forever, deny
|
||||||
|
- Проверка is_allowed_forever по хешу действия
|
||||||
|
- normalized_action хранится в JSON
|
||||||
|
|
||||||
|
#### Skills
|
||||||
|
- **`duck_core/skills/registry.py`** — `SkillRegistry`: загрузка из `*/skill.yaml`, парсинг procedure/examples/notes, поиск по ключевым словам
|
||||||
|
- **`skills/analyze_project/`** — единственный скилл: анализ структуры проекта
|
||||||
|
|
||||||
|
#### Experience & Reflection
|
||||||
|
- **`duck_core/experience/recorder.py`** — запись результатов задач, предложения по обновлению скиллов
|
||||||
|
- **`duck_core/reflection.py`** — `Reflection.reflect()`: вызов critic-роли для анализа транскрипта задачи
|
||||||
|
|
||||||
|
#### Memory
|
||||||
|
- **`duck_core/memory/store.py`** — `MemoryStore`: хранение в SQLite с поддержкой scope (global/workspace/conversation), importance, полнотекстовый поиск (LIKE)
|
||||||
|
- **`duck_core/memory/policy.py`** — `MemoryPolicy`: заглушка (всегда should_store=False)
|
||||||
|
- **`duck_core/memory/vector_memory.py`** — `VectorMemory`: интеграция с Qdrant для семантического поиска (требует embeddings endpoint)
|
||||||
|
|
||||||
|
#### FastAPI API
|
||||||
|
- **`duck_core/api.py`** — полный HTTP API (878 строк):
|
||||||
|
- `POST /v1/chat` — основной чат с сохранением в conversation
|
||||||
|
- `POST /v1/chat/stream` — streaming чат через SSE
|
||||||
|
- `POST /v1/tasks/{task_id}/continue/stream` — продолжение после одобрения
|
||||||
|
- `POST /v1/tasks/{task_id}/password/stream` — ввод sudo-пароля
|
||||||
|
- `GET/POST /v1/conversations` — управление диалогами
|
||||||
|
- `GET /v1/tasks`, `GET /v1/tasks/{task_id}/events` — инспекция задач
|
||||||
|
- `GET /v1/approvals/pending`, `POST /v1/approvals/{id}/allow_once|allow_forever|deny`
|
||||||
|
- `GET /v1/skills`, `GET /v1/skills/{skill_id}`
|
||||||
|
- `GET /v1/experience`
|
||||||
|
- `POST /v1/memory`, `GET /v1/memory`, `GET /v1/memory/search`
|
||||||
|
- `GET /v1/models/roles`, `GET /v1/models/ping`
|
||||||
|
- `GET /health`, `GET /v1/status`
|
||||||
|
- Веб-страницы: `/`, `/approvals`, `/skills`, `/memory`, `/experience`
|
||||||
|
|
||||||
|
#### WebChat UI
|
||||||
|
- **`duck_core/web/templates/index.html`** — полноценный WebChat с sidebar, conversation list, activity drawer
|
||||||
|
- **`duck_core/web/static/app.js`** (997 строк) — клиентская логика:
|
||||||
|
- SSE streaming с парсингом reasoning_delta, content_delta, tool_call_started/finished, tool_approval_requested, tool_password_requested
|
||||||
|
- Инлайн-терминалы для отображения вызовов инструментов
|
||||||
|
- Инлайн-кнопки одобрения/запрета действий
|
||||||
|
- Форма ввода sudo-пароля
|
||||||
|
- Activity drawer с вкладками Events/Commands/Memory
|
||||||
|
- Управление диалогами (create, select, load history)
|
||||||
|
- Enter для отправки, Shift+Enter для новой строки
|
||||||
|
- **`duck_core/web/static/style.css`** (1002 строки) — светлая тема, responsive layout
|
||||||
|
|
||||||
|
#### Скрипты
|
||||||
|
- **`scripts/llama/start_main.sh`** — управление llama-server (start/stop/restart/status/logs)
|
||||||
|
- **`scripts/llama/start_thinker_mtp_experimental.sh`** — экспериментальный MTP endpoint
|
||||||
|
- **`scripts/llama/build_vulkan.sh`** — сборка llama.cpp с Vulkan
|
||||||
|
- **`scripts/llama/healthcheck.sh`** — проверка здоровья llama-server
|
||||||
|
- **`scripts/verify/`** — 7 верификационных скриптов (basic_chat, file_write_read, tool_blocking, models_roles, skills, experience, memory)
|
||||||
|
- **`scripts/bench/bench_runtime.py`** — бенчмарк
|
||||||
|
|
||||||
|
#### Тесты
|
||||||
|
- 18 smoke-тестов в `tests/smoke/`:
|
||||||
|
- `test_models_config.py`, `test_model_client.py`, `test_api_health.py`
|
||||||
|
- `test_event_log.py`, `test_action_directive_schema.py`
|
||||||
|
- `test_tool_gateway.py`, `test_approvals.py`
|
||||||
|
- `test_skill_registry.py`, `test_experience_recorder.py`
|
||||||
|
- `test_vector_memory.py`, `test_memory_store.py`
|
||||||
|
- `test_chat_api.py`, `test_conversations.py`
|
||||||
|
- `test_runtime_reasoning.py`, `test_runtime_tools.py`
|
||||||
|
- `test_llama_server_connection.py`, `test_llama_service_script.py`
|
||||||
|
|
||||||
|
#### Документация
|
||||||
|
- 12 файлов в `docs/`:
|
||||||
|
- `architecture.md`, `how_to_run.md`, `how_to_test.md`
|
||||||
|
- `web_api.md`, `model_roles.md`, `tool_gateway.md`
|
||||||
|
- `memory_architecture.md`, `experience_learning.md`, `skills.md`
|
||||||
|
- `local_llama_server.md`, `performance_mtp.md`
|
||||||
|
- `superpowers/plans/2026-05-19-ducklm-runtime.md` — план реализации
|
||||||
|
|
||||||
|
#### Docker
|
||||||
|
- **`docker-compose.memory.yml`** — Qdrant (порты 6333/6334)
|
||||||
|
|
||||||
|
#### Сборка и запуск
|
||||||
|
- **`pyproject.toml`** — зависимости: fastapi, uvicorn, httpx, pydantic, pyyaml, jinja2, python-dotenv, jsonschema, aiosqlite, qdrant-client
|
||||||
|
- **`Makefile`** — цели: duck-up, duck-llama-main, duck-api, duck-dev, duck-smoke, duck-test, duck-verify
|
||||||
|
- **`data/duck.sqlite3`** — рабочая БД SQLite
|
||||||
|
- **`workspace/`** — рабочая директория для инструментов
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. План разработки (из `docs/superpowers/plans/2026-05-19-ducklm-runtime.md`)
|
||||||
|
|
||||||
|
План состоит из 4 задач:
|
||||||
|
|
||||||
|
### Task 1: Tests First
|
||||||
|
- Написать smoke tests для всех компонентов
|
||||||
|
- ✅ **Выполнено** — 18 тестов созданы
|
||||||
|
|
||||||
|
### Task 2: Runtime Core
|
||||||
|
- pyproject.toml, .env.example, config/models.yaml
|
||||||
|
- config.py, model_client.py, events/store.py, tasks/store.py, tasks/state.py
|
||||||
|
- context_builder.py, runtime_loop.py, api.py
|
||||||
|
- ✅ **Выполнено** — все компоненты реализованы
|
||||||
|
|
||||||
|
### Task 3: Stage Adapters
|
||||||
|
- tools/*, approvals/service.py, skills/registry.py
|
||||||
|
- experience/recorder.py, reflection.py, memory/*
|
||||||
|
- schemas/action_directive.schema.json
|
||||||
|
- ✅ **Выполнено** — все компоненты реализованы
|
||||||
|
|
||||||
|
### Task 4: Project Surface
|
||||||
|
- scripts/llama/*, scripts/verify/*, scripts/bench/*
|
||||||
|
- web/templates/*, web/static/*
|
||||||
|
- skills/analyze_project/*
|
||||||
|
- docker-compose.memory.yml, Makefile, README.md, docs/*
|
||||||
|
- ✅ **Выполнено** — все компоненты реализованы
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Отступления от плана / дополнительные возможности
|
||||||
|
|
||||||
|
### 6.1. Что добавлено сверх плана
|
||||||
|
|
||||||
|
1. **ConversationStore** — полноценное управление диалогами (conversations + conversation_messages таблицы), чего не было явно в плане Task 2. План упоминал только tasks и events.
|
||||||
|
|
||||||
|
2. **Streaming API** — `POST /v1/chat/stream` с SSE, `POST /v1/tasks/{id}/continue/stream`, `POST /v1/tasks/{id}/password/stream`. В плане не было явно указано streaming.
|
||||||
|
|
||||||
|
3. **Password flow** — полный цикл запроса sudo-пароля: `requires_password` → `tool_password_requested` → `/v1/tasks/{id}/password/stream`. В плане не было детализировано.
|
||||||
|
|
||||||
|
4. **Activity Drawer в WebChat** — боковая панель с вкладками Events/Commands/Memory, инлайн-терминалы для инструментов, инлайн-одобрения. Значительно больше, чем «пустая WebChat-страница» из этапа 1 ТЗ.
|
||||||
|
|
||||||
|
5. **Command Audit** — отдельный тип события `command_audit` для shell-команд с полной метаданной (action_type, risk_level, blocked, approved, returncode).
|
||||||
|
|
||||||
|
6. **Scope-aware Memory** — трёхуровневая система скоупов (global/workspace/conversation) с автоматическим infer_scope.
|
||||||
|
|
||||||
|
7. **Skill update proposals** — автоматическая запредложений по обновлению скиллов в `skills/_proposals/`.
|
||||||
|
|
||||||
|
8. **18 тестов вместо 11 запланированных** — добавлены: `test_chat_api`, `test_conversations`, `test_runtime_reasoning`, `test_runtime_tools`, `test_llama_server_connection`, `test_llama_service_script`, `test_memory_store`.
|
||||||
|
|
||||||
|
### 6.2. Что не реализовано (или реализовано частично)
|
||||||
|
|
||||||
|
1. **MemoryPolicy — заглушка.** `MemoryPolicy.classify()` всегда возвращает `should_store=False`. Нет LLM-классификации для автоматического сохранения памяти.
|
||||||
|
|
||||||
|
2. **ContextBuilder — минимальный.** Нет суммаризации старых events, нет обрезки по token budget, нет приоритизации контекста. Просто склеивает memory + history + user message.
|
||||||
|
|
||||||
|
3. **Critic не вызывается автоматически.** `Reflection.reflect()` есть, но не интегрирован в RuntimeLoop — нет автоматической рефлексии после завершения задачи.
|
||||||
|
|
||||||
|
4. **Summary роль не используется.** Нет автоматической суммаризации контекста при превышении budget.
|
||||||
|
|
||||||
|
5. **Coder роль не используется в основном потоке.** RuntimeLoop вызывает только action и thinker.
|
||||||
|
|
||||||
|
6. **Recall роль не определена в конфиге.** В ТЗ упоминается recall, но в `config/models.yaml` её нет.
|
||||||
|
|
||||||
|
7. **Sys_util роль не определена в конфиге.** Аналогично.
|
||||||
|
|
||||||
|
8. **VectorMemory не интегрирован в RuntimeLoop.** Qdrant-поиск не подключён к основному циклу (MemoryStore использует LIKE-поиск, а не векторный).
|
||||||
|
|
||||||
|
9. **WebChat — светлая тема.** В памяти пользователя указано предпочтение тёмной темы, но CSS реализован светлый (`color-scheme: light`, белый фон).
|
||||||
|
|
||||||
|
10. **Нет CLI.** Упомянуто в ТЗ как необязательное, но если понадобится — нужно делать.
|
||||||
|
|
||||||
|
11. **Нет автоматического применения skill patches.** Предложения пишутся в `skills/_proposals/`, но не применяются автоматически.
|
||||||
|
|
||||||
|
### 6.3. Технические заметки
|
||||||
|
|
||||||
|
- **Модель:** Qwen3.6 35B A3B, два варианта — nonMTP (основной, порт 8081) и MTP (экспериментальный, порт 8085)
|
||||||
|
- **GPU:** Radeon RX580, Vulkan backend, 20 GPU layers
|
||||||
|
- **llama-server бинарник:** `./vendor/llama.cpp/build/bin/llama-server`
|
||||||
|
- **ctx_size в .env:** 4096 (хотя в коде Settings дефолт 65536)
|
||||||
|
- **reasoning-budget:** 512 в .env, `--reasoning-budget 512 --cache-ram 0`
|
||||||
|
- **Python:** 3.13 (по путям `__pycache__`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Структура базы данных (SQLite)
|
||||||
|
|
||||||
|
Таблицы:
|
||||||
|
- **tasks** — задачи (task_id, status, user_message, workspace, debug, final_response, created_at, updated_at)
|
||||||
|
- **events** — события (id, task_id, sequence, event_type, payload_json, created_at)
|
||||||
|
- **conversations** — диалоги (id, conversation_id, title, workspace, created_at, updated_at)
|
||||||
|
- **conversation_messages** — сообщения диалогов (id, conversation_id, role, content, reasoning_content, task_id, status, created_at)
|
||||||
|
- **approvals** — согласования (id, approval_id, task_id, action_hash, normalized_action_json, status, decision, created_at, updated_at)
|
||||||
|
- **experience_records** — записи опыта (id, task_id, skill_id, summary, result, what_worked_json, what_failed_json, reusable_lesson, suggested_skill_patch, confidence, created_at)
|
||||||
|
- **memories** — память (id, memory_id, text, scope, workspace, conversation_id, memory_type, importance, metadata_json, created_at, updated_at)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Когнитивный цикл (как работает RuntimeLoop)
|
||||||
|
|
||||||
|
1. Пользователь отправляет сообщение → `POST /v1/chat` или `/v1/chat/stream`
|
||||||
|
2. Создаётся Task + событие `task_created`
|
||||||
|
3. ContextBuilder собирает сообщения (memory + history + user message)
|
||||||
|
4. **Action loop** (до 4 итераций):
|
||||||
|
- Модель `action` генерирует JSON directive (schema: action_directive.schema.json)
|
||||||
|
- ToolGateway выполняет каждый action через соответствующий инструмент
|
||||||
|
- Если команда требует approval → пауза, создание Approval, ожидание решения
|
||||||
|
- Если sudo → запрос пароля
|
||||||
|
- Результаты собираются как tool_observations
|
||||||
|
5. Thinker получает все tool_observations и формирует финальный ответ
|
||||||
|
6. Задача завершена → `task_completed`
|
||||||
|
7. (Опционально) Reflection через critic — **не автоматизировано**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Статус готовности
|
||||||
|
|
||||||
|
| Компонент | Статус |
|
||||||
|
|-----------|--------|
|
||||||
|
| Конфигурация | ✅ Готово |
|
||||||
|
| ModelClient | ✅ Готово |
|
||||||
|
| TaskStore / EventStore | ✅ Готово |
|
||||||
|
| ConversationStore | ✅ Готово |
|
||||||
|
| RuntimeLoop | ✅ Готово |
|
||||||
|
| ContextBuilder | ⚠️ Минимальный |
|
||||||
|
| ToolGateway + Tools | ✅ Готово |
|
||||||
|
| ApprovalService | ✅ Готово |
|
||||||
|
| SkillRegistry | ✅ Готово |
|
||||||
|
| ExperienceRecorder | ✅ Готово |
|
||||||
|
| Reflection | ⚠️ Не интегрирован в loop |
|
||||||
|
| MemoryStore (SQLite) | ✅ Готово |
|
||||||
|
| MemoryPolicy | ✅ LLM-based (Phase 1) |
|
||||||
|
| VectorMemory (Qdrant) | ✅ Интегрирован (Phase 5) |
|
||||||
|
| FastAPI API | ✅ Готово |
|
||||||
|
| WebChat UI | ✅ Готово (светлая тема) |
|
||||||
|
| Streaming | ✅ Готово |
|
||||||
|
| Password flow | ✅ Готово |
|
||||||
|
| Smoke tests | ✅ 74 теста |
|
||||||
|
| Docs | ✅ 12 файлов |
|
||||||
|
| Scripts | ✅ Готово |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Архитектура эмбеддингов
|
||||||
|
|
||||||
|
### Локальная модель (основной режим)
|
||||||
|
- **Модель**: `all-MiniLM-L6-v2` (sentence-transformers, 384 размерности)
|
||||||
|
- **Расположение**: `./models/all-MiniLM-L6-v2/` (safetensors формат)
|
||||||
|
- **Библиотека**: `sentence-transformers` (добавлен в pyproject.toml)
|
||||||
|
- **Использование**: `VectorMemory._local_embed()` — загрузка модели через `SentenceTransformer`, кодирование в thread pool
|
||||||
|
|
||||||
|
### Remote endpoint (fallback)
|
||||||
|
- **Endpoint**: `/v1/embeddings` на llama-server или OpenAI-совместимом сервере
|
||||||
|
- **Использование**: `VectorMemory._remote_embed()` — HTTP POST запрос
|
||||||
|
|
||||||
|
### Поток данных
|
||||||
|
1. Зача завершена → `_run_memory_policy()` → LLM классифицирует → `MemoryDecision`
|
||||||
|
2. Если `should_store=True` → `MemoryStore.add()` (SQLite) + `VectorMemory.add_memory()` (Qdrant)
|
||||||
|
3. При следующем запросе → `MemoryStore.relevant()` (SQLite LIKE) + `VectorMemory.search_memory()` (semantic)
|
||||||
|
4. Recall-роль фильтрует релевантные воспоминания через LLM
|
||||||
|
|
||||||
|
### Зависимости
|
||||||
|
- `sentence-transformers` — для локальной модели
|
||||||
|
- `qdrant-client` — для Qdrant (уже был)
|
||||||
|
- Qdrant запускается через `docker-compose.memory.yml` (порт 6333)
|
||||||
|
| Docker (Qdrant) | ✅ Готово |
|
||||||
|
|
||||||
|
**Общий вывод:** Все 4 задачи плана реализованы. Система представляет собой работающий skeleton с полным когнитивным циклом. Основные направления для дальнейшего развития: интеграция рефлексии и summary в основной цикл, LLM-based MemoryPolicy, векторная память, тёмная тема, расширение ContextBuilder.
|
||||||
|
|
@ -51,3 +51,23 @@ models:
|
||||||
temperature: 0.1
|
temperature: 0.1
|
||||||
max_output_tokens: 4096
|
max_output_tokens: 4096
|
||||||
system_prompt: prompts/roles/summary.md
|
system_prompt: prompts/roles/summary.md
|
||||||
|
|
||||||
|
memory_policy:
|
||||||
|
provider: llama_server
|
||||||
|
base_url: http://127.0.0.1:8081/v1
|
||||||
|
model: local-main
|
||||||
|
purpose: memory_classification
|
||||||
|
structured_output: true
|
||||||
|
temperature: 0.1
|
||||||
|
max_output_tokens: 1024
|
||||||
|
system_prompt: prompts/roles/memory_policy.md
|
||||||
|
|
||||||
|
recall:
|
||||||
|
provider: llama_server
|
||||||
|
base_url: http://127.0.0.1:8081/v1
|
||||||
|
model: local-main
|
||||||
|
purpose: memory_recall
|
||||||
|
structured_output: true
|
||||||
|
temperature: 0.1
|
||||||
|
max_output_tokens: 2048
|
||||||
|
system_prompt: prompts/roles/recall.md
|
||||||
|
|
|
||||||
|
|
@ -13,10 +13,13 @@ from pydantic import BaseModel
|
||||||
|
|
||||||
from duck_core.approvals.service import ApprovalService
|
from duck_core.approvals.service import ApprovalService
|
||||||
from duck_core.config import get_settings
|
from duck_core.config import get_settings
|
||||||
|
from duck_core.context_builder import ContextBuilder
|
||||||
from duck_core.conversations.store import ConversationStore
|
from duck_core.conversations.store import ConversationStore
|
||||||
from duck_core.events.store import EventStore
|
from duck_core.events.store import EventStore
|
||||||
from duck_core.experience.recorder import ExperienceRecorder
|
from duck_core.experience.recorder import ExperienceRecorder
|
||||||
|
from duck_core.memory.policy import MemoryPolicy
|
||||||
from duck_core.memory.store import MemoryStore
|
from duck_core.memory.store import MemoryStore
|
||||||
|
from duck_core.memory.vector_memory import VectorMemory
|
||||||
from duck_core.model_client import ModelClient
|
from duck_core.model_client import ModelClient
|
||||||
from duck_core.runtime_loop import RuntimeLoop
|
from duck_core.runtime_loop import RuntimeLoop
|
||||||
from duck_core.skills.registry import SkillRegistry
|
from duck_core.skills.registry import SkillRegistry
|
||||||
|
|
@ -74,11 +77,27 @@ def create_app() -> FastAPI:
|
||||||
event_store = EventStore(settings.db_path)
|
event_store = EventStore(settings.db_path)
|
||||||
conversations = ConversationStore(settings.db_path)
|
conversations = ConversationStore(settings.db_path)
|
||||||
model_client = ModelClient()
|
model_client = ModelClient()
|
||||||
|
context_builder = ContextBuilder(model_client=model_client)
|
||||||
approvals = ApprovalService(settings.db_path)
|
approvals = ApprovalService(settings.db_path)
|
||||||
runtime = RuntimeLoop(task_store, event_store, model_client, approval_service=approvals)
|
memory_policy = MemoryPolicy(model_client=model_client)
|
||||||
skills = SkillRegistry("skills")
|
|
||||||
experience = ExperienceRecorder(settings.db_path)
|
|
||||||
memory_store = MemoryStore(settings.db_path)
|
memory_store = MemoryStore(settings.db_path)
|
||||||
|
vector_memory = VectorMemory(
|
||||||
|
qdrant_url=settings.qdrant_url,
|
||||||
|
local_embedding_model="./models/all-MiniLM-L6-v2",
|
||||||
|
)
|
||||||
|
experience = ExperienceRecorder(settings.db_path)
|
||||||
|
runtime = RuntimeLoop(
|
||||||
|
task_store,
|
||||||
|
event_store,
|
||||||
|
model_client,
|
||||||
|
context_builder=context_builder,
|
||||||
|
approval_service=approvals,
|
||||||
|
memory_policy=memory_policy,
|
||||||
|
memory_store=memory_store,
|
||||||
|
vector_memory=vector_memory,
|
||||||
|
experience_recorder=experience,
|
||||||
|
)
|
||||||
|
skills = SkillRegistry("skills")
|
||||||
|
|
||||||
@app.on_event("startup")
|
@app.on_event("startup")
|
||||||
async def startup() -> None:
|
async def startup() -> None:
|
||||||
|
|
@ -147,6 +166,14 @@ def create_app() -> FastAPI:
|
||||||
memory_records = await relevant_memory(
|
memory_records = await relevant_memory(
|
||||||
body.message, conversation.workspace, conversation.conversation_id
|
body.message, conversation.workspace, conversation.conversation_id
|
||||||
)
|
)
|
||||||
|
# Use recall-role to filter relevant memories via LLM
|
||||||
|
if memory_records and runtime.context_builder._model_client is not None:
|
||||||
|
try:
|
||||||
|
memory_records = await runtime.context_builder.recall_relevant_memory(
|
||||||
|
body.message, memory_records
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
pass # Fallback to unfiltered memory_records
|
||||||
result = await runtime.run_chat(
|
result = await runtime.run_chat(
|
||||||
body.message,
|
body.message,
|
||||||
conversation.workspace,
|
conversation.workspace,
|
||||||
|
|
@ -265,7 +292,7 @@ def create_app() -> FastAPI:
|
||||||
reasoning_parts: list[str] = []
|
reasoning_parts: list[str] = []
|
||||||
content_parts: list[str] = []
|
content_parts: list[str] = []
|
||||||
try:
|
try:
|
||||||
messages = runtime.context_builder.build_basic_messages(
|
messages = await runtime.context_builder.build_async_messages(
|
||||||
task, history, memory_records
|
task, history, memory_records
|
||||||
)
|
)
|
||||||
tool_observations = await runtime._run_action_loop(
|
tool_observations = await runtime._run_action_loop(
|
||||||
|
|
@ -479,7 +506,7 @@ def create_app() -> FastAPI:
|
||||||
tool_observation = await runtime._run_approved_or_denied_action(
|
tool_observation = await runtime._run_approved_or_denied_action(
|
||||||
task_id, approval.normalized_action, approval.decision
|
task_id, approval.normalized_action, approval.decision
|
||||||
)
|
)
|
||||||
messages = runtime.context_builder.build_basic_messages(task)
|
messages = await runtime.context_builder.build_async_messages(task)
|
||||||
tool_observations = [tool_observation]
|
tool_observations = [tool_observation]
|
||||||
if approval.decision != "deny" and not has_password_request(tool_observations):
|
if approval.decision != "deny" and not has_password_request(tool_observations):
|
||||||
tool_observations = await runtime._run_action_loop(
|
tool_observations = await runtime._run_action_loop(
|
||||||
|
|
@ -667,7 +694,7 @@ def create_app() -> FastAPI:
|
||||||
approval.decision,
|
approval.decision,
|
||||||
password=body.password,
|
password=body.password,
|
||||||
)
|
)
|
||||||
messages = runtime.context_builder.build_basic_messages(task)
|
messages = await runtime.context_builder.build_async_messages(task)
|
||||||
tool_observations = [tool_observation]
|
tool_observations = [tool_observation]
|
||||||
if not has_password_request(tool_observations):
|
if not has_password_request(tool_observations):
|
||||||
tool_observations = await runtime._run_action_loop(
|
tool_observations = await runtime._run_action_loop(
|
||||||
|
|
|
||||||
|
|
@ -1,32 +1,348 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
from duck_core.tasks.state import TaskState
|
from duck_core.tasks.state import TaskState
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# Approximate tokens per character (rough heuristic: ~4 chars per token)
|
||||||
|
_CHARS_PER_TOKEN = 4
|
||||||
|
|
||||||
|
|
||||||
|
def estimate_tokens(text: str) -> int:
|
||||||
|
"""Rough token estimate based on character count."""
|
||||||
|
return max(len(text) // _CHARS_PER_TOKEN, 1)
|
||||||
|
|
||||||
|
|
||||||
|
def estimate_messages_tokens(messages: list[dict[str, str]]) -> int:
|
||||||
|
"""Estimate total token count for a list of messages."""
|
||||||
|
total = 0
|
||||||
|
for msg in messages:
|
||||||
|
total += estimate_tokens(msg.get("content", "")) + 4 # role + formatting overhead
|
||||||
|
return total
|
||||||
|
|
||||||
|
|
||||||
class ContextBuilder:
|
class ContextBuilder:
|
||||||
|
"""Builds context messages with token budget awareness.
|
||||||
|
|
||||||
|
Priority order (highest first):
|
||||||
|
1. Current user message (always kept)
|
||||||
|
2. Active task state
|
||||||
|
3. Selected skill summary
|
||||||
|
4. Recent tool observations
|
||||||
|
5. Relevant memory
|
||||||
|
6. Summarized old events / history
|
||||||
|
7. Full conversation history (remaining budget)
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
max_input_tokens: int = 49152,
|
||||||
|
max_memory_tokens: int = 8000,
|
||||||
|
max_history_tokens: int = 12000,
|
||||||
|
summary_role: str = "summary",
|
||||||
|
recall_role: str = "recall",
|
||||||
|
model_client: Any | None = None,
|
||||||
|
):
|
||||||
|
self.max_input_tokens = max_input_tokens
|
||||||
|
self.max_memory_tokens = max_memory_tokens
|
||||||
|
self.max_history_tokens = max_history_tokens
|
||||||
|
self.summary_role = summary_role
|
||||||
|
self.recall_role = recall_role
|
||||||
|
self._model_client = model_client
|
||||||
|
|
||||||
|
async def recall_relevant_memory(
|
||||||
|
self,
|
||||||
|
query: str,
|
||||||
|
memory_records: list[dict[str, str]],
|
||||||
|
) -> list[dict[str, str]]:
|
||||||
|
"""Use recall-role LLM to filter memory records by relevance.
|
||||||
|
|
||||||
|
Returns only the memories that are relevant to the query.
|
||||||
|
Falls back to returning all records if LLM is unavailable.
|
||||||
|
"""
|
||||||
|
if not memory_records or self._model_client is None:
|
||||||
|
return memory_records
|
||||||
|
|
||||||
|
try:
|
||||||
|
return await self._llm_recall(query, memory_records)
|
||||||
|
except Exception as exc:
|
||||||
|
logger.warning("Recall failed, using all memories: %s", exc)
|
||||||
|
return memory_records
|
||||||
|
|
||||||
|
async def _llm_recall(
|
||||||
|
self,
|
||||||
|
query: str,
|
||||||
|
memory_records: list[dict[str, str]],
|
||||||
|
) -> list[dict[str, str]]:
|
||||||
|
"""Call recall-role LLM to identify relevant memories."""
|
||||||
|
memories_text = "\n".join(
|
||||||
|
f"[{m.get('memory_id', i)}] {m.get('text', '')}"
|
||||||
|
for i, m in enumerate(memory_records)
|
||||||
|
)
|
||||||
|
response = await self._model_client.chat(
|
||||||
|
self.recall_role,
|
||||||
|
[{
|
||||||
|
"role": "user",
|
||||||
|
"content": (
|
||||||
|
f"User query: {query}\n\n"
|
||||||
|
f"Available memories:\n{memories_text}"
|
||||||
|
),
|
||||||
|
}],
|
||||||
|
response_format={
|
||||||
|
"type": "json_schema",
|
||||||
|
"json_schema": {
|
||||||
|
"name": "recall_result",
|
||||||
|
"schema": {
|
||||||
|
"type": "object",
|
||||||
|
"required": ["relevant_ids", "reasoning"],
|
||||||
|
"additionalProperties": False,
|
||||||
|
"properties": {
|
||||||
|
"relevant_ids": {
|
||||||
|
"type": "array",
|
||||||
|
"items": {"type": "string"},
|
||||||
|
},
|
||||||
|
"reasoning": {"type": "string"},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
"strict": True,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
)
|
||||||
|
data = json.loads(response.content)
|
||||||
|
relevant_ids = set(data.get("relevant_ids", []))
|
||||||
|
if not relevant_ids:
|
||||||
|
return []
|
||||||
|
return [m for i, m in enumerate(memory_records) if m.get("memory_id", str(i)) in relevant_ids]
|
||||||
|
|
||||||
def build_basic_messages(
|
def build_basic_messages(
|
||||||
self,
|
self,
|
||||||
task: TaskState,
|
task: TaskState,
|
||||||
history_messages: list[dict[str, str]] | None = None,
|
history_messages: list[dict[str, str]] | None = None,
|
||||||
memory_records: list[dict[str, str]] | None = None,
|
memory_records: list[dict[str, str]] | None = None,
|
||||||
|
tool_observations: list[dict[str, Any]] | None = None,
|
||||||
|
skill_summary: str | None = None,
|
||||||
) -> list[dict[str, str]]:
|
) -> list[dict[str, str]]:
|
||||||
memory_messages = []
|
"""Build context messages respecting token budget.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
task: Current task state.
|
||||||
|
history_messages: Previous conversation messages.
|
||||||
|
memory_records: Relevant memory records.
|
||||||
|
tool_observations: Recent tool call results.
|
||||||
|
skill_summary: Selected skill description.
|
||||||
|
"""
|
||||||
|
messages: list[dict[str, str]] = []
|
||||||
|
budget_remaining = self.max_input_tokens
|
||||||
|
|
||||||
|
# 1. System-level context (memory + skill)
|
||||||
|
system_parts: list[str] = []
|
||||||
|
|
||||||
|
# Memory records
|
||||||
if memory_records:
|
if memory_records:
|
||||||
lines = [
|
memory_text = self._format_memory(memory_records)
|
||||||
f"- {record.get('scope', 'memory')}: {record.get('text', '')}"
|
mem_tokens = estimate_tokens(memory_text)
|
||||||
for record in memory_records
|
if mem_tokens > self.max_memory_tokens:
|
||||||
if record.get("text")
|
# Truncate memory to fit budget
|
||||||
]
|
memory_text = self._truncate_text(memory_text, self.max_memory_tokens)
|
||||||
if lines:
|
system_parts.append(memory_text)
|
||||||
memory_messages.append(
|
budget_remaining -= estimate_tokens(memory_text)
|
||||||
{
|
|
||||||
|
# Skill summary
|
||||||
|
if skill_summary:
|
||||||
|
skill_text = f"Active skill:\n{skill_summary}"
|
||||||
|
system_parts.append(skill_text)
|
||||||
|
budget_remaining -= estimate_tokens(skill_text)
|
||||||
|
|
||||||
|
if system_parts:
|
||||||
|
messages.append({
|
||||||
"role": "system",
|
"role": "system",
|
||||||
"content": "Relevant memory:\n" + "\n".join(lines),
|
"content": "\n\n".join(system_parts),
|
||||||
}
|
})
|
||||||
)
|
|
||||||
return [
|
# 2. Tool observations (recent, high priority)
|
||||||
*memory_messages,
|
if tool_observations:
|
||||||
*(history_messages or []),
|
obs_text = "Tool observations:\n" + self._format_observations(tool_observations)
|
||||||
{
|
obs_tokens = estimate_tokens(obs_text)
|
||||||
|
if obs_tokens > budget_remaining * 0.4:
|
||||||
|
# Don't let observations consume more than 40% of remaining budget
|
||||||
|
obs_text = self._truncate_text(obs_text, int(budget_remaining * 0.4))
|
||||||
|
obs_tokens = estimate_tokens(obs_text)
|
||||||
|
messages.append({"role": "user", "content": obs_text})
|
||||||
|
budget_remaining -= obs_tokens
|
||||||
|
|
||||||
|
# 3. Conversation history (lower priority, may be summarized)
|
||||||
|
if history_messages:
|
||||||
|
hist_tokens = estimate_messages_tokens(history_messages)
|
||||||
|
if hist_tokens <= budget_remaining:
|
||||||
|
messages.extend(history_messages)
|
||||||
|
budget_remaining -= hist_tokens
|
||||||
|
elif budget_remaining > 100:
|
||||||
|
# Summarize old history if we have some budget left
|
||||||
|
summarized = self._summarize_history(history_messages, budget_remaining)
|
||||||
|
if summarized:
|
||||||
|
messages.append({
|
||||||
|
"role": "system",
|
||||||
|
"content": f"Conversation summary:\n{summarized}",
|
||||||
|
})
|
||||||
|
# else: no budget for history at all
|
||||||
|
|
||||||
|
# 4. Current user message (always last, always included)
|
||||||
|
messages.append({
|
||||||
"role": "user",
|
"role": "user",
|
||||||
"content": task.user_message,
|
"content": task.user_message,
|
||||||
},
|
})
|
||||||
|
|
||||||
|
return messages
|
||||||
|
|
||||||
|
async def build_async_messages(
|
||||||
|
self,
|
||||||
|
task: TaskState,
|
||||||
|
history_messages: list[dict[str, str]] | None = None,
|
||||||
|
memory_records: list[dict[str, str]] | None = None,
|
||||||
|
tool_observations: list[dict[str, Any]] | None = None,
|
||||||
|
skill_summary: str | None = None,
|
||||||
|
) -> list[dict[str, str]]:
|
||||||
|
"""Async context builder variant that can use LLM summarization."""
|
||||||
|
messages: list[dict[str, str]] = []
|
||||||
|
budget_remaining = self.max_input_tokens
|
||||||
|
|
||||||
|
system_parts: list[str] = []
|
||||||
|
if memory_records:
|
||||||
|
memory_text = self._format_memory(memory_records)
|
||||||
|
mem_tokens = estimate_tokens(memory_text)
|
||||||
|
if mem_tokens > self.max_memory_tokens:
|
||||||
|
memory_text = self._truncate_text(memory_text, self.max_memory_tokens)
|
||||||
|
system_parts.append(memory_text)
|
||||||
|
budget_remaining -= estimate_tokens(memory_text)
|
||||||
|
|
||||||
|
if skill_summary:
|
||||||
|
skill_text = f"Active skill:\n{skill_summary}"
|
||||||
|
system_parts.append(skill_text)
|
||||||
|
budget_remaining -= estimate_tokens(skill_text)
|
||||||
|
|
||||||
|
if system_parts:
|
||||||
|
messages.append({"role": "system", "content": "\n\n".join(system_parts)})
|
||||||
|
|
||||||
|
if tool_observations:
|
||||||
|
obs_text = "Tool observations:\n" + self._format_observations(tool_observations)
|
||||||
|
obs_tokens = estimate_tokens(obs_text)
|
||||||
|
if obs_tokens > budget_remaining * 0.4:
|
||||||
|
obs_text = self._truncate_text(obs_text, int(budget_remaining * 0.4))
|
||||||
|
obs_tokens = estimate_tokens(obs_text)
|
||||||
|
messages.append({"role": "user", "content": obs_text})
|
||||||
|
budget_remaining -= obs_tokens
|
||||||
|
|
||||||
|
if history_messages:
|
||||||
|
hist_tokens = estimate_messages_tokens(history_messages)
|
||||||
|
if hist_tokens <= budget_remaining:
|
||||||
|
messages.extend(history_messages)
|
||||||
|
budget_remaining -= hist_tokens
|
||||||
|
elif budget_remaining > 100:
|
||||||
|
summarized = await self._summarize_history_async(
|
||||||
|
history_messages, budget_remaining
|
||||||
|
)
|
||||||
|
if summarized:
|
||||||
|
messages.append({
|
||||||
|
"role": "system",
|
||||||
|
"content": f"Conversation summary:\n{summarized}",
|
||||||
|
})
|
||||||
|
|
||||||
|
messages.append({"role": "user", "content": task.user_message})
|
||||||
|
return messages
|
||||||
|
|
||||||
|
def _format_memory(self, records: list[dict[str, str]]) -> str:
|
||||||
|
lines = [
|
||||||
|
f"- {record.get('scope', 'memory')}: {record.get('text', '')}"
|
||||||
|
for record in records
|
||||||
|
if record.get("text")
|
||||||
]
|
]
|
||||||
|
return "Relevant memory:\n" + "\n".join(lines) if lines else ""
|
||||||
|
|
||||||
|
def _format_observations(self, observations: list[dict[str, Any]]) -> str:
|
||||||
|
parts = []
|
||||||
|
for obs in observations:
|
||||||
|
tool = obs.get("tool", "unknown")
|
||||||
|
result = obs.get("result", {})
|
||||||
|
ok = result.get("ok", False)
|
||||||
|
output = result.get("output", "")
|
||||||
|
error = result.get("error", "")
|
||||||
|
status = "ok" if ok else "error"
|
||||||
|
part = f"- {tool} ({status})"
|
||||||
|
if output:
|
||||||
|
part += f"\n output: {output[:200]}"
|
||||||
|
if error:
|
||||||
|
part += f"\n error: {error[:200]}"
|
||||||
|
parts.append(part)
|
||||||
|
return "\n".join(parts)
|
||||||
|
|
||||||
|
def _truncate_text(self, text: str, max_tokens: int) -> str:
|
||||||
|
"""Truncate text to fit within max_tokens."""
|
||||||
|
max_chars = max_tokens * _CHARS_PER_TOKEN
|
||||||
|
if len(text) <= max_chars:
|
||||||
|
return text
|
||||||
|
return text[:max_chars] + "\n... (truncated)"
|
||||||
|
|
||||||
|
def _summarize_history(
|
||||||
|
self,
|
||||||
|
history: list[dict[str, str]],
|
||||||
|
budget_tokens: int,
|
||||||
|
) -> str | None:
|
||||||
|
"""Summarize conversation history to fit budget.
|
||||||
|
|
||||||
|
Synchronous callers use deterministic truncation. Runtime code should
|
||||||
|
call build_async_messages() when LLM summarization is desired.
|
||||||
|
"""
|
||||||
|
if not history:
|
||||||
|
return None
|
||||||
|
|
||||||
|
result = []
|
||||||
|
remaining = budget_tokens
|
||||||
|
for msg in reversed(history):
|
||||||
|
tokens = estimate_tokens(msg.get("content", "")) + 4
|
||||||
|
if tokens > remaining:
|
||||||
|
break
|
||||||
|
result.append(f"{msg['role']}: {msg['content'][:100]}")
|
||||||
|
remaining -= tokens
|
||||||
|
return "\n".join(reversed(result)) if result else None
|
||||||
|
|
||||||
|
async def _summarize_history_async(
|
||||||
|
self,
|
||||||
|
history: list[dict[str, str]],
|
||||||
|
budget_tokens: int,
|
||||||
|
) -> str | None:
|
||||||
|
if self._model_client is None:
|
||||||
|
return self._summarize_history(history, budget_tokens)
|
||||||
|
summarized = await self._llm_summarize_history(history, budget_tokens)
|
||||||
|
return summarized or self._summarize_history(history, budget_tokens)
|
||||||
|
|
||||||
|
async def _llm_summarize_history(
|
||||||
|
self,
|
||||||
|
history: list[dict[str, str]],
|
||||||
|
budget_tokens: int,
|
||||||
|
) -> str | None:
|
||||||
|
"""Use summary-role LLM to compress history."""
|
||||||
|
try:
|
||||||
|
history_text = "\n".join(
|
||||||
|
f"{m['role']}: {m.get('content', '')}" for m in history
|
||||||
|
)
|
||||||
|
response = await self._model_client.chat(
|
||||||
|
self.summary_role,
|
||||||
|
[{
|
||||||
|
"role": "user",
|
||||||
|
"content": (
|
||||||
|
"Summarize this conversation history. Keep decisions, outcomes, "
|
||||||
|
"and key facts. Be concise.\n\n"
|
||||||
|
+ history_text
|
||||||
|
),
|
||||||
|
}],
|
||||||
|
)
|
||||||
|
summary = response.content
|
||||||
|
# Ensure summary fits budget
|
||||||
|
return self._truncate_text(summary, budget_tokens)
|
||||||
|
except Exception as exc:
|
||||||
|
logger.warning("History summarization failed: %s", exc)
|
||||||
|
return None
|
||||||
|
|
|
||||||
|
|
@ -1,16 +1,82 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
from pydantic import BaseModel
|
from pydantic import BaseModel
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
class MemoryDecision(BaseModel):
|
class MemoryDecision(BaseModel):
|
||||||
should_store: bool
|
should_store: bool = False
|
||||||
memory_type: str
|
memory_type: str = "note"
|
||||||
summary: str
|
summary: str = ""
|
||||||
importance: float
|
importance: float = 0.0
|
||||||
|
scope: str = "workspace"
|
||||||
metadata: dict[str, str] = {}
|
metadata: dict[str, str] = {}
|
||||||
|
|
||||||
|
|
||||||
class MemoryPolicy:
|
class MemoryPolicy:
|
||||||
|
"""Decides whether task output should be stored in memory.
|
||||||
|
|
||||||
|
When *model_client* is provided, uses an LLM call to classify the task
|
||||||
|
transcript. Falls back to a safe default (should_store=False) on any
|
||||||
|
error so the runtime is never blocked by policy failures.
|
||||||
|
"""
|
||||||
|
|
||||||
|
_PROMPT_SYSTEM = (
|
||||||
|
"You are DuckLM memory policy. Decide whether the given task transcript "
|
||||||
|
"contains information worth storing in long-term memory.\n\n"
|
||||||
|
"Return ONLY valid JSON with these keys:\n"
|
||||||
|
" should_store: boolean — true if this is worth remembering\n"
|
||||||
|
" memory_type: string — one of: fact, preference, lesson, decision, event, note\n"
|
||||||
|
" summary: string — concise one-sentence summary (max 200 chars)\n"
|
||||||
|
" importance: number — 0.0 to 1.0\n"
|
||||||
|
" scope: string — one of: global, workspace, conversation\n"
|
||||||
|
" metadata: object — optional extra key-value pairs\n\n"
|
||||||
|
"Rules:\n"
|
||||||
|
"- Store user preferences, important decisions, reusable lessons, key facts.\n"
|
||||||
|
"- Do NOT store routine tool calls, temporary state, or trivial observations.\n"
|
||||||
|
"- importance >= 0.7 for preferences and lessons, >= 0.4 for facts, < 0.4 for events.\n"
|
||||||
|
"- scope='global' for user preferences and system-wide facts.\n"
|
||||||
|
"- scope='workspace' for project-specific information.\n"
|
||||||
|
"- scope='conversation' for chat-specific context.\n"
|
||||||
|
)
|
||||||
|
|
||||||
|
_RESPONSE_SCHEMA = {
|
||||||
|
"type": "object",
|
||||||
|
"required": ["should_store", "memory_type", "summary", "importance", "scope", "metadata"],
|
||||||
|
"additionalProperties": False,
|
||||||
|
"properties": {
|
||||||
|
"should_store": {"type": "boolean"},
|
||||||
|
"memory_type": {
|
||||||
|
"type": "string",
|
||||||
|
"enum": ["fact", "preference", "lesson", "decision", "event", "note"],
|
||||||
|
},
|
||||||
|
"summary": {"type": "string", "maxLength": 300},
|
||||||
|
"importance": {"type": "number", "minimum": 0.0, "maximum": 1.0},
|
||||||
|
"scope": {"type": "string", "enum": ["global", "workspace", "conversation"]},
|
||||||
|
"metadata": {"type": "object", "additionalProperties": {"type": "string"}},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
model_client: Any | None = None,
|
||||||
|
role: str = "memory_policy",
|
||||||
|
):
|
||||||
|
self._model_client = model_client
|
||||||
|
self._role = role
|
||||||
|
|
||||||
async def classify(self, summary: str, task_id: str) -> MemoryDecision:
|
async def classify(self, summary: str, task_id: str) -> MemoryDecision:
|
||||||
|
"""Classify whether *summary* from *task_id* should be stored in memory.
|
||||||
|
|
||||||
|
If no model client is configured, returns the safe default
|
||||||
|
(should_store=False) — the old stub behaviour.
|
||||||
|
"""
|
||||||
|
if self._model_client is None:
|
||||||
return MemoryDecision(
|
return MemoryDecision(
|
||||||
should_store=False,
|
should_store=False,
|
||||||
memory_type="event",
|
memory_type="event",
|
||||||
|
|
@ -18,3 +84,77 @@ class MemoryPolicy:
|
||||||
importance=0.0,
|
importance=0.0,
|
||||||
metadata={"task_id": task_id, "source": "stub_policy"},
|
metadata={"task_id": task_id, "source": "stub_policy"},
|
||||||
)
|
)
|
||||||
|
|
||||||
|
return await self._classify_with_llm(summary, task_id)
|
||||||
|
|
||||||
|
async def _classify_with_llm(self, summary: str, task_id: str) -> MemoryDecision:
|
||||||
|
messages = [
|
||||||
|
{
|
||||||
|
"role": "user",
|
||||||
|
"content": f"Task ID: {task_id}\n\nTranscript:\n{summary}",
|
||||||
|
}
|
||||||
|
]
|
||||||
|
|
||||||
|
response_format = {
|
||||||
|
"type": "json_schema",
|
||||||
|
"json_schema": {
|
||||||
|
"name": "memory_decision",
|
||||||
|
"schema": self._RESPONSE_SCHEMA,
|
||||||
|
"strict": True,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = await self._model_client.chat(
|
||||||
|
self._role,
|
||||||
|
messages,
|
||||||
|
response_format=response_format,
|
||||||
|
)
|
||||||
|
except Exception as exc:
|
||||||
|
logger.warning("MemoryPolicy LLM call failed for %s: %s", task_id, exc)
|
||||||
|
return MemoryDecision(
|
||||||
|
should_store=False,
|
||||||
|
memory_type="event",
|
||||||
|
summary=summary,
|
||||||
|
importance=0.0,
|
||||||
|
metadata={"task_id": task_id, "source": "llm_policy_fallback"},
|
||||||
|
)
|
||||||
|
|
||||||
|
return self._parse_response(response.content, summary, task_id)
|
||||||
|
|
||||||
|
def _parse_response(self, content: str, summary: str, task_id: str) -> MemoryDecision:
|
||||||
|
try:
|
||||||
|
data = json.loads(content)
|
||||||
|
except (json.JSONDecodeError, TypeError):
|
||||||
|
logger.warning("MemoryPolicy: invalid JSON for %s: %s", task_id, content[:200])
|
||||||
|
return MemoryDecision(
|
||||||
|
should_store=False,
|
||||||
|
memory_type="event",
|
||||||
|
summary=summary,
|
||||||
|
importance=0.0,
|
||||||
|
metadata={"task_id": task_id, "source": "llm_policy_fallback"},
|
||||||
|
)
|
||||||
|
|
||||||
|
required = ("should_store", "memory_type", "summary", "importance", "scope")
|
||||||
|
if not all(key in data for key in required):
|
||||||
|
logger.warning("MemoryPolicy: missing fields for %s: %s", task_id, list(data.keys()))
|
||||||
|
return MemoryDecision(
|
||||||
|
should_store=False,
|
||||||
|
memory_type="event",
|
||||||
|
summary=summary,
|
||||||
|
importance=0.0,
|
||||||
|
metadata={"task_id": task_id, "source": "llm_policy_fallback"},
|
||||||
|
)
|
||||||
|
|
||||||
|
return MemoryDecision(
|
||||||
|
should_store=bool(data.get("should_store", False)),
|
||||||
|
memory_type=str(data.get("memory_type", "note")),
|
||||||
|
summary=str(data.get("summary", summary))[:300],
|
||||||
|
importance=float(max(0.0, min(data.get("importance", 0.0), 1.0))),
|
||||||
|
scope=str(data.get("scope", "workspace")),
|
||||||
|
metadata={
|
||||||
|
"task_id": task_id,
|
||||||
|
"source": "llm_policy",
|
||||||
|
**{str(k): str(v) for k, v in data.get("metadata", {}).items()},
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
|
||||||
|
|
@ -1,28 +1,70 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
|
from pathlib import Path
|
||||||
from typing import Any
|
from typing import Any
|
||||||
from uuid import uuid4
|
from uuid import uuid4
|
||||||
|
|
||||||
import httpx
|
import httpx
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
class EmbeddingsUnavailableError(RuntimeError):
|
class EmbeddingsUnavailableError(RuntimeError):
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
|
||||||
class VectorMemory:
|
class VectorMemory:
|
||||||
|
"""Semantic memory using Qdrant for vector storage and sentence-transformers for embeddings.
|
||||||
|
|
||||||
|
Supports two modes:
|
||||||
|
1. Local sentence-transformers model (default): uses all-MiniLM-L6-v2
|
||||||
|
2. Remote embeddings endpoint: uses llama-server or OpenAI-compatible /v1/embeddings
|
||||||
|
"""
|
||||||
|
|
||||||
def __init__(
|
def __init__(
|
||||||
self,
|
self,
|
||||||
qdrant_url: str,
|
qdrant_url: str,
|
||||||
collection_name: str = "duck_memory",
|
collection_name: str = "duck_memory",
|
||||||
embeddings_base_url: str | None = "http://127.0.0.1:8081/v1",
|
embeddings_base_url: str | None = None,
|
||||||
|
local_embedding_model: str | None = "all-MiniLM-L6-v2",
|
||||||
):
|
):
|
||||||
self.qdrant_url = qdrant_url.rstrip("/")
|
self.qdrant_url = qdrant_url.rstrip("/")
|
||||||
self.collection_name = collection_name
|
self.collection_name = collection_name
|
||||||
self.embeddings_base_url = embeddings_base_url.rstrip("/") if embeddings_base_url else None
|
self.embeddings_base_url = embeddings_base_url.rstrip("/") if embeddings_base_url else None
|
||||||
|
self._local_model_path = local_embedding_model
|
||||||
|
self._local_model = None
|
||||||
|
|
||||||
|
def _load_local_model(self):
|
||||||
|
"""Lazy-load the sentence-transformers model."""
|
||||||
|
if self._local_model is not None:
|
||||||
|
return self._local_model
|
||||||
|
|
||||||
|
try:
|
||||||
|
from sentence_transformers import SentenceTransformer
|
||||||
|
|
||||||
|
model_path = self._local_model_path
|
||||||
|
# Check if it's a local path or a HuggingFace model name
|
||||||
|
if Path(model_path).exists():
|
||||||
|
logger.info("Loading local embedding model from %s", model_path)
|
||||||
|
self._local_model = SentenceTransformer(model_path)
|
||||||
|
else:
|
||||||
|
logger.info("Loading embedding model from HuggingFace: %s", model_path)
|
||||||
|
self._local_model = SentenceTransformer(model_path)
|
||||||
|
return self._local_model
|
||||||
|
except ImportError:
|
||||||
|
raise EmbeddingsUnavailableError(
|
||||||
|
"sentence-transformers is not installed. "
|
||||||
|
"Install with: pip install sentence-transformers"
|
||||||
|
)
|
||||||
|
except Exception as exc:
|
||||||
|
raise EmbeddingsUnavailableError(f"Failed to load embedding model: {exc}")
|
||||||
|
|
||||||
async def add_memory(self, text: str, metadata: dict[str, Any] | None = None) -> str:
|
async def add_memory(self, text: str, metadata: dict[str, Any] | None = None) -> str:
|
||||||
vector = await self._embed(text)
|
vector = await self._embed(text)
|
||||||
point_id = str(uuid4())
|
point_id = str(uuid4())
|
||||||
async with httpx.AsyncClient(timeout=20.0, trust_env=False) as client:
|
async with httpx.AsyncClient(timeout=20.0, trust_env=False) as client:
|
||||||
|
# Create collection if not exists (ignore error if already exists)
|
||||||
await client.put(
|
await client.put(
|
||||||
f"{self.qdrant_url}/collections/{self.collection_name}",
|
f"{self.qdrant_url}/collections/{self.collection_name}",
|
||||||
json={"vectors": {"size": len(vector), "distance": "Cosine"}},
|
json={"vectors": {"size": len(vector), "distance": "Cosine"}},
|
||||||
|
|
@ -53,10 +95,32 @@ class VectorMemory:
|
||||||
return response.json().get("result", [])
|
return response.json().get("result", [])
|
||||||
|
|
||||||
async def _embed(self, text: str) -> list[float]:
|
async def _embed(self, text: str) -> list[float]:
|
||||||
if not self.embeddings_base_url:
|
"""Generate embeddings using local model or remote endpoint."""
|
||||||
|
# Prefer local model if available
|
||||||
|
if self._local_model_path and not self.embeddings_base_url:
|
||||||
|
return await self._local_embed(text)
|
||||||
|
# Fall back to remote endpoint
|
||||||
|
if self.embeddings_base_url:
|
||||||
|
return await self._remote_embed(text)
|
||||||
raise EmbeddingsUnavailableError(
|
raise EmbeddingsUnavailableError(
|
||||||
"Embeddings endpoint is not configured; vector memory is explicit stub."
|
"No embedding source configured. Set local_embedding_model or embeddings_base_url."
|
||||||
)
|
)
|
||||||
|
|
||||||
|
async def _local_embed(self, text: str) -> list[float]:
|
||||||
|
"""Generate embeddings using local sentence-transformers model."""
|
||||||
|
import asyncio
|
||||||
|
|
||||||
|
model = self._load_local_model()
|
||||||
|
# Run in thread pool to not block event loop
|
||||||
|
loop = asyncio.get_event_loop()
|
||||||
|
vector = await loop.run_in_executor(None, lambda: model.encode(text))
|
||||||
|
# Handle both numpy arrays and plain lists
|
||||||
|
if hasattr(vector, "tolist"):
|
||||||
|
vector = vector.tolist()
|
||||||
|
return [float(v) for v in vector]
|
||||||
|
|
||||||
|
async def _remote_embed(self, text: str) -> list[float]:
|
||||||
|
"""Generate embeddings using remote /v1/embeddings endpoint."""
|
||||||
async with httpx.AsyncClient(timeout=20.0, trust_env=False) as client:
|
async with httpx.AsyncClient(timeout=20.0, trust_env=False) as client:
|
||||||
response = await client.post(
|
response = await client.post(
|
||||||
f"{self.embeddings_base_url}/embeddings",
|
f"{self.embeddings_base_url}/embeddings",
|
||||||
|
|
|
||||||
|
|
@ -1,14 +1,22 @@
|
||||||
import json
|
import json
|
||||||
|
import logging
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
from duck_core.approvals.service import ApprovalService
|
from duck_core.approvals.service import ApprovalService
|
||||||
from duck_core.context_builder import ContextBuilder
|
from duck_core.context_builder import ContextBuilder
|
||||||
from duck_core.events.store import EventStore
|
from duck_core.events.store import EventStore
|
||||||
|
from duck_core.experience.recorder import ExperienceRecorder
|
||||||
|
from duck_core.memory.policy import MemoryPolicy
|
||||||
|
from duck_core.memory.store import MemoryStore
|
||||||
|
from duck_core.memory.vector_memory import VectorMemory
|
||||||
from duck_core.model_client import ModelClient
|
from duck_core.model_client import ModelClient
|
||||||
|
from duck_core.reflection import Reflection
|
||||||
from duck_core.tasks.store import TaskStore
|
from duck_core.tasks.store import TaskStore
|
||||||
from duck_core.tools.gateway import ToolGateway
|
|
||||||
from duck_core.tools.base import ToolResult
|
from duck_core.tools.base import ToolResult
|
||||||
|
from duck_core.tools.gateway import ToolGateway
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
|
|
@ -27,13 +35,23 @@ class RuntimeLoop:
|
||||||
model_client: ModelClient | None = None,
|
model_client: ModelClient | None = None,
|
||||||
context_builder: ContextBuilder | None = None,
|
context_builder: ContextBuilder | None = None,
|
||||||
approval_service: ApprovalService | None = None,
|
approval_service: ApprovalService | None = None,
|
||||||
|
memory_policy: MemoryPolicy | None = None,
|
||||||
|
memory_store: MemoryStore | None = None,
|
||||||
|
vector_memory: VectorMemory | None = None,
|
||||||
|
experience_recorder: ExperienceRecorder | None = None,
|
||||||
max_tool_iterations: int = 4,
|
max_tool_iterations: int = 4,
|
||||||
):
|
):
|
||||||
self.task_store = task_store
|
self.task_store = task_store
|
||||||
self.event_store = event_store
|
self.event_store = event_store
|
||||||
self.model_client = model_client or ModelClient()
|
self.model_client = model_client or ModelClient()
|
||||||
self.context_builder = context_builder or ContextBuilder()
|
self.context_builder = context_builder or ContextBuilder(
|
||||||
|
model_client=self.model_client
|
||||||
|
)
|
||||||
self.approval_service = approval_service
|
self.approval_service = approval_service
|
||||||
|
self.memory_policy = memory_policy or MemoryPolicy(model_client=self.model_client)
|
||||||
|
self.memory_store = memory_store
|
||||||
|
self.vector_memory = vector_memory
|
||||||
|
self.experience_recorder = experience_recorder
|
||||||
self.max_tool_iterations = max_tool_iterations
|
self.max_tool_iterations = max_tool_iterations
|
||||||
|
|
||||||
async def run_chat(
|
async def run_chat(
|
||||||
|
|
@ -43,6 +61,7 @@ class RuntimeLoop:
|
||||||
debug: bool = False,
|
debug: bool = False,
|
||||||
history_messages: list[dict[str, str]] | None = None,
|
history_messages: list[dict[str, str]] | None = None,
|
||||||
memory_records: list[dict[str, str]] | None = None,
|
memory_records: list[dict[str, str]] | None = None,
|
||||||
|
reflect: bool = True,
|
||||||
) -> ChatResult:
|
) -> ChatResult:
|
||||||
task = await self.task_store.create_task(message, workspace, debug)
|
task = await self.task_store.create_task(message, workspace, debug)
|
||||||
await self.event_store.append(
|
await self.event_store.append(
|
||||||
|
|
@ -51,7 +70,7 @@ class RuntimeLoop:
|
||||||
{"message": message, "workspace": workspace, "debug": debug},
|
{"message": message, "workspace": workspace, "debug": debug},
|
||||||
)
|
)
|
||||||
try:
|
try:
|
||||||
messages = self.context_builder.build_basic_messages(
|
messages = await self.context_builder.build_async_messages(
|
||||||
task, history_messages, memory_records
|
task, history_messages, memory_records
|
||||||
)
|
)
|
||||||
tool_observations = await self._run_action_loop(task.task_id, messages, workspace)
|
tool_observations = await self._run_action_loop(task.task_id, messages, workspace)
|
||||||
|
|
@ -111,6 +130,9 @@ class RuntimeLoop:
|
||||||
"reasoning_content": response.reasoning_content,
|
"reasoning_content": response.reasoning_content,
|
||||||
},
|
},
|
||||||
)
|
)
|
||||||
|
await self._run_memory_policy(task.task_id, response.content)
|
||||||
|
if reflect:
|
||||||
|
await self._run_reflection(task.task_id)
|
||||||
return ChatResult(
|
return ChatResult(
|
||||||
task_id=task.task_id,
|
task_id=task.task_id,
|
||||||
status="completed",
|
status="completed",
|
||||||
|
|
@ -172,7 +194,7 @@ class RuntimeLoop:
|
||||||
tool_observation = await self._run_approved_or_denied_action(
|
tool_observation = await self._run_approved_or_denied_action(
|
||||||
task_id, approval.normalized_action, approval.decision
|
task_id, approval.normalized_action, approval.decision
|
||||||
)
|
)
|
||||||
messages = self.context_builder.build_basic_messages(task)
|
messages = await self.context_builder.build_async_messages(task)
|
||||||
tool_observations = [tool_observation]
|
tool_observations = [tool_observation]
|
||||||
if approval.decision != "deny":
|
if approval.decision != "deny":
|
||||||
tool_observations = await self._run_action_loop(
|
tool_observations = await self._run_action_loop(
|
||||||
|
|
@ -250,6 +272,83 @@ class RuntimeLoop:
|
||||||
reasoning_content=None,
|
reasoning_content=None,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
async def _run_memory_policy(self, task_id: str, final_response: str) -> None:
|
||||||
|
"""Classify task output and store in memory if policy says so."""
|
||||||
|
if self.memory_store is None:
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
decision = await self.memory_policy.classify(final_response, task_id)
|
||||||
|
await self.event_store.append(
|
||||||
|
task_id,
|
||||||
|
"memory_policy_decision",
|
||||||
|
decision.model_dump(),
|
||||||
|
)
|
||||||
|
if decision.should_store:
|
||||||
|
task = await self.task_store.get_task(task_id)
|
||||||
|
memory_workspace = task.workspace if task and task.workspace else ""
|
||||||
|
await self.memory_store.add(
|
||||||
|
text=decision.summary,
|
||||||
|
workspace=memory_workspace,
|
||||||
|
scope=decision.scope,
|
||||||
|
memory_type=decision.memory_type,
|
||||||
|
importance=decision.importance,
|
||||||
|
metadata=decision.metadata,
|
||||||
|
)
|
||||||
|
if self.vector_memory is not None:
|
||||||
|
try:
|
||||||
|
await self.vector_memory.add_memory(
|
||||||
|
text=decision.summary,
|
||||||
|
metadata={"scope": decision.scope, "memory_type": decision.memory_type},
|
||||||
|
)
|
||||||
|
except Exception as vec_exc:
|
||||||
|
logger.warning("Vector memory store failed for %s: %s", task_id, vec_exc)
|
||||||
|
await self.event_store.append(
|
||||||
|
task_id,
|
||||||
|
"memory_stored",
|
||||||
|
{"summary": decision.summary, "scope": decision.scope},
|
||||||
|
)
|
||||||
|
except Exception as exc:
|
||||||
|
logger.warning("Memory policy failed for %s: %s", task_id, exc)
|
||||||
|
await self.event_store.append(
|
||||||
|
task_id,
|
||||||
|
"memory_policy_failed",
|
||||||
|
{"error": str(exc)},
|
||||||
|
)
|
||||||
|
|
||||||
|
async def _run_reflection(self, task_id: str) -> None:
|
||||||
|
"""Run critic reflection on completed task and record experience."""
|
||||||
|
if self.experience_recorder is None:
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
events = await self.event_store.list_events(task_id)
|
||||||
|
transcript_lines = []
|
||||||
|
for event in events:
|
||||||
|
line = f"[{event.event_type}] {json.dumps(event.payload, ensure_ascii=False)}"
|
||||||
|
transcript_lines.append(line)
|
||||||
|
transcript = "\n".join(transcript_lines)
|
||||||
|
|
||||||
|
reflection = Reflection(
|
||||||
|
model_client=self.model_client,
|
||||||
|
recorder=self.experience_recorder,
|
||||||
|
)
|
||||||
|
record = await reflection.reflect(task_id, transcript)
|
||||||
|
await self.event_store.append(
|
||||||
|
task_id,
|
||||||
|
"reflection_completed",
|
||||||
|
{
|
||||||
|
"record_id": record.id,
|
||||||
|
"summary": record.summary[:200],
|
||||||
|
"reusable_lesson": record.reusable_lesson[:200] if record.reusable_lesson else None,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
except Exception as exc:
|
||||||
|
logger.warning("Reflection failed for %s: %s", task_id, exc)
|
||||||
|
await self.event_store.append(
|
||||||
|
task_id,
|
||||||
|
"reflection_failed",
|
||||||
|
{"error": str(exc)},
|
||||||
|
)
|
||||||
|
|
||||||
async def _run_approved_or_denied_action(
|
async def _run_approved_or_denied_action(
|
||||||
self,
|
self,
|
||||||
task_id: str,
|
task_id: str,
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,70 @@
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from duck_core.model_client import ModelClient
|
||||||
|
from duck_core.tools.base import ToolResult
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class CoderTool:
|
||||||
|
"""Tool that delegates code generation to the coder-role LLM.
|
||||||
|
|
||||||
|
The coder model receives the task description and relevant context,
|
||||||
|
then returns code or technical analysis.
|
||||||
|
"""
|
||||||
|
|
||||||
|
name = "coder"
|
||||||
|
risk_level = "low"
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
model_client: ModelClient | None = None,
|
||||||
|
role: str = "coder",
|
||||||
|
max_output_tokens: int = 16384,
|
||||||
|
):
|
||||||
|
self._model_client = model_client
|
||||||
|
self._role = role
|
||||||
|
self._max_output_tokens = max_output_tokens
|
||||||
|
|
||||||
|
async def run(self, args: dict[str, Any]) -> ToolResult:
|
||||||
|
task_description = str(args.get("task_description", "")).strip()
|
||||||
|
if not task_description:
|
||||||
|
return ToolResult(ok=False, error="task_description is required for coder tool")
|
||||||
|
|
||||||
|
context = str(args.get("context", "")).strip()
|
||||||
|
language = str(args.get("language", "python")).strip()
|
||||||
|
|
||||||
|
prompt_parts = [f"Task: {task_description}"]
|
||||||
|
if language:
|
||||||
|
prompt_parts.append(f"Language: {language}")
|
||||||
|
if context:
|
||||||
|
prompt_parts.append(f"Context:\n{context}")
|
||||||
|
|
||||||
|
messages = [{"role": "user", "content": "\n\n".join(prompt_parts)}]
|
||||||
|
|
||||||
|
try:
|
||||||
|
if self._model_client is None:
|
||||||
|
return ToolResult(
|
||||||
|
ok=False,
|
||||||
|
error="Coder tool has no model client configured",
|
||||||
|
)
|
||||||
|
response = await self._model_client.chat(
|
||||||
|
self._role,
|
||||||
|
messages,
|
||||||
|
max_output_tokens=self._max_output_tokens,
|
||||||
|
)
|
||||||
|
return ToolResult(
|
||||||
|
ok=True,
|
||||||
|
output=response.content,
|
||||||
|
metadata={
|
||||||
|
"role": self._role,
|
||||||
|
"model": response.model,
|
||||||
|
"latency_ms": response.latency_ms,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
except Exception as exc:
|
||||||
|
logger.warning("Coder tool failed: %s", exc)
|
||||||
|
return ToolResult(ok=False, error=f"Coder tool failed: {exc}")
|
||||||
|
|
@ -1,6 +1,7 @@
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
from duck_core.tools.base import Tool, ToolResult
|
from duck_core.tools.base import Tool, ToolResult
|
||||||
|
from duck_core.tools.coder import CoderTool
|
||||||
from duck_core.tools.file_read import FileReadTool
|
from duck_core.tools.file_read import FileReadTool
|
||||||
from duck_core.tools.file_write import FileWriteTool
|
from duck_core.tools.file_write import FileWriteTool
|
||||||
from duck_core.tools.list_dir import ListDirTool
|
from duck_core.tools.list_dir import ListDirTool
|
||||||
|
|
@ -13,7 +14,7 @@ class ToolGateway:
|
||||||
self.tools = {tool.name: tool for tool in tools}
|
self.tools = {tool.name: tool for tool in tools}
|
||||||
|
|
||||||
@classmethod
|
@classmethod
|
||||||
def default(cls, workspace: str) -> "ToolGateway":
|
def default(cls, workspace: str, model_client: Any = None) -> "ToolGateway":
|
||||||
return cls(
|
return cls(
|
||||||
[
|
[
|
||||||
FileReadTool(workspace),
|
FileReadTool(workspace),
|
||||||
|
|
@ -21,9 +22,19 @@ class ToolGateway:
|
||||||
ListDirTool(workspace),
|
ListDirTool(workspace),
|
||||||
SearchFilesTool(workspace),
|
SearchFilesTool(workspace),
|
||||||
ShellExecSafeTool(workspace),
|
ShellExecSafeTool(workspace),
|
||||||
|
CoderTool(model_client=model_client),
|
||||||
]
|
]
|
||||||
)
|
)
|
||||||
|
|
||||||
|
def with_model_client(self, model_client: Any) -> "ToolGateway":
|
||||||
|
"""Return a new gateway with model-dependent tools configured."""
|
||||||
|
new_tools = list(self.tools.values())
|
||||||
|
# Replace coder tool with one that has model_client
|
||||||
|
new_tools = [
|
||||||
|
t for t in new_tools if not isinstance(t, CoderTool)
|
||||||
|
] + [CoderTool(model_client=model_client)]
|
||||||
|
return self.__class__(new_tools)
|
||||||
|
|
||||||
async def run_action(
|
async def run_action(
|
||||||
self, action: dict[str, Any], approved: bool = False, password: str | None = None
|
self, action: dict[str, Any], approved: bool = False, password: str | None = None
|
||||||
) -> ToolResult:
|
) -> ToolResult:
|
||||||
|
|
|
||||||
|
|
@ -14,6 +14,8 @@ Available tools:
|
||||||
Args: {"query": "text to find", "path": ".", "glob": "*.py"}
|
Args: {"query": "text to find", "path": ".", "glob": "*.py"}
|
||||||
- shell_exec_safe: run a safe allowlisted shell command in the current workspace.
|
- shell_exec_safe: run a safe allowlisted shell command in the current workspace.
|
||||||
Args: {"command": "pwd"}
|
Args: {"command": "pwd"}
|
||||||
|
- coder: delegate a code generation or analysis task to the coder-role LLM.
|
||||||
|
Args: {"task_description": "what to build or analyze", "language": "python", "context": "optional context"}
|
||||||
|
|
||||||
Return actions=[] when the user can be answered directly without tools.
|
Return actions=[] when the user can be answered directly without tools.
|
||||||
When tool_observations are already present, request only genuinely missing
|
When tool_observations are already present, request only genuinely missing
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,17 @@
|
||||||
|
You are DuckLM memory policy classifier. Decide whether a task transcript contains information worth storing in long-term memory.
|
||||||
|
|
||||||
|
Return ONLY valid JSON with these keys:
|
||||||
|
should_store: boolean — true if this is worth remembering
|
||||||
|
memory_type: string — one of: fact, preference, lesson, decision, event, note
|
||||||
|
summary: string — concise one-sentence summary (max 200 chars)
|
||||||
|
importance: number — 0.0 to 1.0
|
||||||
|
scope: string — one of: global, workspace, conversation
|
||||||
|
metadata: object — optional extra key-value pairs
|
||||||
|
|
||||||
|
Rules:
|
||||||
|
- Store user preferences, important decisions, reusable lessons, key facts.
|
||||||
|
- Do NOT store routine tool calls, temporary state, or trivial observations.
|
||||||
|
- importance >= 0.7 for preferences and lessons, >= 0.4 for facts, < 0.4 for events.
|
||||||
|
- scope='global' for user preferences and system-wide facts.
|
||||||
|
- scope='workspace' for project-specific information.
|
||||||
|
- scope='conversation' for chat-specific context.
|
||||||
|
|
@ -0,0 +1,14 @@
|
||||||
|
You are DuckLM recall role. Given a user query and a list of memory records, identify which memories are relevant to the query.
|
||||||
|
|
||||||
|
Return ONLY valid JSON:
|
||||||
|
{
|
||||||
|
"relevant_ids": ["memory_id_1", "memory_id_2"],
|
||||||
|
"reasoning": "brief explanation of why these memories were selected"
|
||||||
|
}
|
||||||
|
|
||||||
|
Rules:
|
||||||
|
- Only include memories that are directly relevant to the user's current query
|
||||||
|
- Prefer specific memories over general ones
|
||||||
|
- Include global memories if they apply to the current context
|
||||||
|
- If no memories are relevant, return empty relevant_ids array
|
||||||
|
- Be conservative — better to include too few than too many irrelevant memories
|
||||||
|
|
@ -13,7 +13,8 @@ dependencies = [
|
||||||
"python-dotenv",
|
"python-dotenv",
|
||||||
"jsonschema",
|
"jsonschema",
|
||||||
"aiosqlite",
|
"aiosqlite",
|
||||||
"qdrant-client"
|
"qdrant-client",
|
||||||
|
"sentence-transformers"
|
||||||
]
|
]
|
||||||
|
|
||||||
[project.optional-dependencies]
|
[project.optional-dependencies]
|
||||||
|
|
|
||||||
|
|
@ -46,15 +46,12 @@ def test_chat_api_uses_runtime_and_records_events(tmp_path, monkeypatch):
|
||||||
|
|
||||||
assert payload["status"] == "completed"
|
assert payload["status"] == "completed"
|
||||||
assert "DuckLM" in payload["final_response"]
|
assert "DuckLM" in payload["final_response"]
|
||||||
assert [event["event_type"] for event in events] == [
|
event_types = [event["event_type"] for event in events]
|
||||||
"task_created",
|
# Core events that must always be present
|
||||||
"model_call_started",
|
assert "task_created" in event_types
|
||||||
"action_directive_failed",
|
assert "task_completed" in event_types
|
||||||
"model_call_started",
|
# Memory policy decision is now recorded after task completion
|
||||||
"cognition_response",
|
assert "memory_policy_decision" in event_types
|
||||||
"model_call_finished",
|
|
||||||
"task_completed",
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
||||||
def test_chat_api_exposes_pending_approval_from_runtime_tool_gate(tmp_path, monkeypatch):
|
def test_chat_api_exposes_pending_approval_from_runtime_tool_gate(tmp_path, monkeypatch):
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,197 @@
|
||||||
|
import pytest
|
||||||
|
from unittest.mock import AsyncMock
|
||||||
|
|
||||||
|
from duck_core.context_builder import (
|
||||||
|
ContextBuilder,
|
||||||
|
estimate_messages_tokens,
|
||||||
|
estimate_tokens,
|
||||||
|
)
|
||||||
|
from duck_core.model_client import ModelResponse
|
||||||
|
from duck_core.tasks.state import TaskState
|
||||||
|
|
||||||
|
|
||||||
|
def _make_task(message: str = "test") -> TaskState:
|
||||||
|
return TaskState(
|
||||||
|
task_id="task_1",
|
||||||
|
status="running",
|
||||||
|
user_message=message,
|
||||||
|
workspace="/tmp/test",
|
||||||
|
debug=False,
|
||||||
|
created_at="now",
|
||||||
|
updated_at="now",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_estimate_tokens_approximate():
|
||||||
|
assert estimate_tokens("hello world") == 2 # 11 chars / 4 = 2
|
||||||
|
assert estimate_tokens("") == 1 # minimum 1
|
||||||
|
assert estimate_tokens("a" * 400) == 100
|
||||||
|
|
||||||
|
|
||||||
|
def test_estimate_messages_tokens():
|
||||||
|
messages = [
|
||||||
|
{"role": "user", "content": "hello"},
|
||||||
|
{"role": "assistant", "content": "world"},
|
||||||
|
]
|
||||||
|
# Each message: content tokens + 4 overhead
|
||||||
|
tokens = estimate_messages_tokens(messages)
|
||||||
|
assert tokens > 0
|
||||||
|
# "hello" = 5 chars / 4 = 1 token + 4 overhead = 5
|
||||||
|
# "world" = 5 chars / 4 = 1 token + 4 overhead = 5
|
||||||
|
assert tokens == 10
|
||||||
|
|
||||||
|
|
||||||
|
def test_context_builder_basic_messages():
|
||||||
|
builder = ContextBuilder()
|
||||||
|
task = _make_task("What is DuckLM?")
|
||||||
|
messages = builder.build_basic_messages(task)
|
||||||
|
assert len(messages) == 1
|
||||||
|
assert messages[0]["role"] == "user"
|
||||||
|
assert messages[0]["content"] == "What is DuckLM?"
|
||||||
|
|
||||||
|
|
||||||
|
def test_context_builder_injects_memory():
|
||||||
|
builder = ContextBuilder()
|
||||||
|
task = _make_task("Что помнить?")
|
||||||
|
messages = builder.build_basic_messages(
|
||||||
|
task,
|
||||||
|
memory_records=[
|
||||||
|
{"scope": "global", "text": "Use Russian."},
|
||||||
|
{"scope": "workspace", "text": "DuckLM uses Vulkan."},
|
||||||
|
],
|
||||||
|
)
|
||||||
|
assert messages[0]["role"] == "system"
|
||||||
|
assert "Relevant memory" in messages[0]["content"]
|
||||||
|
assert "global: Use Russian." in messages[0]["content"]
|
||||||
|
assert messages[-1]["content"] == "Что помнить?"
|
||||||
|
|
||||||
|
|
||||||
|
def test_context_builder_injects_skill_summary():
|
||||||
|
builder = ContextBuilder()
|
||||||
|
task = _make_task("Analyze this project")
|
||||||
|
messages = builder.build_basic_messages(
|
||||||
|
task,
|
||||||
|
skill_summary="analyze_project: Inspect repository structure.",
|
||||||
|
)
|
||||||
|
assert any("Active skill" in m.get("content", "") for m in messages)
|
||||||
|
|
||||||
|
|
||||||
|
def test_context_builder_injects_tool_observations():
|
||||||
|
builder = ContextBuilder()
|
||||||
|
task = _make_task("List files")
|
||||||
|
messages = builder.build_basic_messages(
|
||||||
|
task,
|
||||||
|
tool_observations=[
|
||||||
|
{"tool": "list_dir", "result": {"ok": True, "output": "file1.txt\nfile2.txt"}},
|
||||||
|
],
|
||||||
|
)
|
||||||
|
obs_msg = [m for m in messages if "Tool observations" in m.get("content", "")]
|
||||||
|
assert len(obs_msg) == 1
|
||||||
|
assert "list_dir" in obs_msg[0]["content"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_context_builder_includes_history():
|
||||||
|
builder = ContextBuilder()
|
||||||
|
task = _make_task("Follow-up question")
|
||||||
|
history = [
|
||||||
|
{"role": "user", "content": "first question"},
|
||||||
|
{"role": "assistant", "content": "first answer"},
|
||||||
|
]
|
||||||
|
messages = builder.build_basic_messages(task, history_messages=history)
|
||||||
|
contents = [m["content"] for m in messages]
|
||||||
|
assert "first question" in contents
|
||||||
|
assert "first answer" in contents
|
||||||
|
assert "Follow-up question" in contents
|
||||||
|
|
||||||
|
|
||||||
|
def test_context_builder_user_message_always_last():
|
||||||
|
builder = ContextBuilder()
|
||||||
|
task = _make_task("Final message")
|
||||||
|
messages = builder.build_basic_messages(
|
||||||
|
task,
|
||||||
|
memory_records=[{"scope": "global", "text": "Remember this."}],
|
||||||
|
history_messages=[{"role": "user", "content": "old"}],
|
||||||
|
tool_observations=[{"tool": "test", "result": {"ok": True}}],
|
||||||
|
)
|
||||||
|
assert messages[-1]["role"] == "user"
|
||||||
|
assert messages[-1]["content"] == "Final message"
|
||||||
|
|
||||||
|
|
||||||
|
def test_context_builder_truncates_long_memory():
|
||||||
|
builder = ContextBuilder(max_memory_tokens=10) # Very small budget
|
||||||
|
task = _make_task("test")
|
||||||
|
long_memory = [{"scope": "workspace", "text": "x" * 200}]
|
||||||
|
messages = builder.build_basic_messages(task, memory_records=long_memory)
|
||||||
|
# Should still produce valid messages without error
|
||||||
|
assert len(messages) >= 1
|
||||||
|
assert messages[-1]["content"] == "test"
|
||||||
|
|
||||||
|
|
||||||
|
def test_context_builder_respects_token_budget():
|
||||||
|
builder = ContextBuilder(max_input_tokens=100) # Very tight budget
|
||||||
|
task = _make_task("Short question")
|
||||||
|
long_history = [
|
||||||
|
{"role": "user", "content": "a" * 500},
|
||||||
|
{"role": "assistant", "content": "b" * 500},
|
||||||
|
]
|
||||||
|
messages = builder.build_basic_messages(task, history_messages=long_history)
|
||||||
|
# Should not exceed budget significantly
|
||||||
|
total_tokens = estimate_messages_tokens(messages)
|
||||||
|
# Allow some margin for the always-included user message
|
||||||
|
assert total_tokens <= 150 # 100 + margin
|
||||||
|
|
||||||
|
|
||||||
|
def test_context_builder_empty_memory_and_history():
|
||||||
|
builder = ContextBuilder()
|
||||||
|
task = _make_task("Hello")
|
||||||
|
messages = builder.build_basic_messages(task)
|
||||||
|
assert len(messages) == 1
|
||||||
|
assert messages[0]["content"] == "Hello"
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_context_builder_recall_awaits_model_client():
|
||||||
|
model_client = AsyncMock()
|
||||||
|
model_client.chat = AsyncMock(
|
||||||
|
return_value=ModelResponse(
|
||||||
|
role="recall",
|
||||||
|
model="local-main",
|
||||||
|
content='{"relevant_ids":["mem_1"],"reasoning":"matches query"}',
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=1.0,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
builder = ContextBuilder(model_client=model_client)
|
||||||
|
|
||||||
|
records = [
|
||||||
|
{"memory_id": "mem_1", "text": "DuckLM uses Vulkan."},
|
||||||
|
{"memory_id": "mem_2", "text": "Unrelated."},
|
||||||
|
]
|
||||||
|
relevant = await builder.recall_relevant_memory("How does DuckLM run?", records)
|
||||||
|
|
||||||
|
assert relevant == [records[0]]
|
||||||
|
model_client.chat.assert_awaited_once()
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_context_builder_summary_awaits_model_client():
|
||||||
|
model_client = AsyncMock()
|
||||||
|
model_client.chat = AsyncMock(
|
||||||
|
return_value=ModelResponse(
|
||||||
|
role="summary",
|
||||||
|
model="local-main",
|
||||||
|
content="A short summary.",
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=1.0,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
builder = ContextBuilder(max_input_tokens=150, model_client=model_client)
|
||||||
|
task = _make_task("Current")
|
||||||
|
history = [{"role": "user", "content": "x" * 800}]
|
||||||
|
|
||||||
|
messages = await builder.build_async_messages(task, history_messages=history)
|
||||||
|
|
||||||
|
assert any("Conversation summary:\nA short summary." in m["content"] for m in messages)
|
||||||
|
model_client.chat.assert_awaited_once()
|
||||||
|
|
@ -125,5 +125,13 @@ def test_conversation_history_is_sent_to_model(tmp_path, monkeypatch):
|
||||||
json={"conversation_id": conversation["conversation_id"], "message": "second"},
|
json={"conversation_id": conversation["conversation_id"], "message": "second"},
|
||||||
)
|
)
|
||||||
|
|
||||||
second_call_content = [message["content"] for message in seen_thinker_messages[-1]]
|
# Filter out memory_policy and reflection calls — they use critic role
|
||||||
|
# with different message patterns
|
||||||
|
thinker_calls = [msgs for msgs in seen_thinker_messages if any(
|
||||||
|
msg.get("role") == "user"
|
||||||
|
and not msg.get("content", "").startswith("Task ID:")
|
||||||
|
and not msg.get("content", "").startswith("Reflect on this DuckLM task")
|
||||||
|
for msg in msgs
|
||||||
|
)]
|
||||||
|
second_call_content = [message["content"] for message in thinker_calls[-1]]
|
||||||
assert second_call_content == ["first", "answer 1", "second"]
|
assert second_call_content == ["first", "answer 1", "second"]
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,123 @@
|
||||||
|
import json
|
||||||
|
from unittest.mock import AsyncMock
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from duck_core.memory.policy import MemoryPolicy
|
||||||
|
from duck_core.model_client import ModelClient, ModelResponse
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def mock_model_client():
|
||||||
|
client = AsyncMock(spec=ModelClient)
|
||||||
|
client.chat = AsyncMock(
|
||||||
|
return_value=ModelResponse(
|
||||||
|
role="critic",
|
||||||
|
model="local-main",
|
||||||
|
content=json.dumps({
|
||||||
|
"should_store": True,
|
||||||
|
"memory_type": "preference",
|
||||||
|
"summary": "User prefers concise Russian answers.",
|
||||||
|
"importance": 0.9,
|
||||||
|
"scope": "global",
|
||||||
|
"metadata": {"source": "conversation"},
|
||||||
|
}),
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=42.0,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
return client
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_memory_policy_stub_returns_should_store_false():
|
||||||
|
policy = MemoryPolicy()
|
||||||
|
decision = await policy.classify("some summary", "task_123")
|
||||||
|
assert decision.should_store is False
|
||||||
|
assert decision.memory_type == "event"
|
||||||
|
assert decision.importance == 0.0
|
||||||
|
assert decision.metadata["source"] == "stub_policy"
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_llm_memory_policy_classifies_and_stores(mock_model_client):
|
||||||
|
policy = MemoryPolicy(model_client=mock_model_client, role="memory_policy")
|
||||||
|
decision = await policy.classify(
|
||||||
|
"User said they prefer short answers in Russian.", "task_456"
|
||||||
|
)
|
||||||
|
assert decision.should_store is True
|
||||||
|
assert decision.memory_type == "preference"
|
||||||
|
assert decision.importance == 0.9
|
||||||
|
assert decision.summary == "User prefers concise Russian answers."
|
||||||
|
mock_model_client.chat.assert_called_once()
|
||||||
|
call_args = mock_model_client.chat.call_args
|
||||||
|
# ModelClient.chat(role, messages, ...) — positional args
|
||||||
|
assert call_args.args[0] == "memory_policy"
|
||||||
|
messages = call_args.args[1]
|
||||||
|
assert len(messages) == 1
|
||||||
|
assert messages[0]["role"] == "user"
|
||||||
|
assert "User said they prefer short answers" in messages[0]["content"]
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_llm_memory_policy_handles_non_storable(mock_model_client):
|
||||||
|
mock_model_client.chat.return_value = ModelResponse(
|
||||||
|
role="critic",
|
||||||
|
model="local-main",
|
||||||
|
content=json.dumps({
|
||||||
|
"should_store": False,
|
||||||
|
"memory_type": "event",
|
||||||
|
"summary": "Routine tool call, nothing to remember.",
|
||||||
|
"importance": 0.1,
|
||||||
|
"scope": "workspace",
|
||||||
|
"metadata": {},
|
||||||
|
}),
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=30.0,
|
||||||
|
)
|
||||||
|
policy = MemoryPolicy(model_client=mock_model_client)
|
||||||
|
decision = await policy.classify("Ran ls -la in workspace.", "task_789")
|
||||||
|
assert decision.should_store is False
|
||||||
|
assert decision.importance == 0.1
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_llm_memory_policy_uses_response_format(mock_model_client):
|
||||||
|
policy = MemoryPolicy(model_client=mock_model_client)
|
||||||
|
await policy.classify("test summary", "task_1")
|
||||||
|
call_args = mock_model_client.chat.call_args
|
||||||
|
assert call_args.kwargs["response_format"]["type"] == "json_schema"
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_llm_memory_policy_invalid_json_falls_back(mock_model_client):
|
||||||
|
mock_model_client.chat.return_value = ModelResponse(
|
||||||
|
role="critic",
|
||||||
|
model="local-main",
|
||||||
|
content="not valid json {{{",
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=10.0,
|
||||||
|
)
|
||||||
|
policy = MemoryPolicy(model_client=mock_model_client)
|
||||||
|
decision = await policy.classify("some summary", "task_x")
|
||||||
|
assert decision.should_store is False
|
||||||
|
assert decision.metadata["source"] == "llm_policy_fallback"
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_llm_memory_policy_missing_fields_falls_back(mock_model_client):
|
||||||
|
mock_model_client.chat.return_value = ModelResponse(
|
||||||
|
role="critic",
|
||||||
|
model="local-main",
|
||||||
|
content=json.dumps({"should_store": True}),
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=10.0,
|
||||||
|
)
|
||||||
|
policy = MemoryPolicy(model_client=mock_model_client)
|
||||||
|
decision = await policy.classify("some summary", "task_y")
|
||||||
|
assert decision.should_store is False
|
||||||
|
assert decision.metadata["source"] == "llm_policy_fallback"
|
||||||
|
|
@ -2,8 +2,13 @@ from fastapi.testclient import TestClient
|
||||||
|
|
||||||
from duck_core.api import create_app
|
from duck_core.api import create_app
|
||||||
from duck_core.context_builder import ContextBuilder
|
from duck_core.context_builder import ContextBuilder
|
||||||
|
from duck_core.events.store import EventStore
|
||||||
|
from duck_core.memory.policy import MemoryDecision
|
||||||
from duck_core.memory.store import MemoryStore
|
from duck_core.memory.store import MemoryStore
|
||||||
|
from duck_core.model_client import ModelResponse
|
||||||
|
from duck_core.runtime_loop import RuntimeLoop
|
||||||
from duck_core.tasks.state import TaskState
|
from duck_core.tasks.state import TaskState
|
||||||
|
from duck_core.tasks.store import TaskStore
|
||||||
|
|
||||||
|
|
||||||
def test_memory_api_stores_workspace_scoped_notes(tmp_path, monkeypatch):
|
def test_memory_api_stores_workspace_scoped_notes(tmp_path, monkeypatch):
|
||||||
|
|
@ -135,7 +140,7 @@ def test_chat_api_injects_relevant_memory_into_model_context(tmp_path, monkeypat
|
||||||
raw={},
|
raw={},
|
||||||
latency_ms=1.0,
|
latency_ms=1.0,
|
||||||
)
|
)
|
||||||
seen_messages.append(messages)
|
seen_messages.append((role, messages))
|
||||||
return ModelResponse(
|
return ModelResponse(
|
||||||
role=role,
|
role=role,
|
||||||
model="local-main",
|
model="local-main",
|
||||||
|
|
@ -145,8 +150,6 @@ def test_chat_api_injects_relevant_memory_into_model_context(tmp_path, monkeypat
|
||||||
latency_ms=1.0,
|
latency_ms=1.0,
|
||||||
)
|
)
|
||||||
|
|
||||||
from duck_core.model_client import ModelResponse
|
|
||||||
|
|
||||||
monkeypatch.setattr("duck_core.model_client.ModelClient.chat", fake_chat)
|
monkeypatch.setattr("duck_core.model_client.ModelClient.chat", fake_chat)
|
||||||
client = TestClient(create_app())
|
client = TestClient(create_app())
|
||||||
conversation = client.post(
|
conversation = client.post(
|
||||||
|
|
@ -170,5 +173,60 @@ def test_chat_api_injects_relevant_memory_into_model_context(tmp_path, monkeypat
|
||||||
},
|
},
|
||||||
)
|
)
|
||||||
|
|
||||||
assert seen_messages[0][0]["role"] == "system"
|
thinker_messages = [messages for role, messages in seen_messages if role == "thinker"]
|
||||||
assert "User prefers direct Russian answers." in seen_messages[0][0]["content"]
|
assert thinker_messages[0][0]["role"] == "system"
|
||||||
|
assert "User prefers direct Russian answers." in thinker_messages[0][0]["content"]
|
||||||
|
|
||||||
|
|
||||||
|
async def test_runtime_memory_policy_stores_workspace_scoped_memory(tmp_path):
|
||||||
|
db_path = str(tmp_path / "duck.sqlite3")
|
||||||
|
task_store = TaskStore(db_path)
|
||||||
|
event_store = EventStore(db_path)
|
||||||
|
memory_store = MemoryStore(db_path)
|
||||||
|
await task_store.init()
|
||||||
|
await event_store.init()
|
||||||
|
await memory_store.init()
|
||||||
|
|
||||||
|
class FakeModelClient:
|
||||||
|
async def chat(self, role, messages, temperature=None, max_output_tokens=None, response_format=None):
|
||||||
|
if role == "action":
|
||||||
|
return ModelResponse(
|
||||||
|
role=role,
|
||||||
|
model="local-main",
|
||||||
|
content='{"kind":"action_directive","intent":"answer","risk_level":"none","actions":[]}',
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=1.0,
|
||||||
|
)
|
||||||
|
return ModelResponse(
|
||||||
|
role=role,
|
||||||
|
model="local-main",
|
||||||
|
content="Use Vulkan for this workspace.",
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=1.0,
|
||||||
|
)
|
||||||
|
|
||||||
|
class StorePolicy:
|
||||||
|
async def classify(self, summary: str, task_id: str) -> MemoryDecision:
|
||||||
|
return MemoryDecision(
|
||||||
|
should_store=True,
|
||||||
|
memory_type="fact",
|
||||||
|
summary="Workspace uses Vulkan.",
|
||||||
|
importance=0.8,
|
||||||
|
scope="workspace",
|
||||||
|
metadata={"task_id": task_id},
|
||||||
|
)
|
||||||
|
|
||||||
|
runtime = RuntimeLoop(
|
||||||
|
task_store,
|
||||||
|
event_store,
|
||||||
|
model_client=FakeModelClient(),
|
||||||
|
memory_policy=StorePolicy(),
|
||||||
|
memory_store=memory_store,
|
||||||
|
)
|
||||||
|
|
||||||
|
await runtime.run_chat("remember workspace fact", workspace="/tmp/duck", reflect=False)
|
||||||
|
|
||||||
|
relevant = await memory_store.relevant(workspace="/tmp/duck", query="vulkan")
|
||||||
|
assert [record.text for record in relevant] == ["Workspace uses Vulkan."]
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,227 @@
|
||||||
|
import json
|
||||||
|
from unittest.mock import AsyncMock
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from duck_core.events.store import EventStore
|
||||||
|
from duck_core.experience.recorder import ExperienceRecorder
|
||||||
|
from duck_core.memory.policy import MemoryPolicy
|
||||||
|
from duck_core.memory.store import MemoryStore
|
||||||
|
from duck_core.model_client import ModelClient, ModelResponse
|
||||||
|
from duck_core.runtime_loop import RuntimeLoop
|
||||||
|
from duck_core.tasks.store import TaskStore
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def task_store(tmp_path):
|
||||||
|
store = TaskStore(str(tmp_path / "duck.sqlite3"))
|
||||||
|
return store
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def event_store(tmp_path):
|
||||||
|
store = EventStore(str(tmp_path / "duck.sqlite3"))
|
||||||
|
return store
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def memory_store(tmp_path):
|
||||||
|
store = MemoryStore(str(tmp_path / "duck.sqlite3"))
|
||||||
|
return store
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def experience_recorder(tmp_path):
|
||||||
|
recorder = ExperienceRecorder(str(tmp_path / "duck.sqlite3"))
|
||||||
|
return recorder
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def mock_model_client():
|
||||||
|
client = AsyncMock(spec=ModelClient)
|
||||||
|
client.chat = AsyncMock(
|
||||||
|
side_effect=[
|
||||||
|
# First call: action role — return empty actions
|
||||||
|
ModelResponse(
|
||||||
|
role="action",
|
||||||
|
model="local-main",
|
||||||
|
content=json.dumps({
|
||||||
|
"kind": "action_directive",
|
||||||
|
"intent": "answer directly",
|
||||||
|
"risk_level": "none",
|
||||||
|
"actions": [],
|
||||||
|
}),
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=1.0,
|
||||||
|
),
|
||||||
|
# Second call: thinker role — final answer
|
||||||
|
ModelResponse(
|
||||||
|
role="thinker",
|
||||||
|
model="local-main",
|
||||||
|
content="DuckLM is a local cognitive runtime.",
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=1.0,
|
||||||
|
),
|
||||||
|
# Third call: memory_policy role
|
||||||
|
ModelResponse(
|
||||||
|
role="critic",
|
||||||
|
model="local-main",
|
||||||
|
content=json.dumps({
|
||||||
|
"should_store": False,
|
||||||
|
"memory_type": "event",
|
||||||
|
"summary": "Routine answer, nothing to remember.",
|
||||||
|
"importance": 0.1,
|
||||||
|
"scope": "workspace",
|
||||||
|
"metadata": {},
|
||||||
|
}),
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=1.0,
|
||||||
|
),
|
||||||
|
# Fourth call: critic role (reflection)
|
||||||
|
ModelResponse(
|
||||||
|
role="critic",
|
||||||
|
model="local-main",
|
||||||
|
content="Task completed successfully. No issues found. Reusable lesson: direct answers work well for simple queries.",
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=1.0,
|
||||||
|
),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
return client
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_reflection_is_called_after_task_completion(
|
||||||
|
task_store, event_store, memory_store, experience_recorder, mock_model_client
|
||||||
|
):
|
||||||
|
policy = MemoryPolicy(model_client=mock_model_client)
|
||||||
|
runtime = RuntimeLoop(
|
||||||
|
task_store=task_store,
|
||||||
|
event_store=event_store,
|
||||||
|
model_client=mock_model_client,
|
||||||
|
memory_policy=policy,
|
||||||
|
memory_store=memory_store,
|
||||||
|
experience_recorder=experience_recorder,
|
||||||
|
)
|
||||||
|
|
||||||
|
result = await runtime.run_chat("What is DuckLM?", workspace="/tmp/test")
|
||||||
|
|
||||||
|
assert result.status == "completed"
|
||||||
|
assert "DuckLM" in result.final_response
|
||||||
|
|
||||||
|
# Check that reflection was called — experience record created
|
||||||
|
records = await experience_recorder.list_records()
|
||||||
|
assert len(records) == 1
|
||||||
|
assert records[0].task_id == result.task_id
|
||||||
|
assert "completed successfully" in records[0].reusable_lesson
|
||||||
|
|
||||||
|
# Check that reflection_completed event was recorded
|
||||||
|
events = await event_store.list_events(result.task_id)
|
||||||
|
event_types = [e.event_type for e in events]
|
||||||
|
assert "reflection_completed" in event_types
|
||||||
|
assert records[0].task_id == result.task_id
|
||||||
|
assert "completed successfully" in records[0].reusable_lesson
|
||||||
|
|
||||||
|
# Check that reflection_completed event was recorded
|
||||||
|
events = await event_store.list_events(result.task_id)
|
||||||
|
event_types = [e.event_type for e in events]
|
||||||
|
assert "reflection_completed" in event_types
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_reflection_failure_does_not_break_task(
|
||||||
|
task_store, event_store, memory_store, experience_recorder
|
||||||
|
):
|
||||||
|
"""If reflection fails, the task should still complete successfully."""
|
||||||
|
client = AsyncMock(spec=ModelClient)
|
||||||
|
client.chat = AsyncMock(
|
||||||
|
side_effect=[
|
||||||
|
# Action: empty
|
||||||
|
ModelResponse(
|
||||||
|
role="action",
|
||||||
|
model="local-main",
|
||||||
|
content=json.dumps({
|
||||||
|
"kind": "action_directive",
|
||||||
|
"intent": "answer",
|
||||||
|
"risk_level": "none",
|
||||||
|
"actions": [],
|
||||||
|
}),
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=1.0,
|
||||||
|
),
|
||||||
|
# Thinker: answer
|
||||||
|
ModelResponse(
|
||||||
|
role="thinker",
|
||||||
|
model="local-main",
|
||||||
|
content="Answer.",
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=1.0,
|
||||||
|
),
|
||||||
|
# Memory policy
|
||||||
|
ModelResponse(
|
||||||
|
role="critic",
|
||||||
|
model="local-main",
|
||||||
|
content=json.dumps({
|
||||||
|
"should_store": False,
|
||||||
|
"memory_type": "event",
|
||||||
|
"summary": "Routine.",
|
||||||
|
"importance": 0.1,
|
||||||
|
"scope": "workspace",
|
||||||
|
"metadata": {},
|
||||||
|
}),
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=1.0,
|
||||||
|
),
|
||||||
|
# Critic (reflection) — raises exception
|
||||||
|
ConnectionError("LLM unavailable"),
|
||||||
|
]
|
||||||
|
)
|
||||||
|
|
||||||
|
policy = MemoryPolicy(model_client=client)
|
||||||
|
runtime = RuntimeLoop(
|
||||||
|
task_store=task_store,
|
||||||
|
event_store=event_store,
|
||||||
|
model_client=client,
|
||||||
|
memory_policy=policy,
|
||||||
|
memory_store=memory_store,
|
||||||
|
experience_recorder=experience_recorder,
|
||||||
|
)
|
||||||
|
|
||||||
|
result = await runtime.run_chat("test", workspace="/tmp/test")
|
||||||
|
|
||||||
|
# Task should still complete
|
||||||
|
assert result.status == "completed"
|
||||||
|
|
||||||
|
# Reflection failure event should be recorded
|
||||||
|
events = await event_store.list_events(result.task_id)
|
||||||
|
event_types = [e.event_type for e in events]
|
||||||
|
assert "reflection_failed" in event_types
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_reflection_not_called_when_disabled(
|
||||||
|
task_store, event_store, memory_store, mock_model_client
|
||||||
|
):
|
||||||
|
"""When reflect=False, no reflection should be called."""
|
||||||
|
policy = MemoryPolicy(model_client=mock_model_client)
|
||||||
|
runtime = RuntimeLoop(
|
||||||
|
task_store=task_store,
|
||||||
|
event_store=event_store,
|
||||||
|
model_client=mock_model_client,
|
||||||
|
memory_policy=policy,
|
||||||
|
memory_store=memory_store,
|
||||||
|
)
|
||||||
|
|
||||||
|
result = await runtime.run_chat("What is DuckLM?", workspace="/tmp/test", reflect=False)
|
||||||
|
|
||||||
|
assert result.status == "completed"
|
||||||
|
# mock_model_client.chat should have been called 3 times (action, thinker, memory_policy)
|
||||||
|
# NOT 4 times (no critic/reflection call)
|
||||||
|
assert mock_model_client.chat.call_count == 3
|
||||||
|
|
@ -5,7 +5,11 @@ from duck_core.memory.vector_memory import EmbeddingsUnavailableError, VectorMem
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
@pytest.mark.asyncio
|
||||||
async def test_vector_memory_stub_is_explicit_when_embeddings_unavailable():
|
async def test_vector_memory_stub_is_explicit_when_embeddings_unavailable():
|
||||||
memory = VectorMemory(qdrant_url="http://127.0.0.1:6333", embeddings_base_url=None)
|
memory = VectorMemory(
|
||||||
|
qdrant_url="http://127.0.0.1:6333",
|
||||||
|
embeddings_base_url=None,
|
||||||
|
local_embedding_model=None,
|
||||||
|
)
|
||||||
|
|
||||||
with pytest.raises(EmbeddingsUnavailableError):
|
with pytest.raises(EmbeddingsUnavailableError):
|
||||||
await memory.add_memory("remember this")
|
await memory.add_memory("remember this")
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,97 @@
|
||||||
|
from unittest.mock import AsyncMock, MagicMock, patch
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from duck_core.memory.vector_memory import VectorMemory, EmbeddingsUnavailableError
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_vector_memory_uses_local_model():
|
||||||
|
"""Test VectorMemory with local sentence-transformers model (mocked)."""
|
||||||
|
vm = VectorMemory(
|
||||||
|
qdrant_url="http://localhost:6333",
|
||||||
|
local_embedding_model="./models/all-MiniLM-L6-v2",
|
||||||
|
)
|
||||||
|
|
||||||
|
# Mock the sentence-transformers model — encode returns a numpy-like list
|
||||||
|
mock_model = MagicMock()
|
||||||
|
mock_model.encode.return_value = [0.1] * 384 # all-MiniLM-L6-v2 produces 384-dim vectors
|
||||||
|
|
||||||
|
with patch.object(vm, "_load_local_model", return_value=mock_model):
|
||||||
|
with patch("httpx.AsyncClient") as mock_client_class:
|
||||||
|
mock_client = AsyncMock()
|
||||||
|
mock_client_class.return_value.__aenter__ = AsyncMock(return_value=mock_client)
|
||||||
|
mock_client_class.return_value.__aexit__ = AsyncMock(return_value=False)
|
||||||
|
put_response = MagicMock(status_code=200)
|
||||||
|
put_response.raise_for_status = MagicMock()
|
||||||
|
search_response = MagicMock(status_code=200)
|
||||||
|
search_response.raise_for_status = MagicMock()
|
||||||
|
search_response.json.return_value = {
|
||||||
|
"result": [{"id": "test-id", "payload": {"text": "test"}}]
|
||||||
|
}
|
||||||
|
mock_client.put.return_value = put_response
|
||||||
|
mock_client.post.return_value = search_response
|
||||||
|
|
||||||
|
point_id = await vm.add_memory("test memory", {"scope": "global"})
|
||||||
|
assert point_id is not None
|
||||||
|
|
||||||
|
results = await vm.search_memory("test query")
|
||||||
|
assert isinstance(results, list)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_vector_memory_no_embedding_source():
|
||||||
|
"""VectorMemory with no embedding source should raise."""
|
||||||
|
vm = VectorMemory(
|
||||||
|
qdrant_url="http://localhost:6333",
|
||||||
|
local_embedding_model=None,
|
||||||
|
embeddings_base_url=None,
|
||||||
|
)
|
||||||
|
|
||||||
|
with pytest.raises(EmbeddingsUnavailableError):
|
||||||
|
await vm.add_memory("test")
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_vector_memory_remote_fallback():
|
||||||
|
"""Test VectorMemory with remote embeddings endpoint."""
|
||||||
|
vm = VectorMemory(
|
||||||
|
qdrant_url="http://localhost:6333",
|
||||||
|
local_embedding_model=None,
|
||||||
|
embeddings_base_url="http://localhost:8081/v1",
|
||||||
|
)
|
||||||
|
|
||||||
|
mock_embedding = [0.1] * 384
|
||||||
|
|
||||||
|
with patch("httpx.AsyncClient") as mock_client_class:
|
||||||
|
mock_client = AsyncMock()
|
||||||
|
mock_client_class.return_value.__aenter__ = AsyncMock(return_value=mock_client)
|
||||||
|
mock_client_class.return_value.__aexit__ = AsyncMock(return_value=False)
|
||||||
|
embed_response = MagicMock(status_code=200)
|
||||||
|
embed_response.json.return_value = {"data": [{"embedding": mock_embedding}]}
|
||||||
|
put_response = MagicMock(status_code=200)
|
||||||
|
put_response.raise_for_status = MagicMock()
|
||||||
|
mock_client.post.return_value = embed_response
|
||||||
|
mock_client.put.return_value = put_response
|
||||||
|
|
||||||
|
point_id = await vm.add_memory("test")
|
||||||
|
assert point_id is not None
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_vector_memory_remote_503():
|
||||||
|
"""Remote embeddings returning 503 should raise EmbeddingsUnavailableError."""
|
||||||
|
vm = VectorMemory(
|
||||||
|
qdrant_url="http://localhost:6333",
|
||||||
|
local_embedding_model=None,
|
||||||
|
embeddings_base_url="http://localhost:8081/v1",
|
||||||
|
)
|
||||||
|
|
||||||
|
with patch("httpx.AsyncClient") as mock_client_class:
|
||||||
|
mock_client = AsyncMock()
|
||||||
|
mock_client_class.return_value.__aenter__ = AsyncMock(return_value=mock_client)
|
||||||
|
mock_client_class.return_value.__aexit__ = AsyncMock(return_value=False)
|
||||||
|
mock_client.post.return_value = AsyncMock(status_code=503)
|
||||||
|
|
||||||
|
with pytest.raises(EmbeddingsUnavailableError):
|
||||||
|
await vm.add_memory("test")
|
||||||
Loading…
Reference in New Issue