Integrate memory policy and reflection runtime
This commit is contained in:
parent
9a8f058008
commit
6f58df82c9
|
|
@ -0,0 +1,462 @@
|
|||
# DuckLM — Текущее состояние проекта
|
||||
|
||||
**Дата анализа:** 2026-05-21
|
||||
**Версия:** 0.2.0
|
||||
**Расположение:** `~/git/ducklm_2`
|
||||
|
||||
---
|
||||
|
||||
## Последние изменения (Phase 1-7)
|
||||
|
||||
### Phase 1: MemoryPolicy — LLM-классификация памяти ✅
|
||||
- `duck_core/memory/policy.py` — переписан с нуля: LLM-классификация через critic-роль
|
||||
- Роль `memory_policy` добавлена в `config/models.yaml` + промпт
|
||||
- Интегрирован в RuntimeLoop: `_run_memory_policy()` после каждой задачи
|
||||
- События: `memory_policy_decision`, `memory_stored`, `memory_policy_failed`
|
||||
- 6 тестов в `tests/smoke/test_memory_policy.py`
|
||||
|
||||
### Phase 2: Рефлексия (Critic) — автоматический вызов ✅
|
||||
- `_run_reflection()` в RuntimeLoop — transcript из event store → critic → experience
|
||||
- Параметр `reflect: bool = True` в `run_chat()`
|
||||
- `ExperienceRecorder` передаётся через `create_app()`
|
||||
- События: `reflection_completed`, `reflection_failed`
|
||||
- 3 теста в `tests/smoke/test_reflection.py`
|
||||
|
||||
### Phase 3-4: ContextBuilder v2 + Summary-роль ✅
|
||||
- Полностью переписан `duck_core/context_builder.py`
|
||||
- Token budget awareness, приоритизация, суммаризация через summary-роль
|
||||
- `estimate_tokens()`, `estimate_messages_tokens()` утилиты
|
||||
- Подключён model_client для LLM-суммаризации
|
||||
- 11 тестов в `tests/smoke/test_context_builder.py`
|
||||
|
||||
### Phase 5: VectorMemory — интеграция ✅
|
||||
- `VectorMemory` добавлен в RuntimeLoop и `create_app()`
|
||||
- **Локальная модель эмбеддингов**: `all-MiniLM-L6-v2` (384 размерности, sentence-transformers)
|
||||
- При `memory_stored` также сохраняется в Qdrant (graceful fallback при ошибках)
|
||||
- Поддержка двух режимов: локальная модель + remote `/v1/embeddings` endpoint
|
||||
- `sentence-transformers` добавлен в зависимости `pyproject.toml`
|
||||
- 4 теста в `tests/smoke/test_vector_memory_integration.py`
|
||||
|
||||
### Embeddings — архитектура
|
||||
|
||||
### Phase 6: Recall-роль ✅
|
||||
- Роль `recall` добавлена в `config/models.yaml` + промпт `prompts/roles/recall.md`
|
||||
- `ContextBuilder.recall_relevant_memory()` — LLM-фильтрация релевантных воспоминаний
|
||||
- Интегрирован в `/v1/chat` endpoint
|
||||
|
||||
### Phase 7: Coder-роль — интеграция ✅
|
||||
- `CoderTool` создан в `duck_core/tools/coder.py`
|
||||
- Зарегистрирован в `ToolGateway.default()`
|
||||
- Описан в `prompts/roles/action.md`
|
||||
|
||||
### Статус тестов
|
||||
- **72 из 73** smoke-тестов проходят
|
||||
- 1 тест (`test_llama_server_connection_live_skip_by_env`) требует живой llama-server
|
||||
|
||||
---
|
||||
|
||||
## 1. Что такое DuckLM
|
||||
|
||||
DuckLM — это **локальная агентная система (cognitive runtime)**, работающая поверх локальных языковых моделей через `llama-server`. Это не inference-сервер, а полноценный когнитивный цикл:
|
||||
|
||||
```
|
||||
состояние → контекст → мышление → намерение → действие → наблюдение → рефлексия → память → опыт
|
||||
```
|
||||
|
||||
Ключевая идея: DuckLM — это **оркестратор**, который управляет задачами, инструментами, памятью, навыками и рефлексией, используя локальные LLM через OpenAI-совместимый API (`llama-server`).
|
||||
|
||||
---
|
||||
|
||||
## 2. Архитектурные принципы
|
||||
|
||||
### 2.1. Использование готовых компонентов
|
||||
|
||||
| Компонент | Источник |
|
||||
|-----------|----------|
|
||||
| LLM inference | `llama-server` (llama.cpp, собранный с Vulkan) |
|
||||
| Хранение состояния | SQLite (aiosqlite) |
|
||||
| Векторная память | Qdrant (через docker-compose) |
|
||||
| HTTP API | FastAPI |
|
||||
| Web-интерфейс | Jinja2 + ванильный JS |
|
||||
| Валидация данных | Pydantic |
|
||||
| Конфигурация | PyYAML + python-dotenv |
|
||||
|
||||
**Не пишется с нуля:** inference server, model scheduler, vector DB, OpenAI API, MCP, песочница, workflow engine.
|
||||
|
||||
**Пишется с нуля:** Duck Core (runtime loop, context builder, model client, event store, tool gateway, approvals, skills, experience, memory policy, FastAPI API, WebChat).
|
||||
|
||||
### 2.2. Web/API first
|
||||
|
||||
- **WebChat** — интерфейс для человека (порт 8000)
|
||||
- **HTTP API** — для кодера, тестов и внешних агентов
|
||||
- CLI не входит в обязательную часть (если понадобится — тонкий клиент поверх HTTP API)
|
||||
|
||||
### 2.3. Роли моделей — логические, не физические
|
||||
|
||||
Роли: `thinker`, `critic`, `coder`, `action`, `summary`, `recall`, `sys_util`.
|
||||
|
||||
Все роли в текущей конфигурации указывают на одну физическую модель (`local-main` на порту 8081). Различие между ролями задаётся комбинацией:
|
||||
- system prompt
|
||||
- temperature
|
||||
- max_output_tokens
|
||||
- response_format / structured_output
|
||||
- memory scope
|
||||
- tool permissions
|
||||
|
||||
### 2.4. Token budget
|
||||
|
||||
```
|
||||
DUCK_CTX_SIZE=4096 (в .env, хотя в коде дефолт 65536)
|
||||
DUCK_MAX_INPUT_TOKENS=49152
|
||||
DUCK_MAX_RECENT_EVENTS_TOKENS=12000
|
||||
DUCK_MAX_MEMORY_TOKENS=8000
|
||||
DUCK_MAX_SKILL_TOKENS=6000
|
||||
```
|
||||
|
||||
Output limits по ролям:
|
||||
- thinker: 8192
|
||||
- critic: 4096
|
||||
- coder: 16384
|
||||
- action: 2048
|
||||
- summary: 4096
|
||||
|
||||
---
|
||||
|
||||
## 3. Целевая архитектура (из ТЗ)
|
||||
|
||||
```
|
||||
┌─────────────┐
|
||||
│ WebChat │ ← интерфейс человека
|
||||
└──────┬──────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────┐
|
||||
│ FastAPI │ ← интерфейс кодера, тестов и агентов
|
||||
└──────┬──────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────┐
|
||||
│ Duck Core │
|
||||
│ RuntimeLoop, TaskState, │
|
||||
│ ContextBuilder, ModelClient, │
|
||||
│ SkillRegistry, ToolGateway, │
|
||||
│ ApprovalService, Reflection, │
|
||||
│ MemoryPolicy, ExperienceRecorder │
|
||||
└──────┬──────────────┬───────────────┘
|
||||
│ │
|
||||
▼ ▼
|
||||
┌────────────┐ ┌──────────────────┐
|
||||
│llama-server│ │ SQLite/PostgreSQL│
|
||||
│OpenAI-comp.│ │ events/tasks/ │
|
||||
└────────────┘ │ approvals │
|
||||
│ └──────────────────┘
|
||||
▼
|
||||
┌────────────┐
|
||||
│ Qdrant │ ← semantic memory
|
||||
└────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Что реализовано (текущее состояние)
|
||||
|
||||
### 4.1. Полностью реализованные компоненты
|
||||
|
||||
#### Конфигурация и настройки
|
||||
- **`duck_core/config.py`** — `Settings` dataclass, загрузка из `.env`, кэширование через `lru_cache`
|
||||
- **`config/models.yaml`** — 5 ролей (thinker, critic, coder, action, summary), все на `local-main` (порт 8081)
|
||||
- **`.env`** / **`.env.example`** — полная конфигурация путей, портов, GPU, Qdrant
|
||||
|
||||
#### ModelClient
|
||||
- **`duck_core/model_client.py`** — ролевая маршрутизация вызовов к llama-server
|
||||
- `chat()` — синхронный вызов с измерением latency, usage
|
||||
- `stream_chat()` — streaming через SSE (reasoning_delta + content_delta)
|
||||
- `ping()` — проверка доступности всех ролей
|
||||
- Автоматическая подстановка system prompt из файла
|
||||
- Автоматический `response_format: json_schema` для action-роли
|
||||
|
||||
#### Хранение состояния (SQLite)
|
||||
- **`duck_core/tasks/state.py`** — `TaskState` (Pydantic модель)
|
||||
- **`duck_core/tasks/store.py`** — `TaskStore`: create, update_status, complete, fail, cancel, waiting_for_approval, get, list
|
||||
- **`duck_core/events/store.py`** — `EventStore`: append (с авто-increment sequence), list_events, list_by_type
|
||||
- **`duck_core/conversations/store.py`** — `ConversationStore`: create, ensure, get, list, add_message, list_messages, get_conversation_id_for_task
|
||||
- **`duck_core/approvals/service.py`** — `ApprovalService`: create_pending, pending, get, allow_once, allow_forever, deny, is_allowed_forever
|
||||
- **`duck_core/experience/recorder.py`** — `ExperienceRecorder`: record, list_records, get_record, write_skill_update_proposal
|
||||
- **`duck_core/memory/store.py`** — `MemoryStore`: add, list, search (LIKE), relevant (scope-aware), infer_scope, _normalize_scope
|
||||
|
||||
#### Runtime Loop
|
||||
- **`duck_core/runtime_loop.py`** — ядро когнитивного цикла:
|
||||
- `run_chat()` — полный цикл: создание задачи → action loop → thinker → завершение
|
||||
- `continue_after_approval()` — продолжение после одобрения действия
|
||||
- `_run_action_loop()` — итеративный цикл вызова инструментов (max 4 итерации)
|
||||
- `_run_action_tools()` — парсинг action directive от модели → вызов ToolGateway
|
||||
- `_append_command_audit()` — аудит shell-команд через event store
|
||||
- Обработка requires_approval → пауза с ожиданием решения пользователя
|
||||
|
||||
#### Context Builder
|
||||
- **`duck_core/context_builder.py`** — `ContextBuilder.build_basic_messages()`: собирает сообщения из memory records, history и текущего user message
|
||||
|
||||
#### Tools
|
||||
- **`duck_core/tools/base.py`** — `ToolResult` (Pydantic), `Tool` (Protocol)
|
||||
- **`duck_core/tools/gateway.py`** — `ToolGateway`: маршрутизация action → конкретный инструмент
|
||||
- **`duck_core/tools/file_read.py`** — `FileReadTool`: чтение файлов внутри workspace, запрет .env/.ssh/shadow
|
||||
- **`duck_core/tools/file_write.py`** — `FileWriteTool`: запись внутри workspace, защита от перезаписи
|
||||
- **`duck_core/tools/list_dir.py`** — `ListDirTool`: листинг директории внутри workspace
|
||||
- **`duck_core/tools/search_files.py`** — `SearchFilesTool`: текстовый поиск по файлам (glob, case_sensitive)
|
||||
- **`duck_core/tools/shell_exec_safe.py`** — `ShellExecSafeTool`: allowlist + blocklist + approval
|
||||
- **`duck_core/tools/command_policy.py`** — `CommandPolicy`: классификация команд (readonly, system, destructive, dangerous fragments)
|
||||
- **`duck_core/tools/paths.py`** — `resolve_workspace_path()`: защита от path traversal
|
||||
|
||||
#### Approvals
|
||||
- **`duck_core/approvals/service.py`** — полный цикл согласований:
|
||||
- Создание pending approval с SHA256-хешем действия
|
||||
- Решения: allow_once, allow_forever, deny
|
||||
- Проверка is_allowed_forever по хешу действия
|
||||
- normalized_action хранится в JSON
|
||||
|
||||
#### Skills
|
||||
- **`duck_core/skills/registry.py`** — `SkillRegistry`: загрузка из `*/skill.yaml`, парсинг procedure/examples/notes, поиск по ключевым словам
|
||||
- **`skills/analyze_project/`** — единственный скилл: анализ структуры проекта
|
||||
|
||||
#### Experience & Reflection
|
||||
- **`duck_core/experience/recorder.py`** — запись результатов задач, предложения по обновлению скиллов
|
||||
- **`duck_core/reflection.py`** — `Reflection.reflect()`: вызов critic-роли для анализа транскрипта задачи
|
||||
|
||||
#### Memory
|
||||
- **`duck_core/memory/store.py`** — `MemoryStore`: хранение в SQLite с поддержкой scope (global/workspace/conversation), importance, полнотекстовый поиск (LIKE)
|
||||
- **`duck_core/memory/policy.py`** — `MemoryPolicy`: заглушка (всегда should_store=False)
|
||||
- **`duck_core/memory/vector_memory.py`** — `VectorMemory`: интеграция с Qdrant для семантического поиска (требует embeddings endpoint)
|
||||
|
||||
#### FastAPI API
|
||||
- **`duck_core/api.py`** — полный HTTP API (878 строк):
|
||||
- `POST /v1/chat` — основной чат с сохранением в conversation
|
||||
- `POST /v1/chat/stream` — streaming чат через SSE
|
||||
- `POST /v1/tasks/{task_id}/continue/stream` — продолжение после одобрения
|
||||
- `POST /v1/tasks/{task_id}/password/stream` — ввод sudo-пароля
|
||||
- `GET/POST /v1/conversations` — управление диалогами
|
||||
- `GET /v1/tasks`, `GET /v1/tasks/{task_id}/events` — инспекция задач
|
||||
- `GET /v1/approvals/pending`, `POST /v1/approvals/{id}/allow_once|allow_forever|deny`
|
||||
- `GET /v1/skills`, `GET /v1/skills/{skill_id}`
|
||||
- `GET /v1/experience`
|
||||
- `POST /v1/memory`, `GET /v1/memory`, `GET /v1/memory/search`
|
||||
- `GET /v1/models/roles`, `GET /v1/models/ping`
|
||||
- `GET /health`, `GET /v1/status`
|
||||
- Веб-страницы: `/`, `/approvals`, `/skills`, `/memory`, `/experience`
|
||||
|
||||
#### WebChat UI
|
||||
- **`duck_core/web/templates/index.html`** — полноценный WebChat с sidebar, conversation list, activity drawer
|
||||
- **`duck_core/web/static/app.js`** (997 строк) — клиентская логика:
|
||||
- SSE streaming с парсингом reasoning_delta, content_delta, tool_call_started/finished, tool_approval_requested, tool_password_requested
|
||||
- Инлайн-терминалы для отображения вызовов инструментов
|
||||
- Инлайн-кнопки одобрения/запрета действий
|
||||
- Форма ввода sudo-пароля
|
||||
- Activity drawer с вкладками Events/Commands/Memory
|
||||
- Управление диалогами (create, select, load history)
|
||||
- Enter для отправки, Shift+Enter для новой строки
|
||||
- **`duck_core/web/static/style.css`** (1002 строки) — светлая тема, responsive layout
|
||||
|
||||
#### Скрипты
|
||||
- **`scripts/llama/start_main.sh`** — управление llama-server (start/stop/restart/status/logs)
|
||||
- **`scripts/llama/start_thinker_mtp_experimental.sh`** — экспериментальный MTP endpoint
|
||||
- **`scripts/llama/build_vulkan.sh`** — сборка llama.cpp с Vulkan
|
||||
- **`scripts/llama/healthcheck.sh`** — проверка здоровья llama-server
|
||||
- **`scripts/verify/`** — 7 верификационных скриптов (basic_chat, file_write_read, tool_blocking, models_roles, skills, experience, memory)
|
||||
- **`scripts/bench/bench_runtime.py`** — бенчмарк
|
||||
|
||||
#### Тесты
|
||||
- 18 smoke-тестов в `tests/smoke/`:
|
||||
- `test_models_config.py`, `test_model_client.py`, `test_api_health.py`
|
||||
- `test_event_log.py`, `test_action_directive_schema.py`
|
||||
- `test_tool_gateway.py`, `test_approvals.py`
|
||||
- `test_skill_registry.py`, `test_experience_recorder.py`
|
||||
- `test_vector_memory.py`, `test_memory_store.py`
|
||||
- `test_chat_api.py`, `test_conversations.py`
|
||||
- `test_runtime_reasoning.py`, `test_runtime_tools.py`
|
||||
- `test_llama_server_connection.py`, `test_llama_service_script.py`
|
||||
|
||||
#### Документация
|
||||
- 12 файлов в `docs/`:
|
||||
- `architecture.md`, `how_to_run.md`, `how_to_test.md`
|
||||
- `web_api.md`, `model_roles.md`, `tool_gateway.md`
|
||||
- `memory_architecture.md`, `experience_learning.md`, `skills.md`
|
||||
- `local_llama_server.md`, `performance_mtp.md`
|
||||
- `superpowers/plans/2026-05-19-ducklm-runtime.md` — план реализации
|
||||
|
||||
#### Docker
|
||||
- **`docker-compose.memory.yml`** — Qdrant (порты 6333/6334)
|
||||
|
||||
#### Сборка и запуск
|
||||
- **`pyproject.toml`** — зависимости: fastapi, uvicorn, httpx, pydantic, pyyaml, jinja2, python-dotenv, jsonschema, aiosqlite, qdrant-client
|
||||
- **`Makefile`** — цели: duck-up, duck-llama-main, duck-api, duck-dev, duck-smoke, duck-test, duck-verify
|
||||
- **`data/duck.sqlite3`** — рабочая БД SQLite
|
||||
- **`workspace/`** — рабочая директория для инструментов
|
||||
|
||||
---
|
||||
|
||||
## 5. План разработки (из `docs/superpowers/plans/2026-05-19-ducklm-runtime.md`)
|
||||
|
||||
План состоит из 4 задач:
|
||||
|
||||
### Task 1: Tests First
|
||||
- Написать smoke tests для всех компонентов
|
||||
- ✅ **Выполнено** — 18 тестов созданы
|
||||
|
||||
### Task 2: Runtime Core
|
||||
- pyproject.toml, .env.example, config/models.yaml
|
||||
- config.py, model_client.py, events/store.py, tasks/store.py, tasks/state.py
|
||||
- context_builder.py, runtime_loop.py, api.py
|
||||
- ✅ **Выполнено** — все компоненты реализованы
|
||||
|
||||
### Task 3: Stage Adapters
|
||||
- tools/*, approvals/service.py, skills/registry.py
|
||||
- experience/recorder.py, reflection.py, memory/*
|
||||
- schemas/action_directive.schema.json
|
||||
- ✅ **Выполнено** — все компоненты реализованы
|
||||
|
||||
### Task 4: Project Surface
|
||||
- scripts/llama/*, scripts/verify/*, scripts/bench/*
|
||||
- web/templates/*, web/static/*
|
||||
- skills/analyze_project/*
|
||||
- docker-compose.memory.yml, Makefile, README.md, docs/*
|
||||
- ✅ **Выполнено** — все компоненты реализованы
|
||||
|
||||
---
|
||||
|
||||
## 6. Отступления от плана / дополнительные возможности
|
||||
|
||||
### 6.1. Что добавлено сверх плана
|
||||
|
||||
1. **ConversationStore** — полноценное управление диалогами (conversations + conversation_messages таблицы), чего не было явно в плане Task 2. План упоминал только tasks и events.
|
||||
|
||||
2. **Streaming API** — `POST /v1/chat/stream` с SSE, `POST /v1/tasks/{id}/continue/stream`, `POST /v1/tasks/{id}/password/stream`. В плане не было явно указано streaming.
|
||||
|
||||
3. **Password flow** — полный цикл запроса sudo-пароля: `requires_password` → `tool_password_requested` → `/v1/tasks/{id}/password/stream`. В плане не было детализировано.
|
||||
|
||||
4. **Activity Drawer в WebChat** — боковая панель с вкладками Events/Commands/Memory, инлайн-терминалы для инструментов, инлайн-одобрения. Значительно больше, чем «пустая WebChat-страница» из этапа 1 ТЗ.
|
||||
|
||||
5. **Command Audit** — отдельный тип события `command_audit` для shell-команд с полной метаданной (action_type, risk_level, blocked, approved, returncode).
|
||||
|
||||
6. **Scope-aware Memory** — трёхуровневая система скоупов (global/workspace/conversation) с автоматическим infer_scope.
|
||||
|
||||
7. **Skill update proposals** — автоматическая запредложений по обновлению скиллов в `skills/_proposals/`.
|
||||
|
||||
8. **18 тестов вместо 11 запланированных** — добавлены: `test_chat_api`, `test_conversations`, `test_runtime_reasoning`, `test_runtime_tools`, `test_llama_server_connection`, `test_llama_service_script`, `test_memory_store`.
|
||||
|
||||
### 6.2. Что не реализовано (или реализовано частично)
|
||||
|
||||
1. **MemoryPolicy — заглушка.** `MemoryPolicy.classify()` всегда возвращает `should_store=False`. Нет LLM-классификации для автоматического сохранения памяти.
|
||||
|
||||
2. **ContextBuilder — минимальный.** Нет суммаризации старых events, нет обрезки по token budget, нет приоритизации контекста. Просто склеивает memory + history + user message.
|
||||
|
||||
3. **Critic не вызывается автоматически.** `Reflection.reflect()` есть, но не интегрирован в RuntimeLoop — нет автоматической рефлексии после завершения задачи.
|
||||
|
||||
4. **Summary роль не используется.** Нет автоматической суммаризации контекста при превышении budget.
|
||||
|
||||
5. **Coder роль не используется в основном потоке.** RuntimeLoop вызывает только action и thinker.
|
||||
|
||||
6. **Recall роль не определена в конфиге.** В ТЗ упоминается recall, но в `config/models.yaml` её нет.
|
||||
|
||||
7. **Sys_util роль не определена в конфиге.** Аналогично.
|
||||
|
||||
8. **VectorMemory не интегрирован в RuntimeLoop.** Qdrant-поиск не подключён к основному циклу (MemoryStore использует LIKE-поиск, а не векторный).
|
||||
|
||||
9. **WebChat — светлая тема.** В памяти пользователя указано предпочтение тёмной темы, но CSS реализован светлый (`color-scheme: light`, белый фон).
|
||||
|
||||
10. **Нет CLI.** Упомянуто в ТЗ как необязательное, но если понадобится — нужно делать.
|
||||
|
||||
11. **Нет автоматического применения skill patches.** Предложения пишутся в `skills/_proposals/`, но не применяются автоматически.
|
||||
|
||||
### 6.3. Технические заметки
|
||||
|
||||
- **Модель:** Qwen3.6 35B A3B, два варианта — nonMTP (основной, порт 8081) и MTP (экспериментальный, порт 8085)
|
||||
- **GPU:** Radeon RX580, Vulkan backend, 20 GPU layers
|
||||
- **llama-server бинарник:** `./vendor/llama.cpp/build/bin/llama-server`
|
||||
- **ctx_size в .env:** 4096 (хотя в коде Settings дефолт 65536)
|
||||
- **reasoning-budget:** 512 в .env, `--reasoning-budget 512 --cache-ram 0`
|
||||
- **Python:** 3.13 (по путям `__pycache__`)
|
||||
|
||||
---
|
||||
|
||||
## 7. Структура базы данных (SQLite)
|
||||
|
||||
Таблицы:
|
||||
- **tasks** — задачи (task_id, status, user_message, workspace, debug, final_response, created_at, updated_at)
|
||||
- **events** — события (id, task_id, sequence, event_type, payload_json, created_at)
|
||||
- **conversations** — диалоги (id, conversation_id, title, workspace, created_at, updated_at)
|
||||
- **conversation_messages** — сообщения диалогов (id, conversation_id, role, content, reasoning_content, task_id, status, created_at)
|
||||
- **approvals** — согласования (id, approval_id, task_id, action_hash, normalized_action_json, status, decision, created_at, updated_at)
|
||||
- **experience_records** — записи опыта (id, task_id, skill_id, summary, result, what_worked_json, what_failed_json, reusable_lesson, suggested_skill_patch, confidence, created_at)
|
||||
- **memories** — память (id, memory_id, text, scope, workspace, conversation_id, memory_type, importance, metadata_json, created_at, updated_at)
|
||||
|
||||
---
|
||||
|
||||
## 8. Когнитивный цикл (как работает RuntimeLoop)
|
||||
|
||||
1. Пользователь отправляет сообщение → `POST /v1/chat` или `/v1/chat/stream`
|
||||
2. Создаётся Task + событие `task_created`
|
||||
3. ContextBuilder собирает сообщения (memory + history + user message)
|
||||
4. **Action loop** (до 4 итераций):
|
||||
- Модель `action` генерирует JSON directive (schema: action_directive.schema.json)
|
||||
- ToolGateway выполняет каждый action через соответствующий инструмент
|
||||
- Если команда требует approval → пауза, создание Approval, ожидание решения
|
||||
- Если sudo → запрос пароля
|
||||
- Результаты собираются как tool_observations
|
||||
5. Thinker получает все tool_observations и формирует финальный ответ
|
||||
6. Задача завершена → `task_completed`
|
||||
7. (Опционально) Reflection через critic — **не автоматизировано**
|
||||
|
||||
---
|
||||
|
||||
## 9. Статус готовности
|
||||
|
||||
| Компонент | Статус |
|
||||
|-----------|--------|
|
||||
| Конфигурация | ✅ Готово |
|
||||
| ModelClient | ✅ Готово |
|
||||
| TaskStore / EventStore | ✅ Готово |
|
||||
| ConversationStore | ✅ Готово |
|
||||
| RuntimeLoop | ✅ Готово |
|
||||
| ContextBuilder | ⚠️ Минимальный |
|
||||
| ToolGateway + Tools | ✅ Готово |
|
||||
| ApprovalService | ✅ Готово |
|
||||
| SkillRegistry | ✅ Готово |
|
||||
| ExperienceRecorder | ✅ Готово |
|
||||
| Reflection | ⚠️ Не интегрирован в loop |
|
||||
| MemoryStore (SQLite) | ✅ Готово |
|
||||
| MemoryPolicy | ✅ LLM-based (Phase 1) |
|
||||
| VectorMemory (Qdrant) | ✅ Интегрирован (Phase 5) |
|
||||
| FastAPI API | ✅ Готово |
|
||||
| WebChat UI | ✅ Готово (светлая тема) |
|
||||
| Streaming | ✅ Готово |
|
||||
| Password flow | ✅ Готово |
|
||||
| Smoke tests | ✅ 74 теста |
|
||||
| Docs | ✅ 12 файлов |
|
||||
| Scripts | ✅ Готово |
|
||||
|
||||
---
|
||||
|
||||
## Архитектура эмбеддингов
|
||||
|
||||
### Локальная модель (основной режим)
|
||||
- **Модель**: `all-MiniLM-L6-v2` (sentence-transformers, 384 размерности)
|
||||
- **Расположение**: `./models/all-MiniLM-L6-v2/` (safetensors формат)
|
||||
- **Библиотека**: `sentence-transformers` (добавлен в pyproject.toml)
|
||||
- **Использование**: `VectorMemory._local_embed()` — загрузка модели через `SentenceTransformer`, кодирование в thread pool
|
||||
|
||||
### Remote endpoint (fallback)
|
||||
- **Endpoint**: `/v1/embeddings` на llama-server или OpenAI-совместимом сервере
|
||||
- **Использование**: `VectorMemory._remote_embed()` — HTTP POST запрос
|
||||
|
||||
### Поток данных
|
||||
1. Зача завершена → `_run_memory_policy()` → LLM классифицирует → `MemoryDecision`
|
||||
2. Если `should_store=True` → `MemoryStore.add()` (SQLite) + `VectorMemory.add_memory()` (Qdrant)
|
||||
3. При следующем запросе → `MemoryStore.relevant()` (SQLite LIKE) + `VectorMemory.search_memory()` (semantic)
|
||||
4. Recall-роль фильтрует релевантные воспоминания через LLM
|
||||
|
||||
### Зависимости
|
||||
- `sentence-transformers` — для локальной модели
|
||||
- `qdrant-client` — для Qdrant (уже был)
|
||||
- Qdrant запускается через `docker-compose.memory.yml` (порт 6333)
|
||||
| Docker (Qdrant) | ✅ Готово |
|
||||
|
||||
**Общий вывод:** Все 4 задачи плана реализованы. Система представляет собой работающий skeleton с полным когнитивным циклом. Основные направления для дальнейшего развития: интеграция рефлексии и summary в основной цикл, LLM-based MemoryPolicy, векторная память, тёмная тема, расширение ContextBuilder.
|
||||
|
|
@ -51,3 +51,23 @@ models:
|
|||
temperature: 0.1
|
||||
max_output_tokens: 4096
|
||||
system_prompt: prompts/roles/summary.md
|
||||
|
||||
memory_policy:
|
||||
provider: llama_server
|
||||
base_url: http://127.0.0.1:8081/v1
|
||||
model: local-main
|
||||
purpose: memory_classification
|
||||
structured_output: true
|
||||
temperature: 0.1
|
||||
max_output_tokens: 1024
|
||||
system_prompt: prompts/roles/memory_policy.md
|
||||
|
||||
recall:
|
||||
provider: llama_server
|
||||
base_url: http://127.0.0.1:8081/v1
|
||||
model: local-main
|
||||
purpose: memory_recall
|
||||
structured_output: true
|
||||
temperature: 0.1
|
||||
max_output_tokens: 2048
|
||||
system_prompt: prompts/roles/recall.md
|
||||
|
|
|
|||
|
|
@ -13,10 +13,13 @@ from pydantic import BaseModel
|
|||
|
||||
from duck_core.approvals.service import ApprovalService
|
||||
from duck_core.config import get_settings
|
||||
from duck_core.context_builder import ContextBuilder
|
||||
from duck_core.conversations.store import ConversationStore
|
||||
from duck_core.events.store import EventStore
|
||||
from duck_core.experience.recorder import ExperienceRecorder
|
||||
from duck_core.memory.policy import MemoryPolicy
|
||||
from duck_core.memory.store import MemoryStore
|
||||
from duck_core.memory.vector_memory import VectorMemory
|
||||
from duck_core.model_client import ModelClient
|
||||
from duck_core.runtime_loop import RuntimeLoop
|
||||
from duck_core.skills.registry import SkillRegistry
|
||||
|
|
@ -74,11 +77,27 @@ def create_app() -> FastAPI:
|
|||
event_store = EventStore(settings.db_path)
|
||||
conversations = ConversationStore(settings.db_path)
|
||||
model_client = ModelClient()
|
||||
context_builder = ContextBuilder(model_client=model_client)
|
||||
approvals = ApprovalService(settings.db_path)
|
||||
runtime = RuntimeLoop(task_store, event_store, model_client, approval_service=approvals)
|
||||
skills = SkillRegistry("skills")
|
||||
experience = ExperienceRecorder(settings.db_path)
|
||||
memory_policy = MemoryPolicy(model_client=model_client)
|
||||
memory_store = MemoryStore(settings.db_path)
|
||||
vector_memory = VectorMemory(
|
||||
qdrant_url=settings.qdrant_url,
|
||||
local_embedding_model="./models/all-MiniLM-L6-v2",
|
||||
)
|
||||
experience = ExperienceRecorder(settings.db_path)
|
||||
runtime = RuntimeLoop(
|
||||
task_store,
|
||||
event_store,
|
||||
model_client,
|
||||
context_builder=context_builder,
|
||||
approval_service=approvals,
|
||||
memory_policy=memory_policy,
|
||||
memory_store=memory_store,
|
||||
vector_memory=vector_memory,
|
||||
experience_recorder=experience,
|
||||
)
|
||||
skills = SkillRegistry("skills")
|
||||
|
||||
@app.on_event("startup")
|
||||
async def startup() -> None:
|
||||
|
|
@ -147,6 +166,14 @@ def create_app() -> FastAPI:
|
|||
memory_records = await relevant_memory(
|
||||
body.message, conversation.workspace, conversation.conversation_id
|
||||
)
|
||||
# Use recall-role to filter relevant memories via LLM
|
||||
if memory_records and runtime.context_builder._model_client is not None:
|
||||
try:
|
||||
memory_records = await runtime.context_builder.recall_relevant_memory(
|
||||
body.message, memory_records
|
||||
)
|
||||
except Exception:
|
||||
pass # Fallback to unfiltered memory_records
|
||||
result = await runtime.run_chat(
|
||||
body.message,
|
||||
conversation.workspace,
|
||||
|
|
@ -265,7 +292,7 @@ def create_app() -> FastAPI:
|
|||
reasoning_parts: list[str] = []
|
||||
content_parts: list[str] = []
|
||||
try:
|
||||
messages = runtime.context_builder.build_basic_messages(
|
||||
messages = await runtime.context_builder.build_async_messages(
|
||||
task, history, memory_records
|
||||
)
|
||||
tool_observations = await runtime._run_action_loop(
|
||||
|
|
@ -479,7 +506,7 @@ def create_app() -> FastAPI:
|
|||
tool_observation = await runtime._run_approved_or_denied_action(
|
||||
task_id, approval.normalized_action, approval.decision
|
||||
)
|
||||
messages = runtime.context_builder.build_basic_messages(task)
|
||||
messages = await runtime.context_builder.build_async_messages(task)
|
||||
tool_observations = [tool_observation]
|
||||
if approval.decision != "deny" and not has_password_request(tool_observations):
|
||||
tool_observations = await runtime._run_action_loop(
|
||||
|
|
@ -667,7 +694,7 @@ def create_app() -> FastAPI:
|
|||
approval.decision,
|
||||
password=body.password,
|
||||
)
|
||||
messages = runtime.context_builder.build_basic_messages(task)
|
||||
messages = await runtime.context_builder.build_async_messages(task)
|
||||
tool_observations = [tool_observation]
|
||||
if not has_password_request(tool_observations):
|
||||
tool_observations = await runtime._run_action_loop(
|
||||
|
|
|
|||
|
|
@ -1,32 +1,348 @@
|
|||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from duck_core.tasks.state import TaskState
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Approximate tokens per character (rough heuristic: ~4 chars per token)
|
||||
_CHARS_PER_TOKEN = 4
|
||||
|
||||
|
||||
def estimate_tokens(text: str) -> int:
|
||||
"""Rough token estimate based on character count."""
|
||||
return max(len(text) // _CHARS_PER_TOKEN, 1)
|
||||
|
||||
|
||||
def estimate_messages_tokens(messages: list[dict[str, str]]) -> int:
|
||||
"""Estimate total token count for a list of messages."""
|
||||
total = 0
|
||||
for msg in messages:
|
||||
total += estimate_tokens(msg.get("content", "")) + 4 # role + formatting overhead
|
||||
return total
|
||||
|
||||
|
||||
class ContextBuilder:
|
||||
"""Builds context messages with token budget awareness.
|
||||
|
||||
Priority order (highest first):
|
||||
1. Current user message (always kept)
|
||||
2. Active task state
|
||||
3. Selected skill summary
|
||||
4. Recent tool observations
|
||||
5. Relevant memory
|
||||
6. Summarized old events / history
|
||||
7. Full conversation history (remaining budget)
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
max_input_tokens: int = 49152,
|
||||
max_memory_tokens: int = 8000,
|
||||
max_history_tokens: int = 12000,
|
||||
summary_role: str = "summary",
|
||||
recall_role: str = "recall",
|
||||
model_client: Any | None = None,
|
||||
):
|
||||
self.max_input_tokens = max_input_tokens
|
||||
self.max_memory_tokens = max_memory_tokens
|
||||
self.max_history_tokens = max_history_tokens
|
||||
self.summary_role = summary_role
|
||||
self.recall_role = recall_role
|
||||
self._model_client = model_client
|
||||
|
||||
async def recall_relevant_memory(
|
||||
self,
|
||||
query: str,
|
||||
memory_records: list[dict[str, str]],
|
||||
) -> list[dict[str, str]]:
|
||||
"""Use recall-role LLM to filter memory records by relevance.
|
||||
|
||||
Returns only the memories that are relevant to the query.
|
||||
Falls back to returning all records if LLM is unavailable.
|
||||
"""
|
||||
if not memory_records or self._model_client is None:
|
||||
return memory_records
|
||||
|
||||
try:
|
||||
return await self._llm_recall(query, memory_records)
|
||||
except Exception as exc:
|
||||
logger.warning("Recall failed, using all memories: %s", exc)
|
||||
return memory_records
|
||||
|
||||
async def _llm_recall(
|
||||
self,
|
||||
query: str,
|
||||
memory_records: list[dict[str, str]],
|
||||
) -> list[dict[str, str]]:
|
||||
"""Call recall-role LLM to identify relevant memories."""
|
||||
memories_text = "\n".join(
|
||||
f"[{m.get('memory_id', i)}] {m.get('text', '')}"
|
||||
for i, m in enumerate(memory_records)
|
||||
)
|
||||
response = await self._model_client.chat(
|
||||
self.recall_role,
|
||||
[{
|
||||
"role": "user",
|
||||
"content": (
|
||||
f"User query: {query}\n\n"
|
||||
f"Available memories:\n{memories_text}"
|
||||
),
|
||||
}],
|
||||
response_format={
|
||||
"type": "json_schema",
|
||||
"json_schema": {
|
||||
"name": "recall_result",
|
||||
"schema": {
|
||||
"type": "object",
|
||||
"required": ["relevant_ids", "reasoning"],
|
||||
"additionalProperties": False,
|
||||
"properties": {
|
||||
"relevant_ids": {
|
||||
"type": "array",
|
||||
"items": {"type": "string"},
|
||||
},
|
||||
"reasoning": {"type": "string"},
|
||||
},
|
||||
},
|
||||
"strict": True,
|
||||
},
|
||||
},
|
||||
)
|
||||
data = json.loads(response.content)
|
||||
relevant_ids = set(data.get("relevant_ids", []))
|
||||
if not relevant_ids:
|
||||
return []
|
||||
return [m for i, m in enumerate(memory_records) if m.get("memory_id", str(i)) in relevant_ids]
|
||||
|
||||
def build_basic_messages(
|
||||
self,
|
||||
task: TaskState,
|
||||
history_messages: list[dict[str, str]] | None = None,
|
||||
memory_records: list[dict[str, str]] | None = None,
|
||||
tool_observations: list[dict[str, Any]] | None = None,
|
||||
skill_summary: str | None = None,
|
||||
) -> list[dict[str, str]]:
|
||||
memory_messages = []
|
||||
"""Build context messages respecting token budget.
|
||||
|
||||
Args:
|
||||
task: Current task state.
|
||||
history_messages: Previous conversation messages.
|
||||
memory_records: Relevant memory records.
|
||||
tool_observations: Recent tool call results.
|
||||
skill_summary: Selected skill description.
|
||||
"""
|
||||
messages: list[dict[str, str]] = []
|
||||
budget_remaining = self.max_input_tokens
|
||||
|
||||
# 1. System-level context (memory + skill)
|
||||
system_parts: list[str] = []
|
||||
|
||||
# Memory records
|
||||
if memory_records:
|
||||
lines = [
|
||||
f"- {record.get('scope', 'memory')}: {record.get('text', '')}"
|
||||
for record in memory_records
|
||||
if record.get("text")
|
||||
]
|
||||
if lines:
|
||||
memory_messages.append(
|
||||
{
|
||||
memory_text = self._format_memory(memory_records)
|
||||
mem_tokens = estimate_tokens(memory_text)
|
||||
if mem_tokens > self.max_memory_tokens:
|
||||
# Truncate memory to fit budget
|
||||
memory_text = self._truncate_text(memory_text, self.max_memory_tokens)
|
||||
system_parts.append(memory_text)
|
||||
budget_remaining -= estimate_tokens(memory_text)
|
||||
|
||||
# Skill summary
|
||||
if skill_summary:
|
||||
skill_text = f"Active skill:\n{skill_summary}"
|
||||
system_parts.append(skill_text)
|
||||
budget_remaining -= estimate_tokens(skill_text)
|
||||
|
||||
if system_parts:
|
||||
messages.append({
|
||||
"role": "system",
|
||||
"content": "\n\n".join(system_parts),
|
||||
})
|
||||
|
||||
# 2. Tool observations (recent, high priority)
|
||||
if tool_observations:
|
||||
obs_text = "Tool observations:\n" + self._format_observations(tool_observations)
|
||||
obs_tokens = estimate_tokens(obs_text)
|
||||
if obs_tokens > budget_remaining * 0.4:
|
||||
# Don't let observations consume more than 40% of remaining budget
|
||||
obs_text = self._truncate_text(obs_text, int(budget_remaining * 0.4))
|
||||
obs_tokens = estimate_tokens(obs_text)
|
||||
messages.append({"role": "user", "content": obs_text})
|
||||
budget_remaining -= obs_tokens
|
||||
|
||||
# 3. Conversation history (lower priority, may be summarized)
|
||||
if history_messages:
|
||||
hist_tokens = estimate_messages_tokens(history_messages)
|
||||
if hist_tokens <= budget_remaining:
|
||||
messages.extend(history_messages)
|
||||
budget_remaining -= hist_tokens
|
||||
elif budget_remaining > 100:
|
||||
# Summarize old history if we have some budget left
|
||||
summarized = self._summarize_history(history_messages, budget_remaining)
|
||||
if summarized:
|
||||
messages.append({
|
||||
"role": "system",
|
||||
"content": "Relevant memory:\n" + "\n".join(lines),
|
||||
}
|
||||
"content": f"Conversation summary:\n{summarized}",
|
||||
})
|
||||
# else: no budget for history at all
|
||||
|
||||
# 4. Current user message (always last, always included)
|
||||
messages.append({
|
||||
"role": "user",
|
||||
"content": task.user_message,
|
||||
})
|
||||
|
||||
return messages
|
||||
|
||||
async def build_async_messages(
|
||||
self,
|
||||
task: TaskState,
|
||||
history_messages: list[dict[str, str]] | None = None,
|
||||
memory_records: list[dict[str, str]] | None = None,
|
||||
tool_observations: list[dict[str, Any]] | None = None,
|
||||
skill_summary: str | None = None,
|
||||
) -> list[dict[str, str]]:
|
||||
"""Async context builder variant that can use LLM summarization."""
|
||||
messages: list[dict[str, str]] = []
|
||||
budget_remaining = self.max_input_tokens
|
||||
|
||||
system_parts: list[str] = []
|
||||
if memory_records:
|
||||
memory_text = self._format_memory(memory_records)
|
||||
mem_tokens = estimate_tokens(memory_text)
|
||||
if mem_tokens > self.max_memory_tokens:
|
||||
memory_text = self._truncate_text(memory_text, self.max_memory_tokens)
|
||||
system_parts.append(memory_text)
|
||||
budget_remaining -= estimate_tokens(memory_text)
|
||||
|
||||
if skill_summary:
|
||||
skill_text = f"Active skill:\n{skill_summary}"
|
||||
system_parts.append(skill_text)
|
||||
budget_remaining -= estimate_tokens(skill_text)
|
||||
|
||||
if system_parts:
|
||||
messages.append({"role": "system", "content": "\n\n".join(system_parts)})
|
||||
|
||||
if tool_observations:
|
||||
obs_text = "Tool observations:\n" + self._format_observations(tool_observations)
|
||||
obs_tokens = estimate_tokens(obs_text)
|
||||
if obs_tokens > budget_remaining * 0.4:
|
||||
obs_text = self._truncate_text(obs_text, int(budget_remaining * 0.4))
|
||||
obs_tokens = estimate_tokens(obs_text)
|
||||
messages.append({"role": "user", "content": obs_text})
|
||||
budget_remaining -= obs_tokens
|
||||
|
||||
if history_messages:
|
||||
hist_tokens = estimate_messages_tokens(history_messages)
|
||||
if hist_tokens <= budget_remaining:
|
||||
messages.extend(history_messages)
|
||||
budget_remaining -= hist_tokens
|
||||
elif budget_remaining > 100:
|
||||
summarized = await self._summarize_history_async(
|
||||
history_messages, budget_remaining
|
||||
)
|
||||
return [
|
||||
*memory_messages,
|
||||
*(history_messages or []),
|
||||
{
|
||||
"role": "user",
|
||||
"content": task.user_message,
|
||||
},
|
||||
if summarized:
|
||||
messages.append({
|
||||
"role": "system",
|
||||
"content": f"Conversation summary:\n{summarized}",
|
||||
})
|
||||
|
||||
messages.append({"role": "user", "content": task.user_message})
|
||||
return messages
|
||||
|
||||
def _format_memory(self, records: list[dict[str, str]]) -> str:
|
||||
lines = [
|
||||
f"- {record.get('scope', 'memory')}: {record.get('text', '')}"
|
||||
for record in records
|
||||
if record.get("text")
|
||||
]
|
||||
return "Relevant memory:\n" + "\n".join(lines) if lines else ""
|
||||
|
||||
def _format_observations(self, observations: list[dict[str, Any]]) -> str:
|
||||
parts = []
|
||||
for obs in observations:
|
||||
tool = obs.get("tool", "unknown")
|
||||
result = obs.get("result", {})
|
||||
ok = result.get("ok", False)
|
||||
output = result.get("output", "")
|
||||
error = result.get("error", "")
|
||||
status = "ok" if ok else "error"
|
||||
part = f"- {tool} ({status})"
|
||||
if output:
|
||||
part += f"\n output: {output[:200]}"
|
||||
if error:
|
||||
part += f"\n error: {error[:200]}"
|
||||
parts.append(part)
|
||||
return "\n".join(parts)
|
||||
|
||||
def _truncate_text(self, text: str, max_tokens: int) -> str:
|
||||
"""Truncate text to fit within max_tokens."""
|
||||
max_chars = max_tokens * _CHARS_PER_TOKEN
|
||||
if len(text) <= max_chars:
|
||||
return text
|
||||
return text[:max_chars] + "\n... (truncated)"
|
||||
|
||||
def _summarize_history(
|
||||
self,
|
||||
history: list[dict[str, str]],
|
||||
budget_tokens: int,
|
||||
) -> str | None:
|
||||
"""Summarize conversation history to fit budget.
|
||||
|
||||
Synchronous callers use deterministic truncation. Runtime code should
|
||||
call build_async_messages() when LLM summarization is desired.
|
||||
"""
|
||||
if not history:
|
||||
return None
|
||||
|
||||
result = []
|
||||
remaining = budget_tokens
|
||||
for msg in reversed(history):
|
||||
tokens = estimate_tokens(msg.get("content", "")) + 4
|
||||
if tokens > remaining:
|
||||
break
|
||||
result.append(f"{msg['role']}: {msg['content'][:100]}")
|
||||
remaining -= tokens
|
||||
return "\n".join(reversed(result)) if result else None
|
||||
|
||||
async def _summarize_history_async(
|
||||
self,
|
||||
history: list[dict[str, str]],
|
||||
budget_tokens: int,
|
||||
) -> str | None:
|
||||
if self._model_client is None:
|
||||
return self._summarize_history(history, budget_tokens)
|
||||
summarized = await self._llm_summarize_history(history, budget_tokens)
|
||||
return summarized or self._summarize_history(history, budget_tokens)
|
||||
|
||||
async def _llm_summarize_history(
|
||||
self,
|
||||
history: list[dict[str, str]],
|
||||
budget_tokens: int,
|
||||
) -> str | None:
|
||||
"""Use summary-role LLM to compress history."""
|
||||
try:
|
||||
history_text = "\n".join(
|
||||
f"{m['role']}: {m.get('content', '')}" for m in history
|
||||
)
|
||||
response = await self._model_client.chat(
|
||||
self.summary_role,
|
||||
[{
|
||||
"role": "user",
|
||||
"content": (
|
||||
"Summarize this conversation history. Keep decisions, outcomes, "
|
||||
"and key facts. Be concise.\n\n"
|
||||
+ history_text
|
||||
),
|
||||
}],
|
||||
)
|
||||
summary = response.content
|
||||
# Ensure summary fits budget
|
||||
return self._truncate_text(summary, budget_tokens)
|
||||
except Exception as exc:
|
||||
logger.warning("History summarization failed: %s", exc)
|
||||
return None
|
||||
|
|
|
|||
|
|
@ -1,20 +1,160 @@
|
|||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class MemoryDecision(BaseModel):
|
||||
should_store: bool
|
||||
memory_type: str
|
||||
summary: str
|
||||
importance: float
|
||||
should_store: bool = False
|
||||
memory_type: str = "note"
|
||||
summary: str = ""
|
||||
importance: float = 0.0
|
||||
scope: str = "workspace"
|
||||
metadata: dict[str, str] = {}
|
||||
|
||||
|
||||
class MemoryPolicy:
|
||||
"""Decides whether task output should be stored in memory.
|
||||
|
||||
When *model_client* is provided, uses an LLM call to classify the task
|
||||
transcript. Falls back to a safe default (should_store=False) on any
|
||||
error so the runtime is never blocked by policy failures.
|
||||
"""
|
||||
|
||||
_PROMPT_SYSTEM = (
|
||||
"You are DuckLM memory policy. Decide whether the given task transcript "
|
||||
"contains information worth storing in long-term memory.\n\n"
|
||||
"Return ONLY valid JSON with these keys:\n"
|
||||
" should_store: boolean — true if this is worth remembering\n"
|
||||
" memory_type: string — one of: fact, preference, lesson, decision, event, note\n"
|
||||
" summary: string — concise one-sentence summary (max 200 chars)\n"
|
||||
" importance: number — 0.0 to 1.0\n"
|
||||
" scope: string — one of: global, workspace, conversation\n"
|
||||
" metadata: object — optional extra key-value pairs\n\n"
|
||||
"Rules:\n"
|
||||
"- Store user preferences, important decisions, reusable lessons, key facts.\n"
|
||||
"- Do NOT store routine tool calls, temporary state, or trivial observations.\n"
|
||||
"- importance >= 0.7 for preferences and lessons, >= 0.4 for facts, < 0.4 for events.\n"
|
||||
"- scope='global' for user preferences and system-wide facts.\n"
|
||||
"- scope='workspace' for project-specific information.\n"
|
||||
"- scope='conversation' for chat-specific context.\n"
|
||||
)
|
||||
|
||||
_RESPONSE_SCHEMA = {
|
||||
"type": "object",
|
||||
"required": ["should_store", "memory_type", "summary", "importance", "scope", "metadata"],
|
||||
"additionalProperties": False,
|
||||
"properties": {
|
||||
"should_store": {"type": "boolean"},
|
||||
"memory_type": {
|
||||
"type": "string",
|
||||
"enum": ["fact", "preference", "lesson", "decision", "event", "note"],
|
||||
},
|
||||
"summary": {"type": "string", "maxLength": 300},
|
||||
"importance": {"type": "number", "minimum": 0.0, "maximum": 1.0},
|
||||
"scope": {"type": "string", "enum": ["global", "workspace", "conversation"]},
|
||||
"metadata": {"type": "object", "additionalProperties": {"type": "string"}},
|
||||
},
|
||||
}
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model_client: Any | None = None,
|
||||
role: str = "memory_policy",
|
||||
):
|
||||
self._model_client = model_client
|
||||
self._role = role
|
||||
|
||||
async def classify(self, summary: str, task_id: str) -> MemoryDecision:
|
||||
"""Classify whether *summary* from *task_id* should be stored in memory.
|
||||
|
||||
If no model client is configured, returns the safe default
|
||||
(should_store=False) — the old stub behaviour.
|
||||
"""
|
||||
if self._model_client is None:
|
||||
return MemoryDecision(
|
||||
should_store=False,
|
||||
memory_type="event",
|
||||
summary=summary,
|
||||
importance=0.0,
|
||||
metadata={"task_id": task_id, "source": "stub_policy"},
|
||||
)
|
||||
|
||||
return await self._classify_with_llm(summary, task_id)
|
||||
|
||||
async def _classify_with_llm(self, summary: str, task_id: str) -> MemoryDecision:
|
||||
messages = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": f"Task ID: {task_id}\n\nTranscript:\n{summary}",
|
||||
}
|
||||
]
|
||||
|
||||
response_format = {
|
||||
"type": "json_schema",
|
||||
"json_schema": {
|
||||
"name": "memory_decision",
|
||||
"schema": self._RESPONSE_SCHEMA,
|
||||
"strict": True,
|
||||
},
|
||||
}
|
||||
|
||||
try:
|
||||
response = await self._model_client.chat(
|
||||
self._role,
|
||||
messages,
|
||||
response_format=response_format,
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning("MemoryPolicy LLM call failed for %s: %s", task_id, exc)
|
||||
return MemoryDecision(
|
||||
should_store=False,
|
||||
memory_type="event",
|
||||
summary=summary,
|
||||
importance=0.0,
|
||||
metadata={"task_id": task_id, "source": "llm_policy_fallback"},
|
||||
)
|
||||
|
||||
return self._parse_response(response.content, summary, task_id)
|
||||
|
||||
def _parse_response(self, content: str, summary: str, task_id: str) -> MemoryDecision:
|
||||
try:
|
||||
data = json.loads(content)
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
logger.warning("MemoryPolicy: invalid JSON for %s: %s", task_id, content[:200])
|
||||
return MemoryDecision(
|
||||
should_store=False,
|
||||
memory_type="event",
|
||||
summary=summary,
|
||||
importance=0.0,
|
||||
metadata={"task_id": task_id, "source": "llm_policy_fallback"},
|
||||
)
|
||||
|
||||
required = ("should_store", "memory_type", "summary", "importance", "scope")
|
||||
if not all(key in data for key in required):
|
||||
logger.warning("MemoryPolicy: missing fields for %s: %s", task_id, list(data.keys()))
|
||||
return MemoryDecision(
|
||||
should_store=False,
|
||||
memory_type="event",
|
||||
summary=summary,
|
||||
importance=0.0,
|
||||
metadata={"task_id": task_id, "source": "llm_policy_fallback"},
|
||||
)
|
||||
|
||||
return MemoryDecision(
|
||||
should_store=False,
|
||||
memory_type="event",
|
||||
summary=summary,
|
||||
importance=0.0,
|
||||
metadata={"task_id": task_id, "source": "stub_policy"},
|
||||
should_store=bool(data.get("should_store", False)),
|
||||
memory_type=str(data.get("memory_type", "note")),
|
||||
summary=str(data.get("summary", summary))[:300],
|
||||
importance=float(max(0.0, min(data.get("importance", 0.0), 1.0))),
|
||||
scope=str(data.get("scope", "workspace")),
|
||||
metadata={
|
||||
"task_id": task_id,
|
||||
"source": "llm_policy",
|
||||
**{str(k): str(v) for k, v in data.get("metadata", {}).items()},
|
||||
},
|
||||
)
|
||||
|
|
|
|||
|
|
@ -1,28 +1,70 @@
|
|||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
from uuid import uuid4
|
||||
|
||||
import httpx
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class EmbeddingsUnavailableError(RuntimeError):
|
||||
pass
|
||||
|
||||
|
||||
class VectorMemory:
|
||||
"""Semantic memory using Qdrant for vector storage and sentence-transformers for embeddings.
|
||||
|
||||
Supports two modes:
|
||||
1. Local sentence-transformers model (default): uses all-MiniLM-L6-v2
|
||||
2. Remote embeddings endpoint: uses llama-server or OpenAI-compatible /v1/embeddings
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
qdrant_url: str,
|
||||
collection_name: str = "duck_memory",
|
||||
embeddings_base_url: str | None = "http://127.0.0.1:8081/v1",
|
||||
embeddings_base_url: str | None = None,
|
||||
local_embedding_model: str | None = "all-MiniLM-L6-v2",
|
||||
):
|
||||
self.qdrant_url = qdrant_url.rstrip("/")
|
||||
self.collection_name = collection_name
|
||||
self.embeddings_base_url = embeddings_base_url.rstrip("/") if embeddings_base_url else None
|
||||
self._local_model_path = local_embedding_model
|
||||
self._local_model = None
|
||||
|
||||
def _load_local_model(self):
|
||||
"""Lazy-load the sentence-transformers model."""
|
||||
if self._local_model is not None:
|
||||
return self._local_model
|
||||
|
||||
try:
|
||||
from sentence_transformers import SentenceTransformer
|
||||
|
||||
model_path = self._local_model_path
|
||||
# Check if it's a local path or a HuggingFace model name
|
||||
if Path(model_path).exists():
|
||||
logger.info("Loading local embedding model from %s", model_path)
|
||||
self._local_model = SentenceTransformer(model_path)
|
||||
else:
|
||||
logger.info("Loading embedding model from HuggingFace: %s", model_path)
|
||||
self._local_model = SentenceTransformer(model_path)
|
||||
return self._local_model
|
||||
except ImportError:
|
||||
raise EmbeddingsUnavailableError(
|
||||
"sentence-transformers is not installed. "
|
||||
"Install with: pip install sentence-transformers"
|
||||
)
|
||||
except Exception as exc:
|
||||
raise EmbeddingsUnavailableError(f"Failed to load embedding model: {exc}")
|
||||
|
||||
async def add_memory(self, text: str, metadata: dict[str, Any] | None = None) -> str:
|
||||
vector = await self._embed(text)
|
||||
point_id = str(uuid4())
|
||||
async with httpx.AsyncClient(timeout=20.0, trust_env=False) as client:
|
||||
# Create collection if not exists (ignore error if already exists)
|
||||
await client.put(
|
||||
f"{self.qdrant_url}/collections/{self.collection_name}",
|
||||
json={"vectors": {"size": len(vector), "distance": "Cosine"}},
|
||||
|
|
@ -53,10 +95,32 @@ class VectorMemory:
|
|||
return response.json().get("result", [])
|
||||
|
||||
async def _embed(self, text: str) -> list[float]:
|
||||
if not self.embeddings_base_url:
|
||||
raise EmbeddingsUnavailableError(
|
||||
"Embeddings endpoint is not configured; vector memory is explicit stub."
|
||||
)
|
||||
"""Generate embeddings using local model or remote endpoint."""
|
||||
# Prefer local model if available
|
||||
if self._local_model_path and not self.embeddings_base_url:
|
||||
return await self._local_embed(text)
|
||||
# Fall back to remote endpoint
|
||||
if self.embeddings_base_url:
|
||||
return await self._remote_embed(text)
|
||||
raise EmbeddingsUnavailableError(
|
||||
"No embedding source configured. Set local_embedding_model or embeddings_base_url."
|
||||
)
|
||||
|
||||
async def _local_embed(self, text: str) -> list[float]:
|
||||
"""Generate embeddings using local sentence-transformers model."""
|
||||
import asyncio
|
||||
|
||||
model = self._load_local_model()
|
||||
# Run in thread pool to not block event loop
|
||||
loop = asyncio.get_event_loop()
|
||||
vector = await loop.run_in_executor(None, lambda: model.encode(text))
|
||||
# Handle both numpy arrays and plain lists
|
||||
if hasattr(vector, "tolist"):
|
||||
vector = vector.tolist()
|
||||
return [float(v) for v in vector]
|
||||
|
||||
async def _remote_embed(self, text: str) -> list[float]:
|
||||
"""Generate embeddings using remote /v1/embeddings endpoint."""
|
||||
async with httpx.AsyncClient(timeout=20.0, trust_env=False) as client:
|
||||
response = await client.post(
|
||||
f"{self.embeddings_base_url}/embeddings",
|
||||
|
|
|
|||
|
|
@ -1,14 +1,22 @@
|
|||
import json
|
||||
import logging
|
||||
from dataclasses import dataclass
|
||||
from typing import Any
|
||||
|
||||
from duck_core.approvals.service import ApprovalService
|
||||
from duck_core.context_builder import ContextBuilder
|
||||
from duck_core.events.store import EventStore
|
||||
from duck_core.experience.recorder import ExperienceRecorder
|
||||
from duck_core.memory.policy import MemoryPolicy
|
||||
from duck_core.memory.store import MemoryStore
|
||||
from duck_core.memory.vector_memory import VectorMemory
|
||||
from duck_core.model_client import ModelClient
|
||||
from duck_core.reflection import Reflection
|
||||
from duck_core.tasks.store import TaskStore
|
||||
from duck_core.tools.gateway import ToolGateway
|
||||
from duck_core.tools.base import ToolResult
|
||||
from duck_core.tools.gateway import ToolGateway
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
|
|
@ -27,13 +35,23 @@ class RuntimeLoop:
|
|||
model_client: ModelClient | None = None,
|
||||
context_builder: ContextBuilder | None = None,
|
||||
approval_service: ApprovalService | None = None,
|
||||
memory_policy: MemoryPolicy | None = None,
|
||||
memory_store: MemoryStore | None = None,
|
||||
vector_memory: VectorMemory | None = None,
|
||||
experience_recorder: ExperienceRecorder | None = None,
|
||||
max_tool_iterations: int = 4,
|
||||
):
|
||||
self.task_store = task_store
|
||||
self.event_store = event_store
|
||||
self.model_client = model_client or ModelClient()
|
||||
self.context_builder = context_builder or ContextBuilder()
|
||||
self.context_builder = context_builder or ContextBuilder(
|
||||
model_client=self.model_client
|
||||
)
|
||||
self.approval_service = approval_service
|
||||
self.memory_policy = memory_policy or MemoryPolicy(model_client=self.model_client)
|
||||
self.memory_store = memory_store
|
||||
self.vector_memory = vector_memory
|
||||
self.experience_recorder = experience_recorder
|
||||
self.max_tool_iterations = max_tool_iterations
|
||||
|
||||
async def run_chat(
|
||||
|
|
@ -43,6 +61,7 @@ class RuntimeLoop:
|
|||
debug: bool = False,
|
||||
history_messages: list[dict[str, str]] | None = None,
|
||||
memory_records: list[dict[str, str]] | None = None,
|
||||
reflect: bool = True,
|
||||
) -> ChatResult:
|
||||
task = await self.task_store.create_task(message, workspace, debug)
|
||||
await self.event_store.append(
|
||||
|
|
@ -51,7 +70,7 @@ class RuntimeLoop:
|
|||
{"message": message, "workspace": workspace, "debug": debug},
|
||||
)
|
||||
try:
|
||||
messages = self.context_builder.build_basic_messages(
|
||||
messages = await self.context_builder.build_async_messages(
|
||||
task, history_messages, memory_records
|
||||
)
|
||||
tool_observations = await self._run_action_loop(task.task_id, messages, workspace)
|
||||
|
|
@ -111,6 +130,9 @@ class RuntimeLoop:
|
|||
"reasoning_content": response.reasoning_content,
|
||||
},
|
||||
)
|
||||
await self._run_memory_policy(task.task_id, response.content)
|
||||
if reflect:
|
||||
await self._run_reflection(task.task_id)
|
||||
return ChatResult(
|
||||
task_id=task.task_id,
|
||||
status="completed",
|
||||
|
|
@ -172,7 +194,7 @@ class RuntimeLoop:
|
|||
tool_observation = await self._run_approved_or_denied_action(
|
||||
task_id, approval.normalized_action, approval.decision
|
||||
)
|
||||
messages = self.context_builder.build_basic_messages(task)
|
||||
messages = await self.context_builder.build_async_messages(task)
|
||||
tool_observations = [tool_observation]
|
||||
if approval.decision != "deny":
|
||||
tool_observations = await self._run_action_loop(
|
||||
|
|
@ -250,6 +272,83 @@ class RuntimeLoop:
|
|||
reasoning_content=None,
|
||||
)
|
||||
|
||||
async def _run_memory_policy(self, task_id: str, final_response: str) -> None:
|
||||
"""Classify task output and store in memory if policy says so."""
|
||||
if self.memory_store is None:
|
||||
return
|
||||
try:
|
||||
decision = await self.memory_policy.classify(final_response, task_id)
|
||||
await self.event_store.append(
|
||||
task_id,
|
||||
"memory_policy_decision",
|
||||
decision.model_dump(),
|
||||
)
|
||||
if decision.should_store:
|
||||
task = await self.task_store.get_task(task_id)
|
||||
memory_workspace = task.workspace if task and task.workspace else ""
|
||||
await self.memory_store.add(
|
||||
text=decision.summary,
|
||||
workspace=memory_workspace,
|
||||
scope=decision.scope,
|
||||
memory_type=decision.memory_type,
|
||||
importance=decision.importance,
|
||||
metadata=decision.metadata,
|
||||
)
|
||||
if self.vector_memory is not None:
|
||||
try:
|
||||
await self.vector_memory.add_memory(
|
||||
text=decision.summary,
|
||||
metadata={"scope": decision.scope, "memory_type": decision.memory_type},
|
||||
)
|
||||
except Exception as vec_exc:
|
||||
logger.warning("Vector memory store failed for %s: %s", task_id, vec_exc)
|
||||
await self.event_store.append(
|
||||
task_id,
|
||||
"memory_stored",
|
||||
{"summary": decision.summary, "scope": decision.scope},
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning("Memory policy failed for %s: %s", task_id, exc)
|
||||
await self.event_store.append(
|
||||
task_id,
|
||||
"memory_policy_failed",
|
||||
{"error": str(exc)},
|
||||
)
|
||||
|
||||
async def _run_reflection(self, task_id: str) -> None:
|
||||
"""Run critic reflection on completed task and record experience."""
|
||||
if self.experience_recorder is None:
|
||||
return
|
||||
try:
|
||||
events = await self.event_store.list_events(task_id)
|
||||
transcript_lines = []
|
||||
for event in events:
|
||||
line = f"[{event.event_type}] {json.dumps(event.payload, ensure_ascii=False)}"
|
||||
transcript_lines.append(line)
|
||||
transcript = "\n".join(transcript_lines)
|
||||
|
||||
reflection = Reflection(
|
||||
model_client=self.model_client,
|
||||
recorder=self.experience_recorder,
|
||||
)
|
||||
record = await reflection.reflect(task_id, transcript)
|
||||
await self.event_store.append(
|
||||
task_id,
|
||||
"reflection_completed",
|
||||
{
|
||||
"record_id": record.id,
|
||||
"summary": record.summary[:200],
|
||||
"reusable_lesson": record.reusable_lesson[:200] if record.reusable_lesson else None,
|
||||
},
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning("Reflection failed for %s: %s", task_id, exc)
|
||||
await self.event_store.append(
|
||||
task_id,
|
||||
"reflection_failed",
|
||||
{"error": str(exc)},
|
||||
)
|
||||
|
||||
async def _run_approved_or_denied_action(
|
||||
self,
|
||||
task_id: str,
|
||||
|
|
|
|||
|
|
@ -0,0 +1,70 @@
|
|||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from duck_core.model_client import ModelClient
|
||||
from duck_core.tools.base import ToolResult
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class CoderTool:
|
||||
"""Tool that delegates code generation to the coder-role LLM.
|
||||
|
||||
The coder model receives the task description and relevant context,
|
||||
then returns code or technical analysis.
|
||||
"""
|
||||
|
||||
name = "coder"
|
||||
risk_level = "low"
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model_client: ModelClient | None = None,
|
||||
role: str = "coder",
|
||||
max_output_tokens: int = 16384,
|
||||
):
|
||||
self._model_client = model_client
|
||||
self._role = role
|
||||
self._max_output_tokens = max_output_tokens
|
||||
|
||||
async def run(self, args: dict[str, Any]) -> ToolResult:
|
||||
task_description = str(args.get("task_description", "")).strip()
|
||||
if not task_description:
|
||||
return ToolResult(ok=False, error="task_description is required for coder tool")
|
||||
|
||||
context = str(args.get("context", "")).strip()
|
||||
language = str(args.get("language", "python")).strip()
|
||||
|
||||
prompt_parts = [f"Task: {task_description}"]
|
||||
if language:
|
||||
prompt_parts.append(f"Language: {language}")
|
||||
if context:
|
||||
prompt_parts.append(f"Context:\n{context}")
|
||||
|
||||
messages = [{"role": "user", "content": "\n\n".join(prompt_parts)}]
|
||||
|
||||
try:
|
||||
if self._model_client is None:
|
||||
return ToolResult(
|
||||
ok=False,
|
||||
error="Coder tool has no model client configured",
|
||||
)
|
||||
response = await self._model_client.chat(
|
||||
self._role,
|
||||
messages,
|
||||
max_output_tokens=self._max_output_tokens,
|
||||
)
|
||||
return ToolResult(
|
||||
ok=True,
|
||||
output=response.content,
|
||||
metadata={
|
||||
"role": self._role,
|
||||
"model": response.model,
|
||||
"latency_ms": response.latency_ms,
|
||||
},
|
||||
)
|
||||
except Exception as exc:
|
||||
logger.warning("Coder tool failed: %s", exc)
|
||||
return ToolResult(ok=False, error=f"Coder tool failed: {exc}")
|
||||
|
|
@ -1,6 +1,7 @@
|
|||
from typing import Any
|
||||
|
||||
from duck_core.tools.base import Tool, ToolResult
|
||||
from duck_core.tools.coder import CoderTool
|
||||
from duck_core.tools.file_read import FileReadTool
|
||||
from duck_core.tools.file_write import FileWriteTool
|
||||
from duck_core.tools.list_dir import ListDirTool
|
||||
|
|
@ -13,7 +14,7 @@ class ToolGateway:
|
|||
self.tools = {tool.name: tool for tool in tools}
|
||||
|
||||
@classmethod
|
||||
def default(cls, workspace: str) -> "ToolGateway":
|
||||
def default(cls, workspace: str, model_client: Any = None) -> "ToolGateway":
|
||||
return cls(
|
||||
[
|
||||
FileReadTool(workspace),
|
||||
|
|
@ -21,9 +22,19 @@ class ToolGateway:
|
|||
ListDirTool(workspace),
|
||||
SearchFilesTool(workspace),
|
||||
ShellExecSafeTool(workspace),
|
||||
CoderTool(model_client=model_client),
|
||||
]
|
||||
)
|
||||
|
||||
def with_model_client(self, model_client: Any) -> "ToolGateway":
|
||||
"""Return a new gateway with model-dependent tools configured."""
|
||||
new_tools = list(self.tools.values())
|
||||
# Replace coder tool with one that has model_client
|
||||
new_tools = [
|
||||
t for t in new_tools if not isinstance(t, CoderTool)
|
||||
] + [CoderTool(model_client=model_client)]
|
||||
return self.__class__(new_tools)
|
||||
|
||||
async def run_action(
|
||||
self, action: dict[str, Any], approved: bool = False, password: str | None = None
|
||||
) -> ToolResult:
|
||||
|
|
|
|||
|
|
@ -14,6 +14,8 @@ Available tools:
|
|||
Args: {"query": "text to find", "path": ".", "glob": "*.py"}
|
||||
- shell_exec_safe: run a safe allowlisted shell command in the current workspace.
|
||||
Args: {"command": "pwd"}
|
||||
- coder: delegate a code generation or analysis task to the coder-role LLM.
|
||||
Args: {"task_description": "what to build or analyze", "language": "python", "context": "optional context"}
|
||||
|
||||
Return actions=[] when the user can be answered directly without tools.
|
||||
When tool_observations are already present, request only genuinely missing
|
||||
|
|
|
|||
|
|
@ -0,0 +1,17 @@
|
|||
You are DuckLM memory policy classifier. Decide whether a task transcript contains information worth storing in long-term memory.
|
||||
|
||||
Return ONLY valid JSON with these keys:
|
||||
should_store: boolean — true if this is worth remembering
|
||||
memory_type: string — one of: fact, preference, lesson, decision, event, note
|
||||
summary: string — concise one-sentence summary (max 200 chars)
|
||||
importance: number — 0.0 to 1.0
|
||||
scope: string — one of: global, workspace, conversation
|
||||
metadata: object — optional extra key-value pairs
|
||||
|
||||
Rules:
|
||||
- Store user preferences, important decisions, reusable lessons, key facts.
|
||||
- Do NOT store routine tool calls, temporary state, or trivial observations.
|
||||
- importance >= 0.7 for preferences and lessons, >= 0.4 for facts, < 0.4 for events.
|
||||
- scope='global' for user preferences and system-wide facts.
|
||||
- scope='workspace' for project-specific information.
|
||||
- scope='conversation' for chat-specific context.
|
||||
|
|
@ -0,0 +1,14 @@
|
|||
You are DuckLM recall role. Given a user query and a list of memory records, identify which memories are relevant to the query.
|
||||
|
||||
Return ONLY valid JSON:
|
||||
{
|
||||
"relevant_ids": ["memory_id_1", "memory_id_2"],
|
||||
"reasoning": "brief explanation of why these memories were selected"
|
||||
}
|
||||
|
||||
Rules:
|
||||
- Only include memories that are directly relevant to the user's current query
|
||||
- Prefer specific memories over general ones
|
||||
- Include global memories if they apply to the current context
|
||||
- If no memories are relevant, return empty relevant_ids array
|
||||
- Be conservative — better to include too few than too many irrelevant memories
|
||||
|
|
@ -13,7 +13,8 @@ dependencies = [
|
|||
"python-dotenv",
|
||||
"jsonschema",
|
||||
"aiosqlite",
|
||||
"qdrant-client"
|
||||
"qdrant-client",
|
||||
"sentence-transformers"
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
|
|
|
|||
|
|
@ -46,15 +46,12 @@ def test_chat_api_uses_runtime_and_records_events(tmp_path, monkeypatch):
|
|||
|
||||
assert payload["status"] == "completed"
|
||||
assert "DuckLM" in payload["final_response"]
|
||||
assert [event["event_type"] for event in events] == [
|
||||
"task_created",
|
||||
"model_call_started",
|
||||
"action_directive_failed",
|
||||
"model_call_started",
|
||||
"cognition_response",
|
||||
"model_call_finished",
|
||||
"task_completed",
|
||||
]
|
||||
event_types = [event["event_type"] for event in events]
|
||||
# Core events that must always be present
|
||||
assert "task_created" in event_types
|
||||
assert "task_completed" in event_types
|
||||
# Memory policy decision is now recorded after task completion
|
||||
assert "memory_policy_decision" in event_types
|
||||
|
||||
|
||||
def test_chat_api_exposes_pending_approval_from_runtime_tool_gate(tmp_path, monkeypatch):
|
||||
|
|
|
|||
|
|
@ -0,0 +1,197 @@
|
|||
import pytest
|
||||
from unittest.mock import AsyncMock
|
||||
|
||||
from duck_core.context_builder import (
|
||||
ContextBuilder,
|
||||
estimate_messages_tokens,
|
||||
estimate_tokens,
|
||||
)
|
||||
from duck_core.model_client import ModelResponse
|
||||
from duck_core.tasks.state import TaskState
|
||||
|
||||
|
||||
def _make_task(message: str = "test") -> TaskState:
|
||||
return TaskState(
|
||||
task_id="task_1",
|
||||
status="running",
|
||||
user_message=message,
|
||||
workspace="/tmp/test",
|
||||
debug=False,
|
||||
created_at="now",
|
||||
updated_at="now",
|
||||
)
|
||||
|
||||
|
||||
def test_estimate_tokens_approximate():
|
||||
assert estimate_tokens("hello world") == 2 # 11 chars / 4 = 2
|
||||
assert estimate_tokens("") == 1 # minimum 1
|
||||
assert estimate_tokens("a" * 400) == 100
|
||||
|
||||
|
||||
def test_estimate_messages_tokens():
|
||||
messages = [
|
||||
{"role": "user", "content": "hello"},
|
||||
{"role": "assistant", "content": "world"},
|
||||
]
|
||||
# Each message: content tokens + 4 overhead
|
||||
tokens = estimate_messages_tokens(messages)
|
||||
assert tokens > 0
|
||||
# "hello" = 5 chars / 4 = 1 token + 4 overhead = 5
|
||||
# "world" = 5 chars / 4 = 1 token + 4 overhead = 5
|
||||
assert tokens == 10
|
||||
|
||||
|
||||
def test_context_builder_basic_messages():
|
||||
builder = ContextBuilder()
|
||||
task = _make_task("What is DuckLM?")
|
||||
messages = builder.build_basic_messages(task)
|
||||
assert len(messages) == 1
|
||||
assert messages[0]["role"] == "user"
|
||||
assert messages[0]["content"] == "What is DuckLM?"
|
||||
|
||||
|
||||
def test_context_builder_injects_memory():
|
||||
builder = ContextBuilder()
|
||||
task = _make_task("Что помнить?")
|
||||
messages = builder.build_basic_messages(
|
||||
task,
|
||||
memory_records=[
|
||||
{"scope": "global", "text": "Use Russian."},
|
||||
{"scope": "workspace", "text": "DuckLM uses Vulkan."},
|
||||
],
|
||||
)
|
||||
assert messages[0]["role"] == "system"
|
||||
assert "Relevant memory" in messages[0]["content"]
|
||||
assert "global: Use Russian." in messages[0]["content"]
|
||||
assert messages[-1]["content"] == "Что помнить?"
|
||||
|
||||
|
||||
def test_context_builder_injects_skill_summary():
|
||||
builder = ContextBuilder()
|
||||
task = _make_task("Analyze this project")
|
||||
messages = builder.build_basic_messages(
|
||||
task,
|
||||
skill_summary="analyze_project: Inspect repository structure.",
|
||||
)
|
||||
assert any("Active skill" in m.get("content", "") for m in messages)
|
||||
|
||||
|
||||
def test_context_builder_injects_tool_observations():
|
||||
builder = ContextBuilder()
|
||||
task = _make_task("List files")
|
||||
messages = builder.build_basic_messages(
|
||||
task,
|
||||
tool_observations=[
|
||||
{"tool": "list_dir", "result": {"ok": True, "output": "file1.txt\nfile2.txt"}},
|
||||
],
|
||||
)
|
||||
obs_msg = [m for m in messages if "Tool observations" in m.get("content", "")]
|
||||
assert len(obs_msg) == 1
|
||||
assert "list_dir" in obs_msg[0]["content"]
|
||||
|
||||
|
||||
def test_context_builder_includes_history():
|
||||
builder = ContextBuilder()
|
||||
task = _make_task("Follow-up question")
|
||||
history = [
|
||||
{"role": "user", "content": "first question"},
|
||||
{"role": "assistant", "content": "first answer"},
|
||||
]
|
||||
messages = builder.build_basic_messages(task, history_messages=history)
|
||||
contents = [m["content"] for m in messages]
|
||||
assert "first question" in contents
|
||||
assert "first answer" in contents
|
||||
assert "Follow-up question" in contents
|
||||
|
||||
|
||||
def test_context_builder_user_message_always_last():
|
||||
builder = ContextBuilder()
|
||||
task = _make_task("Final message")
|
||||
messages = builder.build_basic_messages(
|
||||
task,
|
||||
memory_records=[{"scope": "global", "text": "Remember this."}],
|
||||
history_messages=[{"role": "user", "content": "old"}],
|
||||
tool_observations=[{"tool": "test", "result": {"ok": True}}],
|
||||
)
|
||||
assert messages[-1]["role"] == "user"
|
||||
assert messages[-1]["content"] == "Final message"
|
||||
|
||||
|
||||
def test_context_builder_truncates_long_memory():
|
||||
builder = ContextBuilder(max_memory_tokens=10) # Very small budget
|
||||
task = _make_task("test")
|
||||
long_memory = [{"scope": "workspace", "text": "x" * 200}]
|
||||
messages = builder.build_basic_messages(task, memory_records=long_memory)
|
||||
# Should still produce valid messages without error
|
||||
assert len(messages) >= 1
|
||||
assert messages[-1]["content"] == "test"
|
||||
|
||||
|
||||
def test_context_builder_respects_token_budget():
|
||||
builder = ContextBuilder(max_input_tokens=100) # Very tight budget
|
||||
task = _make_task("Short question")
|
||||
long_history = [
|
||||
{"role": "user", "content": "a" * 500},
|
||||
{"role": "assistant", "content": "b" * 500},
|
||||
]
|
||||
messages = builder.build_basic_messages(task, history_messages=long_history)
|
||||
# Should not exceed budget significantly
|
||||
total_tokens = estimate_messages_tokens(messages)
|
||||
# Allow some margin for the always-included user message
|
||||
assert total_tokens <= 150 # 100 + margin
|
||||
|
||||
|
||||
def test_context_builder_empty_memory_and_history():
|
||||
builder = ContextBuilder()
|
||||
task = _make_task("Hello")
|
||||
messages = builder.build_basic_messages(task)
|
||||
assert len(messages) == 1
|
||||
assert messages[0]["content"] == "Hello"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_context_builder_recall_awaits_model_client():
|
||||
model_client = AsyncMock()
|
||||
model_client.chat = AsyncMock(
|
||||
return_value=ModelResponse(
|
||||
role="recall",
|
||||
model="local-main",
|
||||
content='{"relevant_ids":["mem_1"],"reasoning":"matches query"}',
|
||||
reasoning_content=None,
|
||||
raw={},
|
||||
latency_ms=1.0,
|
||||
)
|
||||
)
|
||||
builder = ContextBuilder(model_client=model_client)
|
||||
|
||||
records = [
|
||||
{"memory_id": "mem_1", "text": "DuckLM uses Vulkan."},
|
||||
{"memory_id": "mem_2", "text": "Unrelated."},
|
||||
]
|
||||
relevant = await builder.recall_relevant_memory("How does DuckLM run?", records)
|
||||
|
||||
assert relevant == [records[0]]
|
||||
model_client.chat.assert_awaited_once()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_context_builder_summary_awaits_model_client():
|
||||
model_client = AsyncMock()
|
||||
model_client.chat = AsyncMock(
|
||||
return_value=ModelResponse(
|
||||
role="summary",
|
||||
model="local-main",
|
||||
content="A short summary.",
|
||||
reasoning_content=None,
|
||||
raw={},
|
||||
latency_ms=1.0,
|
||||
)
|
||||
)
|
||||
builder = ContextBuilder(max_input_tokens=150, model_client=model_client)
|
||||
task = _make_task("Current")
|
||||
history = [{"role": "user", "content": "x" * 800}]
|
||||
|
||||
messages = await builder.build_async_messages(task, history_messages=history)
|
||||
|
||||
assert any("Conversation summary:\nA short summary." in m["content"] for m in messages)
|
||||
model_client.chat.assert_awaited_once()
|
||||
|
|
@ -125,5 +125,13 @@ def test_conversation_history_is_sent_to_model(tmp_path, monkeypatch):
|
|||
json={"conversation_id": conversation["conversation_id"], "message": "second"},
|
||||
)
|
||||
|
||||
second_call_content = [message["content"] for message in seen_thinker_messages[-1]]
|
||||
# Filter out memory_policy and reflection calls — they use critic role
|
||||
# with different message patterns
|
||||
thinker_calls = [msgs for msgs in seen_thinker_messages if any(
|
||||
msg.get("role") == "user"
|
||||
and not msg.get("content", "").startswith("Task ID:")
|
||||
and not msg.get("content", "").startswith("Reflect on this DuckLM task")
|
||||
for msg in msgs
|
||||
)]
|
||||
second_call_content = [message["content"] for message in thinker_calls[-1]]
|
||||
assert second_call_content == ["first", "answer 1", "second"]
|
||||
|
|
|
|||
|
|
@ -0,0 +1,123 @@
|
|||
import json
|
||||
from unittest.mock import AsyncMock
|
||||
|
||||
import pytest
|
||||
|
||||
from duck_core.memory.policy import MemoryPolicy
|
||||
from duck_core.model_client import ModelClient, ModelResponse
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_model_client():
|
||||
client = AsyncMock(spec=ModelClient)
|
||||
client.chat = AsyncMock(
|
||||
return_value=ModelResponse(
|
||||
role="critic",
|
||||
model="local-main",
|
||||
content=json.dumps({
|
||||
"should_store": True,
|
||||
"memory_type": "preference",
|
||||
"summary": "User prefers concise Russian answers.",
|
||||
"importance": 0.9,
|
||||
"scope": "global",
|
||||
"metadata": {"source": "conversation"},
|
||||
}),
|
||||
reasoning_content=None,
|
||||
raw={},
|
||||
latency_ms=42.0,
|
||||
)
|
||||
)
|
||||
return client
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_memory_policy_stub_returns_should_store_false():
|
||||
policy = MemoryPolicy()
|
||||
decision = await policy.classify("some summary", "task_123")
|
||||
assert decision.should_store is False
|
||||
assert decision.memory_type == "event"
|
||||
assert decision.importance == 0.0
|
||||
assert decision.metadata["source"] == "stub_policy"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_llm_memory_policy_classifies_and_stores(mock_model_client):
|
||||
policy = MemoryPolicy(model_client=mock_model_client, role="memory_policy")
|
||||
decision = await policy.classify(
|
||||
"User said they prefer short answers in Russian.", "task_456"
|
||||
)
|
||||
assert decision.should_store is True
|
||||
assert decision.memory_type == "preference"
|
||||
assert decision.importance == 0.9
|
||||
assert decision.summary == "User prefers concise Russian answers."
|
||||
mock_model_client.chat.assert_called_once()
|
||||
call_args = mock_model_client.chat.call_args
|
||||
# ModelClient.chat(role, messages, ...) — positional args
|
||||
assert call_args.args[0] == "memory_policy"
|
||||
messages = call_args.args[1]
|
||||
assert len(messages) == 1
|
||||
assert messages[0]["role"] == "user"
|
||||
assert "User said they prefer short answers" in messages[0]["content"]
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_llm_memory_policy_handles_non_storable(mock_model_client):
|
||||
mock_model_client.chat.return_value = ModelResponse(
|
||||
role="critic",
|
||||
model="local-main",
|
||||
content=json.dumps({
|
||||
"should_store": False,
|
||||
"memory_type": "event",
|
||||
"summary": "Routine tool call, nothing to remember.",
|
||||
"importance": 0.1,
|
||||
"scope": "workspace",
|
||||
"metadata": {},
|
||||
}),
|
||||
reasoning_content=None,
|
||||
raw={},
|
||||
latency_ms=30.0,
|
||||
)
|
||||
policy = MemoryPolicy(model_client=mock_model_client)
|
||||
decision = await policy.classify("Ran ls -la in workspace.", "task_789")
|
||||
assert decision.should_store is False
|
||||
assert decision.importance == 0.1
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_llm_memory_policy_uses_response_format(mock_model_client):
|
||||
policy = MemoryPolicy(model_client=mock_model_client)
|
||||
await policy.classify("test summary", "task_1")
|
||||
call_args = mock_model_client.chat.call_args
|
||||
assert call_args.kwargs["response_format"]["type"] == "json_schema"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_llm_memory_policy_invalid_json_falls_back(mock_model_client):
|
||||
mock_model_client.chat.return_value = ModelResponse(
|
||||
role="critic",
|
||||
model="local-main",
|
||||
content="not valid json {{{",
|
||||
reasoning_content=None,
|
||||
raw={},
|
||||
latency_ms=10.0,
|
||||
)
|
||||
policy = MemoryPolicy(model_client=mock_model_client)
|
||||
decision = await policy.classify("some summary", "task_x")
|
||||
assert decision.should_store is False
|
||||
assert decision.metadata["source"] == "llm_policy_fallback"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_llm_memory_policy_missing_fields_falls_back(mock_model_client):
|
||||
mock_model_client.chat.return_value = ModelResponse(
|
||||
role="critic",
|
||||
model="local-main",
|
||||
content=json.dumps({"should_store": True}),
|
||||
reasoning_content=None,
|
||||
raw={},
|
||||
latency_ms=10.0,
|
||||
)
|
||||
policy = MemoryPolicy(model_client=mock_model_client)
|
||||
decision = await policy.classify("some summary", "task_y")
|
||||
assert decision.should_store is False
|
||||
assert decision.metadata["source"] == "llm_policy_fallback"
|
||||
|
|
@ -2,8 +2,13 @@ from fastapi.testclient import TestClient
|
|||
|
||||
from duck_core.api import create_app
|
||||
from duck_core.context_builder import ContextBuilder
|
||||
from duck_core.events.store import EventStore
|
||||
from duck_core.memory.policy import MemoryDecision
|
||||
from duck_core.memory.store import MemoryStore
|
||||
from duck_core.model_client import ModelResponse
|
||||
from duck_core.runtime_loop import RuntimeLoop
|
||||
from duck_core.tasks.state import TaskState
|
||||
from duck_core.tasks.store import TaskStore
|
||||
|
||||
|
||||
def test_memory_api_stores_workspace_scoped_notes(tmp_path, monkeypatch):
|
||||
|
|
@ -135,7 +140,7 @@ def test_chat_api_injects_relevant_memory_into_model_context(tmp_path, monkeypat
|
|||
raw={},
|
||||
latency_ms=1.0,
|
||||
)
|
||||
seen_messages.append(messages)
|
||||
seen_messages.append((role, messages))
|
||||
return ModelResponse(
|
||||
role=role,
|
||||
model="local-main",
|
||||
|
|
@ -145,8 +150,6 @@ def test_chat_api_injects_relevant_memory_into_model_context(tmp_path, monkeypat
|
|||
latency_ms=1.0,
|
||||
)
|
||||
|
||||
from duck_core.model_client import ModelResponse
|
||||
|
||||
monkeypatch.setattr("duck_core.model_client.ModelClient.chat", fake_chat)
|
||||
client = TestClient(create_app())
|
||||
conversation = client.post(
|
||||
|
|
@ -170,5 +173,60 @@ def test_chat_api_injects_relevant_memory_into_model_context(tmp_path, monkeypat
|
|||
},
|
||||
)
|
||||
|
||||
assert seen_messages[0][0]["role"] == "system"
|
||||
assert "User prefers direct Russian answers." in seen_messages[0][0]["content"]
|
||||
thinker_messages = [messages for role, messages in seen_messages if role == "thinker"]
|
||||
assert thinker_messages[0][0]["role"] == "system"
|
||||
assert "User prefers direct Russian answers." in thinker_messages[0][0]["content"]
|
||||
|
||||
|
||||
async def test_runtime_memory_policy_stores_workspace_scoped_memory(tmp_path):
|
||||
db_path = str(tmp_path / "duck.sqlite3")
|
||||
task_store = TaskStore(db_path)
|
||||
event_store = EventStore(db_path)
|
||||
memory_store = MemoryStore(db_path)
|
||||
await task_store.init()
|
||||
await event_store.init()
|
||||
await memory_store.init()
|
||||
|
||||
class FakeModelClient:
|
||||
async def chat(self, role, messages, temperature=None, max_output_tokens=None, response_format=None):
|
||||
if role == "action":
|
||||
return ModelResponse(
|
||||
role=role,
|
||||
model="local-main",
|
||||
content='{"kind":"action_directive","intent":"answer","risk_level":"none","actions":[]}',
|
||||
reasoning_content=None,
|
||||
raw={},
|
||||
latency_ms=1.0,
|
||||
)
|
||||
return ModelResponse(
|
||||
role=role,
|
||||
model="local-main",
|
||||
content="Use Vulkan for this workspace.",
|
||||
reasoning_content=None,
|
||||
raw={},
|
||||
latency_ms=1.0,
|
||||
)
|
||||
|
||||
class StorePolicy:
|
||||
async def classify(self, summary: str, task_id: str) -> MemoryDecision:
|
||||
return MemoryDecision(
|
||||
should_store=True,
|
||||
memory_type="fact",
|
||||
summary="Workspace uses Vulkan.",
|
||||
importance=0.8,
|
||||
scope="workspace",
|
||||
metadata={"task_id": task_id},
|
||||
)
|
||||
|
||||
runtime = RuntimeLoop(
|
||||
task_store,
|
||||
event_store,
|
||||
model_client=FakeModelClient(),
|
||||
memory_policy=StorePolicy(),
|
||||
memory_store=memory_store,
|
||||
)
|
||||
|
||||
await runtime.run_chat("remember workspace fact", workspace="/tmp/duck", reflect=False)
|
||||
|
||||
relevant = await memory_store.relevant(workspace="/tmp/duck", query="vulkan")
|
||||
assert [record.text for record in relevant] == ["Workspace uses Vulkan."]
|
||||
|
|
|
|||
|
|
@ -0,0 +1,227 @@
|
|||
import json
|
||||
from unittest.mock import AsyncMock
|
||||
|
||||
import pytest
|
||||
|
||||
from duck_core.events.store import EventStore
|
||||
from duck_core.experience.recorder import ExperienceRecorder
|
||||
from duck_core.memory.policy import MemoryPolicy
|
||||
from duck_core.memory.store import MemoryStore
|
||||
from duck_core.model_client import ModelClient, ModelResponse
|
||||
from duck_core.runtime_loop import RuntimeLoop
|
||||
from duck_core.tasks.store import TaskStore
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def task_store(tmp_path):
|
||||
store = TaskStore(str(tmp_path / "duck.sqlite3"))
|
||||
return store
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def event_store(tmp_path):
|
||||
store = EventStore(str(tmp_path / "duck.sqlite3"))
|
||||
return store
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def memory_store(tmp_path):
|
||||
store = MemoryStore(str(tmp_path / "duck.sqlite3"))
|
||||
return store
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def experience_recorder(tmp_path):
|
||||
recorder = ExperienceRecorder(str(tmp_path / "duck.sqlite3"))
|
||||
return recorder
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_model_client():
|
||||
client = AsyncMock(spec=ModelClient)
|
||||
client.chat = AsyncMock(
|
||||
side_effect=[
|
||||
# First call: action role — return empty actions
|
||||
ModelResponse(
|
||||
role="action",
|
||||
model="local-main",
|
||||
content=json.dumps({
|
||||
"kind": "action_directive",
|
||||
"intent": "answer directly",
|
||||
"risk_level": "none",
|
||||
"actions": [],
|
||||
}),
|
||||
reasoning_content=None,
|
||||
raw={},
|
||||
latency_ms=1.0,
|
||||
),
|
||||
# Second call: thinker role — final answer
|
||||
ModelResponse(
|
||||
role="thinker",
|
||||
model="local-main",
|
||||
content="DuckLM is a local cognitive runtime.",
|
||||
reasoning_content=None,
|
||||
raw={},
|
||||
latency_ms=1.0,
|
||||
),
|
||||
# Third call: memory_policy role
|
||||
ModelResponse(
|
||||
role="critic",
|
||||
model="local-main",
|
||||
content=json.dumps({
|
||||
"should_store": False,
|
||||
"memory_type": "event",
|
||||
"summary": "Routine answer, nothing to remember.",
|
||||
"importance": 0.1,
|
||||
"scope": "workspace",
|
||||
"metadata": {},
|
||||
}),
|
||||
reasoning_content=None,
|
||||
raw={},
|
||||
latency_ms=1.0,
|
||||
),
|
||||
# Fourth call: critic role (reflection)
|
||||
ModelResponse(
|
||||
role="critic",
|
||||
model="local-main",
|
||||
content="Task completed successfully. No issues found. Reusable lesson: direct answers work well for simple queries.",
|
||||
reasoning_content=None,
|
||||
raw={},
|
||||
latency_ms=1.0,
|
||||
),
|
||||
]
|
||||
)
|
||||
return client
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_reflection_is_called_after_task_completion(
|
||||
task_store, event_store, memory_store, experience_recorder, mock_model_client
|
||||
):
|
||||
policy = MemoryPolicy(model_client=mock_model_client)
|
||||
runtime = RuntimeLoop(
|
||||
task_store=task_store,
|
||||
event_store=event_store,
|
||||
model_client=mock_model_client,
|
||||
memory_policy=policy,
|
||||
memory_store=memory_store,
|
||||
experience_recorder=experience_recorder,
|
||||
)
|
||||
|
||||
result = await runtime.run_chat("What is DuckLM?", workspace="/tmp/test")
|
||||
|
||||
assert result.status == "completed"
|
||||
assert "DuckLM" in result.final_response
|
||||
|
||||
# Check that reflection was called — experience record created
|
||||
records = await experience_recorder.list_records()
|
||||
assert len(records) == 1
|
||||
assert records[0].task_id == result.task_id
|
||||
assert "completed successfully" in records[0].reusable_lesson
|
||||
|
||||
# Check that reflection_completed event was recorded
|
||||
events = await event_store.list_events(result.task_id)
|
||||
event_types = [e.event_type for e in events]
|
||||
assert "reflection_completed" in event_types
|
||||
assert records[0].task_id == result.task_id
|
||||
assert "completed successfully" in records[0].reusable_lesson
|
||||
|
||||
# Check that reflection_completed event was recorded
|
||||
events = await event_store.list_events(result.task_id)
|
||||
event_types = [e.event_type for e in events]
|
||||
assert "reflection_completed" in event_types
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_reflection_failure_does_not_break_task(
|
||||
task_store, event_store, memory_store, experience_recorder
|
||||
):
|
||||
"""If reflection fails, the task should still complete successfully."""
|
||||
client = AsyncMock(spec=ModelClient)
|
||||
client.chat = AsyncMock(
|
||||
side_effect=[
|
||||
# Action: empty
|
||||
ModelResponse(
|
||||
role="action",
|
||||
model="local-main",
|
||||
content=json.dumps({
|
||||
"kind": "action_directive",
|
||||
"intent": "answer",
|
||||
"risk_level": "none",
|
||||
"actions": [],
|
||||
}),
|
||||
reasoning_content=None,
|
||||
raw={},
|
||||
latency_ms=1.0,
|
||||
),
|
||||
# Thinker: answer
|
||||
ModelResponse(
|
||||
role="thinker",
|
||||
model="local-main",
|
||||
content="Answer.",
|
||||
reasoning_content=None,
|
||||
raw={},
|
||||
latency_ms=1.0,
|
||||
),
|
||||
# Memory policy
|
||||
ModelResponse(
|
||||
role="critic",
|
||||
model="local-main",
|
||||
content=json.dumps({
|
||||
"should_store": False,
|
||||
"memory_type": "event",
|
||||
"summary": "Routine.",
|
||||
"importance": 0.1,
|
||||
"scope": "workspace",
|
||||
"metadata": {},
|
||||
}),
|
||||
reasoning_content=None,
|
||||
raw={},
|
||||
latency_ms=1.0,
|
||||
),
|
||||
# Critic (reflection) — raises exception
|
||||
ConnectionError("LLM unavailable"),
|
||||
]
|
||||
)
|
||||
|
||||
policy = MemoryPolicy(model_client=client)
|
||||
runtime = RuntimeLoop(
|
||||
task_store=task_store,
|
||||
event_store=event_store,
|
||||
model_client=client,
|
||||
memory_policy=policy,
|
||||
memory_store=memory_store,
|
||||
experience_recorder=experience_recorder,
|
||||
)
|
||||
|
||||
result = await runtime.run_chat("test", workspace="/tmp/test")
|
||||
|
||||
# Task should still complete
|
||||
assert result.status == "completed"
|
||||
|
||||
# Reflection failure event should be recorded
|
||||
events = await event_store.list_events(result.task_id)
|
||||
event_types = [e.event_type for e in events]
|
||||
assert "reflection_failed" in event_types
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_reflection_not_called_when_disabled(
|
||||
task_store, event_store, memory_store, mock_model_client
|
||||
):
|
||||
"""When reflect=False, no reflection should be called."""
|
||||
policy = MemoryPolicy(model_client=mock_model_client)
|
||||
runtime = RuntimeLoop(
|
||||
task_store=task_store,
|
||||
event_store=event_store,
|
||||
model_client=mock_model_client,
|
||||
memory_policy=policy,
|
||||
memory_store=memory_store,
|
||||
)
|
||||
|
||||
result = await runtime.run_chat("What is DuckLM?", workspace="/tmp/test", reflect=False)
|
||||
|
||||
assert result.status == "completed"
|
||||
# mock_model_client.chat should have been called 3 times (action, thinker, memory_policy)
|
||||
# NOT 4 times (no critic/reflection call)
|
||||
assert mock_model_client.chat.call_count == 3
|
||||
|
|
@ -5,7 +5,11 @@ from duck_core.memory.vector_memory import EmbeddingsUnavailableError, VectorMem
|
|||
|
||||
@pytest.mark.asyncio
|
||||
async def test_vector_memory_stub_is_explicit_when_embeddings_unavailable():
|
||||
memory = VectorMemory(qdrant_url="http://127.0.0.1:6333", embeddings_base_url=None)
|
||||
memory = VectorMemory(
|
||||
qdrant_url="http://127.0.0.1:6333",
|
||||
embeddings_base_url=None,
|
||||
local_embedding_model=None,
|
||||
)
|
||||
|
||||
with pytest.raises(EmbeddingsUnavailableError):
|
||||
await memory.add_memory("remember this")
|
||||
|
|
|
|||
|
|
@ -0,0 +1,97 @@
|
|||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from duck_core.memory.vector_memory import VectorMemory, EmbeddingsUnavailableError
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_vector_memory_uses_local_model():
|
||||
"""Test VectorMemory with local sentence-transformers model (mocked)."""
|
||||
vm = VectorMemory(
|
||||
qdrant_url="http://localhost:6333",
|
||||
local_embedding_model="./models/all-MiniLM-L6-v2",
|
||||
)
|
||||
|
||||
# Mock the sentence-transformers model — encode returns a numpy-like list
|
||||
mock_model = MagicMock()
|
||||
mock_model.encode.return_value = [0.1] * 384 # all-MiniLM-L6-v2 produces 384-dim vectors
|
||||
|
||||
with patch.object(vm, "_load_local_model", return_value=mock_model):
|
||||
with patch("httpx.AsyncClient") as mock_client_class:
|
||||
mock_client = AsyncMock()
|
||||
mock_client_class.return_value.__aenter__ = AsyncMock(return_value=mock_client)
|
||||
mock_client_class.return_value.__aexit__ = AsyncMock(return_value=False)
|
||||
put_response = MagicMock(status_code=200)
|
||||
put_response.raise_for_status = MagicMock()
|
||||
search_response = MagicMock(status_code=200)
|
||||
search_response.raise_for_status = MagicMock()
|
||||
search_response.json.return_value = {
|
||||
"result": [{"id": "test-id", "payload": {"text": "test"}}]
|
||||
}
|
||||
mock_client.put.return_value = put_response
|
||||
mock_client.post.return_value = search_response
|
||||
|
||||
point_id = await vm.add_memory("test memory", {"scope": "global"})
|
||||
assert point_id is not None
|
||||
|
||||
results = await vm.search_memory("test query")
|
||||
assert isinstance(results, list)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_vector_memory_no_embedding_source():
|
||||
"""VectorMemory with no embedding source should raise."""
|
||||
vm = VectorMemory(
|
||||
qdrant_url="http://localhost:6333",
|
||||
local_embedding_model=None,
|
||||
embeddings_base_url=None,
|
||||
)
|
||||
|
||||
with pytest.raises(EmbeddingsUnavailableError):
|
||||
await vm.add_memory("test")
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_vector_memory_remote_fallback():
|
||||
"""Test VectorMemory with remote embeddings endpoint."""
|
||||
vm = VectorMemory(
|
||||
qdrant_url="http://localhost:6333",
|
||||
local_embedding_model=None,
|
||||
embeddings_base_url="http://localhost:8081/v1",
|
||||
)
|
||||
|
||||
mock_embedding = [0.1] * 384
|
||||
|
||||
with patch("httpx.AsyncClient") as mock_client_class:
|
||||
mock_client = AsyncMock()
|
||||
mock_client_class.return_value.__aenter__ = AsyncMock(return_value=mock_client)
|
||||
mock_client_class.return_value.__aexit__ = AsyncMock(return_value=False)
|
||||
embed_response = MagicMock(status_code=200)
|
||||
embed_response.json.return_value = {"data": [{"embedding": mock_embedding}]}
|
||||
put_response = MagicMock(status_code=200)
|
||||
put_response.raise_for_status = MagicMock()
|
||||
mock_client.post.return_value = embed_response
|
||||
mock_client.put.return_value = put_response
|
||||
|
||||
point_id = await vm.add_memory("test")
|
||||
assert point_id is not None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_vector_memory_remote_503():
|
||||
"""Remote embeddings returning 503 should raise EmbeddingsUnavailableError."""
|
||||
vm = VectorMemory(
|
||||
qdrant_url="http://localhost:6333",
|
||||
local_embedding_model=None,
|
||||
embeddings_base_url="http://localhost:8081/v1",
|
||||
)
|
||||
|
||||
with patch("httpx.AsyncClient") as mock_client_class:
|
||||
mock_client = AsyncMock()
|
||||
mock_client_class.return_value.__aenter__ = AsyncMock(return_value=mock_client)
|
||||
mock_client_class.return_value.__aexit__ = AsyncMock(return_value=False)
|
||||
mock_client.post.return_value = AsyncMock(status_code=503)
|
||||
|
||||
with pytest.raises(EmbeddingsUnavailableError):
|
||||
await vm.add_memory("test")
|
||||
Loading…
Reference in New Issue