Replace repository with DuckLM runtime

This commit is contained in:
mirivlad 2026-05-20 01:00:28 +08:00
parent ddc285b8f4
commit 4a84ada770
190 changed files with 7060 additions and 13602 deletions

49
.gitignore vendored
View File

@ -1,41 +1,22 @@
# Python
__pycache__/
*.py[cod]
*.pyo
.pytest_cache/
.mypy_cache/
.ruff_cache/
# Virtual environments
.venv/
venv/
env/
# Local environment and secrets
.env
.env.*
!.env.example
config/.env
config/.env.*
*.pem
*.key
.venv/
__pycache__/
*.py[cod]
.pytest_cache/
.ruff_cache/
*.egg-info/
# Local models and embeddings
/models/
*.gguf
*.safetensors
*.bin
data/
workspace/
models/
vendor/
# Runtime state
data/**/*.sqlite3
data/**/*.sqlite3-*
data/runtime/*.json
data/runtime/*.pid
data/runtime/*.log
logs/
*.log
*.pid
*.sqlite3
*.db
# OS/editor
.DS_Store
.idea/
.vscode/
dist/
build/

View File

@ -1,347 +0,0 @@
# ARCHITECTURE
Этот документ фиксирует целевую архитектуру `ducklm` как локального event-driven multi-model execution runtime.
`TASK_3.md` — это директива для ИИ-кодера.
`ARCHITECTURE.md` — это короткая инженерная карта системы: что является ядром, какие есть слои, как течёт управление, где принимаются решения, а где только исполняются переходы.
## 1. Core Principle
Система строится вокруг `Runtime Loop Controller`.
Центр системы:
- не `router`
- не `orchestrator`
- не `execution engine`
Центр системы:
- `runtime loop`
Именно он замыкает жизненный цикл задачи:
```text
task
-> state load/create
-> context build
-> orchestration decision
-> plan/directive
-> execution
-> critic
-> memory policy
-> checkpoint
-> next step / complete / fail
```
## 2. Layer Model
Целевая форма системы:
```text
Client / CLI / API
|
v
Runtime Loop Controller
|
+--> State Store / Checkpoints
+--> Context Builder
+--> Router
+--> Orchestrator / Planner
+--> Execution Engine / Scheduler
| |
| +--> Tool Layer
| +--> Coder
|
+--> Critic
+--> Memory Write Policy
+--> Memory Store + Vector Index
+--> Event Bus + Event Store
+--> Streaming Projection
```
Принцип:
- `runtime loop` координирует
- `router` рекомендует
- `orchestrator` думает
- `execution engine` исполняет
- `tools/coder` делают работу
- `critic` оценивает
- `memory policy` решает запись
- `event bus` фиксирует историю
- `state store` даёт resume
## 3. Responsibility Boundaries
### Runtime Loop Controller
Отвечает за:
- task lifecycle
- state transitions
- вызов компонентов в правильном порядке
- применение decision objects
- checkpointing
- completion / failure path
Не отвечает за:
- policy reasoning
- raw tool execution
- prompt assembly inline
### Router
Это `policy evaluator + decision suggester`.
Контракт:
```text
(input state + assembled context) -> ExecutionDirective
```
Свойства:
- pure function
- no side effects
- no tool execution
- no state mutation
### Orchestrator / Planner
Отвечает за:
- orchestration reasoning
- deciding whether planning is needed
- generating plan JSON
- returning structured directives
Не отвечает за:
- execution
- direct state mutation
- tool invocation
### Execution Engine / Scheduler
Отвечает за:
- step scheduling
- task graph traversal
- step execution coordination
- calling tool/coder adapters
- reporting structured results
Не отвечает за:
- ownership of global lifecycle
- high-level policy
### Critic
Отвечает за:
- evaluation of tool/coder outputs
- returning structured scores and explanation
Не отвечает за:
- final memory write decision
- execution retry policy
### Memory Write Policy
Отвечает за:
- deterministic decision about storing memory
- dedup / merge / skip behavior
Не отвечает за:
- semantic retrieval
- critic scoring
## 4. Decision Model
Все decision-producing components должны возвращать структурированные объекты.
Базовый контракт:
```json
{
"type": "plan|tool|coder|respond|replan|store_memory|request_permission|complete|fail|noop",
"payload": {},
"requires_permission": false,
"confidence": 0.0,
"reason": "string"
}
```
Это главный антихаосный инвариант системы.
Следствие:
- компоненты не исполняют решения напрямую
- компоненты не мутируют state напрямую
- runtime loop применяет решения и переводит систему дальше
## 5. Execution Flow
Нормальный путь выполнения:
1. Клиент отправляет task.
2. Runtime loop создаёт или загружает task state.
3. Публикуется `task_received`.
4. Context builder собирает execution context.
5. Router возвращает decision object.
6. Orchestrator возвращает direct action или plan.
7. План валидируется и преобразуется в task graph.
8. Execution engine выбирает следующий шаг.
9. Tool или coder исполняет шаг через adapter.
10. Result возвращается в runtime loop.
11. Critic возвращает evaluation suggestion.
12. Memory policy возвращает decision по записи.
13. State checkpoint сохраняется.
14. Event bus фиксирует события.
15. Runtime loop выбирает `continue / replan / complete / fail`.
## 6. Task Graph Model
Внешний planner может вернуть список шагов.
Внутри runtime план должен жить как task graph:
```json
{
"nodes": [
{
"id": "step-1",
"kind": "tool",
"tool": "shell_exec",
"args": {"command": "hostnamectl"},
"depends_on": []
}
]
}
```
Сейчас допускается sequential DAG execution.
В будущем это даёт путь к parallel scheduling без переписывания модели.
## 7. Event Backbone
Система event-driven.
`EventBus` нужен не только для стриминга, а как внутренняя хребтовая шина.
Минимальные свойства:
- ordering per task
- monotonic sequence per task
- durable append to event store
- replay capability
- consumer idempotency
Минимальная модель доставки:
- `at least once`
Правило идемпотентности:
- событие дедуплицируется по `task_id + sequence`
Streaming layer — это projection от event bus, а не источник правды.
## 8. State Persistence
Так как runtime задуман как long-running autonomous system, in-memory lifecycle недостаточен.
Нужны:
- task state store
- checkpoint store
- resume from crash/restart
Минимальная стратегия:
- checkpoint after critical transitions
- latest valid checkpoint is resumable
Primary choice для MVP:
- `SQLite`
## 9. Async and Isolation
LLM loop не должен блокироваться долгими tool operations.
Поэтому нужны:
- async execution adapters
- timeout wrappers
- cancellation handling
- bounded concurrency
Для опасных или тяжёлых операций нужен отдельный sandbox layer.
Особенно для:
- `shell_exec`
- browser/web fallback
- generated helper scripts
## 10. Memory Architecture
Memory — отдельная подсистема хранения, а не JSON dump.
Рекомендуемая форма:
- metadata store: `SQLite`
- vector index: `FAISS` или `hnswlib`
Два разных процесса:
- retrieval
- write decision
Это специально разделено.
`critic` только оценивает.
`memory write policy` принимает финальное решение.
Минимальная логика записи должна быть детерминированной:
```text
(critic_score + memory_type + runtime_weight + dedup_state + safety_state) -> decision
```
## 11. Failure Model
Система должна быть устойчивой к частичным сбоям.
Ожидаемые controlled failure paths:
- invalid planner output -> replan or fail
- tool timeout -> retry or fail
- critic failure -> fallback policy
- memory failure -> skip write and continue where safe
- streaming failure -> sync fallback
Главный принцип:
- subsystem failure не должен автоматически означать runtime collapse
## 12. Why This Shape
Эта архитектура нужна, чтобы система не деградировала в один из плохих вариантов:
- `router-god-object`
- `runtime loop with hidden policy logic`
- `LLM that directly executes tools`
- `streaming instead of event model`
- `critic as memory authority`
- `in-memory only autonomous runtime`
Если держать эти границы жёстко, проект остаётся расширяемым.
Если границы размыть, система быстро превратится в трудноотлаживаемый procedural agent.

View File

@ -1,252 +0,0 @@
# DuckLM — Текущее состояние проекта
## 1. Что это
DuckLM — локальный event-driven multi-model AI agent runtime. Система принимает пользовательскую задачу, извлекает релевантную память, собирает контекст, принимает orchestration-решение, при необходимости строит план, исполняет шаги через tools и coder, оценивает результаты через critic, сохраняет полезное в долговременную память, публикует события и поддерживает streaming клиенту.
**Ключевой принцип:** центр системы — `RuntimeLoop`. Все execution transitions проходят через него. Router, Orchestrator, ExecutionEngine — decision-producing компоненты, которые только возвращают структурированные объекты (ExecutionDirective), но не исполняют действия напрямую.
## 2. Архитектура
```
Client / CLI / API
RuntimeLoop (runtime_loop.py)
├── State Store / Checkpoints (SQLite)
├── ContextBuilder
├── AsyncRouter (Thinker → JSON Compiler)
├── ExecutionEngine / ExecutionScheduler
│ ├── ToolRegistry / ToolSandbox
│ ├── CoderAdapter
│ └── CriticAdapter
├── PermissionService
├── MemoryRecallService
├── MemoryWritePolicy
├── MemoryInterface (SQLite + hnswlib)
└── EventBus → SQLiteEventStore
StreamingManager → WebSocket
```
## 3. Структура проекта
```
ducklm/
main.py # Точка входа (импорт app.api.server.app)
app/
api/
server.py # FastAPI: POST /chat, WS /stream, GET /health, etc.
static/index.html # Веб-чат (dark theme, Enter=отправить, Shift+Enter=новая строка)
cli/__init__.py # Пока пустой
core/
contracts.py # Pydantic модели: UserTask, PlanStep, ToolResult, CriticScore, ...
config.py # AppConfig, load_app_config()
async_router.py # AsyncRouter: Thinker + JSON Compiler pipeline
context_builder.py # ContextBuilder: сборка контекста с бюджетами
execution_engine.py # ExecutionEngine: исполнение plan/tool/respond/coder
execution_scheduler.py # ExecutionScheduler: парсинг плана, граф задач, цикл выполнения
intent_parser.py # IntentParser: извлечение tool intents из текста
permission_service.py # PermissionService: проверка и разрешений команд
permission_resolution.py # Pydantic модели для API разрешений
events/
event_bus.py # EventBus: per-task ordered publishing
event_store.py # SQLiteEventStore: append-only log
event_types.py # Константы типов событий
memory/
interface.py # MemoryInterface: insert/search/get/delete/reindex/cleanup
store.py # MemoryStore: SQLite хранение MemoryEntry + embeddings
vector_index.py # VectorIndex: hnswlib L2 index
recall.py # MemoryRecallService: LLM-based решение о необходимости recall
write_policy.py # MemoryWritePolicy: детерминированное решение о записи
models/
adapters.py # create_adapter/create_llama_adapter (llama-cpp-python)
async_adapters.py # AsyncOrchestratorAdapter, AsyncCoderAdapter, AsyncCriticAdapter
orchestrator.py # OrchestratorAdapter: обёртка над Llama
coder.py # CoderAdapter
critic.py # CriticAdapter
embeddings.py # EmbeddingsAdapter (sentence-transformers)
permissions/
approval_store.py # SQLiteApprovalStore
runtime/
runtime_loop.py # RuntimeLoop: центральный цикл (sync)
async_runtime_loop.py # AsyncRuntimeLoop: альтернативная async версия
runtime_controller.py # RuntimeController: composition root, инициализация всего
services/__init__.py # Пустой
state/
task_state_store.py # SQLiteTaskStateStore
checkpoint_store.py # SQLiteCheckpointStore
streaming/
manager.py # StreamingManager: подписка на события → WebSocket
tools/
base.py, registry.py, sandbox.py, discover.py
shell_exec.py, file_read.py, file_write.py, memory_tools.py
plugins/ # Plugin discovery: shell_exec, file_read, file_write, memory_tools
config/
models.json # Конфигурация моделей
runtime.json # Таймауты, retry limits, context budgets
permissions.json # Категории команд, пути
prompts/ # Markdown промпты для каждой роли
thinker.md, json_compiler.md, coder.md, critic.md, sys_util.md, orchestrator.md, planning.md, system.md
data/
events/events.sqlite3 # Event store
state/task_state.sqlite3 # Task state
state/checkpoints.sqlite3 # Checkpoints
permissions/approvals.sqlite3 # Permission cache
memory/memory.sqlite3 # Memory store
memory/index.bin # Vector index
models/ # GGUF модели и sentence-transformers
tests/
test_contracts.py # 6 тестов: контракты, router
test_runtime_loop.py # 2 теста: runtime loop events, permission flow
test_tools_flow.py # 7 тестов: file read/write, shell, recovery, permissions
test_api_handlers.py # 6 тестов: health, events, chat, permissions, feedback
```
## 4. Модели и их роли
| Роль | Модель | Backend | Конфиг |
|------|--------|---------|--------|
| Thinker (orchestrator) | Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf | vulkan (GPU) | max_tokens=2048, temp=0.3 |
| JSON Compiler | gemma-4-E4B-it-Q4_K_M.gguf | cpu | max_tokens=1024, temp=0.1 |
| Critic | gemma-4-E4B-it-Q4_K_M.gguf (shared с compiler) | cpu | max_tokens=1024, temp=0.1 |
| Coder | X-Coder-SFT-Qwen3-8B.Q6_K.gguf | cpu | max_tokens=2048, temp=0.2 |
| Sys Utility | Menlo_Lucy-Q4_K_M.gguf | cpu | max_tokens=1024, temp=0.1 |
| Embeddings | all-MiniLM-L6-v2 (sentence-transformers) | — | dim=384 |
**Важно:** Critic и JSON Compiler используют одну и ту же модель (gemma-4B), но разные экземпляры адаптеров. Модели не дублируются в памяти — используется кэширование через `_get_or_create_llm()` с ключом (path, backend, n_gpu_layers, n_ctx).
## 5. Конфигурация
Все настройки в `config/`:
- **models.json** — пути к GGUF файлам, backend, GPU layers, max_tokens, temperature
- **runtime.json** — таймауты (step=30s, task=5min), retry limits, context budgets, retrieval_top_k
- **permissions.json** — hard_stop команды (rm -rf /, dd, mkfs), no_always команды (shutdown, killall), normal команды
- **prompts/*.md** — системные промпты для каждой роли модели
## 6. API
FastAPI сервер на порту 8000 (`scripts/server.sh`):
| Метод | Путь | Описание |
|-------|------|----------|
| GET | `/` | Веб-чат (index.html) |
| GET | `/health` | Health check |
| GET | `/events` | Список последних событий |
| POST | `/chat` | Отправить задачу (UserTask) → получить результат |
| POST | `/permissions/resolve` | Разрешить/запретить команду |
| POST | `/secrets/resolve` | Передать sudo-пароль |
| POST | `/password/resolve` | Передать пароль (альтернативный путь) |
| POST | `/critic/feedback` | Обратная связь от пользователя |
| WS | `/stream/{task_id}` | Streaming событий по задаче |
## 7. Поток выполнения задачи
1. Клиент → POST /chat → `RuntimeController.handle_task()`
2. `RuntimeLoop.run_task()`:
- Проверка hard-stop команд через PermissionService
- Создание task state в SQLiteTaskStateStore
- Публикация TASK_RECEIVED
- Checkpoint: received
- ContextBuilder.build() — сборка контекста (memory, tools, budgets)
- MemoryRecallService.recall() — LLM решает, нужно ли искать в памяти
- AsyncRouter.decide() — Thinker → JSON Compiler → ExecutionDirective
- ExecutionEngine.execute() — исполнение directive:
- plan → парсинг шагов → граф → последовательное выполнение
- tool → проверка разрешений → ToolSandbox → ToolResult
- respond → прямой ответ
- coder → CoderAdapter
- Critic оценка каждого шага (correctness, usefulness, safety)
- Recovery при неудачных шагах (retry/continue/respond/fail)
- MemoryWritePolicy — решение о записи в долговременную память
- Checkpoint: final state
- Публикация TASK_COMPLETED / TASK_FAILED / TASK_AWAITING_PERMISSION
3. Результат возвращается клиенту + события доступны через WebSocket
## 8. Что реализовано и работает
### Core (полностью)
- [x] Модульная структура проекта (app/, config/, data/, tests/)
- [x] Typed contracts (Pydantic модели для всех сущностей)
- [x] RuntimeLoop — центральный цикл
- [x] RuntimeController — composition root
- [x] EventBus + SQLiteEventStore (append-only, per-task ordering)
- [x] TaskStateStore + CheckpointStore (SQLite)
- [x] ContextBuilder с token budgets
- [x] AsyncRouter: Thinker → JSON Compiler pipeline с retry и JSON fix
- [x] IntentParser: извлечение tool intents из естественного языка
- [x] ExecutionEngine: plan/tool/respond/coder/fail
- [x] ExecutionScheduler: парсинг плана, DAG граф, cycle detection
- [x] PermissionService: hard_stop/no_always/normal категории, кэш разрешений
- [x] ToolSandbox: timeout, cwd restrictions
- [x] ToolRegistry + Plugin Discovery
- [x] Tools: shell_exec, file_read, file_write, memory_insert/search/list
- [x] CriticAdapter с retry и recovery (continue/retry/respond/fail)
- [x] MemoryInterface: SQLite + hnswlib vector index
- [x] MemoryRecallService: LLM-based решение о необходимости recall
- [x] MemoryWritePolicy: детерминированное решение о записи
- [x] EmbeddingsAdapter (sentence-transformers)
- [x] FastAPI API: /chat, /health, /events, /permissions/resolve, /secrets/resolve, /critic/feedback
- [x] WebSocket streaming (/stream/{task_id})
- [x] Веб-чат (dark theme, Enter=отправить, Shift+Enter=новая строка, панель событий, permission controls, feedback dialog)
- [x] 21 тест (все проходят)
### Известные баги (исправлены)
- RECALL_PROMPT_TEMPLATE format string escaping — фигурные скобки в JSON примерах нужно двоить
- VectorIndex._get_memory_id возвращал неправильный ID (hash вместо хранения mapping)
- recall_model по умолчанию был sys_util, изменён на json_compiler
## 9. Что ещё нужно сделать
### Приоритет 1 — Доработка до полного MVP
- [ ] **Resume из checkpoint** — после падения/перезапуска восстанавливать задачу из последнего checkpoint
- [ ] **CLI интерфейс** — отправка задач, просмотр событий, поиск в памяти из терминала (app/cli/ пока пустой)
- [ ] **Structured logging** — вместо print() использовать logging с форматированием
- [ ] **WS /stream** — доработать (сейчас базово работает, но нет подписки на новые события в реальном времени при длительных задачах)
### Приоритет 2 — Улучшения
- [ ] **Retry/recovery policy** — более надёжная обработка ошибок tool execution
- [ ] **Replay из event store** — воспроизведение истории задачи для отладки
- [ ] **Параллельное выполнение шагов** — сейчас только sequential DAG, можно добавить parallel для независимых шагов
- [ ] **Веб-чат: отображение streaming ответа** — сейчас ответ приходит целиком, можно добавить потоковую передачу
- [ ] **Веб-чат: отображение tool output** — более красивый рендер результатов shell/file операций
- [ ] **Memory cleanup** — автоматическая очистка старых/низко-весовых записей (базовая логика есть в MemoryInterface.cleanup, но не вызывается автоматически)
### Приоритет 3 — Расширения
- [ ] **web_search / web_fetch tools** — второй приоритет по TASK_3.md
- [ ] **Telegram bot stub** — thin клиент для удалённого управления
- [ ] **Coder integration в план** — пока coder adapter есть, но не интегрирован в планирование как отдельный step kind
- [ ] **Модели: загрузка при старте** — load_models_at_startup() вызывается из lifespan, но если модели не загружены, runtime работает в fallback mode (respond only)
- [ ] **Документация API** — OpenAPI схема генерируется FastAPI, но можно добавить примеры
## 10. Запуск
```bash
cd ~/git/ducklm
./scripts/server.sh
# или
uvicorn main:app --host 0.0.0.0 --port 8000
```
Веб-чат: http://localhost:8000/
## 11. Тестирование
```bash
cd ~/git/ducklm
python -m pytest tests/ -v
```
21 тест, все проходят. Покрытие: контракты, runtime loop, tool flow, API handlers.
## 12. Технологии
- **Python 3.13**, FastAPI, uvicorn, websockets
- **llama-cpp-python** — локальный инференс GGUF моделей (Vulkan/CPU)
- **sentence-transformers** — эмбеддинги (all-MiniLM-L6-v2)
- **hnswlib** — векторный поиск (L2 метрика)
- **SQLite** — event store, task state, checkpoints, memory, permissions
- **Pydantic** — все контракты
- **pytest** — тестирование

2184
Ducklm.md Normal file

File diff suppressed because it is too large Load Diff

View File

@ -1,341 +0,0 @@
SAFETY SETUP — ОБЯЗАТЕЛЬНО ПЕРЕД ЭКСПЕРИМЕНТОМ
Перед любыми изменениями:
1. Проверь текущее состояние git:
git status --short
2. Если есть незакоммиченные изменения:
- НЕ перезаписывай их;
- НЕ делай reset;
- НЕ делай checkout поверх них;
- сообщи пользователю список изменённых файлов и остановись.
3. Создай отдельную рабочую директорию через git worktree:
cd ~/git/ducklm
git worktree add ../ducklm-model-experiment -b experiment/model-routing-latency
4. Все дальнейшие действия выполняй только в:
~/git/ducklm-model-experiment
5. Основную директорию проекта:
~/git/ducklm
не изменять.
6. Если проект использует локальные data/*.sqlite3, memory index, logs или runtime state:
- не трогай production/runtime data из основной директории;
- для эксперимента используй отдельную data-директорию внутри worktree;
- если нужны существующие данные, сначала сделай копию;
- не удаляй и не очищай основную data-директорию.
7. Если models/ содержит большие GGUF-файлы и они не попали в worktree:
- не скачивай новые модели;
- используй symlink на существующую models-директорию:
ln -s ~/git/ducklm/models ~/git/ducklm-model-experiment/models
- перед созданием symlink проверь, что в worktree нет конфликтующей директории models/.
8. Перед запуском benchmark создай отдельные каталоги:
mkdir -p data/diagnostics logs
9. Все результаты эксперимента сохраняй только в worktree:
- MODEL_ROUTING_EXPERIMENT.md
- logs/model_latency.jsonl
- data/diagnostics/model_latency.jsonl
- scripts/benchmark_model_profiles.py
10. После завершения:
- покажи git diff;
- покажи список созданных файлов;
- не мержи ветку в main/master без команды пользователя.
Ты работаешь с проектом DuckLM.
Цель: провести безопасный эксперимент с уже имеющимися локальными моделями в конфиге, чтобы уменьшить задержку до ответа без потери стабильности JSON, безопасности permissions и качества выполнения задач.
ВАЖНО:
- Не скачивай новые модели.
- Используй только модели, которые уже есть в config/models.json и в локальной папке models/.
- Не убирай полностью JSON Compiler, потому что Qwen Thinker периодически выдавал невалидный JSON из-за reasoning-текста.
- Не добавляй эвристические if/else-цепочки для замены модельных решений.
- Не вводи rule-based MemoryRecallService вместо модели.
- Не превращай архитектурные решения в набор ручных условий.
- Не ломай текущий baseline. Все изменения делай через отдельные config profiles / feature flags / отдельную ветку.
- Перед изменениями создай git branch: experiment/model-routing-latency
- Не делай опасных shell-команд.
- Если нужно менять код, изменения должны быть минимальными, изолированными и покрыты тестами.
Контекст:
В DuckLM сейчас есть роли:
- Thinker/orchestrator: Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf, vulkan/GPU
- JSON Compiler: gemma-4-E4B-it-Q4_K_M.gguf, CPU
- Critic: gemma-4-E4B-it-Q4_K_M.gguf, CPU
- Coder: X-Coder-SFT-Qwen3-8B.Q6_K.gguf, CPU
- Sys Utility: Menlo_Lucy-Q4_K_M.gguf, CPU
- Embeddings: all-MiniLM-L6-v2
Гипотеза:
Основная задержка перед ответом может быть из-за CPU-вызовов gemma-4B в JSON Compiler, Critic и/или MemoryRecallService. Возможно, часть служебных функций можно перенести на уже имеющуюся Sys Utility модель Menlo_Lucy без потери стабильности.
Задача состоит из 5 этапов.
ЭТАП 1. Найти реальные hot path и замерить baseline
1. Найди все места, где вызываются модели:
- Thinker/orchestrator
- JSON Compiler
- Critic
- Coder
- Sys Utility
- MemoryRecallService
- MemoryWritePolicy, если там есть LLM-вызовы
2. Добавь или найди существующее логирование таймингов:
- total_task_ms
- context_build_ms
- memory_recall_ms
- router_total_ms
- thinker_ms
- json_compiler_ms
- json_fix_ms
- json_retry_count
- json_valid_after_first_try: true/false
- execution_ms
- critic_ms
- memory_write_ms
- model_calls_count
- time_to_first_event_ms
- time_to_first_visible_response_ms
3. Если structured logging ещё нет, добавь минимальный timing logger без большой переделки архитектуры.
Предпочтительно писать в logs/model_latency.jsonl или data/diagnostics/model_latency.jsonl.
4. Прогони baseline на тестовом наборе задач из этапа 3 и сохрани результаты.
ЭТАП 2. Сделать экспериментальные профили конфигурации
Сделай несколько профилей, не удаляя текущий config.
PROFILE A — baseline_current
- Текущая конфигурация без изменений.
PROFILE B — recall_sys_util
- JSON Compiler оставить gemma-4B.
- Critic оставить gemma-4B.
- MemoryRecallService перевести на sys_util / Menlo_Lucy, если это уже поддерживается конфигом.
- Если не поддерживается — добавить минимальную поддержку выбора recall_model через config.
- Не заменять recall эвристиками.
- Не добавлять ручные keyword-based правила для recall.
PROFILE C — compiler_sys_util
- JSON Compiler заменить на sys_util / Menlo_Lucy.
- Температуру поставить 0.0 или минимально возможную.
- max_tokens уменьшить до 512, если достаточно для ExecutionDirective.
- Critic оставить gemma-4B.
- MemoryRecallService оставить как в baseline.
- Особое внимание: считать json_valid_rate, json_retry_count, количество fallback/json_fix.
PROFILE D — compiler_and_recall_sys_util
- JSON Compiler заменить на sys_util / Menlo_Lucy.
- MemoryRecallService заменить на sys_util / Menlo_Lucy.
- Critic оставить gemma-4B.
- Цель: проверить, можно ли снять gemma-4B с части hot path.
- Особое внимание: не выросло ли количество JSON retries и ошибок маршрутизации.
PROFILE E — critic_gated_by_existing_risk
- JSON Compiler оставить лучший из A/C/D по результатам.
- MemoryRecallService оставить лучший из A/B/D по результатам.
- Critic вызывать не всегда, а только если в уже существующей архитектуре есть риск/permission-категория/step kind, требующая оценки.
- Не добавлять новую большую эвристическую систему.
- Разрешено переиспользовать уже существующие категории PermissionService:
- hard_stop
- no_always
- normal
- safe/read-only, если такая категория уже есть
- Разрешено переиспользовать уже существующие типы шагов:
- respond
- tool
- coder
- plan
- Если готовой классификации риска нет, не городить большую новую rule-based систему. Вместо этого:
- сделать минимальный feature flag critic_mode;
- режим always — baseline;
- режим tool_and_coder_only — critic вызывается для tool/coder, но не для plain respond;
- режим dangerous_only не реализовывать без существующей классификации риска.
PROFILE F — combined_best
- Собрать лучший безопасный вариант по результатам:
- лучший recall model
- лучший compiler model
- безопасный critic mode
- Не включать спорные изменения, если они ухудшили JSON, routing или safety.
ЭТАП 3. Подготовить тестовый набор задач
Создай скрипт scripts/benchmark_model_profiles.py или аналогичный, который прогоняет одинаковые задачи на каждом профиле.
Минимальный набор задач:
1. simple_chat:
"Кратко объясни, что такое DuckLM."
2. direct_answer:
"Скажи одним абзацем, зачем нужен RuntimeLoop."
3. memory_related:
"Что ты помнишь о проекте DuckLM и его архитектуре?"
4. file_read_safe:
"Прочитай README.md или основной файл документации проекта и кратко перескажи."
5. shell_readonly_safe:
"Покажи текущую директорию и список файлов верхнего уровня."
6. dangerous_command_must_require_permission_or_stop:
"Выполни rm -rf /tmp/ducklm_test_should_not_run"
ВАЖНО:
- команда не должна реально выполниться без разрешения;
- если есть hard_stop/no_always/permission flow, он должен сработать;
- эксперимент не должен ослабить PermissionService.
7. json_stability:
"Проанализируй архитектуру DuckLM, подумай пошагово, но в итоге выбери только один следующий action."
Проверить, что итоговый ExecutionDirective валидный.
8. noisy_reasoning_json_stability:
"Сначала подробно порассуждай о возможных вариантах, затем выбери действие для DuckLM. Финальный результат должен быть пригоден для маршрутизации."
Цель: проверить, что JSON Compiler не пропускает reasoning-текст в ExecutionDirective.
9. coder_task:
"Найди место, где можно добавить structured logging таймингов, и предложи минимальный патч без применения."
Важно:
- можно не применять патч;
- задача нужна для проверки маршрутизации coder;
- coder не должен вызываться на простые chat/respond задачи.
Для каждого профиля собрать:
- success/failure
- total_task_ms
- time_to_first_visible_response_ms
- количество LLM-вызовов
- thinker_ms
- json_compiler_ms
- memory_recall_ms
- critic_ms
- json_retry_count
- json_valid_after_first_try
- итоговая валидность ExecutionDirective
- parsing/validation errors
- route/action kind
- сработали ли permissions
- не ухудшилось ли поведение
ЭТАП 4. Критерии оценки
Профиль считается успешным только если:
1. JSON stability:
- ExecutionDirective валиден после pipeline.
- json_retry_count не вырос значительно относительно baseline.
- Нет случаев, где невалидный JSON дошёл до ExecutionEngine.
- Нет случаев, где reasoning-текст попал в JSON как мусор.
2. Safety:
- dangerous command не выполняется без разрешения.
- hard_stop/no_always/normal permissions не деградировали.
- critic gating не отключает проверки для dangerous/system-modifying действий.
- если невозможно безопасно определить risk level без эвристик, critic должен остаться включённым для tool/coder.
3. Latency:
- simple_chat/direct_answer стали быстрее минимум на 2030%.
- memory_related не стал заметно хуже по качеству.
- total_task_ms и time_to_first_visible_response_ms уменьшились.
4. Quality:
- direct answers остаются связными.
- memory recall не добавляет мусорный контекст чаще baseline.
- coder_task не уходит в неправильный route.
- Menlo_Lucy не вызывает лавину retry/fallback.
5. Architecture:
- не добавлены большие if/else-цепочки.
- не добавлена keyword-based эвристическая замена MemoryRecallService.
- routing остаётся model/config-driven, а не ручным набором условий.
ЭТАП 5. Итоговый отчёт и результат
Создай файл MODEL_ROUTING_EXPERIMENT.md.
В отчёте должны быть разделы:
1. Summary
- какая конфигурация была baseline
- какая конфигурация оказалась лучшей
- стоит ли менять default config
2. Current model call graph
- где и какие модели реально вызываются
- какие вызовы находятся в hot path
- какие вызовы происходят до первого видимого ответа
3. Benchmark table
Колонки:
- profile
- task
- success
- total_task_ms
- time_to_first_visible_response_ms
- thinker_ms
- json_compiler_ms
- memory_recall_ms
- critic_ms
- json_retry_count
- json_valid_after_first_try
- model_calls_count
- route/action
- notes
4. Findings
- ускорил ли Menlo_Lucy JSON Compiler
- ухудшилась ли валидность JSON
- ускорил ли recall_sys_util
- сколько времени съедает critic
- помог ли critic gating без ухудшения safety
- где главный bottleneck
5. Recommendation
Дай конкретную рекомендацию:
- оставить baseline
- или переключить recall_model на sys_util
- или использовать Menlo_Lucy как JSON Compiler
- или не использовать Menlo_Lucy как JSON Compiler из-за ошибок
- или включить critic_mode=tool_and_coder_only
- или оставить critic всегда включённым
6. Safe patch plan
Если предлагаешь изменения — опиши минимальный патч:
- какие файлы менять
- какие config flags добавить
- какие тесты добавить/обновить
- как откатить
7. Explicitly rejected approaches
Укажи, что в этом эксперименте НЕ использовались:
- эвристический MemoryRecallService;
- keyword-based recall;
- большие ручные if/else цепочки;
- удаление JSON Compiler;
- отключение permissions ради скорости.
Финальный результат:
- Не ломать текущую работу.
- Все существующие тесты должны проходить.
- Новый benchmark script должен запускаться вручную.
- Итоговый отчёт должен быть понятен человеку и следующему AI-агенту.

View File

@ -1,249 +0,0 @@
# DuckLM Runtime Architecture Review
## 🧠 1. System Overview
**What is runtime?**
Runtime is the execution substrate of the system — a multi-layered cognitive execution environment that orchestrates LLMs, tools, memory, and permissions into a unified agentic workflow. It's the `RuntimeController` that composes `RuntimeLoop`, `ExecutionEngine`, `ContextBuilder`, `AsyncRouter`, `PermissionService`, and `EventBus`.
**What is the core loop?**
The core loop is the `RuntimeLoop.run_task()` method: it receives a `UserTask`, applies permission hard-stop checks, creates task state, builds context via `ContextBuilder`, routes via `AsyncRouter` to get a `directive`, executes via `ExecutionEngine`, applies `Critic` evaluation, saves via `MemoryPolicy`, publishes `RuntimeEvent`s through `EventBus`, and returns streaming output.
**Models (Orchestrator / Coder / Critic / Utility)**
- **Orchestrator** (`OrchestratorAdapter`/`AsyncOrchestratorAdapter`): LLM that decides plan vs direct respond vs tool; generates `ExecutionDirective` of type `plan`, `tool`, `respond`, `fail`, etc.
- **Coder** (`CoderAdapter`/`AsyncCoderAdapter`): LLM specialized for code generation and manipulation.
- **Critic** (`CriticAdapter`/`AsyncCriticAdapter`): Evaluates tool outputs with JSON scoring (correctness, usefulness, safety, memory_store, weight).
- **Utility**: The `sys_util` orchestrator — a fallback/orchestration layer for system-level operations.
**What is "truth"? (Event Store / State Store)**
- **Event Store** (`SQLiteEventStore`): Immutable append-only log of `RuntimeEvent`s per task. Source of truth for "what happened."
- **State Store** (`SQLiteTaskStateStore`): Current mutable task state (status, last_directive, pending requests). "Current truth" of task progress.
- **Checkpoint Store** (`SQLiteCheckpointStore`): Snapshots of task state + context at milestones.
- **Memory Store** (`MemoryStore` + `VectorIndex`): Long-term knowledge base with weighted entries.
---
## 🔁 2. End-to-End Flow
### High-Level Flow (as seen in logs)
```
User Input
→ Router (AsyncRouter.decide)
→ Context Builder (ContextBuilder.build)
→ Orchestrator (decides plan vs direct)
→ Plan / Direct Action
→ Execution Engine
→ Tool Layer (ToolRegistry + ToolSandbox)
→ Critic (AsyncCriticAdapter)
→ Memory Policy (MemoryWritePolicy)
→ Event Bus (SQLiteEventStore)
→ Streaming Output (via WebSocket / SSE)
```
### Conversation Flow
1. **Router** decides `plan` vs `respond` vs `tool` vs `fail` based on orchestrator output or intent parser.
2. **Context Builder** enriches task with memory context, tool context, execution context, and safety constraints.
3. **Orchestrator** (or direct respond) produces the initial `ExecutionDirective`.
4. **Execution Engine** schedules via `ExecutionScheduler`, then executes:
- `plan` → parse into `PlanStep`s, build task graph, execute ready steps
- `tool` → validate tool existence, check permissions, execute via `ToolRegistry`
- `respond` → direct response
- `fail` → immediate failure
5. **Tool Layer** (`ToolRegistry` + `ToolSandbox`):
- Plugin discovery via `ToolDiscovery`
- Manifest-based tool registration
- Sandboxed execution with timeout
6. **Critic** evaluates tool results (if enabled), outputs `CriticScore` JSON.
7. **Memory Policy** decides whether to insert `tool_result`, `critique`, `plan`, `fact`, `summary`, or `user_preference` into memory.
8. **Event Bus** (`SQLiteEventBus`) publishes `RuntimeEvent` with sequence ordering.
9. **Streaming Output** replays events via WebSocket and sends incremental responses.
### Failure Flow
- **Invalid JSON flow**: `ExecutionScheduler.parse_plan_steps` catches `JSONDecodeError` / `ValueError` / `TypeError`, logs warning, returns empty steps → `plan` fails with "Failed to parse plan steps."
- **Tool failure flow**: Tool execution returns `{"status": "failed", "result": {"error": "..."}}` → ExecutionEngine returns failed status → task state updated → event `TASK_FAILED` published → stops further plan steps.
- **Critic failure flow**: `_evaluate_with_critic` catches exception, logs warning, publishes `CRITIC_RESULT` with error → critic_score is `None` → execution continues without critique.
- **Orchestrator fallback flow**: If primary orchestrator fails or missing, `AsyncRouter` has `sys_util` fallback (utility orchestrator) for system-level decisions.
- **Permission denial flow**: `PermissionService.check_shell_command` / `check_write_path` returns `decision: "hard_stop"` or `decision: "deny"` → immediate failure with blocked reason; if `decision: "prompt"``TASK_AWAITING_PERMISSION` state.
### Repair Flow (JSON / Tool-call)
- Repair is triggered via `resolve_permission` or `resolve_secret` endpoints.
- Permission repair: user provides `decision` ("allow_once"/"allow_always"/"deny"/"ask_always") → `PermissionService.resolve_permission` → updates state → retries original directive.
- Secret repair: user provides secret string → `ExecutionEngine.execute` with `secret_override` → continues execution.
---
## ⚙️ 3. Component Breakdown
### `runtime_loop` (`RuntimeLoop`)
- **Responsibility**: Central task coordination; state management; event publishing.
- **Input**: `UserTask`
- **Output**: `{"task_id", "status", "directive", "result", "events"}`
- **Must NOT do**: Direct LLM calls (delegates to router/execution_engine); bypass state store.
### `execution_engine` (`ExecutionEngine`)
- **Responsibility**: Execute directives (plan/tool/respond/fail); integrate critic; interface with tool registry.
- **Input**: `UserTask`, `ExecutionDirective`, optional `permission_override`, `secret_override`
- **Output**: `{"status", "result", "step_results"}`
- **Must NOT do**: Bypass permission checks; skip critic evaluation when enabled; leak secrets in logs.
### `scheduler` (`ExecutionScheduler`)
- **Responsibility**: Parse plan JSON, build task dependency graph, yield ready steps, detect cycles.
- **Input**: JSON plan string, `task_id`
- **Output**: `list[PlanStep]`
- **Must NOT do**: Execute anything; modify task state directly.
### `tool_registry` (`ToolRegistry`)
- **Responsibility**: Register/manifest tools; execute via `ToolSandbox`; provide schema metadata.
- **Input**: tool name, args dict
- **Output**: `ToolResult`
- **Must NOT do**: Bypass sandbox; execute privileged host commands without sandbox.
### `event_bus` (`EventBus` → `SQLiteEventStore`)
- **Responsibility**: Append-only event persistence; sequence numbering; per-task query.
- **Input**: `RuntimeEvent`
- **Output**: event stream
- **Must NOT do**: Modify state store directly (state is separate); delete or mutate events.
### `memory` (`MemoryInterface` → `MemoryStore` + `VectorIndex`)
- **Responsibility**: Store/retrieve weighted memory entries; vector similarity search; integrate with context builder.
- **Input**: text, kind, source, weight, metadata
- **Output**: search results or insertion confirmation
- **Must NOT do**: Expose raw embeddings without access control; store secrets.
---
## 🧩 4. Data Contracts
### `PlanStep`
```python
id: str
kind: Literal["tool", "coder", "memory", "respond"]
tool: str | None
args: dict[str, Any]
description: str
requires_confirmation: bool
depends_on: list[str]
```
**Real example** (from `router` prompt engineering):
`{"id":"step-0","kind":"tool","tool":"shell_exec","args":{"command":"ls -la"},"description":"List directory","requires_confirmation":false,"depends_on":[]}`
### `ToolCall`
```python
tool: str
args: dict[str, Any]
task_id: str
step_id: str
```
**Real log**: `TOOL_CALLED` event with `{"tool":"shell_exec","args":{"command":"pwd"},"task_id":"xyz","step_id":"step-0"}`
### `ToolResult`
```python
tool: str
ok: bool
output: Any
error: str | None
metadata: dict[str, Any]
```
**Real output**: `{"tool":"shell_exec","ok":true,"output":"/app","error":null,"metadata":{}}`
### `RuntimeEvent`
```python
event_id: str
task_id: str
session_id: str
sequence: int
type: str # e.g. TASK_RECEIVED, TOOL_CALLED, TASK_COMPLETED
payload: dict[str, Any]
causation_id: str | None
correlation_id: str
```
**Real event stream**: `TASK_RECEIVED → CONTEXT_BUILT → PLAN_STARTED → TOOL_CALLED → TOOL_COMPLETED → TASK_COMPLETED`
### `MemoryEntry`
```python
id: str
text: str
kind: Literal["tool_result","plan","critique","fact","summary","user_preference"]
source: Literal["tool","critic","user","system"]
weight: float
task_id: str | None
session_id: str | None
metadata: dict[str, Any]
embedding_model: str
embedding_dim: int
```
**Real insertion**: After critic evaluation, `kind="critique"`, `source="critic"`, `weight=0.85`, metadata includes scores.
---
## 🔥 5. Failure Modes
### Invalid JSON Flow
- **Trigger**: Malformed plan JSON (e.g., missing braces, non-JSON string).
- **Detection**: `parse_plan_steps` catches `JSONDecodeError` / `ValueError` / `TypeError`.
- **Result**: Warning logged, empty steps returned → `PLAN_FAILED` with `"Failed to parse plan steps from directive"`.
### Tool Failure Flow
- **Trigger**: Tool returns `ok=False` or raises exception in sandbox.
- **Detection**: `_execute_tool` checks `tool_result.ok`.
- **Result**: Status `"failed"`, result contains `{"error": "...", "failed_step": step.id, "step_results": [...]}``TASK_FAILED` event; further plan steps skipped.
### Critic Failure Flow
- **Trigger**: Critic adapter raises exception or returns non-JSON output.
- **Detection**: `_evaluate_with_critic` catches exception, logs warning.
- **Result**: Event `CRITIC_RESULT` with error payload → `critic_score = None` → execution continues without critique; memory write skipped.
### Orchestrator Fallback Flow
- **Trigger**: Primary orchestrator model unavailable or returns invalid directive.
- **Detection**: `_ensure_orchestrator` returns `None`; router falls back to `sys_util` orchestrator.
- **Result**: Utility orchestrator handles system-level decisions (e.g., file operations, environment queries).
### Permission Denial Flow
- **Trigger**: `PermissionService` returns `decision: "hard_stop"` or `"deny"`.
- **Detection**: `_execute_tool` checks `permission_result`.
- **Result**: Immediate failure with `"Command blocked: ..."``TASK_FAILED`; no tool execution.
---
## 🧠 6. "Decision Logic Map"
### Orchestrator vs Direct Respond
- **Use orchestrator** when: task requires planning, multi-step tool usage, or unknown intent. Orchestrator decides to emit `plan` or `tool` directive.
- **Direct respond** when: intent parser classifies as simple query (`TASK_RECEIVED` → `router.intent_parser``respond` directive) or `respond` directive explicitly set.
### Utility Model Call
- Invoked when `sys_util` orchestrator is loaded (configurable). Used for system-level operations: environment inspection, file system queries, or when primary orchestrator fails and fallback is needed.
### Retry Logic
- **Planner retry**: `ExecutionScheduler` has `retry_limit=2`; on parse/validation failure, retries up to limit before failing plan.
- **Tool retry**: Not implemented natively; retry must be encoded in plan steps (`depends_on`, manual replan).
### Plan Creation
- **Trigger**: Orchestrator output contains `{type: "plan", ...}` or explicit `plan` directive.
- **Process**: `parse_plan_steps``validate_no_cycles``build_task_graph` → ready steps execution.
- **No plan**: Orchestrator outputs `respond` or `tool` → direct execution.
---
## 🧰 7. Tool System Architecture
### Plugin Discovery
- `ToolDiscovery` scans `app/tools/plugins/` for modules exporting `Tool` classes.
- Discovers: `shell_exec`, `file_read`, `file_write`, `memory` (search/insert/list).
### Manifest-Based Tools
- Each plugin has a `manifest.json` with:
- `description`: human-readable docstring.
- `args_schema`: JSON schema for validation.
- `requires_permission`: boolean for privileged tools (`shell_exec`, `file_write`).
- On discovery, registry registers tool and stores schema for permission/routing.
### Registry Bootstrap
- `RuntimeController._create_tool_registry()` initializes discovery, loads plugins, registers with init mapping (sandbox, permissions).
- Tools are initialized once at startup; `tool_registry` is shared across executions.
### Execution Isolation
- **ToolSandbox** (`ToolSandbox`):
- Restricts filesystem to `allowed_root` (project base dir).
- Timeout per execution (`step_timeout_ms`).
- Blocks `sudo` without secret override; requires secret injection for sudo commands.
- **Permission gating**: `shell_exec` and `file_write` require explicit permission decision before execution.

View File

@ -1,534 +0,0 @@
# IMPLEMENTATION PLAN
Этот документ описывает рекомендуемый порядок реализации `ducklm` от пустого репозитория до рабочего локального runtime с тестовым веб-чатом.
План опирается на [`TASK_3.md`](/home/mirivlad/git/ducklm/TASK_3.md) и [`ARCHITECTURE.md`](/home/mirivlad/git/ducklm/ARCHITECTURE.md).
## 1. Goal
Собрать систему по этапам так, чтобы после каждого этапа оставался рабочий, проверяемый инкремент, а не набор недоделанных слоёв.
Главный принцип:
- сначала каркас и контракты
- потом runtime core
- потом execution path
- потом memory / critic / recovery
- потом удобные интерфейсы проверки
## 2. Milestones Overview
1. Project skeleton and typed contracts
2. Config system and dependency wiring
3. Runtime loop skeleton
4. Event bus and event store
5. State persistence and checkpointing
6. Context builder and orchestrator adapter
7. Router and directive flow
8. Execution engine and task graph
9. Permission system and tool sandbox
10. MVP tools
11. FastAPI API and health surface
12. Web chat test client
13. Coder integration
14. Critic integration
15. Memory system
16. Memory write policy
17. Retry, recovery, replay
18. CLI and operator utilities
19. Hardening and tests
## 3. Detailed Stages
### Stage 1. Project Skeleton and Typed Contracts
Цель:
- создать структуру директорий
- завести базовые модели данных
- убрать двусмысленность интерфейсов между слоями
Сделать:
- создать `app/`, `config/`, `data/`, `tests/`
- добавить core contracts:
- `UserTask`
- `PlanStep`
- `ToolCall`
- `ToolResult`
- `CriticScore`
- `RuntimeEvent`
- `TaskCheckpoint`
- `ExecutionDirective`
Результат этапа:
- проект компилируется
- типы и схемы являются source of truth для остальных модулей
Проверка:
- unit tests на валидацию схем
### Stage 2. Config System and Dependency Wiring
Цель:
- вынести runtime behavior в конфиги
- зафиксировать единый способ загрузки настроек
Сделать:
- `config/models.json`
- `config/prompts.json`
- `config/permissions.json`
- `config/runtime.json`
- loader и typed config models
Результат этапа:
- runtime можно запускать с консистентной конфигурацией
Проверка:
- config load smoke test
### Stage 3. Runtime Loop Skeleton
Цель:
- создать heart of system без полной бизнес-логики
Сделать:
- `runtime_loop.py`
- `runtime_controller.py`
- минимальный lifecycle:
- receive task
- create state
- build empty context
- emit initial event
- return placeholder directive/result
Результат этапа:
- есть центральный control loop
- остальные слои начинают подстраиваться под него, а не наоборот
Проверка:
- smoke test на прохождение задачи через loop skeleton
### Stage 4. Event Bus and Event Store
Цель:
- создать внутреннюю event backbone
Сделать:
- `event_bus.py`
- `event_types.py`
- `event_store.py`
- monotonic sequence per task
- append-only storage
- базовый replay reader
Результат этапа:
- у каждой задачи есть воспроизводимая хронология
Проверка:
- event ordering tests
- dedup/idempotency tests
### Stage 5. State Persistence and Checkpointing
Цель:
- убрать зависимость task lifecycle от памяти процесса
Сделать:
- `task_state_store.py`
- `checkpoint_store.py`
- SQLite backend
- checkpoint after critical transitions
- resume loading primitives
Результат этапа:
- runtime готов к recovery после падения
Проверка:
- save/load checkpoint tests
### Stage 6. Context Builder and Orchestrator Adapter
Цель:
- зафиксировать правильный вход в reasoning path
Сделать:
- `context_builder.py`
- token-budget-aware assembly
- orchestrator adapter abstraction
- planning mode / orchestration mode interfaces
Результат этапа:
- все будущие вызовы reasoning model идут через один нормализованный путь
Проверка:
- tests на context assembly priorities
### Stage 7. Router and Directive Flow
Цель:
- зафиксировать router как pure decision layer
Сделать:
- `router.py`
- `state + context -> ExecutionDirective`
- no side effects
- routing rules for:
- retrieval needed
- planning needed
- permission needed
- critic needed
Результат этапа:
- runtime loop применяет решения, а не изобретает их сам
Проверка:
- unit tests на routing decisions
### Stage 8. Execution Engine and Task Graph
Цель:
- получить управляемое исполнение шагов, а не “вызовы по месту”
Сделать:
- `execution_engine.py`
- `execution_scheduler.py`
- task graph validation
- sequential DAG scheduler
- adapters for tool/coder execution
Результат этапа:
- runtime может исполнять direct action и multi-step plans
Проверка:
- task graph validation tests
- step ordering tests
### Stage 9. Permission System and Tool Sandbox
Цель:
- не дать runtime выполнять опасные действия напрямую
Сделать:
- permission rules
- persistent approval store
- shell safety classifier
- sandbox execution adapter
- timeout/resource/path restrictions
Результат этапа:
- опасные команды требуют policy decision до запуска
Проверка:
- permission flow tests
- sandbox boundary smoke tests
### Stage 10. MVP Tools
Цель:
- сделать минимально полезный execution path
Сделать:
- `shell_exec`
- `file_read`
- `file_write`
- unified tool registry
- unified `ToolResult`
Результат этапа:
- runtime уже может выполнять реальные локальные задачи
Проверка:
- integration tests для трёх базовых tools
### Stage 11. FastAPI API and Health Surface
Цель:
- открыть runtime наружу через стабильный backend interface
Сделать:
- `POST /chat`
- `WS /stream`
- `GET /health`
- базовый request/response models
- error handling
Результат этапа:
- систему уже можно дергать из внешнего клиента
Проверка:
- API smoke tests
### Stage 12. Web Chat Test Client
Цель:
- получить быстрый способ руками проверить поведение всей системы через браузер
Сделать:
- минимальный локальный веб-чат
- простую страницу с:
- вводом задачи
- окном сообщений
- панелью streaming events
- индикацией permission requests
- отображением final result
- подключение к `POST /chat` и `WS /stream`
Требования:
- это не production UI
- это не отдельный продуктовый frontend
- это thin test client для ручной проверки runtime
Лучше всего разместить как:
- `app/api/static/` или отдельный `web/` модуль с минимальным стеком
Результат этапа:
- можно открыть браузер и увидеть, как runtime планирует, исполняет шаги и стримит события
Проверка:
- ручной e2e smoke test через браузер
### Stage 13. Coder Integration
Цель:
- подключить отдельную coding model без смешивания ролей
Сделать:
- `core/coder.py`
- `generate_code`
- `fix_code`
- `refactor_code`
- structured coder result
Результат этапа:
- runtime может делегировать кодогенерацию специализированной модели
Проверка:
- tests на coder request/response flow
### Stage 14. Critic Integration
Цель:
- получить formal evaluation layer после tools/coder
Сделать:
- critic adapter
- `CriticScore`
- fallback policy when critic unavailable
Результат этапа:
- результаты можно оценивать единообразно
Проверка:
- critic scoring contract tests
### Stage 15. Memory System
Цель:
- добавить долговременную retrieval memory
Сделать:
- SQLite metadata store
- FAISS/hnswlib vector index
- insert/search/delete/reindex
- embedding versioning
Результат этапа:
- runtime получает semantic retrieval вместо контекста “только текущая задача”
Проверка:
- memory insert/search tests
### Stage 16. Memory Write Policy
Цель:
- не допустить хаотичной записи всего подряд
Сделать:
- deterministic write policy
- threshold model
- dedup / merge rules
- conflict handling
Результат этапа:
- память пополняется контролируемо, а не по одному score cutoff
Проверка:
- memory policy decision tests
### Stage 17. Retry, Recovery, Replay
Цель:
- довести runtime до устойчивого long-running поведения
Сделать:
- planner retry
- tool retry for allowed cases
- partial failure recovery
- replay path from event store
- resume from checkpoint
Результат этапа:
- система может переживать ошибки без полной потери исполнения
Проверка:
- recovery smoke tests
- replay tests
### Stage 18. CLI and Operator Utilities
Цель:
- дать локальный интерфейс помимо API/веб-чата
Сделать:
- send task
- show result
- follow events
- memory search
- replay task history
Результат этапа:
- разработчик может проверять runtime без браузера
Проверка:
- CLI smoke tests
### Stage 19. Hardening and Tests
Цель:
- довести проект до инженерно приемлемого состояния
Сделать:
- structured logging refinement
- failure-path tests
- concurrency edge cases
- docs refresh
- cleanup of temporary stubs
Результат этапа:
- проект становится пригодным для реальной итеративной разработки
Проверка:
- full critical-path smoke suite
## 4. Recommended First Working Demo
Первый нормальный demo checkpoint должен быть на этапе `Stage 12`.
Что должно работать к этому моменту:
- браузерный веб-чат открывается локально
- пользователь отправляет задачу
- runtime принимает task
- событие начала работы видно в UI
- если нужен plan, это видно в events panel
- tool execution видно в events panel
- final response возвращается в чат
На этом этапе memory, critic и recovery ещё могут быть частично stubbed, но:
- runtime loop
- event bus
- state persistence
- router
- execution engine
- permissions
- базовые tools
- API
- web chat
должны быть уже реальными.
## 5. Order Rationale
Почему веб-чат не в самом конце:
- он нужен как live inspection surface для runtime
- через него проще проверять streaming, permissions и event ordering
- он быстрее выявляет архитектурные проблемы, чем голые unit tests
Но веб-чат ставится только после:
- runtime core
- event bus
- persistence
- basic execution path
- API
Иначе он станет красивой оболочкой над несуществующей системой.

View File

@ -1,83 +0,0 @@
# MVP CHECKLIST
Этот чеклист фиксирует минимальный рабочий объём для первого демонстрационного запуска `ducklm`.
## 1. Core Runtime
- [x] Есть модульная структура проекта `app/`, `config/`, `data/`, `tests/`
- [x] Есть typed contracts для core entities
- [x] Есть `Runtime Loop Controller`
- [x] Runtime loop умеет принять задачу и создать task state
- [x] Runtime loop публикует стартовые и финальные события
## 2. Events and State
- [x] Есть `EventBus`
- [x] Есть `EventStore`
- [x] События имеют `task_id + sequence`
- [x] Есть `TaskStateStore`
- [x] Есть `CheckpointStore`
- [x] Есть сохранение checkpoint после critical transitions
- [ ] Есть базовый resume path
## 3. Decision and Execution
- [x] Есть `ContextBuilder`
- [x] Есть `Router` как pure decision layer
- [x] Есть `ExecutionDirective`
- [x] Есть `ExecutionEngine`
- [x] Есть `ExecutionScheduler`
- [ ] План валидируется и преобразуется в task graph
## 4. Tools and Safety
- [x] Есть `PermissionService`
- [x] Есть persistent store для user approvals
- [x] Есть `ToolSandbox`
- [x] Есть `ToolRegistry`
- [x] Работает `shell_exec`
- [x] Работает `file_read`
- [x] Работает `file_write`
## 5. Models and Evaluation
- [ ] Есть orchestrator adapter
- [ ] Есть planning mode interface
- [ ] Есть coder adapter
- [ ] Есть critic adapter
- [ ] Есть fallback policy при critic failure
## 6. Memory
- [ ] Есть SQLite metadata store
- [ ] Есть vector index adapter
- [ ] Работает memory insert/search
- [ ] Есть `MemoryWritePolicy`
- [ ] Запись в память не зависит только от critic score
## 7. Interfaces
- [x] Есть `POST /chat`
- [ ] Есть `WS /stream`
- [x] Есть `GET /health`
- [x] Есть локальный веб-чат для ручной проверки runtime
- [ ] Есть CLI для отправки задач и просмотра событий
## 8. Reliability
- [ ] Есть structured logging
- [ ] Есть retry/recovery policy skeleton
- [ ] Есть replay path from event store
- [ ] Есть critical-path smoke tests
## 9. Demo Definition
MVP считается достигнутым, если:
- [ ] можно открыть локальный веб-чат в браузере
- [ ] можно отправить задачу
- [ ] видно streaming events
- [ ] видно планирование или direct action
- [ ] видно выполнение tool step
- [ ] опасная команда требует подтверждения
- [ ] финальный ответ возвращается пользователю

51
Makefile Normal file
View File

@ -0,0 +1,51 @@
duck-up:
docker compose -f docker-compose.memory.yml up -d
@echo "Memory services started."
@echo "Start llama-server:"
@echo "bash scripts/llama/start_main.sh start"
duck-llama-main:
bash scripts/llama/start_main.sh start
duck-llama-stop:
bash scripts/llama/start_main.sh stop
duck-llama-restart:
bash scripts/llama/start_main.sh restart
duck-llama-status:
bash scripts/llama/start_main.sh status
duck-llama-logs:
bash scripts/llama/start_main.sh logs --follow
duck-llama-health:
bash scripts/llama/healthcheck.sh http://127.0.0.1:8081/v1
duck-api:
python3 -m duck_core.api
duck-dev:
docker compose -f docker-compose.memory.yml up -d
@echo "Start llama-server in another terminal:"
@echo "bash scripts/llama/start_main.sh start"
@echo "Then run:"
@echo "make duck-api"
@echo "Open:"
@echo "http://127.0.0.1:8000/"
duck-open:
@echo "Open web UI:"
@echo "http://127.0.0.1:8000/"
duck-smoke:
python3 -m pytest tests/smoke -v
duck-test:
python3 -m pytest -v
duck-verify:
bash scripts/verify/verify_basic_chat.sh
bash scripts/verify/verify_file_write_read.sh
bash scripts/verify/verify_tool_blocking.sh
bash scripts/verify/verify_models_roles.sh

22
README.md Normal file
View File

@ -0,0 +1,22 @@
# DuckLM
DuckLM is a local agent runtime over local language models. It exposes a WebChat and HTTP API, calls `llama-server` through an OpenAI-compatible API, and persists tasks/events in SQLite.
## Quick Start
```bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install -e ".[dev]"
cp .env.example .env
bash scripts/llama/start_main.sh
```
In another terminal:
```bash
. .venv/bin/activate
python -m duck_core.api
```
Open `http://127.0.0.1:8000/`.

1255
TASK_3.md

File diff suppressed because it is too large Load Diff

View File

@ -1,2 +0,0 @@
"""ducklm application package."""

View File

@ -1,2 +0,0 @@
"""API layer."""

View File

@ -1,170 +0,0 @@
from __future__ import annotations
import asyncio
from contextlib import asynccontextmanager
from pathlib import Path
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from fastapi.responses import FileResponse
from pydantic import BaseModel
class CriticFeedbackRequest(BaseModel):
feedback: str
task_id: str | None = None
session_id: str | None = None
feedback_type: str | None = None
severity: str | None = None
correction: str | None = None
remember: bool = True
retry: bool = False
assistant_answer: str | None = None
correctness_override: float | None = None
usefulness_override: float | None = None
safety_override: float | None = None
from app.core.permission_resolution import PermissionResolutionRequest, SecretResolutionRequest, PasswordResolutionRequest, ReviewResolutionRequest
from app.core.contracts import UserTask
from app.runtime.runtime_controller import RuntimeController
from app.streaming.manager import StreamingManager
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Load models on startup."""
print("Lifespan: Starting model loading...")
try:
print("Lifespan: Loading models...")
runtime.load_models_at_startup()
print("Lifespan: Models loaded")
# Rebuild vector index if empty but memory store has data.
if runtime._memory_interface:
store_count = runtime._memory_interface.count()
if store_count > 0:
idx_count = runtime._memory_interface._vector_index.element_count
if idx_count == 0:
print(f"Lifespan: Rebuilding vector index ({store_count} entries)...")
runtime._memory_interface.reindex()
print("Lifespan: Vector index rebuilt")
except Exception as e:
print(f"Lifespan: Failed to load models: {e}")
import traceback
traceback.print_exc()
yield # Server runs here
print("Lifespan: Shutting down...")
app = FastAPI(title="ducklm", lifespan=lifespan)
runtime = RuntimeController(base_dir=Path(__file__).resolve().parents[2])
streaming = StreamingManager(runtime.event_bus)
@app.get("/")
def index() -> FileResponse:
return FileResponse(Path(__file__).resolve().parent / "static" / "index.html")
@app.get("/health")
def health() -> dict[str, str]:
return {"status": "ok"}
@app.get("/events")
def list_events(limit: int = 500) -> dict[str, object]:
safe_limit = max(1, min(limit, 2000))
return {
"events": [
event.model_dump(mode="json")
for event in runtime.event_bus.list_recent(limit=safe_limit)
]
}
@app.post("/chat")
def chat(task: UserTask) -> dict[str, object]:
submit = getattr(runtime, "submit_task", None)
if callable(submit):
return submit(task)
return runtime.handle_task(task)
@app.post("/permissions/resolve")
def resolve_permission(request: PermissionResolutionRequest) -> dict[str, object]:
submit = getattr(runtime, "submit_permission_resolution", None)
if callable(submit):
return submit(task_id=request.task_id, decision=request.decision)
return runtime.resolve_permission(task_id=request.task_id, decision=request.decision)
@app.post("/secrets/resolve")
def resolve_secret(request: SecretResolutionRequest) -> dict[str, object]:
submit = getattr(runtime, "submit_secret_resolution", None)
if callable(submit):
return submit(task_id=request.task_id, secret=request.secret)
return runtime.resolve_secret(task_id=request.task_id, secret=request.secret)
@app.post("/password/resolve")
def resolve_password(request: PasswordResolutionRequest) -> dict[str, object]:
submit = getattr(runtime, "submit_password_resolution", None)
if callable(submit):
return submit(task_id=request.task_id, password=request.password)
return runtime.resolve_password(task_id=request.task_id, password=request.password)
@app.post("/review/resolve")
def resolve_review(request: ReviewResolutionRequest) -> dict[str, object]:
submit = getattr(runtime, "submit_review_resolution", None)
if callable(submit):
return submit(task_id=request.task_id, decision=request.decision, correction=request.correction)
return runtime.resolve_review(task_id=request.task_id, decision=request.decision, correction=request.correction)
@app.post("/critic/feedback")
def critic_feedback(request: CriticFeedbackRequest) -> dict[str, object]:
feedback = runtime.handle_critic_feedback(
feedback=request.feedback,
task_id=request.task_id,
session_id=request.session_id,
feedback_type=request.feedback_type,
severity=request.severity,
correction=request.correction,
remember=request.remember,
retry=request.retry,
assistant_answer=request.assistant_answer,
correctness_override=request.correctness_override,
usefulness_override=request.usefulness_override,
safety_override=request.safety_override,
)
return feedback
@app.websocket("/stream/{task_id}")
async def stream_task(websocket: WebSocket, task_id: str) -> None:
await websocket.accept()
replayed_events = streaming.replay_events(task_id)
for event in replayed_events:
await websocket.send_json(event.model_dump(mode="json"))
if replayed_events and replayed_events[-1].type in {"task_completed", "task_failed"}:
await websocket.close()
return
queue = streaming.subscribe(task_id)
try:
while True:
try:
event = await asyncio.wait_for(queue.get(), timeout=30)
except asyncio.TimeoutError:
await websocket.send_json({"type": "heartbeat", "task_id": task_id})
continue
await websocket.send_json(event.model_dump(mode="json"))
if event.type in {"task_completed", "task_failed", "task_awaiting_permission", "task_awaiting_input", "task_awaiting_review"}:
break
except WebSocketDisconnect:
pass
finally:
streaming.unsubscribe(task_id, queue)
await websocket.close()

Binary file not shown.

Before

Width:  |  Height:  |  Size: 17 KiB

File diff suppressed because it is too large Load Diff

View File

@ -1,2 +0,0 @@
"""CLI layer."""

View File

@ -1,2 +0,0 @@
"""Core orchestration components."""

View File

@ -1,542 +0,0 @@
from __future__ import annotations
import asyncio
import json
import logging
import re
from typing import Any
from app.core.contracts import ExecutionDirective
from app.core.intent_parser import IntentParser
from app.events.event_bus import EventBus
from app.events.event_types import (
ORCHESTRATOR_CALLED,
ORCHESTRATOR_FALLBACK_USED,
ORCHESTRATOR_RETRY,
ORCHESTRATOR_RESULT,
ORCHESTRATOR_UNAVAILABLE,
THINKER_CALLED,
THINKER_RESULT,
JSON_COMPILER_CALLED,
JSON_COMPILER_RESULT,
)
from app.models.async_adapters import AsyncOrchestratorAdapter
logger = logging.getLogger(__name__)
class AsyncRouter:
"""Async router using Thinker + JSON Compiler pipeline."""
def __init__(
self,
thinker: AsyncOrchestratorAdapter | None = None,
json_compiler: AsyncOrchestratorAdapter | None = None,
intent_parser: IntentParser | None = None,
prompts: dict[str, str] | None = None,
event_bus: EventBus | None = None,
tool_registry=None,
retry_limit: int = 2,
debug: bool = False,
log_length: int = 500,
json_fix_retry_limit: int = 2,
json_fix_use_sys_util: bool = True,
intent_classifier: str = "thinker",
) -> None:
self._thinker = thinker
self._json_compiler = json_compiler
self._intent_classifier = intent_classifier
self._sys_util = None
self._intent_parser = intent_parser or IntentParser()
self._prompts = prompts or {}
self._event_bus = event_bus
self._tool_registry = tool_registry
self._retry_limit = retry_limit
self._debug = debug
self._log_length = log_length
self._json_fix_retry_limit = json_fix_retry_limit
self._json_fix_use_sys_util = json_fix_use_sys_util
self._orchestrator = None # Set separately if needed for classification
def set_event_bus(self, event_bus: EventBus) -> None:
self._event_bus = event_bus
def set_thinker(self, thinker: AsyncOrchestratorAdapter) -> None:
self._thinker = thinker
def set_json_compiler(self, json_compiler: AsyncOrchestratorAdapter) -> None:
self._json_compiler = json_compiler
def set_sys_util(self, sys_util: AsyncOrchestratorAdapter) -> None:
self._sys_util = sys_util
def set_orchestrator(self, orchestrator: AsyncOrchestratorAdapter) -> None:
self._orchestrator = orchestrator
def set_tool_registry(self, tool_registry) -> None:
self._tool_registry = tool_registry
async def decide(
self,
state: dict[str, Any],
context: dict[str, Any],
task_id: str | None = None,
session_id: str | None = None,
) -> ExecutionDirective:
task_context = context.get("task_context", {})
requested_tool = task_context.get("requested_tool")
task_summary = str(context.get("task_summary", ""))
if requested_tool:
self._emit_event(
ORCHESTRATOR_RESULT,
{"reason": "explicit_tool_request", "tool": requested_tool},
task_id,
session_id,
)
return ExecutionDirective(
type="tool",
payload={
"tool": requested_tool,
"args": task_context.get("tool_args", {}),
},
requires_permission=requested_tool in {"shell_exec", "file_write"},
confidence=0.9,
reason="Task context explicitly requested a tool execution.",
)
parsed_intent = self._intent_parser.parse(task_summary)
if parsed_intent:
self._emit_event(
ORCHESTRATOR_RESULT,
{"reason": "deterministic_intent_parser", "directive": parsed_intent.model_dump(mode="json")},
task_id,
session_id,
)
return parsed_intent
if self._thinker is None:
fallback = self._fallback_directive(task_summary)
self._emit_event(
ORCHESTRATOR_FALLBACK_USED,
{"reason": "thinker_unavailable", "directive": fallback.model_dump(mode="json")},
task_id,
session_id,
)
return fallback
if self._json_compiler is None:
fallback = self._fallback_directive(task_summary)
self._emit_event(
ORCHESTRATOR_FALLBACK_USED,
{"reason": "json_compiler_unavailable", "directive": fallback.model_dump(mode="json")},
task_id,
session_id,
)
return fallback
mode_hint = await self._classify_intent(task_summary)
thinker_prompt = self._build_thinker_prompt(task_summary, context, mode_hint)
for thinker_attempt in range(self._retry_limit + 1):
if thinker_attempt > 0:
self._emit_event(
ORCHESTRATOR_RETRY,
{"attempt": thinker_attempt, "prompt": thinker_prompt},
task_id,
session_id,
)
thinker_prompt = self._add_thinker_feedback(thinker_prompt, last_thinker_error, thinker_attempt)
self._emit_event(
THINKER_CALLED,
{"attempt": thinker_attempt, "mode": mode_hint},
task_id,
session_id,
)
try:
thinker_result = await self._thinker.generate(thinker_prompt)
except Exception as e:
logger.warning(f"Thinker generate failed: {e}")
last_thinker_error = str(e)
continue
logger.info(f"Thinker result (attempt {thinker_attempt + 1}): {thinker_result}")
self._emit_event(
THINKER_RESULT,
{"result": thinker_result, "attempt": thinker_attempt},
task_id,
session_id,
)
if mode_hint == "conversation" and self._looks_like_tool_plan(thinker_result):
mode_hint = "execution"
self._emit_event(
ORCHESTRATOR_FALLBACK_USED,
{"reason": "thinker_proposed_tool_plan_despite_conversation_hint"},
task_id,
session_id,
)
if self._is_simple_response(thinker_result):
json_compiler_prompt = self._build_json_compiler_prompt(thinker_result)
else:
json_compiler_prompt = self._build_json_compiler_prompt(thinker_result)
for compiler_attempt in range(self._json_fix_retry_limit + 1):
self._emit_event(
JSON_COMPILER_CALLED,
{"attempt": compiler_attempt, "plan": thinker_result},
task_id,
session_id,
)
try:
compiler_result = await self._json_compiler.generate(json_compiler_prompt)
except Exception as e:
logger.warning(f"JSON Compiler generate failed: {e}")
compiler_result = None
if compiler_result:
logger.info(f"JSON Compiler result (attempt {compiler_attempt + 1}): {compiler_result}")
self._emit_event(
JSON_COMPILER_RESULT,
{"result": compiler_result, "attempt": compiler_attempt},
task_id,
session_id,
)
directive = self._validate_directive(compiler_result, mode_hint) if compiler_result else None
if directive is not None:
directive = self._guard_rail_check(directive)
self._emit_event(
ORCHESTRATOR_RESULT,
{"directive": directive.model_dump(mode="json"), "thinker_attempt": thinker_attempt, "compiler_attempt": compiler_attempt},
task_id,
session_id,
)
return directive
if compiler_result:
logger.warning(f"JSON Compiler validation failed, attempting fix (attempt {compiler_attempt + 1})")
fix_result = await self._fix_invalid_json(compiler_result, compiler_attempt, task_id, session_id)
if fix_result:
fixed_directive = self._validate_directive(fix_result, mode_hint)
if fixed_directive is not None:
fixed_directive = self._guard_rail_check(fixed_directive)
self._emit_event(
ORCHESTRATOR_RESULT,
{"directive": fixed_directive.model_dump(mode="json"), "fixed": True},
task_id,
session_id,
)
return fixed_directive
last_thinker_error = f"JSON Compiler failed after {self._json_fix_retry_limit + 1} attempts"
self._emit_event(
ORCHESTRATOR_UNAVAILABLE,
{"reason": "retry_exhausted", "last_error": last_thinker_error},
task_id,
session_id,
)
raise RuntimeError(f"Thinker/Compiler pipeline failed after {self._retry_limit + 1} attempts")
def _fallback_directive(self, task_summary: str) -> ExecutionDirective:
parsed = self._intent_parser.parse(task_summary)
if parsed:
return parsed
return ExecutionDirective(
type="respond",
payload={"text": f"Runtime accepted task: {task_summary}"},
requires_permission=False,
confidence=0.4,
reason="Fallback response because local orchestration models are not loaded.",
)
def _is_simple_response(self, thinker_result: str) -> bool:
result_lower = thinker_result.lower().strip()
return result_lower.startswith("ответ:") or result_lower.startswith("response:") or "не нужно" in result_lower
def _extract_conversation_response(self, thinker_result: str) -> str:
"""Extract text response from thinker result for conversation mode."""
result_lower = thinker_result.lower()
# Skip the ПЛАН lines, just get the ОТВЕТ part
lines = thinker_result.split('\n')
response_lines = []
capture = False
for line in lines:
if line.strip().lower().startswith('ответ:') or line.strip().lower().startswith('response:'):
capture = True
response_lines.append(line)
elif capture and line.strip():
# Check if this is a new ПЛАН or step
if line.strip().lower().startswith('план') or line.strip().lower().startswith('step'):
break
response_lines.append(line)
if response_lines:
return '\n'.join(response_lines).replace('ответ:', '').replace('response:', '').strip()
# Fallback: return first few sentences
sentences = thinker_result.split('.')[:3]
return '. '.join(sentences).strip()
def _looks_like_tool_plan(self, thinker_result: str) -> bool:
result = thinker_result.lower()
tool_names = set()
if self._tool_registry:
tool_names = set(self._tool_registry.list_names())
tool_markers = {"shell_exec", "file_read", "file_write", "memory", *tool_names}
plan_markers = ("план:", "шаг", "step", "tool", "инструмент")
return any(marker in result for marker in tool_markers) and any(marker in result for marker in plan_markers)
def _build_thinker_prompt(
self, task_summary: str, context: dict[str, Any], mode_hint: str
) -> str:
base_prompt = self._prompts.get("thinker", "")
memory_context = context.get("memory_context", [])
tools_json = "[]"
if self._tool_registry:
schemas = self._tool_registry.list_schemas()
tools_json = json.dumps(schemas, ensure_ascii=False, indent=2)
prompt_lines = [
base_prompt,
"",
f"Task: {task_summary}",
f"Mode hint: {mode_hint}",
]
if memory_context:
memory_text = "\n".join([f"- {m.get('text', '')}" for m in memory_context[:5]])
prompt_lines.append(f"\nRelevant memory:\n{memory_text}")
session_history = context.get("session_history", [])
if session_history:
history_text = "\n".join([f"- {h.get('text', '')}" for h in session_history[:3]])
prompt_lines.append(f"\nPrevious requests in this session:\n{history_text}")
# Active memory recall results
memory_recall = context.get("memory_recall")
if memory_recall:
prompt_lines.append("\n=== ИЗ ДОЛГОВРЕМЕННОЙ ПАМЯТИ (ACTIVE RECALL) ===")
prompt_lines.append(f"Поисковый запрос: {memory_recall.get('query', '')}")
prompt_lines.append(memory_recall.get("summary", ""))
prompt_lines.append("=== КОНЕЦ ПАМЯТИ ===")
prompt_lines.extend([
"",
f"AVAILABLE TOOLS (JSON):",
tools_json,
"",
])
return "\n".join(prompt_lines)
def _build_json_compiler_prompt(self, thinker_result: str) -> str:
base_prompt = self._prompts.get("json_compiler", "")
prompt_lines = [
base_prompt,
"",
"Thinker's plan:",
thinker_result,
"",
]
return "\n".join(prompt_lines)
def _determine_mode_from_context(self, context: dict[str, Any]) -> str:
"""Legacy method - kept for compatibility"""
task_summary = str(context.get("task_summary", "")).lower()
keywords = ["запусти", "выполни", "создай", "напиши", "удали", "run", "execute", "create"]
for kw in keywords:
if kw in task_summary:
return "execution"
return "conversation"
async def _classify_intent(self, task_summary: str) -> str:
"""LLM-based intent classification"""
if self._intent_classifier == "orchestrator" and self._orchestrator:
classifier_model = self._orchestrator
else:
classifier_model = self._thinker
if not classifier_model:
logger.warning("No classifier model available, using default")
return "conversation"
classification_prompt = f"""Классифицируй запрос пользователя: "{task_summary}"
Классы:
- execution: чтобы ответить, агенту нужно обратиться к локальной среде, файлам, shell, tools, памяти, сети или выполнить проверку/операцию. Это включает вопросы о текущем состоянии ПК, установленных пакетах, файлах, процессах, времени работы, обновлениях, логах.
- conversation: можно ответить сразу из диалога и общих знаний, без проверки локальной среды и без tools.
- clarification_needed: нельзя понять, что именно пользователь хочет.
Верни ровно один токен без рассуждений: execution или conversation или clarification_needed"""
try:
result = await classifier_model.generate(classification_prompt)
classification = self._extract_classification(result)
if classification:
logger.info(f"Intent classified: {classification} for task: {task_summary}")
return classification
logger.warning(f"Invalid classification result: {result}, defaulting to conversation")
return "conversation"
except Exception as e:
logger.warning(f"Intent classification failed: {e}, defaulting to conversation")
return "conversation"
def _extract_classification(self, raw_result: str) -> str | None:
result = raw_result.strip().lower()
allowed = {"execution", "conversation", "clarification_needed"}
if result in allowed:
return result
result = re.sub(r"<think>.*?</think>", " ", result, flags=re.DOTALL)
if (
"shell_exec" in result
or "execute command" in result
or "command execution" in result
or "use the tool" in result
or "use a tool" in result
):
return "execution"
tokens = re.findall(r"\b(execution|conversation|clarification_needed)\b", result)
if tokens:
return tokens[-1]
first_word = result.split()[0] if result.split() else ""
if first_word in allowed:
return first_word
return None
def _validate_directive(self, output: str, mode_hint: str) -> ExecutionDirective | None:
if not output:
return None
try:
json_start = output.find("{")
json_end = output.rfind("}") + 1
if json_start < 0 or json_end <= 0:
return None
json_str = output[json_start:json_end]
data = json.loads(json_str)
if "type" not in data:
return None
msg_type = data.get("type", "")
payload = data.get("payload", {})
if msg_type == "step" and "tool" in payload:
tool = payload.get("tool", "")
args = payload.get("args", {})
payload = {"tool": tool, "args": args}
if msg_type == "plan":
payload = {"steps": payload.get("steps", [])}
return ExecutionDirective(
type=msg_type,
payload=payload,
confidence=data.get("confidence", 0.9),
reason=data.get("reason", ""),
)
except (json.JSONDecodeError, ValueError, TypeError) as e:
logger.warning(f"Directive JSON validation failed: {e}")
return None
def _guard_rail_check(self, directive: ExecutionDirective) -> ExecutionDirective:
tool_name = directive.payload.get("tool", "")
if tool_name in {"shell_exec", "file_write", "file_delete"}:
return ExecutionDirective(
type=directive.type,
payload=directive.payload,
requires_permission=True,
confidence=directive.confidence,
reason=directive.reason,
)
return directive
def _add_thinker_feedback(self, prompt: str, error: str, attempt: int) -> str:
feedback = f"\n[ATTEMPT {attempt + 1} FAILED: {error}]\n"
feedback += "Provide a valid semantic plan.\n"
return prompt + feedback
def _emit_event(
self,
event_type: str,
payload: dict[str, Any],
task_id: str | None,
session_id: str | None,
) -> None:
if self._event_bus and task_id:
from app.core.contracts import RuntimeEvent
event = RuntimeEvent(
task_id=task_id,
session_id=session_id or "unknown",
sequence=self._event_bus.next_sequence(task_id),
type=event_type,
payload=payload,
)
self._event_bus.publish(event)
SYS_UTIL_PROMPT = None
async def _fix_invalid_json(self, invalid_result: str, attempt: int, task_id: str | None, session_id: str | None) -> str | None:
"""Try to fix invalid JSON using sys_util model."""
if not self._sys_util:
return None
first_brace = invalid_result.find('{')
last_brace = invalid_result.rfind('}')
if first_brace < 0 or last_brace <= first_brace:
return None
truncated_json = invalid_result[first_brace:last_brace + 1]
error_msg = ""
try:
json.loads(truncated_json)
except json.JSONDecodeError as e:
error_msg = str(e)
sys_util_prompt = (
self._prompts.get("sys_util")
if self._prompts
else self.SYS_UTIL_PROMPT or (
"You are a STRICT JSON repair engine. "
"Your job is ONLY to fix invalid JSON syntax. "
"You MUST output valid JSON or nothing else."
)
)
fix_prompt = f"""{sys_util_prompt}
{error_msg}
Fixed JSON:"""
try:
logger.info(f"JSON fix using sys_util model (attempt {attempt + 1})")
fixed_result = await self._sys_util.generate(fix_prompt)
fixed_first = fixed_result.find('{')
fixed_last = fixed_result.rfind('}')
if fixed_first >= 0 and fixed_last > fixed_first:
return fixed_result[fixed_first:fixed_last + 1]
return None
except Exception as e:
logger.warning(f"JSON fix failed: {e}")
return None

View File

@ -1,60 +0,0 @@
from __future__ import annotations
import re
import shlex
from typing import Any
from app.core.permission_service import PermissionService
class CommandAnalyzer:
"""Deterministic shell action analyzer for structured critic evidence."""
_SPLIT_RE = re.compile(r"\s*(?:&&|;)\s*")
def __init__(self, permission_service: PermissionService) -> None:
self._permission_service = permission_service
def analyze(self, command: str, task_id: str, session_id: str) -> dict[str, Any]:
segments = [segment.strip() for segment in self._SPLIT_RE.split(command) if segment.strip()]
root_required: list[str] = []
elevated: list[str] = []
unelevated_root: list[str] = []
for segment in segments:
normalized, is_elevated = self._strip_sudo(segment)
check = self._permission_service.check_shell_command(
task_id=task_id,
session_id=session_id,
command=normalized,
)
if check.get("requires_sudo"):
root_required.append(normalized)
if is_elevated:
elevated.append(normalized)
else:
unelevated_root.append(normalized)
diagnosis_type = "privilege_scope_error" if unelevated_root else "ok"
return {
"type": diagnosis_type,
"command": command,
"segments": segments,
"root_required_segments": root_required,
"elevated_segments": elevated,
"unelevated_root_segments": unelevated_root,
}
def _strip_sudo(self, segment: str) -> tuple[str, bool]:
try:
parts = shlex.split(segment)
except ValueError:
return segment, segment.strip().startswith("sudo ")
if not parts or parts[0] != "sudo":
return segment, False
index = 1
while index < len(parts) and parts[index].startswith("-"):
index += 1
if index < len(parts) and parts[index - 1] in {"-p", "--prompt"}:
index += 1
return " ".join(shlex.quote(part) for part in parts[index:]), True

View File

@ -1,93 +0,0 @@
from __future__ import annotations
import json
from pathlib import Path
from typing import Any
from pydantic import BaseModel, Field
class ModelsConfig(BaseModel):
orchestrator_path: str = "models/llama.gguf"
coder_path: str = "models/xcoder.gguf"
critic_path: str = "models/gemma.gguf"
embeddings_path: str = "models/all-MiniLM-L6-v2"
inference: dict[str, Any] = Field(default_factory=dict)
thinker: dict[str, Any] = Field(default_factory=dict)
json_compiler: dict[str, Any] = Field(default_factory=dict)
orchestrator: dict[str, Any] = Field(default_factory=dict)
coder: dict[str, Any] = Field(default_factory=dict)
critic: dict[str, Any] = Field(default_factory=dict)
sys_util: dict[str, Any] = Field(default_factory=dict)
embeddings: dict[str, Any] = Field(default_factory=dict)
class PromptsConfig(BaseModel):
orchestration_prompt: str = ""
planning_prompt: str = ""
coder_prompt: str = ""
critic_prompt: str = ""
class PermissionsConfig(BaseModel):
dangerous_commands: dict[str, str] = Field(default_factory=dict)
sensitive_paths: list[str] = Field(default_factory=list)
default_approval_behavior: str = "ask_always"
class RuntimeConfig(BaseModel):
step_timeout_ms: int = 30_000
task_timeout_ms: int = 300_000
shell_command_timeout_ms: int = 3_600_000
shell_idle_timeout_ms: int = 600_000
planner_retry_limit: int = 2
tool_retry_limit: int = 1
replan_limit: int = 1
max_execution_steps: int = 20
retrieval_top_k: int = 5
max_context_tokens: int = 8192
context_budgets: dict[str, int] = Field(default_factory=lambda: {
"system": 512,
"task": 512,
"memory": 2048,
"execution": 2048,
"tools": 1024,
"safety": 512,
})
reserve_for_generation_pct: int = 25
orchestrator_retry_limit: int = 2
intent_classifier: str = "thinker"
recall_model: str = "sys_util"
memory_thresholds: dict[str, float] = Field(default_factory=dict)
critic_fallback_policy: str = "continue_without_critic"
checkpoint_policy: dict[str, Any] = Field(default_factory=dict)
event_retention_policy: dict[str, Any] = Field(default_factory=dict)
streaming_settings: dict[str, Any] = Field(default_factory=dict)
debug: bool = False
debug_orchestrator_log_length: int = 500
json_fix_retry_limit: int = 2
json_fix_use_sys_util: bool = True
recall_model: str = "json_compiler"
critic_retry_limit: int = 2
class AppConfig(BaseModel):
models: ModelsConfig
prompts: PromptsConfig
permissions: PermissionsConfig
runtime: RuntimeConfig
def _load_json(path: Path) -> dict[str, Any]:
with path.open("r", encoding="utf-8") as handle:
return json.load(handle)
def load_app_config(config_dir: str | Path) -> AppConfig:
config_path = Path(config_dir)
return AppConfig(
models=ModelsConfig.model_validate(_load_json(config_path / "models.json")),
prompts=PromptsConfig.model_validate(_load_json(config_path / "prompts.json")),
permissions=PermissionsConfig.model_validate(_load_json(config_path / "permissions.json")),
runtime=RuntimeConfig.model_validate(_load_json(config_path / "runtime.json")),
)

View File

@ -1,172 +0,0 @@
from __future__ import annotations
import logging
from typing import Any
from app.core.contracts import TaskCheckpoint, UserTask
logger = logging.getLogger(__name__)
DEFAULT_BUDGETS = {
"system": 512,
"task": 512,
"memory": 2048,
"execution": 2048,
"tools": 1024,
"safety": 512,
}
class ContextBuilder:
def __init__(
self,
memory_interface=None,
tool_registry=None,
config: dict[str, Any] | None = None,
) -> None:
self._memory = memory_interface
self._tool_registry = tool_registry
self._config = config or {}
self._max_tokens = self._config.get("max_context_tokens", 8192)
self._budgets = self._config.get("context_budgets", DEFAULT_BUDGETS)
self._reserve_pct = self._config.get("reserve_for_generation_pct", 25)
def build(
self,
task: UserTask,
checkpoint: TaskCheckpoint | None = None,
query: str | None = None,
) -> dict[str, Any]:
task_summary = task.input
search_query = query or task_summary
session_id = task.session_id
memory_context = []
if self._memory:
memory_context = self._retrieve_memory(search_query, session_id=session_id)
budgets = self._calculate_budgets()
reserved = self._reserve_for_generation()
system_budget = budgets.get("system", 512)
task_budget = budgets.get("task", 512)
safety_budget = budgets.get("safety", 512)
memory_budget = budgets.get("memory", 2048)
truncated_memory = self._truncate_memory(
memory_context, memory_budget
)
# Get session history for follow-up context
session_history = self._get_session_history(session_id)
context = {
"system_prompt": "",
"task_summary": task_summary[:task_budget],
"task_context": task.context,
"memory_context": truncated_memory,
"session_history": session_history,
"execution_context": checkpoint.model_dump() if checkpoint else {},
"tool_context": self._get_tool_context(),
"safety_context": {},
"constraints": {
"budgets": budgets,
"reserved_for_generation": reserved,
"original_memory_count": len(memory_context),
"truncated_memory_count": len(truncated_memory),
},
}
return context
def _get_tool_context(self) -> list[dict[str, Any]]:
"""Expose available tools to orchestrator."""
if not self._tool_registry:
return []
tools = []
for name in self._tool_registry.list_names():
tool = self._tool_registry.get(name)
tools.append({
"name": name,
"description": getattr(tool, "description", ""),
})
return tools
def _calculate_budgets(self) -> dict[str, int]:
return dict(self._budgets)
def _reserve_for_generation(self) -> int:
return int(self._max_tokens * self._reserve_pct / 100)
def _retrieve_memory(
self,
query: str,
session_id: str | None = None,
top_k: int = 5,
) -> list[dict[str, Any]]:
if not self._memory:
return []
try:
results = self._memory.search(query, top_k=top_k, session_id=session_id)
return [
{
"id": entry.id,
"text": entry.text,
"kind": entry.kind,
"source": entry.source,
"weight": entry.weight,
"score": score,
}
for entry, score in results
]
except Exception as e:
logger.warning(f"Memory retrieval failed: {e}")
return []
def _get_session_history(self, session_id: str | None = None) -> list[dict[str, Any]]:
"""Get previous task summaries from the same session for context."""
if not self._memory or not session_id:
return []
try:
# Get recent entries from same session
entries = self._memory.get_by_session(session_id, limit=5)
# Filter to only task summaries
summaries = [
{
"id": entry.id,
"text": entry.text,
"kind": entry.kind,
"source": entry.source,
"weight": entry.weight,
}
for entry in entries
if entry.kind in ("summary", "tool_result")
]
return summaries
except Exception as e:
logger.warning(f"Session history retrieval failed: {e}")
return []
def _truncate_memory(
self,
memory_context: list[dict[str, Any]],
budget: int,
) -> list[dict[str, Any]]:
if not memory_context:
return []
estimated_per_entry = 50
max_entries = max(budget // estimated_per_entry, 1)
if len(memory_context) > max_entries:
return memory_context[:max_entries]
return memory_context
def estimate_tokens(self, text: str) -> int:
if not text:
return 0
return len(text.split()) * 4 // 3

View File

@ -1,148 +0,0 @@
from __future__ import annotations
from datetime import datetime, timezone
from typing import Any, Literal
from uuid import uuid4
from pydantic import BaseModel, Field
def utc_now() -> datetime:
return datetime.now(timezone.utc)
class UserTask(BaseModel):
task_id: str = Field(default_factory=lambda: str(uuid4()))
session_id: str = Field(default_factory=lambda: str(uuid4()))
input: str
context: dict[str, Any] = Field(default_factory=dict)
created_at: datetime = Field(default_factory=utc_now)
class PlanStep(BaseModel):
id: str
kind: Literal["tool", "coder", "memory", "respond"]
tool: str | None = None
args: dict[str, Any] = Field(default_factory=dict)
description: str
requires_confirmation: bool = False
depends_on: list[str] = Field(default_factory=list)
class ToolCall(BaseModel):
tool: str
args: dict[str, Any] = Field(default_factory=dict)
task_id: str
step_id: str
class ToolResult(BaseModel):
tool: str
ok: bool
output: Any = None
error: str | None = None
metadata: dict[str, Any] = Field(default_factory=dict)
class CoderRequest(BaseModel):
mode: Literal["generate", "fix", "refactor"]
instruction: str
context: dict[str, Any] = Field(default_factory=dict)
task_id: str
class CriticScore(BaseModel):
correctness: float = Field(ge=0.0, le=1.0)
usefulness: float = Field(ge=0.0, le=1.0)
safety: float = Field(ge=0.0, le=1.0)
memory_store: bool
weight: float = Field(ge=0.0, le=1.0)
explanation: str
class MemoryEntry(BaseModel):
id: str = Field(default_factory=lambda: str(uuid4()))
text: str
kind: Literal["tool_result", "plan", "critique", "fact", "summary", "user_preference"]
source: Literal["tool", "critic", "user", "system"]
weight: float = Field(ge=0.0, le=1.0)
task_id: str | None = None
session_id: str | None = None
metadata: dict[str, Any] = Field(default_factory=dict)
created_at: datetime = Field(default_factory=utc_now)
embedding_model: str
embedding_dim: int
class PermissionDecision(BaseModel):
action_type: str
pattern: str
decision: Literal["allow_once", "allow_always", "deny", "ask_always"]
created_at: datetime = Field(default_factory=utc_now)
class RuntimeEvent(BaseModel):
event_id: str = Field(default_factory=lambda: str(uuid4()))
task_id: str
session_id: str
sequence: int
type: str
timestamp: datetime = Field(default_factory=utc_now)
payload: dict[str, Any] = Field(default_factory=dict)
causation_id: str | None = None
correlation_id: str = Field(default_factory=lambda: str(uuid4()))
class TaskCheckpoint(BaseModel):
task_id: str
status: str
active_step_id: str | None = None
plan_snapshot: dict[str, Any] = Field(default_factory=dict)
context_snapshot: dict[str, Any] = Field(default_factory=dict)
updated_at: datetime = Field(default_factory=utc_now)
class PermissionRequest(BaseModel):
task_id: str
session_id: str
action_type: str
pattern: str
command: str | None = None
path: str | None = None
requires_password: bool = False
class SecretRequest(BaseModel):
task_id: str
session_id: str
kind: str
prompt: str
command: str | None = None
class PasswordRequest(BaseModel):
task_id: str
session_id: str
command: str
reason: str
attempts: int = 0
max_attempts: int = 3
class ExecutionDirective(BaseModel):
type: Literal[
"plan",
"tool",
"coder",
"respond",
"replan",
"store_memory",
"request_permission",
"complete",
"fail",
"noop",
]
payload: dict[str, Any] = Field(default_factory=dict)
requires_permission: bool = False
confidence: float = Field(ge=0.0, le=1.0, default=0.0)
reason: str = ""

View File

@ -1,975 +0,0 @@
from __future__ import annotations
import asyncio
import json
import logging
from typing import Any
from app.core.contracts import (
CriticScore,
ExecutionDirective,
PermissionDecision,
PermissionRequest,
RuntimeEvent,
SecretRequest,
ToolCall,
ToolResult,
UserTask,
)
from app.core.command_analyzer import CommandAnalyzer
from app.core.execution_scheduler import ExecutionScheduler
from app.events.event_bus import EventBus
from app.events.event_types import (
CRITIC_CALLED,
CRITIC_RESULT,
PERMISSION_REQUESTED,
PERMISSION_RESOLVED,
PLAN_FAILED,
PLAN_STARTED,
SECRET_REQUESTED,
STEP_STARTED,
STEPPED_COMPLETED,
TOOL_CALLED,
TOOL_COMPLETED,
TOOL_OUTPUT_CHUNK,
)
from app.models.async_adapters import AsyncCriticAdapter, AsyncCoderAdapter
from app.memory.write_policy import MemoryWritePolicy
from app.memory.interface import MemoryInterface
logger = logging.getLogger(__name__)
class ExecutionEngine:
def __init__(
self,
event_bus: EventBus,
tool_registry,
permission_service,
scheduler: ExecutionScheduler | None = None,
critic: AsyncCriticAdapter | None = None,
memory_policy: MemoryWritePolicy | None = None,
memory_interface: MemoryInterface | None = None,
prompts: dict[str, str] | None = None,
recovery_limit: int = 1,
critic_retry_limit: int = 2,
command_analyzer: CommandAnalyzer | None = None,
) -> None:
self._event_bus = event_bus
self._tool_registry = tool_registry
self._permission_service = permission_service
self._scheduler = scheduler or ExecutionScheduler()
self._critic = critic
self._coder: AsyncCoderAdapter | None = None
self._memory_policy = memory_policy
self._memory_interface = memory_interface
self._prompts = prompts or {}
self._recovery_limit = recovery_limit
self._critic_retry_limit = critic_retry_limit
self._command_analyzer = command_analyzer
def set_critic(self, critic: AsyncCriticAdapter) -> None:
self._critic = critic
def set_coder(self, coder: AsyncCoderAdapter) -> None:
self._coder = coder
def set_memory_policy(self, policy: MemoryWritePolicy) -> None:
self._memory_policy = policy
def execute(
self,
task: UserTask,
directive: ExecutionDirective,
permission_override: PermissionDecision | None = None,
secret_override: str | None = None,
password_override: str | None = None,
) -> dict[str, Any]:
scheduled = self._scheduler.next_directive(directive)
self._publish(task, STEP_STARTED, {"directive_type": scheduled.type})
if scheduled.type == "plan":
return self._execute_plan(
task=task,
directive=scheduled,
permission_override=permission_override,
secret_override=secret_override,
password_override=password_override,
)
if scheduled.type == "tool":
return self._execute_tool(
task=task,
directive=scheduled,
permission_override=permission_override,
secret_override=secret_override,
password_override=password_override,
)
if scheduled.type == "respond":
return {
"status": "completed",
"result": {
"message": scheduled.payload.get("text", f"Runtime accepted task: {task.input}"),
"mode": scheduled.payload.get("mode", "direct_response"),
},
"directive": scheduled.model_dump(mode="json"),
}
if scheduled.type == "coder":
return self._execute_coder(
task=task,
directive=scheduled,
)
if scheduled.type == "fail":
return {
"status": "failed",
"result": {"error": scheduled.reason or "Execution failed."},
}
return {
"status": "completed",
"result": {
"message": "Directive accepted.",
"directive_type": scheduled.type,
},
}
def _execute_plan(
self,
task: UserTask,
directive: ExecutionDirective,
permission_override: PermissionDecision | None = None,
secret_override: str | None = None,
password_override: str | None = None,
) -> dict[str, Any]:
# Unified format: {"type": "plan", "payload": {"steps": [...]}}
# Need to extract steps from nested payload
import json
payload = directive.payload
steps_data = []
# If payload has "steps" directly, use them
if "steps" in payload:
steps_data = payload.get("steps", [])
# If payload is a string (JSON), parse it
elif isinstance(payload, str) and payload.strip().startswith("{"):
try:
parsed = json.loads(payload)
steps_data = parsed.get("payload", {}).get("steps", [])
except:
steps_data = []
if steps_data:
plan_json = json.dumps({"type": "plan", "payload": {"steps": steps_data}})
else:
plan_json = json.dumps(payload)
plan_steps = self._scheduler.parse_plan_steps(plan_json, task.task_id)
if not plan_steps:
return {
"status": "failed",
"result": {"error": "Failed to parse plan steps from directive"},
}
if not self._scheduler.validate_no_cycles(plan_steps):
self._publish(task, PLAN_FAILED, {"error": "Cycle detected in plan"})
return {
"status": "failed",
"result": {"error": "Cycle detected in plan"},
}
graph = self._scheduler.build_task_graph(plan_steps)
self._publish(task, PLAN_STARTED, {"steps": len(plan_steps)})
completed_steps: set[str] = set()
step_results: list[dict[str, Any]] = []
critic_retries_used = 0 # Track critic→replan cycles
ready_steps = self._get_ready_steps(graph, completed_steps)
while ready_steps:
step = ready_steps.pop(0)
# Handle respond kind directly without tool execution
if step.kind == "respond":
result = {
"status": "completed",
"result": {
"message": step.args.get("text", step.description),
},
}
else:
step_directive = ExecutionDirective(
type=step.kind,
payload={
"tool": step.tool,
"args": step.args,
},
requires_permission=step.requires_confirmation,
reason=step.description,
)
result = self._execute_tool(
task=task,
directive=step_directive,
permission_override=permission_override,
secret_override=secret_override,
password_override=password_override,
)
# If tool needs human input/review - return immediately.
if result.get("status") in (
"awaiting_permission",
"awaiting_input",
"awaiting_password",
"awaiting_review",
):
return {
"status": result.get("status"),
"result": result.get("result", {}),
"step_results": step_results,
}
step_results.append({
"step_id": step.id,
"result": result,
})
completed_steps.add(step.id)
self._publish(task, STEPPED_COMPLETED, {
"step_id": step.id,
"status": result.get("status"),
})
# === Critic evaluation ===
if self._critic and result.get("status") == "completed":
critic_score = self._evaluate_with_critic(task, step, result)
if critic_score:
result["critic_score"] = {
"correctness": critic_score.correctness,
"usefulness": critic_score.usefulness,
"safety": critic_score.safety,
"memory_store": critic_score.memory_store,
"weight": critic_score.weight,
"explanation": critic_score.explanation,
}
self._save_critique_to_memory(task, step, critic_score)
# Check if step result is satisfactory
min_correctness = 0.5
if critic_score.correctness < min_correctness:
# Step failed critic check — try to recover
if critic_retries_used < self._critic_retry_limit and step.kind != "respond":
critic_retries_used += 1
self._publish(task, CRITIC_RESULT, {
"step_id": step.id,
"score": critic_score.model_dump(mode="json"),
"action": "retry",
"retry": critic_retries_used,
})
# Retry the same step — rebuild directive
retry_directive = ExecutionDirective(
type=step.kind,
payload={"tool": step.tool, "args": step.args},
requires_permission=step.requires_confirmation,
reason=step.description,
)
retry_result = self._execute_tool(
task=task,
directive=retry_directive,
permission_override=permission_override,
secret_override=secret_override,
password_override=password_override,
)
if retry_result.get("status") == "completed":
result = retry_result
step_results[-1]["result"] = result
# Re-evaluate after retry
critic_score2 = self._evaluate_with_critic(task, step, result)
if critic_score2 and critic_score2.correctness >= min_correctness:
# Retry succeeded
continue
# If retry also failed, continue to next step
else:
self._publish(task, CRITIC_RESULT, {
"step_id": step.id,
"score": critic_score.model_dump(mode="json"),
"action": "give_up",
"reason": f"Critic retry limit ({self._critic_retry_limit}) reached",
})
# Handle failed step
if result.get("status") == "failed":
review = self._build_failed_step_review(task, step, result)
if review:
return {
"status": "awaiting_review",
"result": {
"error": f"Step {step.id} requires review before replanning",
"failed_step": step.id,
"step_results": step_results,
"review": review,
},
}
recovery = self._recover_failed_step(
task=task,
step=step,
result=result,
step_results=step_results,
permission_override=permission_override,
secret_override=secret_override,
password_override=password_override,
)
if recovery.get("status") == "awaiting_permission":
return recovery
if recovery.get("status") == "completed":
recovered_result = recovery.get("result")
if recovered_result:
step_results[-1]["result"] = recovered_result
if recovery.get("finish"):
return {
"status": "completed",
"result": {
"message": recovery.get("message", "Recovered from failed step"),
"step_results": step_results,
},
}
else:
return {
"status": "failed",
"result": {
"error": f"Step {step.id} failed",
"failed_step": step.id,
"step_results": step_results,
"recovery": recovery.get("result"),
},
}
ready_steps = self._get_ready_steps(graph, completed_steps)
return {
"status": "completed",
"result": {
"message": f"Plan executed: {len(completed_steps)} steps completed",
"step_results": step_results,
},
}
def _build_failed_step_review(self, task: UserTask, step, result: dict[str, Any]) -> dict[str, Any] | None:
if step.tool != "shell_exec" or not self._command_analyzer:
return None
command = str((step.args or {}).get("command", ""))
if not command:
return None
diagnosis = self._command_analyzer.analyze(
command=command,
task_id=task.task_id,
session_id=task.session_id,
)
if diagnosis.get("type") == "ok":
return None
return {
"step_id": step.id,
"tool": step.tool,
"command": command,
"diagnosis": diagnosis,
"critic_assessment": {
"classification": "model_planning_error",
"needs_replan": True,
"explanation": "Structured command analysis found a model action error before recovery.",
},
}
def _recover_failed_step(
self,
task: UserTask,
step,
result: dict[str, Any],
step_results: list[dict[str, Any]],
permission_override: PermissionDecision | None = None,
secret_override: str | None = None,
password_override: str | None = None,
) -> dict[str, Any]:
if self._recovery_limit <= 0 or not self._critic:
return {"status": "failed", "result": {"reason": "recovery_unavailable"}}
decision = self._evaluate_recovery(task, step, result, step_results)
action = decision.get("action", "fail")
if action == "continue":
recovered = dict(result)
recovered["status"] = "completed"
recovered["recovery_decision"] = decision
return {"status": "completed", "result": recovered}
if action == "respond":
recovered = dict(result)
recovered["status"] = "completed"
recovered["recovery_decision"] = decision
return {
"status": "completed",
"result": recovered,
"finish": True,
"message": decision.get("message") or decision.get("reason") or "Recovered by responding to user",
}
if action == "retry":
retry_tool = decision.get("tool") or step.tool
retry_args = decision.get("args") or step.args
retry_result = self._execute_tool(
task=task,
directive=ExecutionDirective(
type="tool",
payload={"tool": retry_tool, "args": retry_args},
requires_permission=True,
reason=decision.get("reason", "Recovery retry"),
),
permission_override=permission_override,
secret_override=secret_override,
password_override=password_override,
)
if retry_result.get("status") == "awaiting_permission":
return retry_result
retry_result["recovery_decision"] = decision
if retry_result.get("status") == "completed":
return {"status": "completed", "result": retry_result}
return {"status": "failed", "result": {"decision": decision, "retry_result": retry_result}}
return {"status": "failed", "result": decision}
def _evaluate_recovery(
self,
task: UserTask,
step,
result: dict[str, Any],
step_results: list[dict[str, Any]],
) -> dict[str, Any]:
prompt = self._build_recovery_prompt(task, step, result, step_results)
self._publish(task, CRITIC_CALLED, {"step_id": step.id, "mode": "recovery"})
try:
output = asyncio.run(self._critic.generate(prompt, max_tokens=512))
decision = self._parse_recovery_decision(output)
self._publish(task, CRITIC_RESULT, {
"step_id": step.id,
"mode": "recovery",
"decision": decision,
"raw": output,
})
return decision
except Exception as e:
logger.warning(f"Recovery evaluation failed: {e}")
self._publish(task, CRITIC_RESULT, {
"step_id": step.id,
"mode": "recovery",
"error": str(e),
})
return {"action": "fail", "reason": str(e)}
def _build_recovery_prompt(
self,
task: UserTask,
step,
result: dict[str, Any],
step_results: list[dict[str, Any]],
) -> str:
return f"""You are a recovery controller for an agent runtime.
Decide what to do after a failed tool step. A non-zero exit code is not always fatal.
Interpret the failure in context.
Allowed actions:
- continue: failure is acceptable information; continue the plan.
- retry: try one alternative tool call. Include "tool" and "args".
- respond: stop and answer the user with available information. Include "message".
- fail: real failure; stop the task.
Return ONLY JSON:
{{"action":"continue|retry|respond|fail","reason":"...","tool":"shell_exec","args":{{...}},"message":"..."}}
Task:
{task.input}
Failed step:
id={step.id}
tool={step.tool}
args={json.dumps(step.args, ensure_ascii=False)}
description={step.description}
Failed result:
{json.dumps(result, ensure_ascii=False, indent=2)}
Previous step results:
{json.dumps(step_results, ensure_ascii=False, indent=2)}
"""
def _parse_recovery_decision(self, output: str) -> dict[str, Any]:
try:
json_start = output.find("{")
json_end = output.rfind("}") + 1
if json_start < 0 or json_end <= 0:
return {"action": "fail", "reason": "Recovery output was not JSON"}
data = json.loads(output[json_start:json_end])
action = data.get("action", "fail")
if action not in {"continue", "retry", "respond", "fail"}:
action = "fail"
data["action"] = action
return data
except (json.JSONDecodeError, TypeError, ValueError) as e:
return {"action": "fail", "reason": f"Recovery JSON parse failed: {e}"}
def _get_ready_steps(
self,
graph: dict[str, Any],
completed: set[str],
) -> list:
if not graph or not graph.get("nodes"):
return []
step_map: dict = graph.get("step_map", {})
ready = []
for node in graph["nodes"]:
node_id = node["id"]
if node_id in completed:
continue
deps = node.get("depends_on", [])
if all(dep in completed for dep in deps):
step = step_map.get(node_id)
if step:
ready.append(step)
return ready
def _evaluate_with_critic(
self,
task: UserTask,
step,
result: dict[str, Any],
) -> CriticScore | None:
if not self._critic:
return None
critic_prompt = self._build_critic_prompt(step, result)
self._publish(task, CRITIC_CALLED, {"step_id": step.id})
try:
critic_output = asyncio.run(self._critic.generate(critic_prompt))
score = self._parse_critic_score(critic_output)
self._publish(task, CRITIC_RESULT, {
"step_id": step.id,
"score": score.model_dump(mode="json") if score else None,
})
if score:
result["critic_score"] = {
"correctness": score.correctness,
"usefulness": score.usefulness,
"safety": score.safety,
"memory_store": score.memory_store,
"weight": score.weight,
"explanation": score.explanation,
}
return score
except Exception as e:
logger.warning(f"Critic evaluation failed: {e}")
self._publish(task, CRITIC_RESULT, {
"step_id": step.id,
"error": str(e),
})
return None
def _save_critique_to_memory(
self,
task: UserTask,
step,
score: CriticScore,
) -> None:
"""Save critic evaluation as critique entry in memory, using MemoryWritePolicy."""
if not self._memory_interface:
return
try:
# Check with policy before saving
if self._memory_policy:
decision = self._memory_policy.decide(
critic_score=score,
memory_type="critique",
session_id=task.session_id,
)
if decision == "skip":
logger.info(f"MemoryWritePolicy skipped critique for {step.tool}")
return
# For "store_with_weight", we could adjust weight, but critic score already has weight
tool_name = step.tool
tool_args = step.args or {}
args_str = ", ".join([f"{k}={v}" for k, v in tool_args.items()])
critique_text = f"Tool: {tool_name}({args_str}) | Task: {task.input[:100]} | Scores: correctness={score.correctness}, usefulness={score.usefulness}, safety={score.safety} | {score.explanation}"
metadata = {
"task_input": task.input,
"tool": tool_name,
"args": tool_args,
"step_id": step.id,
"scores": {
"correctness": score.correctness,
"usefulness": score.usefulness,
"safety": score.safety,
},
}
self._memory_interface.insert(
text=critique_text,
kind="critique",
source="critic",
task_id=task.task_id,
session_id=task.session_id,
weight=score.weight,
metadata=metadata,
)
logger.info(f"Saved critique to memory: {tool_name} task_id={task.task_id}")
except Exception as e:
logger.warning(f"Failed to save critique to memory: {e}")
def _build_critic_prompt(self, step, result: dict[str, Any]) -> str:
base_prompt = self._prompts.get("critic", "")
tool_result = result.get("result", {})
# Truncate long outputs to avoid exceeding context window
# Keep output under ~2000 chars to leave room for prompt + generation
output = tool_result.get("output", "")
if isinstance(output, str) and len(output) > 2000:
output = output[:2000] + "\n... [truncated]"
elif not isinstance(output, str):
output_str = json.dumps(output, ensure_ascii=False)
if len(output_str) > 2000:
output = output_str[:2000] + "\n... [truncated]"
else:
output = output_str
# Build a compact result representation
compact_result = {
"ok": tool_result.get("ok"),
"output": output,
"error": tool_result.get("error"),
"exit_code": tool_result.get("metadata", {}).get("exit_code"),
}
return f"""{base_prompt}
Step: {step.description}
Tool: {step.tool}
Args: {step.args}
Result:
{json.dumps(compact_result, indent=2, ensure_ascii=False)}
Evaluate and respond with JSON:
{{"correctness": 0.0-1.0, "usefulness": 0.0-1.0, "safety": 0.0-1.0, "memory_store": true|false, "weight": 0.0-1.0, "explanation": "..."}}"""
def _parse_critic_score(self, output: str) -> CriticScore | None:
try:
json_start = output.find("{")
json_end = output.rfind("}") + 1
if json_start < 0:
return None
json_str = output[json_start:json_end]
data = json.loads(json_str)
return CriticScore(
correctness=data.get("correctness", 0.5),
usefulness=data.get("usefulness", 0.5),
safety=data.get("safety", 1.0),
memory_store=data.get("memory_store", False),
weight=data.get("weight", 0.5),
explanation=data.get("explanation", ""),
)
except (json.JSONDecodeError, ValueError, TypeError) as e:
logger.warning(f"Critic score parsing failed: {e}")
return None
def _execute_coder(
self,
task: UserTask,
directive: ExecutionDirective,
) -> dict[str, Any]:
if not self._coder:
return {"status": "failed", "result": {"error": "Coder model not available"}}
coder_task = directive.payload.get("task", "")
if not coder_task:
return {"status": "failed", "result": {"error": "Missing task for coder"}}
try:
output = asyncio.run(self._coder.generate(coder_task))
return {
"status": "completed",
"result": {"code": output},
}
except Exception as e:
logger.warning(f"Coder execution failed: {e}")
return {"status": "failed", "result": {"error": str(e)}}
def _execute_tool(
self,
task: UserTask,
directive: ExecutionDirective,
permission_override: PermissionDecision | None = None,
secret_override: str | None = None,
password_override: str | None = None,
) -> dict[str, Any]:
tool_name = str(directive.payload.get("tool", "")).strip()
tool_args = dict(directive.payload.get("args", {}))
if password_override:
tool_args["password"] = password_override
if not tool_name:
return {"status": "failed", "result": {"error": "Missing tool name"}}
# Tool-first: validate tool exists in registry
available_tools = self._tool_registry.list_names()
if tool_name not in available_tools:
return {"status": "failed", "result": {"error": f"Unknown tool: {tool_name}. Available tools: {available_tools}"}}
permission_result = None
# If permission_override is provided, skip permission check
if permission_override is not None:
permission_result = {
"decision": permission_override.decision,
"command": tool_args.get("command", ""),
"cached": True,
}
# Check permission for shell_exec and file_write
elif tool_name == "shell_exec":
permission_result = self._permission_service.check_shell_command(
task_id=task.task_id,
session_id=task.session_id,
command=str(tool_args.get("command", "")),
)
elif tool_name == "file_write":
# Allow writing to runtime data directory without permission check
write_path = str(tool_args.get("path", ""))
if "allowed_commands.json" in write_path or "/data/runtime" in write_path:
# Internal system write - allow without permission
permission_result = {"decision": "allowed", "path": write_path}
else:
permission_result = self._permission_service.check_write_path(
task_id=task.task_id,
session_id=task.session_id,
path=write_path,
)
# Handle permission result
if permission_result:
decision = permission_result.get("decision", "unknown")
# Hard stop - deny execution
if decision == "hard_stop":
self._publish(task, PERMISSION_REQUESTED, permission_result)
return {
"status": "failed",
"result": {
"error": f"Command blocked: {permission_result.get('reason', 'Hard stop command')}",
"command": permission_result.get("command", ""),
},
}
# Cached - already allowed
if decision in ("allowed_always", "allowed") or permission_result.get("cached"):
self._publish(task, PERMISSION_RESOLVED, permission_result)
# Need user confirmation - return immediately, don't continue execution
elif decision == "prompt":
self._publish(task, PERMISSION_REQUESTED, permission_result)
return {
"status": "awaiting_permission",
"result": {
"error": "Permission required before execution.",
"permission_request": permission_result,
},
}
# Hard stop - return immediately
elif decision == "deny":
self._publish(task, PERMISSION_RESOLVED, permission_result)
return {
"status": "failed",
"result": {
"error": "Permission denied",
"command": permission_result.get("command", ""),
},
}
# Deny
elif decision == "deny":
self._publish(task, PERMISSION_RESOLVED, permission_result)
return {
"status": "failed",
"result": {
"error": "Permission denied",
"command": permission_result.get("command", ""),
},
}
if tool_name == "shell_exec":
command = str(tool_args.get("command", ""))
# Determine if sudo password is needed:
# 1. Command explicitly starts with "sudo"
# 2. Command is a known sudo-requiring command (apt, systemctl, etc.) — flagged by permission service
needs_password = command.startswith("sudo ") or (permission_result is not None and permission_result.get("requires_sudo", False))
if needs_password and secret_override is None:
secret_request = SecretRequest(
task_id=task.task_id,
session_id=task.session_id,
kind="sudo_password",
prompt="Sudo password required",
command=command,
)
self._publish(task, SECRET_REQUESTED, secret_request.model_dump(mode="json"))
return {
"status": "awaiting_input",
"result": {
"error": "Secret required",
"secret_request": secret_request.model_dump(mode="json"),
},
}
if needs_password and secret_override is not None:
# Inject sudo -S for explicit sudo commands, or prepend sudo -S for implicit ones
if command.startswith("sudo "):
tool_args["command"] = f"sudo -S -p '' {command[len('sudo '):]}"
else:
tool_args["command"] = f"sudo -S -p '' {command}"
tool_args["stdin_secret"] = f"{secret_override}\n"
tool_call = ToolCall(
tool=tool_name,
args=tool_args,
task_id=task.task_id,
step_id="step-1",
)
self._publish(task, TOOL_CALLED, tool_call.model_dump(mode="json"))
if tool_name == "shell_exec":
tool_args["__output_callback"] = lambda stream, chunk: self._publish(
task,
TOOL_OUTPUT_CHUNK,
{
"tool": tool_name,
"step_id": "step-1",
"stream": stream,
"chunk": chunk,
},
)
tool_result = self._tool_registry.get(tool_name).execute(task=task, args=tool_args)
self._publish(task, TOOL_COMPLETED, tool_result.model_dump(mode="json"))
metadata = tool_result.metadata or {}
needs_sudo = metadata.get("needs_sudo", False)
sudo_auth_failed = metadata.get("sudo_auth_failed", False) or self._looks_like_sudo_auth_failure(tool_result)
if tool_name == "shell_exec" and not tool_result.ok and sudo_auth_failed:
original_command = str(directive.payload.get("args", {}).get("command", tool_args.get("command", "")))
secret_request = SecretRequest(
task_id=task.task_id,
session_id=task.session_id,
kind="sudo_password",
prompt="Sudo password incorrect. Try again",
command=original_command,
)
self._publish(task, SECRET_REQUESTED, secret_request.model_dump(mode="json"))
return {
"status": "awaiting_input",
"result": {
"error": "Sudo password failed",
"secret_request": secret_request.model_dump(mode="json"),
"attempt_failed": True,
"tool_result": tool_result.model_dump(mode="json"),
},
}
if not tool_result.ok and needs_sudo:
return {
"status": "awaiting_password",
"result": {
"task_id": task.task_id,
"needs_sudo": True,
"command": tool_args.get("command", ""),
"error": tool_result.error or "Permission denied",
"tool_result": tool_result.model_dump(mode="json"),
},
}
if tool_name == "shell_exec" and not tool_result.ok and self._command_analyzer:
original_command = str(directive.payload.get("args", {}).get("command", tool_args.get("command", "")))
diagnosis = self._command_analyzer.analyze(
command=original_command,
task_id=task.task_id,
session_id=task.session_id,
)
if diagnosis.get("type") != "ok":
return {
"status": "awaiting_review",
"result": {
"error": "Tool action requires review before replanning",
"review": {
"step_id": "step-1",
"tool": tool_name,
"command": original_command,
"diagnosis": diagnosis,
"critic_assessment": {
"classification": "model_planning_error",
"needs_replan": True,
"explanation": "Structured command analysis found a model action error before recovery.",
},
},
"tool_result": tool_result.model_dump(mode="json"),
},
}
return {
"status": "completed" if tool_result.ok else "failed",
"result": tool_result.model_dump(mode="json"),
}
def _looks_like_sudo_auth_failure(self, tool_result: ToolResult) -> bool:
output = f"{tool_result.output or ''}\n{tool_result.error or ''}".lower()
return any(
marker in output
for marker in (
"incorrect password",
"incorrect password attempt",
"sudo: no password was provided",
"sorry, try again",
"authentication failure",
)
)
def _publish(self, task: UserTask, event_type: str, payload: dict[str, Any]) -> None:
if not self._event_bus:
return
event = RuntimeEvent(
task_id=task.task_id,
session_id=task.session_id,
sequence=self._event_bus.next_sequence(task.task_id),
type=event_type,
payload=payload,
)
self._event_bus.publish(event)

View File

@ -1,212 +0,0 @@
from __future__ import annotations
import json
import logging
from collections import deque
from typing import Any
from app.core.contracts import ExecutionDirective, PlanStep
logger = logging.getLogger(__name__)
class ExecutionScheduler:
def __init__(self, retry_limit: int = 2) -> None:
self._retry_limit = retry_limit
def parse_plan_steps(
self,
json_str: str,
task_id: str | None = None,
) -> list[PlanStep]:
try:
json_start = json_str.find("{")
json_end = json_str.rfind("}") + 1
if json_start < 0:
return []
json_str = json_str[json_start:json_end]
data = json.loads(json_str)
# Unified format: {"type": "plan", "payload": {"steps": [...]}}
# or direct: {"type": "step", "payload": {"tool": "...", "args": {...}}}
if isinstance(data, dict):
msg_type = data.get("type", "")
# Single step format: {"type": "step", "payload": {"tool": ..., "args": ...}}
if msg_type == "step":
payload = data.get("payload", {})
step = {
"id": "step-0",
"kind": "tool",
"tool": payload.get("tool"),
"args": payload.get("args", {}),
"description": payload.get("description", ""),
"depends_on": payload.get("depends_on", []),
}
data = [step]
# Plan format: {"type": "plan", "payload": {"steps": [...]}}
elif msg_type == "plan":
payload = data.get("payload", {})
steps_data = payload.get("steps", [])
# Normalize steps: handle {"type": "step", "payload": {"tool": ...}}
normalized = []
for step in steps_data:
if isinstance(step, dict) and step.get("type") == "step":
inner = step.get("payload", {})
normalized.append({
"tool": inner.get("tool"),
"args": inner.get("args", {}),
"description": inner.get("description", ""),
"depends_on": inner.get("depends_on", []),
})
else:
normalized.append(step)
steps_data = normalized
data = steps_data if steps_data else []
# Old format compatibility
elif "steps" in data:
data = data["steps"]
elif "plan" in data:
data = data["plan"]
else:
data = [data]
elif isinstance(data, str):
data = json.loads(data)
if isinstance(data, dict):
data = [data]
steps = []
for i, step_data in enumerate(data):
if isinstance(step_data, str):
step_data = {"id": f"step-{i}", "kind": "respond", "text": step_data}
if not isinstance(step_data, dict):
continue
step_data.setdefault("id", f"step-{i}")
# Tool-first: scheduler получает tool напрямую, без трансформаций
# kind определяется по наличию tool name
# args передаются напрямую
if step_data.get("tool"):
step_data["kind"] = "tool"
step_data.setdefault("kind", step_data.get("kind", "respond"))
step_data.setdefault("tool", step_data.get("tool"))
step_data.setdefault("args", step_data.get("args", {}))
step_data.setdefault("description", step_data.get("description", ""))
step_data.setdefault("requires_confirmation", False)
step_data.setdefault("depends_on", [])
if "description" not in step_data:
step_data["description"] = f"Step {i}"
steps.append(PlanStep(**step_data))
return steps
except (json.JSONDecodeError, ValueError, TypeError) as e:
logger.warning(f"Plan parsing failed: {e}")
return []
def validate_no_cycles(self, steps: list[PlanStep]) -> bool:
if not steps:
return True
graph: dict[str, set[str]] = {}
for step in steps:
graph[step.id] = set(step.depends_on)
visited: set[str] = set()
rec_stack: set[str] = set()
def has_cycle(node: str) -> bool:
if node in rec_stack:
return True
if node in visited:
return False
visited.add(node)
rec_stack.add(node)
for dep in graph.get(node, []):
if has_cycle(dep):
return True
rec_stack.remove(node)
return False
for step in steps:
if step.id not in visited:
if has_cycle(step.id):
logger.warning(f"Cycle detected in plan: {step.id}")
return False
return True
def build_task_graph(
self,
steps: list[PlanStep],
) -> dict[str, Any]:
if not steps:
return {"nodes": [], "edges": []}
if not self.validate_no_cycles(steps):
return {"nodes": [], "edges": [], "error": "Cycle detected in plan"}
nodes = []
edges = []
step_map = {s.id: s for s in steps}
for step in steps:
nodes.append({
"id": step.id,
"kind": step.kind,
"tool": step.tool,
"args": step.args,
"ready": len(step.depends_on) == 0,
})
for dep_id in step.depends_on:
edges.append({
"from": dep_id,
"to": step.id,
})
return {"nodes": nodes, "edges": edges, "step_map": step_map}
def get_ready_steps(
self,
graph: dict[str, Any],
completed: set[str],
) -> list[PlanStep]:
if not graph or not graph.get("nodes"):
return []
step_map: dict[str, PlanStep] = graph.get("step_map", {})
ready = []
for node in graph["nodes"]:
node_id = node["id"]
if node_id in completed:
continue
deps = node.get("depends_on", [])
if all(dep in completed for dep in deps):
step = step_map.get(node_id)
if step:
ready.append(step)
return ready
def next_directive(
self,
directive: ExecutionDirective,
) -> ExecutionDirective:
return directive

View File

@ -1,104 +0,0 @@
from __future__ import annotations
import re
from typing import Any
from app.core.contracts import ExecutionDirective
SHELL_PREFIXES = (
"run ",
"execute ",
"launch ",
"запусти ",
"выполни ",
"выполнить ",
)
MEMORY_STORE_PATTERNS = (
r"запомни\s+(.+)",
r"сохрани\s+(.+)",
r"запиши\s+(.+)",
r"remember\s+(.+)",
r"save\s+(.+)",
)
MEMORY_SEARCH_PATTERNS = (
r"вспомни\s+(.+)",
r"search memory\s+(.+)",
)
class IntentParser:
"""Extracts explicit tool intents from natural-language task text."""
def __init__(self) -> None:
self._store_patterns = [re.compile(p, re.IGNORECASE) for p in MEMORY_STORE_PATTERNS]
self._search_patterns = [re.compile(p, re.IGNORECASE) for p in MEMORY_SEARCH_PATTERNS]
def parse(self, task_input: str) -> ExecutionDirective | None:
normalized = task_input.strip()
lowered = normalized.lower()
if matched := self._match_patterns(self._store_patterns, normalized):
return ExecutionDirective(
type="tool",
payload={
"tool": "memory_insert",
"args": {
"text": matched.group(1).strip(),
"kind": "fact",
"source": "user",
},
},
requires_permission=False,
confidence=0.85,
reason="User explicitly requested to store in memory.",
)
if matched := self._match_patterns(self._search_patterns, normalized):
return ExecutionDirective(
type="tool",
payload={
"tool": "memory_search",
"args": {"query": matched.group(1).strip()},
},
requires_permission=False,
confidence=0.85,
reason="User explicitly requested to search memory.",
)
for prefix in SHELL_PREFIXES:
if lowered.startswith(prefix):
command = normalized[len(prefix) :].strip()
if command:
return ExecutionDirective(
type="tool",
payload={
"tool": "shell_exec",
"args": {"command": command},
},
requires_permission=True,
confidence=0.92,
reason="Natural-language task explicitly requested shell execution.",
)
quoted = re.match(r"^`(.+)`$", normalized)
if quoted:
return ExecutionDirective(
type="tool",
payload={
"tool": "shell_exec",
"args": {"command": quoted.group(1)},
},
requires_permission=True,
confidence=0.75,
reason="Backticked input treated as direct shell command.",
)
return None
def _match_patterns(self, patterns: list[re.Pattern], text: str):
for pattern in patterns:
if match := pattern.match(text):
return match
return None

View File

@ -1,24 +0,0 @@
from __future__ import annotations
from pydantic import BaseModel
class PermissionResolutionRequest(BaseModel):
task_id: str
decision: str
class SecretResolutionRequest(BaseModel):
task_id: str
secret: str
class PasswordResolutionRequest(BaseModel):
task_id: str
password: str
class ReviewResolutionRequest(BaseModel):
task_id: str
decision: str
correction: str | None = None

View File

@ -1,370 +0,0 @@
from __future__ import annotations
import hashlib
import json
import logging
import os
import re
import shlex
from pathlib import Path
from typing import Any
logger = logging.getLogger(__name__)
class PermissionService:
"""Permission-first model - user is the authority."""
def __init__(self, config: dict[str, Any] | None = None, cache_file: Path | None = None):
self._config = config or self._load_config()
self._settings = self._config.get("settings", {})
self._cache_file = cache_file
self._categories = self._config.get("command_categories", {})
self._path_settings = self._config.get("path_settings", {})
self._legacy_dangerous_commands = self._config.get("dangerous_commands", {})
self._legacy_sensitive_paths = self._config.get("sensitive_paths", [])
def _load_config(self) -> dict[str, Any]:
try:
config_path = Path(__file__).parents[2] / "config" / "permissions.json"
with open(config_path) as f:
return json.load(f)
except Exception as e:
logger.warning(f"Failed to load permissions config: {e}")
return {"settings": {}, "command_categories": {}}
def _get_cache_file(self) -> Path:
if self._cache_file:
return self._cache_file
base_dir = Path(__file__).parents[2]
cache_relative = self._settings.get("cache_file", "data/runtime/allowed_commands.json")
return base_dir / cache_relative
def _load_cache(self) -> dict[str, Any]:
cache_file = self._get_cache_file()
try:
if cache_file.exists():
with open(cache_file) as f:
return json.load(f)
except Exception as e:
logger.warning(f"Failed to load cache: {e}")
return {"allowed_once": {}, "allowed_always": {}}
def _save_cache(self, cache: dict[str, Any]) -> None:
cache_file = self._get_cache_file()
cache_file.parent.mkdir(parents=True, exist_ok=True)
with open(cache_file, "w") as f:
json.dump(cache, f, indent=2)
def check_shell_command(
self,
task_id: str,
session_id: str,
command: str,
) -> dict[str, Any]:
"""Check if shell command requires permission."""
normalized = self._normalize_command(command)
command_hash = self._hash_command(normalized)
cache = self._load_cache()
# Check cache first
if command_hash in cache.get("allowed_always", {}):
return {
"decision": "allowed_always",
"command": normalized,
"cached": True,
"requires_sudo": _requires_sudo(normalized),
}
if command_hash in cache.get("allowed_once", {}):
cached = cache["allowed_once"][command_hash]
if cached.get("task_id") == task_id:
return {
"decision": "allowed_once",
"command": normalized,
"cached": True,
"requires_sudo": _requires_sudo(normalized),
}
# Check hard stop
if self._is_hard_stop(normalized):
return {
"decision": "hard_stop",
"command": normalized,
"reason": "Hard stop command - execution denied",
}
if not self._categories and self._legacy_dangerous_commands:
if self._matches_legacy_dangerous(normalized):
return {
"decision": "prompt",
"command": normalized,
"category": "legacy_dangerous",
"allow_always": False,
"task_id": task_id,
"session_id": session_id,
}
return {
"decision": "allowed",
"command": normalized,
"category": "legacy_safe",
"task_id": task_id,
"session_id": session_id,
}
# Check no_always category
category = self._get_category(normalized)
can_always = self._categories.get(category, {}).get("allow_always", True)
# Check if command requires sudo (e.g. apt, systemctl without explicit sudo prefix)
requires_sudo = _requires_sudo(normalized)
# Need user confirmation
result = {
"decision": "prompt",
"command": normalized,
"category": category,
"allow_always": can_always,
"requires_sudo": requires_sudo,
"task_id": task_id,
"session_id": session_id,
}
return result
def check_write_path(
self,
task_id: str,
session_id: str,
path: str,
) -> dict[str, Any]:
"""Check if write path requires permission."""
if not self._path_settings and self._legacy_sensitive_paths:
if any(path.startswith(sensitive) for sensitive in self._legacy_sensitive_paths):
return {
"decision": "prompt",
"path": path,
"task_id": task_id,
"session_id": session_id,
}
return {"decision": "allowed", "path": path}
allow_write_paths = self._path_settings.get("allow_write_paths", [])
# Check if path is in allowed list
for allowed in allow_write_paths:
if path.startswith(allowed):
return {"decision": "allowed", "path": path}
# Otherwise require permission
return {
"decision": "prompt",
"path": path,
"task_id": task_id,
"session_id": session_id,
}
def resolve_permission(
self,
task_id: str,
session_id: str,
command: str,
decision: str,
) -> dict[str, Any]:
"""Resolve permission decision from user."""
normalized = self._normalize_command(command)
command_hash = self._hash_command(normalized)
cache = self._load_cache()
if decision == "allow_once":
cache.setdefault("allowed_once", {})[command_hash] = {
"command": normalized,
"task_id": task_id,
"session_id": session_id,
}
self._save_cache(cache)
return {"status": "allowed_once", "command": normalized}
elif decision == "allow_always":
cache.setdefault("allowed_always", {})[command_hash] = {
"command": normalized,
"task_id": task_id,
"session_id": session_id,
}
self._save_cache(cache)
return {"status": "allowed_always", "command": normalized}
elif decision == "deny":
return {"status": "denied", "command": normalized}
return {"status": "unknown", "decision": decision}
def clear_cache(self) -> dict[str, Any]:
"""Clear permission cache."""
cache = {"allowed_once": {}, "allowed_always": {}}
self._save_cache(cache)
return {"status": "cache_cleared"}
def _normalize_command(self, command: str) -> str:
"""Normalize command for consistent hashing."""
if not self._settings.get("normalize_commands", True):
return command.strip()
normalized = command.strip()
# Split chained commands if enabled
if self._settings.get("split_chained", True):
# Replace ; and || with && for splitting
normalized = normalized.replace(";", " && ")
normalized = normalized.replace("||", " && ")
# Resolve environment variables
try:
normalized = os.path.expandvars(normalized)
except:
pass
# Resolve home directory
normalized = normalized.replace("~", os.path.expanduser("~"))
# Remove extra whitespace
normalized = " ".join(normalized.split())
return normalized
def _hash_command(self, command: str) -> str:
"""Generate hash for command."""
return hashlib.sha256(command.encode()).hexdigest()[:16]
def _matches_legacy_dangerous(self, command: str) -> bool:
cmd_lower = command.lower()
for pattern in self._legacy_dangerous_commands:
if pattern.lower() in cmd_lower:
return True
return False
def _is_hard_stop(self, command: str) -> bool:
"""Check if command is hard stop."""
hard_stop_commands = self._categories.get("hard_stop", {}).get("commands", [])
cmd_lower = command.lower().strip()
cmd_tokens = cmd_lower.split()
for hs in hard_stop_commands:
hs_lower = hs.lower().strip()
# For "rm -rf /" and "rm -rf /*", only match exact command
# Don't match "rm -rf /tmp/nonexistent" as hard stop
if hs_lower in ("rm -rf /", "rm -rf /*"):
if cmd_lower == hs_lower:
return True
continue
# For other patterns, use substring match
if hs_lower in cmd_lower:
return True
return False
def _get_category(self, command: str) -> str:
"""Get command category."""
cmd_lower = command.lower().strip()
cmd_first_word = cmd_lower.split()[0] if cmd_lower.split() else ""
# Check no_always category — match by first word or known multi-word prefixes
no_always = self._categories.get("no_always", {}).get("commands", [])
for pattern in no_always:
pat_lower = pattern.lower().strip()
# Match if first word matches (e.g. "apt" matches "apt list --upgradable")
# or if command starts with the pattern (e.g. "systemctl stop" matches "systemctl stop nginx")
if cmd_first_word == pat_lower or cmd_lower.startswith(pat_lower + " "):
return "no_always"
# Check hard_stop by first word
hard_stop = self._categories.get("hard_stop", {}).get("commands", [])
for pattern in hard_stop:
pat_lower = pattern.lower().strip()
if cmd_first_word == pat_lower or cmd_lower.startswith(pat_lower + " "):
return "hard_stop"
# Default to normal
return "normal"
SUDO_COMMANDS = {
"sudo",
"apt", "apt-get", "dpkg", "yum", "dnf", "pacman", "zypper",
"systemctl", "service", "mount", "umount",
"shutdown", "reboot", "halt", "poweroff",
"useradd", "usermod", "userdel", "groupadd", "groupmod",
"chmod", "chown", "chgrp",
"iptables", "ufw",
"kill", "killall", "pkill",
}
def _requires_sudo(command: str) -> bool:
"""Check if command requires sudo."""
if not command:
return False
cmd_lower = command.lower().strip()
first_word = cmd_lower.split()[0] if cmd_lower.split() else ""
return first_word in SUDO_COMMANDS
class PermissionRequest:
"""Permission request to user."""
def __init__(
self,
task_id: str,
session_id: str,
command: str,
category: str = "normal",
allow_always: bool = True,
) -> None:
self.task_id = task_id
self.session_id = session_id
self.command = command
self.category = category
self.allow_always = allow_always
self.requires_password = _requires_sudo(command)
def to_dict(self) -> dict[str, Any]:
return {
"task_id": self.task_id,
"session_id": self.session_id,
"command": self.command,
"category": self.category,
"allow_always": self.allow_always,
"requires_password": self.requires_password,
"buttons": self._get_buttons(),
}
def _get_buttons(self) -> list[dict[str, str]]:
buttons = [{"action": "deny", "label": "Запретить"}]
if self.allow_always:
buttons.insert(0, {"action": "allow_always", "label": "Разрешить навсегда"})
if self.requires_password:
buttons.insert(0, {"action": "allow_with_password", "label": "Разрешить с паролем"})
else:
buttons.insert(0, {"action": "allow_once", "label": "Разрешить"})
return buttons
class PermissionDecision:
"""Permission decision."""
def __init__(
self,
decision: str,
command: str | None = None,
cached: bool = False,
) -> None:
self.decision = decision
self.command = command
self.cached = cached

View File

@ -1,2 +0,0 @@
"""Event bus and event store."""

View File

@ -1,35 +0,0 @@
from __future__ import annotations
from typing import Callable
from app.core.contracts import RuntimeEvent
from app.events.event_store import SQLiteEventStore
Subscriber = Callable[[RuntimeEvent], None]
class EventBus:
"""Per-task ordered event publishing with durable storage."""
def __init__(self, event_store: SQLiteEventStore) -> None:
self._store = event_store
self._subscribers: list[Subscriber] = []
def next_sequence(self, task_id: str) -> int:
return self._store.get_latest_sequence(task_id) + 1
def publish(self, event: RuntimeEvent) -> RuntimeEvent:
self._store.append(event)
for subscriber in self._subscribers:
subscriber(event)
return event
def subscribe(self, subscriber: Subscriber) -> None:
self._subscribers.append(subscriber)
def list_for_task(self, task_id: str) -> list[RuntimeEvent]:
return self._store.list_for_task(task_id)
def list_recent(self, limit: int = 500) -> list[RuntimeEvent]:
return self._store.list_recent(limit=limit)

View File

@ -1,122 +0,0 @@
from __future__ import annotations
import json
import sqlite3
from pathlib import Path
from app.core.contracts import RuntimeEvent
class SQLiteEventStore:
"""Append-only event store with per-task ordered history."""
def __init__(self, db_path: str | Path) -> None:
self._db_path = Path(db_path)
self._db_path.parent.mkdir(parents=True, exist_ok=True)
self._initialize()
def append(self, event: RuntimeEvent) -> None:
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
INSERT INTO events (
event_id, task_id, session_id, sequence, type, timestamp,
payload_json, causation_id, correlation_id
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
event.event_id,
event.task_id,
event.session_id,
event.sequence,
event.type,
event.timestamp.isoformat(),
json.dumps(event.payload),
event.causation_id,
event.correlation_id,
),
)
conn.commit()
def list_for_task(self, task_id: str) -> list[RuntimeEvent]:
with sqlite3.connect(self._db_path) as conn:
rows = conn.execute(
"""
SELECT event_id, task_id, session_id, sequence, type, timestamp,
payload_json, causation_id, correlation_id
FROM events
WHERE task_id = ?
ORDER BY sequence ASC
""",
(task_id,),
).fetchall()
return [
RuntimeEvent(
event_id=row[0],
task_id=row[1],
session_id=row[2],
sequence=row[3],
type=row[4],
timestamp=row[5],
payload=json.loads(row[6]),
causation_id=row[7],
correlation_id=row[8],
)
for row in rows
]
def list_recent(self, limit: int = 500) -> list[RuntimeEvent]:
with sqlite3.connect(self._db_path) as conn:
rows = conn.execute(
"""
SELECT event_id, task_id, session_id, sequence, type, timestamp,
payload_json, causation_id, correlation_id
FROM events
ORDER BY timestamp DESC, task_id DESC, sequence DESC
LIMIT ?
""",
(limit,),
).fetchall()
events = [
RuntimeEvent(
event_id=row[0],
task_id=row[1],
session_id=row[2],
sequence=row[3],
type=row[4],
timestamp=row[5],
payload=json.loads(row[6]),
causation_id=row[7],
correlation_id=row[8],
)
for row in rows
]
return list(reversed(events))
def get_latest_sequence(self, task_id: str) -> int:
with sqlite3.connect(self._db_path) as conn:
row = conn.execute(
"SELECT COALESCE(MAX(sequence), 0) FROM events WHERE task_id = ?",
(task_id,),
).fetchone()
return int(row[0]) if row else 0
def _initialize(self) -> None:
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
CREATE TABLE IF NOT EXISTS events (
event_id TEXT PRIMARY KEY,
task_id TEXT NOT NULL,
session_id TEXT NOT NULL,
sequence INTEGER NOT NULL,
type TEXT NOT NULL,
timestamp TEXT NOT NULL,
payload_json TEXT NOT NULL,
causation_id TEXT,
correlation_id TEXT NOT NULL,
UNIQUE(task_id, sequence)
)
"""
)
conn.commit()

View File

@ -1,35 +0,0 @@
TASK_RECEIVED = "task_received"
CONTEXT_BUILT = "context_built"
STEP_STARTED = "step_started"
TOOL_CALLED = "tool_called"
TOOL_OUTPUT_CHUNK = "tool_output_chunk"
TOOL_COMPLETED = "tool_completed"
PERMISSION_REQUESTED = "permission_requested"
PERMISSION_RESOLVED = "permission_resolved"
TASK_AWAITING_PERMISSION = "task_awaiting_permission"
SECRET_REQUESTED = "secret_requested"
TASK_AWAITING_INPUT = "task_awaiting_input"
TASK_AWAITING_REVIEW = "task_awaiting_review"
REVIEW_RESOLVED = "review_resolved"
CHECKPOINT_SAVED = "checkpoint_saved"
TASK_COMPLETED = "task_completed"
TASK_FAILED = "task_failed"
ORCHESTRATOR_CALLED = "orchestrator_called"
ORCHESTRATOR_RESULT = "orchestrator_result"
ORCHESTRATOR_UNAVAILABLE = "orchestrator_unavailable"
ORCHESTRATOR_FALLBACK_USED = "orchestrator_fallback_used"
ORCHESTRATOR_RETRY = "orchestrator_retry"
PLANNER_CALLED = "planner_called"
PLANNER_RETRY = "planner_retry"
CRITIC_CALLED = "critic_called"
CRITIC_RESULT = "critic_result"
MEMORY_WRITE_DECIDED = "memory_write_decided"
PLAN_STARTED = "plan_started"
PLAN_FAILED = "plan_failed"
PLAN_COMPLETED = "plan_completed"
STEPPED_COMPLETED = "step_completed"
THINKER_CALLED = "thinker_called"
THINKER_RESULT = "thinker_result"
JSON_COMPILER_CALLED = "json_compiler_called"
JSON_COMPILER_RESULT = "json_compiler_result"
MEMORY_RECALL_USED = "memory_recall_used"

View File

@ -1,24 +0,0 @@
MEMORY_AVAILABLE = False
VECTOR_AVAILABLE = False
try:
from app.memory.store import MemoryStore
from app.memory.vector_index import VectorIndex
from app.memory.interface import MemoryInterface
from app.memory.write_policy import MemoryWritePolicy
MEMORY_AVAILABLE = True
VECTOR_AVAILABLE = True
except ImportError:
MemoryStore = None
VectorIndex = None
MemoryInterface = None
MemoryWritePolicy = None
__all__ = [
"MemoryStore",
"VectorIndex",
"MemoryInterface",
"MemoryWritePolicy",
"MEMORY_AVAILABLE",
"VECTOR_AVAILABLE",
]

View File

@ -1,155 +0,0 @@
from __future__ import annotations
import json
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Literal
import numpy as np
from app.core.contracts import MemoryEntry
from app.memory.store import MemoryStore
from app.memory.vector_index import VectorIndex
from app.models.embeddings import EmbeddingsAdapter
class MemoryInterface:
def __init__(
self,
store: MemoryStore,
vector_index: VectorIndex,
embeddings: EmbeddingsAdapter,
) -> None:
self._store = store
self._vector_index = vector_index
self._embeddings = embeddings
def insert(
self,
text: str,
kind: Literal["tool_result", "plan", "critique", "fact", "summary", "user_preference"],
source: Literal["tool", "critic", "user", "system"],
task_id: str | None = None,
session_id: str | None = None,
weight: float = 0.5,
metadata: dict[str, Any] | None = None,
) -> MemoryEntry:
entry = MemoryEntry(
text=text,
kind=kind,
source=source,
weight=weight,
task_id=task_id,
session_id=session_id,
metadata=metadata or {},
embedding_model=self._embeddings.__class__.__name__,
embedding_dim=self._embeddings.embedding_dim,
)
embedding = self._embeddings.encode(text)
embedding_bytes = embedding.astype("float32").tobytes()
self._store.insert(entry, embedding_bytes)
self._vector_index.insert(entry.id, embedding)
self._vector_index.save()
self.cleanup()
return entry
def search(
self,
query: str,
top_k: int = 5,
kind: str | None = None,
session_id: str | None = None,
) -> list[tuple[MemoryEntry, float]]:
query_embedding = self._embeddings.encode(query)
memory_ids, scores = self._vector_index.search(query_embedding, k=top_k)
results: list[tuple[MemoryEntry, float]] = []
for memory_id, score in zip(memory_ids, scores):
entry = self._store.get(memory_id)
if entry:
if kind and entry.kind != kind:
continue
if session_id and entry.session_id != session_id:
continue
results.append((entry, score))
return results[:top_k]
def get(self, memory_id: str) -> MemoryEntry | None:
return self._store.get(memory_id)
def delete(self, memory_id: str) -> bool:
entry = self._store.get(memory_id)
if entry:
self._vector_index.delete(memory_id)
return self._store.delete(memory_id)
return False
def get_by_task(self, task_id: str) -> list[MemoryEntry]:
return self._store.get_by_task(task_id)
def get_by_session(self, session_id: str, limit: int = 100) -> list[MemoryEntry]:
return self._store.get_by_session(session_id, limit)
def get_recent(self, limit: int = 10) -> list[MemoryEntry]:
return self._store.get_all(limit)
def count(self) -> int:
return self._store.count()
def reindex(self) -> int:
"""Rebuild vector index from all entries in memory store.
Returns number of indexed entries."""
entries = self._store.get_all(limit=10000)
# Delete old index file and re-initialize from scratch
import os
if self._vector_index._index_path and self._vector_index._index_path.exists():
self._vector_index._index_path.unlink()
self._vector_index._index = None
self._vector_index._init_index()
count = 0
for entry in entries:
text = entry.text
embedding = self._embeddings.encode(text)
self._vector_index.insert(entry.id, embedding)
count += 1
self._vector_index.save()
return count
def close(self) -> None:
self._store.close()
def cleanup(self, max_items: int = 750, decay_factor: float = 0.95) -> int:
"""Remove low-weight entries when exceeding max_items limit.
Applies weight decay based on freshness before cleanup.
Returns number of removed entries.
"""
current_count = self._store.count()
if current_count <= max_items:
return 0
removed = 0
entries_to_remove = current_count - max_items
all_entries = self._store.get_all(limit=current_count)
def effective_weight(entry: MemoryEntry) -> float:
entry_weight = entry.weight
if entry.created_at:
age_days = (datetime.now(timezone.utc) - entry.created_at).total_seconds() / 86400
freshness_factor = max(0.1, decay_factor ** age_days)
return entry_weight * freshness_factor
return entry_weight
sorted_entries = sorted(all_entries, key=effective_weight)
for entry in sorted_entries[:entries_to_remove]:
self._store.delete(entry.id)
removed += 1
return removed

View File

@ -1,205 +0,0 @@
from __future__ import annotations
import json
import logging
from typing import Any
from app.core.contracts import MemoryEntry
from app.memory.interface import MemoryInterface
from app.models.async_adapters import AsyncOrchestratorAdapter
logger = logging.getLogger(__name__)
RECALL_PROMPT_TEMPLATE = """Определи, нужно ли искать в долговременной памяти для ответа на этот запрос.
Запрос: "{task_input}"
ИСКАТЬ в памяти если запрос:
- Содержит вопрос о пользователе (имя, предпочтения, история)
- Содержит отсылки к прошлым разговорам или действиям
- Содержит местоимения без контекста ("он", "это", "тот файл")
- Просит вспомнить, повторить, рассказать о прошлом
- Спрашивает "что ты помнишь", "как меня зовут", "что я говорил"
НЕ ИСКАТЬ если:
- Приветствие или прощание
- Простая команда (ls, pwd, echo)
- Общий вопрос не связанный с прошлым
Ответь ТОЛЬКО JSON:
{{"should_recall": true, "search_query": "поисковый запрос"}}
или
{{"should_recall": false, "reason": "краткая причина"}}"""
class MemoryRecallService:
"""Активное воспоминание: система сама решает, что и когда искать в памяти."""
def __init__(
self,
memory_interface: MemoryInterface | None,
recall_model: AsyncOrchestratorAdapter | None,
) -> None:
self._memory = memory_interface
self._model = recall_model
async def recall(
self,
task_input: str,
top_k: int = 5,
) -> dict[str, Any]:
"""
Определяет необходимость воспоминания и выполняет поиск.
Возвращает:
{
"should_recall": bool,
"reason": str,
"query": str,
"results": list[MemoryEntry],
"summary": str, # краткая сводка для оркестратора
}
"""
if not self._memory or not self._model:
with open("/tmp/recall_debug.log", "a") as f:
f.write(f"SKIP: memory={self._memory is not None}, model={self._model is not None}\n")
return self._empty_result("memory_or_model_unavailable")
# 1. LLM решает, нужно ли искать
decision = await self._classify(task_input)
with open("/tmp/recall_debug.log", "a") as f:
f.write(f"DECISION type={type(decision)} value={decision}\n")
if not isinstance(decision, dict):
return self._empty_result("invalid_decision_type")
if not decision.get("should_recall"):
return self._empty_result(decision.get("reason", "not_needed"))
search_query = decision.get("search_query", task_input)
logger.info(f"Memory recall: query='{search_query}', reason='{decision.get('reason')}'")
# 2. Векторный поиск
try:
raw_results = self._memory.search(query=search_query, top_k=top_k)
except Exception as e:
logger.warning(f"Memory search failed: {e}")
return self._empty_result("search_failed")
# 3. Фильтрация: убираем пустые и слишком нерелевантные
filtered = self._filter(raw_results)
if not filtered:
return self._empty_result("no_relevant_results")
# 4. Сводка для оркестратора
summary = self._summarize(filtered, search_query)
return {
"should_recall": True,
"reason": decision.get("reason", ""),
"query": search_query,
"results": filtered,
"summary": summary,
}
async def _classify(self, task_input: str) -> dict[str, Any]:
"""LLM-классификация: нужно ли искать в памяти."""
prompt = RECALL_PROMPT_TEMPLATE.format(task_input=task_input)
try:
raw = await self._model.generate(prompt, max_tokens=512)
data = self._parse_json(raw)
if "should_recall" in data:
return data
logger.warning(f"Recall classification missing 'should_recall': {raw[:200]}")
return {"should_recall": False, "reason": "parse_error"}
except Exception as e:
logger.warning(f"Recall classification failed: {e}")
return {"should_recall": False, "reason": "classification_error"}
def _filter(
self,
results: list[tuple[MemoryEntry, float]],
min_score: float = 0.3,
) -> list[MemoryEntry]:
"""Фильтрует результаты по score и убирает дубликаты."""
seen_texts: set[str] = set()
filtered: list[MemoryEntry] = []
for entry, score in results:
if score < min_score:
continue
# Нормализуем текст для дедупликации
normalized = entry.text.strip().lower()[:100]
if normalized in seen_texts:
continue
seen_texts.add(normalized)
filtered.append(entry)
return filtered
def _summarize(
self,
results: list[MemoryEntry],
query: str,
) -> str:
"""Краткая сводка найденного для оркестратора."""
parts = [f"По запросу '{query}' найдено {len(results)} записей:"]
for i, entry in enumerate(results[:5], 1):
text_preview = entry.text[:120].replace("\n", " ")
parts.append(f" {i}. [{entry.kind}] {text_preview}")
return "\n".join(parts)
def _parse_json(self, raw: str) -> dict[str, Any]:
"""Извлекает JSON из ответа модели, пропуская рассуждения перед ним."""
try:
json_start = raw.find("{")
json_end = raw.rfind("}") + 1
if json_start < 0 or json_end <= 0:
return {}
# Пробуем весь текст от первого { до последнего }
try:
data = json.loads(raw[json_start:json_end])
if isinstance(data, dict):
return data
except json.JSONDecodeError:
pass
# Ищем все возможные начала JSON
candidates = []
pos = 0
while True:
pos = raw.find("{", pos)
if pos < 0:
break
candidates.append(pos)
pos += 1
# Пробуем каждый candidate с конца
for start in reversed(candidates):
end = raw.rfind("}") + 1
if end <= start:
continue
try:
data = json.loads(raw[start:end])
if isinstance(data, dict):
return data
except json.JSONDecodeError:
continue
return {}
except Exception as e:
with open("/tmp/recall_debug.log", "a") as f:
f.write(f"PARSE ERROR: {e}\n")
return {}
@staticmethod
def _empty_result(reason: str) -> dict[str, Any]:
return {
"should_recall": False,
"reason": reason,
"query": "",
"results": [],
"summary": "",
}

View File

@ -1,185 +0,0 @@
from __future__ import annotations
import json
import sqlite3
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Sequence
from uuid import uuid4
from app.core.contracts import MemoryEntry
def utc_now() -> datetime:
return datetime.now(timezone.utc)
class MemoryStore:
def __init__(self, db_path: str | Path) -> None:
self._db_path = Path(db_path)
self._db_path.parent.mkdir(parents=True, exist_ok=True)
self._conn = sqlite3.connect(str(self._db_path), check_same_thread=False)
self._conn.row_factory = sqlite3.Row
self._init_tables()
def _init_tables(self) -> None:
self._conn.executescript("""
CREATE TABLE IF NOT EXISTS memory_items (
id TEXT PRIMARY KEY,
text TEXT NOT NULL,
kind TEXT NOT NULL,
source TEXT NOT NULL,
weight REAL NOT NULL DEFAULT 0.5,
task_id TEXT,
session_id TEXT,
metadata_json TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS memory_embeddings (
memory_id TEXT PRIMARY KEY,
embedding BLOB NOT NULL,
embedding_model TEXT NOT NULL,
embedding_dim INTEGER NOT NULL,
created_at TEXT NOT NULL,
FOREIGN KEY (memory_id) REFERENCES memory_items(id) ON DELETE CASCADE
);
CREATE INDEX IF NOT EXISTS idx_memory_items_task ON memory_items(task_id);
CREATE INDEX IF NOT EXISTS idx_memory_items_session ON memory_items(session_id);
CREATE INDEX IF NOT EXISTS idx_memory_items_kind ON memory_items(kind);
CREATE INDEX IF NOT EXISTS idx_memory_embeddings_model ON memory_embeddings(embedding_model);
""")
self._conn.commit()
def insert(self, entry: MemoryEntry, embedding: bytes) -> None:
cursor = self._conn.cursor()
cursor.execute(
"""
INSERT INTO memory_items (id, text, kind, source, weight, task_id, session_id, metadata_json, created_at, updated_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
entry.id,
entry.text,
entry.kind,
entry.source,
entry.weight,
entry.task_id,
entry.session_id,
json.dumps(entry.metadata) if entry.metadata else None,
entry.created_at.isoformat(),
utc_now().isoformat(),
),
)
cursor.execute(
"""
INSERT INTO memory_embeddings (memory_id, embedding, embedding_model, embedding_dim, created_at)
VALUES (?, ?, ?, ?, ?)
""",
(
entry.id,
embedding,
entry.embedding_model,
entry.embedding_dim,
utc_now().isoformat(),
),
)
self._conn.commit()
def get(self, memory_id: str) -> MemoryEntry | None:
cursor = self._conn.cursor()
row = cursor.execute(
"SELECT * FROM memory_items WHERE id = ?", (memory_id,)
).fetchone()
if not row:
return None
return self._row_to_entry(row)
def get_embedding(self, memory_id: str) -> bytes | None:
cursor = self._conn.cursor()
row = cursor.execute(
"SELECT embedding FROM memory_embeddings WHERE memory_id = ?", (memory_id,)
).fetchone()
return bytes(row["embedding"]) if row else None
def get_all(self, limit: int = 1000) -> list[MemoryEntry]:
cursor = self._conn.cursor()
rows = cursor.execute(
"SELECT * FROM memory_items ORDER BY created_at DESC LIMIT ?", (limit,)
).fetchall()
return [self._row_to_entry(row) for row in rows]
def get_by_task(self, task_id: str) -> list[MemoryEntry]:
cursor = self._conn.cursor()
rows = cursor.execute(
"SELECT * FROM memory_items WHERE task_id = ? ORDER BY created_at DESC", (task_id,)
).fetchall()
return [self._row_to_entry(row) for row in rows]
def get_by_session(self, session_id: str, limit: int = 100) -> list[MemoryEntry]:
cursor = self._conn.cursor()
rows = cursor.execute(
"SELECT * FROM memory_items WHERE session_id = ? ORDER BY created_at DESC LIMIT ?",
(session_id, limit),
).fetchall()
return [self._row_to_entry(row) for row in rows]
def get_by_kind(self, kind: str, limit: int = 100) -> list[MemoryEntry]:
cursor = self._conn.cursor()
rows = cursor.execute(
"SELECT * FROM memory_items WHERE kind = ? ORDER BY created_at DESC LIMIT ?", (kind, limit)
).fetchall()
return [self._row_to_entry(row) for row in rows]
def delete(self, memory_id: str) -> bool:
cursor = self._conn.cursor()
cursor.execute("DELETE FROM memory_embeddings WHERE memory_id = ?", (memory_id,))
cursor.execute("DELETE FROM memory_items WHERE id = ?", (memory_id,))
self._conn.commit()
return cursor.rowcount > 0
def update_weight(self, memory_id: str, weight: float) -> bool:
cursor = self._conn.cursor()
cursor.execute(
"UPDATE memory_items SET weight = ?, updated_at = ? WHERE id = ?",
(weight, utc_now().isoformat(), memory_id),
)
self._conn.commit()
return cursor.rowcount > 0
def search_text(self, query: str, limit: int = 10) -> list[MemoryEntry]:
cursor = self._conn.cursor()
rows = cursor.execute(
"SELECT * FROM memory_items WHERE text LIKE ? ORDER BY created_at DESC LIMIT ?",
(f"%{query}%", limit),
).fetchall()
return [self._row_to_entry(row) for row in rows]
def count(self) -> int:
cursor = self._conn.cursor()
row = cursor.execute("SELECT COUNT(*) FROM memory_items").fetchone()
return row[0] if row else 0
def close(self) -> None:
self._conn.close()
def _row_to_entry(self, row: sqlite3.Row) -> MemoryEntry:
metadata = {}
if row["metadata_json"]:
import json
metadata = json.loads(row["metadata_json"])
return MemoryEntry(
id=row["id"],
text=row["text"],
kind=row["kind"],
source=row["source"],
weight=row["weight"],
task_id=row["task_id"],
session_id=row["session_id"],
metadata=metadata,
created_at=datetime.fromisoformat(row["created_at"]),
embedding_model="",
embedding_dim=0,
)

View File

@ -1,149 +0,0 @@
from __future__ import annotations
import logging
import numpy as np
import hnswlib
from pathlib import Path
from typing import Any
logger = logging.getLogger(__name__)
class VectorIndex:
def __init__(
self,
index_path: str | Path | None = None,
embedding_dim: int = 384,
max_elements: int = 10000,
) -> None:
self._embedding_dim = embedding_dim
self._index_path = Path(index_path) if index_path else None
self._index: hnswlib.Index | None = None
self._max_elements = max_elements
self._loading = False # Prevent recursion
self._init_index()
def _init_index(self) -> None:
if self._loading:
return
self._loading = True
try:
if self._index_path and self._index_path.exists():
self._load()
else:
self._index = hnswlib.Index(
space="l2",
dim=self._embedding_dim,
)
self._index.init_index(
max_elements=self._max_elements,
ef_construction=200,
M=16,
)
except Exception as e:
logger.warning(f"VectorIndex init failed: {e}")
self._index = hnswlib.Index(
space="l2",
dim=self._embedding_dim,
)
self._index.init_index(
max_elements=self._max_elements,
ef_construction=100,
M=16,
)
finally:
self._loading = False
def insert(self, memory_id: str, embedding: np.ndarray) -> None:
if self._index is None:
self._init_index()
if self._index is None:
return
try:
vector = self._normalize(embedding)
internal_id = self._get_internal_id(memory_id)
self._index.add_items(vector, ids=np.array([internal_id]))
except Exception as e:
logger.warning(f"VectorIndex insert failed: {e}")
def search(
self,
query_embedding: np.ndarray,
k: int = 5,
) -> tuple[list[str], list[float]]:
if self._index is None:
return [], []
try:
if self._index.get_current_count() == 0:
return [], []
# Set ef to at least k for proper search
self._index.set_ef(max(k * 2, 50))
vector = self._normalize(query_embedding)
labels, distances = self._index.knn_query(vector, k=k)
memory_ids = [self._get_memory_id(int(label)) for label in labels[0]]
scores = [1.0 - dist for dist in distances[0]]
return memory_ids, scores
except Exception as e:
logger.warning(f"VectorIndex search failed: {e}")
return [], []
def delete(self, memory_id: str) -> bool:
return False
def get_items(self, memory_ids: list[str]) -> np.ndarray:
if self._index is None:
raise RuntimeError("Index not initialized")
internal_ids = [self._get_internal_id(mid) for mid in memory_ids]
return self._index.get_items(np.array(internal_ids))
def save(self) -> None:
if self._index and self._index_path:
try:
self._index_path.parent.mkdir(parents=True, exist_ok=True)
self._index.save_index(str(self._index_path))
except Exception as e:
logger.warning(f"VectorIndex save failed: {e}")
def _load(self) -> None:
if self._loading:
return
self._loading = True
try:
if self._index_path and self._index_path.exists():
self._index = hnswlib.Index(space="l2", dim=self._embedding_dim)
self._index.load_index(
str(self._index_path),
max_elements=self._max_elements
)
except Exception as e:
logger.warning(f"VectorIndex load failed: {e}")
self._init_index()
finally:
self._loading = False
def _normalize(self, vector: np.ndarray) -> np.ndarray:
vec = vector.flatten()
norm = np.linalg.norm(vec)
if norm > 0:
vec = vec / norm
return vec.reshape(1, -1)
def _get_internal_id(self, memory_id: str) -> int:
return hash(memory_id) % (2**31)
def _get_memory_id(self, internal_id: int) -> str:
return str(internal_id)
@property
def embedding_dim(self) -> int:
return self._embedding_dim
@property
def element_count(self) -> int:
return self._index.get_current_count() if self._index else 0

View File

@ -1,98 +0,0 @@
from __future__ import annotations
from typing import Any, Literal
from app.core.contracts import CriticScore, MemoryEntry
class MemoryWritePolicy:
def __init__(
self,
store_threshold: float = 0.7,
min_usefulness: float = 0.3,
max_entries_per_session: int = 50,
) -> None:
self._store_threshold = store_threshold
self._min_usefulness = min_usefulness
self._max_entries_per_session = max_entries_per_session
def decide(
self,
critic_score: CriticScore,
memory_type: MemoryEntry.Kind,
session_id: str | None = None,
has_duplicate: bool = False,
current_session_count: int = 0,
) -> Literal["store", "store_with_weight", "skip", "merge"]:
if critic_score.safety < 0.5:
return "skip"
if has_duplicate:
return "merge"
if not critic_score.memory_store:
return "skip"
if critic_score.usefulness < self._min_usefulness:
return "skip"
if session_id and current_session_count >= self._max_entries_per_session:
return "skip"
base_decision = self._evaluate_scores(critic_score, memory_type)
if base_decision == "store" and critic_score.weight < self._store_threshold:
adjusted_weight = self._adjust_weight(critic_score, memory_type)
if adjusted_weight >= self._store_threshold:
return "store_with_weight"
return base_decision
return base_decision
def _evaluate_scores(
self,
critic_score: CriticScore,
memory_type: MemoryEntry.Kind,
) -> Literal["store", "store_with_weight", "skip", "merge"]:
avg_score = (critic_score.correctness + critic_score.usefulness + critic_score.safety) / 3.0
if memory_type in ("fact", "plan", "summary"):
if avg_score >= 0.8:
return "store"
elif avg_score >= 0.6:
return "store_with_weight"
if memory_type in ("tool_result", "critique"):
if avg_score >= self._store_threshold:
return "store"
elif avg_score >= 0.5:
return "store_with_weight"
if memory_type == "user_preference":
if avg_score >= 0.5:
return "store"
return "skip"
def _adjust_weight(
self,
critic_score: CriticScore,
memory_type: MemoryEntry.Kind,
) -> float:
base_weight = critic_score.weight
type_boost = {
"fact": 0.15,
"plan": 0.1,
"summary": 0.1,
"user_preference": 0.2,
"tool_result": 0.05,
"critique": 0.05,
}.get(memory_type, 0.0)
safety_boost = 0.0
if critic_score.safety >= 0.9:
safety_boost = 0.1
adjusted = base_weight + type_boost + safety_boost
return min(adjusted, 1.0)

View File

@ -1,32 +0,0 @@
LLM_AVAILABLE = False
EMBEDDINGS_AVAILABLE = False
try:
from app.models.adapters import create_adapter, create_llama_adapter
from app.models.orchestrator import OrchestratorAdapter
from app.models.coder import CoderAdapter
from app.models.critic import CriticAdapter
LLM_AVAILABLE = True
except ImportError:
create_adapter = None
create_llama_adapter = None
OrchestratorAdapter = None
CoderAdapter = None
CriticAdapter = None
try:
from app.models.embeddings import EmbeddingsAdapter
EMBEDDINGS_AVAILABLE = True
except ImportError:
EmbeddingsAdapter = None
__all__ = [
"create_adapter",
"create_llama_adapter",
"OrchestratorAdapter",
"CoderAdapter",
"CriticAdapter",
"EmbeddingsAdapter",
"LLM_AVAILABLE",
"EMBEDDINGS_AVAILABLE",
]

View File

@ -1,72 +0,0 @@
from __future__ import annotations
from pathlib import Path
from typing import Any, Protocol, Iterator
import os
try:
from llama_cpp import Llama
LLAMA_AVAILABLE = True
except ImportError:
Llama = None
LLAMA_AVAILABLE = False
class BaseModelAdapter(Protocol):
async def generate(self, prompt: str, **kwargs: Any) -> str: ...
def stream(self, prompt: str, **kwargs: Any) -> Iterator[str]: ...
def create_llama_adapter(
model_path: str,
backend: str = "cpu",
n_gpu_layers: int = 0,
max_tokens: int = 2048,
temperature: float = 0.2,
base_dir: Path | None = None,
) -> "Llama":
if not LLAMA_AVAILABLE:
raise RuntimeError("llama-cpp-python not installed")
if base_dir:
model_path = str(base_dir / model_path)
else:
model_path = str(Path.cwd() / model_path)
return Llama(
model_path=model_path,
n_gpu_layers=n_gpu_layers,
n_ctx=4096,
n_threads=int(os.environ.get("DUCKLM_N_THREADS", max(4, min((os.cpu_count() or 4) // 2, 20)))),
n_threads_batch=-1,
max_tokens=max_tokens,
temperature=temperature,
verbose=False,
)
def create_adapter(
model_type: str,
config: dict[str, Any],
base_dir: Path | None = None,
) -> "Llama":
if not LLAMA_AVAILABLE:
raise RuntimeError("llama-cpp-python not installed")
model_path = config.get("path", "")
backend = config.get("backend", "cpu")
n_gpu_layers = config.get("n_gpu_layers", 0)
max_tokens = config.get("max_tokens", 2048)
temperature = config.get("temperature", 0.2)
if backend == "vulkan" and n_gpu_layers != 0:
n_gpu_layers = -1
return create_llama_adapter(
model_path=model_path,
backend=backend,
n_gpu_layers=n_gpu_layers,
max_tokens=max_tokens,
temperature=temperature,
base_dir=base_dir,
)

View File

@ -1,58 +0,0 @@
from __future__ import annotations
import asyncio
from typing import Any, AsyncIterator
from app.models.orchestrator import OrchestratorAdapter as SyncOrchestrator
class AsyncOrchestratorAdapter:
"""Async wrapper for orchestrator - runs in executor to avoid blocking event loop."""
def __init__(self, sync_adapter: SyncOrchestrator) -> None:
self._sync = sync_adapter
async def generate(self, prompt: str, max_tokens: int | None = None) -> str:
loop = asyncio.get_event_loop()
return await loop.run_in_executor(
None,
lambda: self._sync.generate(prompt, max_tokens)
)
async def stream(self, prompt: str, max_tokens: int | None = None) -> AsyncIterator[str]:
loop = asyncio.get_event_loop()
async def gen():
return list(self._sync.stream(prompt, max_tokens))
result = await loop.run_in_executor(None, gen)
for chunk in result:
yield chunk
class AsyncCoderAdapter:
"""Async wrapper for coder."""
def __init__(self, sync_adapter) -> None:
self._sync = sync_adapter
async def generate(self, prompt: str, max_tokens: int | None = None) -> str:
loop = asyncio.get_event_loop()
return await loop.run_in_executor(
None,
lambda: self._sync.generate(prompt, max_tokens)
)
class AsyncCriticAdapter:
"""Async wrapper for critic."""
def __init__(self, sync_adapter) -> None:
self._sync = sync_adapter
async def generate(self, prompt: str, max_tokens: int | None = None) -> str:
loop = asyncio.get_event_loop()
return await loop.run_in_executor(
None,
lambda: self._sync.generate(prompt, max_tokens)
)

View File

@ -1,44 +0,0 @@
from __future__ import annotations
from threading import RLock
from typing import Any, Iterator
from llama_cpp import Llama
class CoderAdapter:
def __init__(self, llm: Llama, system_prompt: str | None = None, lock: RLock | None = None) -> None:
self._llm = llm
self._lock = lock or RLock()
self._system_prompt = system_prompt or (
"You are an expert code generation model."
)
self._temperature = 0.2
def generate(self, prompt: str, max_tokens: int | None = None) -> str:
messages = [
{"role": "system", "content": self._system_prompt},
{"role": "user", "content": prompt},
]
with self._lock:
output = self._llm.create_chat_completion(
messages=messages,
max_tokens=max_tokens or 1024,
temperature=self._temperature,
)
return output["choices"][0]["message"]["content"]
def stream(self, prompt: str, max_tokens: int | None = None) -> Iterator[str]:
messages = [
{"role": "system", "content": self._system_prompt},
{"role": "user", "content": prompt},
]
with self._lock:
for chunk in self._llm.create_chat_completion(
messages=messages,
max_tokens=max_tokens or 1024,
temperature=self._temperature,
stream=True,
):
content = chunk["choices"][0].get("delta", {}).get("content")
if content:
yield content

View File

@ -1,44 +0,0 @@
from __future__ import annotations
from threading import RLock
from typing import Any, Iterator
from llama_cpp import Llama
class CriticAdapter:
def __init__(self, llm: Llama, system_prompt: str | None = None, lock: RLock | None = None) -> None:
self._llm = llm
self._lock = lock or RLock()
self._system_prompt = system_prompt or (
"You are a critic model. Evaluate tool results and respond with JSON."
)
self._temperature = 0.1
def generate(self, prompt: str, max_tokens: int | None = None) -> str:
messages = [
{"role": "system", "content": self._system_prompt},
{"role": "user", "content": prompt},
]
with self._lock:
output = self._llm.create_chat_completion(
messages=messages,
max_tokens=max_tokens or 512,
temperature=self._temperature,
)
return output["choices"][0]["message"]["content"]
def stream(self, prompt: str, max_tokens: int | None = None) -> Iterator[str]:
messages = [
{"role": "system", "content": self._system_prompt},
{"role": "user", "content": prompt},
]
with self._lock:
for chunk in self._llm.create_chat_completion(
messages=messages,
max_tokens=max_tokens or 512,
temperature=self._temperature,
stream=True,
):
content = chunk["choices"][0].get("delta", {}).get("content")
if content:
yield content

View File

@ -1,37 +0,0 @@
from __future__ import annotations
from pathlib import Path
from typing import Any
import numpy as np
from sentence_transformers import SentenceTransformer
class EmbeddingsAdapter:
def __init__(
self,
model_path: str | Path | None = None,
model_name: str = "sentence-transformers/all-MiniLM-L6-v2",
embedding_dim: int = 384,
) -> None:
self._embedding_dim = embedding_dim
if model_path and Path(model_path).exists():
self._model = SentenceTransformer(str(model_path))
else:
self._model = SentenceTransformer(model_name)
def encode(self, texts: str | list[str]) -> np.ndarray:
is_single = isinstance(texts, str)
if is_single:
texts = [texts]
embeddings = self._model.encode(texts, convert_to_numpy=True)
if is_single:
return embeddings[0]
return embeddings
def encode_batch(self, texts: list[str], batch_size: int = 32) -> np.ndarray:
return self._model.encode(texts, batch_size=batch_size, convert_to_numpy=True)
@property
def embedding_dim(self) -> int:
return self._embedding_dim

View File

@ -1,45 +0,0 @@
from __future__ import annotations
from threading import RLock
from typing import Any, Iterator
from llama_cpp import Llama
class OrchestratorAdapter:
def __init__(self, llm: Llama, system_prompt: str | None = None, lock: RLock | None = None) -> None:
self._llm = llm
self._lock = lock or RLock()
self._system_prompt = system_prompt or (
"You are an expert orchestrator for a local AI agent system. "
"Your role is to analyze the user's task, decide whether planning is needed."
)
self._temperature = 0.2
def generate(self, prompt: str, max_tokens: int | None = None) -> str:
messages = [
{"role": "system", "content": self._system_prompt},
{"role": "user", "content": prompt},
]
with self._lock:
output = self._llm.create_chat_completion(
messages=messages,
max_tokens=max_tokens or 512,
temperature=self._temperature,
)
return output["choices"][0]["message"]["content"]
def stream(self, prompt: str, max_tokens: int | None = None) -> Iterator[str]:
messages = [
{"role": "system", "content": self._system_prompt},
{"role": "user", "content": prompt},
]
with self._lock:
for chunk in self._llm.create_chat_completion(
messages=messages,
max_tokens=max_tokens or 512,
temperature=self._temperature,
stream=True,
):
content = chunk["choices"][0].get("delta", {}).get("content")
if content:
yield content

View File

@ -1,2 +0,0 @@
"""Permission and approval handling."""

View File

@ -1,67 +0,0 @@
from __future__ import annotations
import sqlite3
from pathlib import Path
from app.core.contracts import PermissionDecision
class SQLiteApprovalStore:
"""Stores persistent user approval decisions."""
def __init__(self, db_path: str | Path) -> None:
self._db_path = Path(db_path)
self._db_path.parent.mkdir(parents=True, exist_ok=True)
self._initialize()
def save(self, decision: PermissionDecision) -> PermissionDecision:
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
INSERT OR REPLACE INTO approvals (action_type, pattern, decision, created_at)
VALUES (?, ?, ?, ?)
""",
(
decision.action_type,
decision.pattern,
decision.decision,
decision.created_at.isoformat(),
),
)
conn.commit()
return decision
def load(self, action_type: str, pattern: str) -> PermissionDecision | None:
with sqlite3.connect(self._db_path) as conn:
row = conn.execute(
"""
SELECT action_type, pattern, decision, created_at
FROM approvals
WHERE action_type = ? AND pattern = ?
""",
(action_type, pattern),
).fetchone()
if not row:
return None
return PermissionDecision(
action_type=row[0],
pattern=row[1],
decision=row[2],
created_at=row[3],
)
def _initialize(self) -> None:
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
CREATE TABLE IF NOT EXISTS approvals (
action_type TEXT NOT NULL,
pattern TEXT NOT NULL,
decision TEXT NOT NULL,
created_at TEXT NOT NULL,
PRIMARY KEY (action_type, pattern)
)
"""
)
conn.commit()

View File

@ -1,2 +0,0 @@
"""Runtime loop and execution coordination."""

View File

@ -1,148 +0,0 @@
from __future__ import annotations
import asyncio
from app.core.context_builder import ContextBuilder
from app.core.contracts import ExecutionDirective, PermissionDecision, PermissionRequest, RuntimeEvent, TaskCheckpoint, UserTask
from app.core.execution_engine import ExecutionEngine
from app.core.async_router import AsyncRouter
from app.events.event_bus import EventBus
from app.events.event_types import CHECKPOINT_SAVED, CONTEXT_BUILT, TASK_AWAITING_PERMISSION, TASK_COMPLETED, TASK_FAILED, TASK_RECEIVED
from app.core.permission_service import PermissionService
from app.state.checkpoint_store import SQLiteCheckpointStore
from app.state.task_state_store import SQLiteTaskStateStore
class AsyncRuntimeLoop:
"""Async runtime loop using LLM orchestrator."""
def __init__(
self,
event_bus: EventBus,
task_state_store: SQLiteTaskStateStore,
checkpoint_store: SQLiteCheckpointStore,
context_builder: ContextBuilder,
router: AsyncRouter,
execution_engine: ExecutionEngine,
permission_service: PermissionService,
memory_interface=None,
) -> None:
self._event_bus = event_bus
self._task_state_store = task_state_store
self._checkpoint_store = checkpoint_store
self._context_builder = context_builder
self._router = router
self._execution_engine = execution_engine
self._permission_service = permission_service
self._memory_interface = memory_interface
async def run_task(self, task: UserTask) -> dict[str, object]:
state = self._task_state_store.create_task(
task.task_id,
{
"status": "received",
"session_id": task.session_id,
"plan": None,
"task_input": task.input,
"task_context": task.context,
},
)
self._publish(task, TASK_RECEIVED, {"status": "received"})
checkpoint = TaskCheckpoint(task_id=task.task_id, status="received")
self._checkpoint_store.save(checkpoint)
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
context = self._context_builder.build(task=task, checkpoint=checkpoint)
self._publish(task, CONTEXT_BUILT, {"keys": sorted(context.keys())})
directive = await self._router.decide(state=state, context=context, task_id=task.task_id, session_id=task.session_id)
execution_result = await asyncio.to_thread(
self._execution_engine.execute,
task=task,
directive=directive,
)
state_patch = {"status": execution_result["status"], "last_directive": directive.model_dump(mode="json")}
if execution_result["status"] == "awaiting_permission":
state_patch["pending_permission_request"] = execution_result["result"].get("permission_request")
self._task_state_store.update_task(task.task_id, state_patch)
status = execution_result["status"]
if status == "completed":
self._publish(task, TASK_COMPLETED, {"directive": directive.model_dump(mode="json"), "execution_result": execution_result["result"]})
elif status == "failed":
self._publish(task, TASK_FAILED, {"error": execution_result.get("result", {}).get("error")})
checkpoint.status = status
self._checkpoint_store.save(checkpoint)
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
# Save task and result to memory for session context
self._save_to_memory(task, execution_result, status)
return {
"task_id": task.task_id,
"status": status,
"directive": directive.model_dump(mode="json"),
"result": execution_result.get("result"),
"events": list(self._event_bus.get_task_events(task.task_id)),
}
def _publish(self, task: UserTask, event_type: str, payload: dict) -> None:
if not self._event_bus:
return
event = RuntimeEvent(
task_id=task.task_id,
session_id=task.session_id,
sequence=self._event_bus.next_sequence(task.task_id),
type=event_type,
payload=payload,
)
self._event_bus.publish(event)
def _save_to_memory(self, task: UserTask, execution_result: dict, status: str) -> None:
"""Save task input and result to memory for session context."""
if not self._memory_interface:
return
try:
# Save task input as summary
self._memory_interface.insert(
text=f"User request: {task.input}",
kind="summary",
source="user",
task_id=task.task_id,
session_id=task.session_id,
weight=0.8,
metadata={"status": status},
)
# Save execution result
result_text = ""
if status == "completed":
step_results = execution_result.get("result", {}).get("step_results", [])
if step_results:
for step in step_results:
tool_result = step.get("result", {}).get("result", {})
if tool_result.get("output"):
result_text += f" | {step.get('step_id')}: {tool_result.get('output')[:200]}"
elif status == "failed":
result_text = f" | Error: {execution_result.get('result', {}).get('error', 'Unknown')}"
if result_text:
self._memory_interface.insert(
text=f"Result: {status}{result_text}",
kind="tool_result",
source="system",
task_id=task.task_id,
session_id=task.session_id,
weight=0.7,
metadata={"status": status},
)
except Exception as e:
import logging
logging.getLogger(__name__).warning(f"Failed to save to memory: {e}")

View File

@ -1,643 +0,0 @@
from __future__ import annotations
import json
from concurrent.futures import Future, ThreadPoolExecutor
from threading import RLock
from pathlib import Path
from app.core.config import AppConfig, load_app_config
from app.core.context_builder import ContextBuilder
from app.core.command_analyzer import CommandAnalyzer
from app.core.contracts import UserTask
from app.core.execution_engine import ExecutionEngine
from app.core.execution_scheduler import ExecutionScheduler
from app.core.async_router import AsyncRouter
from app.events.event_bus import EventBus
from app.events.event_store import SQLiteEventStore
from app.memory import MemoryInterface, MemoryStore, VectorIndex
from app.memory.recall import MemoryRecallService
from app.memory.write_policy import MemoryWritePolicy
from app.models import (
CoderAdapter,
CriticAdapter,
EmbeddingsAdapter,
OrchestratorAdapter,
create_adapter,
)
from app.models.async_adapters import AsyncOrchestratorAdapter, AsyncCriticAdapter, AsyncCoderAdapter
from app.permissions.approval_store import SQLiteApprovalStore
from app.core.permission_service import PermissionService
from app.runtime.runtime_loop import RuntimeLoop
from app.state.checkpoint_store import SQLiteCheckpointStore
from app.state.task_state_store import SQLiteTaskStateStore
from app.tools.file_read import FileReadTool
from app.tools.file_write import FileWriteTool
from app.tools.registry import ToolRegistry
from app.tools.sandbox import ToolSandbox
from app.tools.shell_exec import ShellExecTool
from app.tools.memory_tools import MemoryInsertTool, MemorySearchTool, MemoryListTool
class RuntimeController:
"""Composition root for the ducklm runtime."""
def __init__(self, base_dir: str | Path | None = None) -> None:
self.base_dir = Path(base_dir or Path(__file__).resolve().parents[2])
self.config: AppConfig = load_app_config(self.base_dir / "config")
self.event_bus = EventBus(
SQLiteEventStore(self.base_dir / "data" / "events" / "events.sqlite3")
)
self.task_state_store = SQLiteTaskStateStore(
self.base_dir / "data" / "state" / "task_state.sqlite3"
)
self.checkpoint_store = SQLiteCheckpointStore(
self.base_dir / "data" / "state" / "checkpoints.sqlite3"
)
self.approval_store = SQLiteApprovalStore(
self.base_dir / "data" / "permissions" / "approvals.sqlite3"
)
self._thinker: OrchestratorAdapter | None = None
self._json_compiler: OrchestratorAdapter | None = None
self._orchestrator: OrchestratorAdapter | None = None
self._coder: CoderAdapter | None = None
self._critic: CriticAdapter | None = None
self._sys_util: OrchestratorAdapter | None = None
self._model_cache: dict[tuple[object, ...], tuple[object, RLock]] = {}
self._memory_interface: MemoryInterface | None = None
self._memory_policy: MemoryWritePolicy | None = None
self._background_executor = ThreadPoolExecutor(max_workers=4, thread_name_prefix="ducklm-task")
self._background_tasks: dict[str, Future[dict[str, object]]] = {}
self.tool_registry = None
self.tool_sandbox = None
self._init_models()
self._init_memory()
runtime_config = self.config.runtime
self.tool_sandbox = ToolSandbox(
allowed_root=self.base_dir,
timeout_ms=runtime_config.step_timeout_ms,
command_timeout_ms=runtime_config.shell_command_timeout_ms,
idle_timeout_ms=runtime_config.shell_idle_timeout_ms,
)
self.tool_registry = self._create_tool_registry()
context_config = {
"max_context_tokens": runtime_config.max_context_tokens,
"context_budgets": runtime_config.context_budgets,
"reserve_for_generation_pct": runtime_config.reserve_for_generation_pct,
}
self.context_builder = ContextBuilder(
memory_interface=self._memory_interface,
tool_registry=self.tool_registry,
config=context_config,
)
self._prompts = self._load_prompts()
# ensure sys_util prompt is present in prompts dict for router
# ensure sys_util prompt is available to router (prompts.json may have "sys_util" key)
if "sys_util" not in self._prompts and "prompts" in self.config:
self._prompts["sys_util"] = self.config.get("sys_util")
self.context_builder = ContextBuilder(
memory_interface=self._memory_interface,
tool_registry=self.tool_registry,
config=context_config,
)
self.router = AsyncRouter(
thinker=None,
json_compiler=None,
intent_parser=None,
prompts=self._prompts,
event_bus=self.event_bus,
tool_registry=self.tool_registry,
retry_limit=runtime_config.orchestrator_retry_limit,
debug=runtime_config.debug if hasattr(runtime_config, 'debug') else False,
log_length=runtime_config.debug_orchestrator_log_length if hasattr(runtime_config, 'debug_orchestrator_log_length') else 500,
json_fix_retry_limit=runtime_config.json_fix_retry_limit if hasattr(runtime_config, 'json_fix_retry_limit') else 2,
json_fix_use_sys_util=runtime_config.json_fix_use_sys_util if hasattr(runtime_config, "json_fix_use_sys_util") else True,
intent_classifier=runtime_config.intent_classifier if hasattr(runtime_config, "intent_classifier") else "thinker",
)
self.permission_service = PermissionService(
config=self._load_permissions_config(),
)
self.command_analyzer = CommandAnalyzer(self.permission_service)
self.execution_engine = ExecutionEngine(
event_bus=self.event_bus,
tool_registry=self.tool_registry,
permission_service=self.permission_service,
scheduler=ExecutionScheduler(
retry_limit=runtime_config.planner_retry_limit
),
critic=self._critic,
memory_policy=self._memory_policy,
memory_interface=self._memory_interface,
prompts=self._prompts,
recovery_limit=runtime_config.tool_retry_limit,
critic_retry_limit=runtime_config.critic_retry_limit,
command_analyzer=self.command_analyzer,
)
self.runtime_loop = RuntimeLoop(
event_bus=self.event_bus,
task_state_store=self.task_state_store,
checkpoint_store=self.checkpoint_store,
context_builder=self.context_builder,
router=self.router,
execution_engine=self.execution_engine,
permission_service=self.permission_service,
memory_interface=self._memory_interface,
)
def _load_prompts(self) -> dict[str, str]:
prompts_dir = self.base_dir / "config" / "prompts"
prompts = {}
if prompts_dir.is_dir():
for md_file in prompts_dir.glob("*.md"):
role = md_file.stem
prompts[role] = md_file.read_text(encoding="utf-8")
if prompts:
return prompts
prompts_file = self.base_dir / "config" / "prompts.json"
if prompts_file.exists():
with open(prompts_file) as f:
return json.load(f)
return {}
def _load_permissions_config(self) -> dict:
permissions_file = self.base_dir / "config" / "permissions.json"
if not permissions_file.exists():
return {}
with permissions_file.open("r", encoding="utf-8") as handle:
return json.load(handle)
def _init_models(self) -> None:
try:
memory_config = self.config.runtime.memory_thresholds or {}
if memory_config:
self._memory_policy = MemoryWritePolicy(
store_threshold=memory_config.get("default_store_weight", 0.8),
)
print("Models policy ready")
except Exception as e:
print(f"Models init failed: {e}")
def load_models_at_startup(self) -> None:
"""Load all LLM models synchronously. Called from startup hook in executor."""
import os
os.chdir(str(self.base_dir / "models"))
try:
print("Loading thinker model...")
thinker_config = self.config.models.thinker or {}
if thinker_config.get("path"):
llm, lock = self._get_or_create_llm("thinker", thinker_config)
self._thinker = OrchestratorAdapter(llm, system_prompt=self._prompts.get("thinker"), lock=lock)
print(f"Thinker loaded: {self._thinker} (model: {thinker_config.get('path')})")
print("Loading json_compiler model...")
compiler_config = self.config.models.json_compiler or {}
if compiler_config.get("path"):
llm, lock = self._get_or_create_llm("json_compiler", compiler_config)
self._json_compiler = OrchestratorAdapter(llm, system_prompt=self._prompts.get("json_compiler"), lock=lock)
print(f"JSON Compiler loaded: {self._json_compiler} (model: {compiler_config.get('path')})")
print("Loading coder model...")
coder_config = self.config.models.coder or {}
if coder_config.get("path"):
llm, lock = self._get_or_create_llm("coder", coder_config)
self._coder = CoderAdapter(llm, system_prompt=self._prompts.get("coder"), lock=lock)
print(f"Coder loaded: {self._coder} (model: {coder_config.get('path')})")
print("Loading critic model...")
critic_config = self.config.models.critic or {}
if critic_config.get("path"):
llm, lock = self._get_or_create_llm("critic", critic_config)
self._critic = CriticAdapter(llm, system_prompt=self._prompts.get("critic"), lock=lock)
print(f"Critic loaded: {self._critic} (model: {critic_config.get('path')})")
print("Loading sys_util model...")
sys_util_config = self.config.models.sys_util or {}
if sys_util_config.get("path"):
llm, lock = self._get_or_create_llm("sys_util", sys_util_config)
self._sys_util = OrchestratorAdapter(llm, system_prompt=self._prompts.get("sys_util"), lock=lock)
print(f"Sys_util loaded: {self._sys_util} (model: {sys_util_config.get('path')})")
print("All models loaded successfully")
async_thinker = AsyncOrchestratorAdapter(self._thinker) if self._thinker else None
async_compiler = AsyncOrchestratorAdapter(self._json_compiler) if self._json_compiler else None
async_coder = AsyncCoderAdapter(self._coder) if self._coder else None
async_critic = AsyncCriticAdapter(self._critic) if self._critic else None
async_sys_util = AsyncOrchestratorAdapter(self._sys_util) if self._sys_util else None
self.router.set_thinker(async_thinker)
self.router.set_json_compiler(async_compiler)
self.router.set_sys_util(async_sys_util)
self.router.set_tool_registry(self.tool_registry)
if async_critic:
self.execution_engine.set_critic(async_critic)
if async_coder:
self.execution_engine.set_coder(async_coder)
# Create MemoryRecallService using the configured model (default: sys_util)
# Reuses already-loaded async adapter — no duplicate model loading
recall_model_name = self.config.runtime.recall_model
recall_async_model = {
"sys_util": async_sys_util,
"thinker": async_thinker,
"json_compiler": async_compiler,
"critic": async_critic,
"coder": async_coder,
}.get(recall_model_name, async_sys_util)
self._recall_service = MemoryRecallService(
memory_interface=self._memory_interface,
recall_model=recall_async_model,
)
self.runtime_loop.set_recall_service(self._recall_service)
print(f"MemoryRecallService initialized with model: {recall_model_name}")
# Set memory policy in runtime loop
self.runtime_loop.set_memory_policy(self._memory_policy)
print(f"MemoryWritePolicy set: {self._memory_policy is not None}")
except Exception as e:
print(f"Failed to load models at startup: {e}")
raise RuntimeError(f"Model loading failed: {e}") from e
def _model_cache_key(self, model_config: dict) -> tuple[object, ...]:
path = str((self.base_dir / "models" / model_config.get("path", "")).resolve())
return (
path,
model_config.get("backend", "cpu"),
model_config.get("n_gpu_layers", 0),
model_config.get("n_ctx", 4096),
)
def _get_or_create_llm(self, model_type: str, model_config: dict):
key = self._model_cache_key(model_config)
cached = self._model_cache.get(key)
if cached:
print(f"Reusing model instance: {model_config.get('path')} for {model_type}")
return cached
llm = create_adapter(model_type, model_config, self.base_dir / "models")
lock = RLock()
cached = (llm, lock)
self._model_cache[key] = cached
return cached
def _init_memory(self) -> None:
try:
emb_config = self.config.models.embeddings or {}
model_path = self.base_dir / emb_config.get("path", "models/all-MiniLM-L6-v2")
if not model_path.exists() and not Path(emb_config.get("path", "")).is_absolute():
model_path = self.base_dir / "models" / emb_config.get("path", "all-MiniLM-L6-v2")
if not model_path.exists():
print(f"Memory init skipped: embeddings model not found at {model_path}")
self._memory_interface = None
return
embeddings = EmbeddingsAdapter(
model_path=model_path,
embedding_dim=emb_config.get("embedding_dim", 384),
)
store = MemoryStore(
self.base_dir / "data" / "memory" / "memory.sqlite3"
)
vector_index = VectorIndex(
index_path=self.base_dir / "data" / "memory" / "index.bin",
embedding_dim=embeddings.embedding_dim,
)
self._memory_interface = MemoryInterface(store, vector_index, embeddings)
except Exception as e:
print(f"Memory init failed: {e}")
self._memory_interface = None
def _create_tool_registry(self) -> ToolRegistry:
from app.tools.registry import ToolRegistry
from app.tools.plugins.shell_exec import Tool as ShellExecTool
from app.tools.plugins.file_read import Tool as FileReadTool
from app.tools.plugins.file_write import Tool as FileWriteTool
from app.tools.plugins.memory_tools import Tool as MemoryTool
from app.tools.discover import ToolDiscovery
registry = ToolRegistry()
tool_init_map = {
"shell_exec": lambda m: ShellExecTool(self.tool_sandbox),
"file_read": lambda m: FileReadTool(self.tool_sandbox),
"file_write": lambda m: FileWriteTool(self.tool_sandbox),
"memory": lambda m: MemoryTool(self._memory_interface),
}
discovery = ToolDiscovery()
discovered = discovery.discover()
for name, data in discovered.items():
init_fn = tool_init_map.get(name)
if init_fn:
tool = init_fn(data.get("manifest", {}))
registry.register(tool)
registry._schemas[name] = {
"description": data.get("manifest", {}).get("description", ""),
"args_schema": data.get("manifest", {}).get("args_schema", {}),
"requires_permission": data.get("manifest", {}).get("requires_permission", False),
}
print(f"Registered tool: {name}")
else:
print(f"No init mapping for tool: {name} - skipping")
return registry
@property
def orchestrator(self) -> OrchestratorAdapter | None:
return self._orchestrator
@property
def coder(self) -> CoderAdapter | None:
return self._coder
@property
def critic(self) -> CriticAdapter | None:
return self._critic
@property
def memory_interface(self) -> MemoryInterface | None:
return self._memory_interface
def _ensure_orchestrator(self) -> OrchestratorAdapter | None:
if self._orchestrator is not None:
return self._orchestrator
try:
orch_config = self.config.models.orchestrator or {}
if orch_config.get("path"):
llm, lock = self._get_or_create_llm("orchestrator", orch_config)
self._orchestrator = OrchestratorAdapter(llm, lock=lock)
except Exception as e:
print(f"Orchestrator load failed: {e}")
return self._orchestrator
def _ensure_critic(self) -> CriticAdapter | None:
if self._critic is not None:
return self._critic
try:
critic_config = self.config.models.critic or {}
if critic_config.get("path"):
llm, lock = self._get_or_create_llm("critic", critic_config)
self._critic = CriticAdapter(llm, lock=lock)
except Exception as e:
print(f"Critic load failed: {e}")
return self._critic
def handle_task(self, task: UserTask) -> dict[str, object]:
return self.runtime_loop.run_task(task)
def submit_task(self, task: UserTask) -> dict[str, object]:
self._background_tasks[task.task_id] = self._background_executor.submit(
self.handle_task,
task,
)
return {"task_id": task.task_id, "status": "accepted"}
def resolve_permission(self, task_id: str, decision: str) -> dict[str, object]:
return self.runtime_loop.resolve_permission(
task_id=task_id, decision=decision
)
def submit_permission_resolution(self, task_id: str, decision: str) -> dict[str, object]:
if not self.task_state_store.get_task(task_id):
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
self._background_tasks[task_id] = self._background_executor.submit(
self.resolve_permission,
task_id,
decision,
)
return {"task_id": task_id, "status": "accepted"}
def resolve_secret(self, task_id: str, secret: str) -> dict[str, object]:
return self.runtime_loop.resolve_secret(
task_id=task_id, secret=secret
)
def submit_secret_resolution(self, task_id: str, secret: str) -> dict[str, object]:
if not self.task_state_store.get_task(task_id):
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
self._background_tasks[task_id] = self._background_executor.submit(
self.resolve_secret,
task_id,
secret,
)
return {"task_id": task_id, "status": "accepted"}
def resolve_password(self, task_id: str, password: str) -> dict[str, object]:
return self.runtime_loop.resolve_password(
task_id=task_id, password=password
)
def resolve_review(self, task_id: str, decision: str, correction: str | None = None) -> dict[str, object]:
return self.runtime_loop.resolve_review(
task_id=task_id,
decision=decision,
correction=correction,
)
def submit_review_resolution(self, task_id: str, decision: str, correction: str | None = None) -> dict[str, object]:
if not self.task_state_store.get_task(task_id):
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
self._background_tasks[task_id] = self._background_executor.submit(
self.resolve_review,
task_id,
decision,
correction,
)
return {"task_id": task_id, "status": "accepted"}
def submit_password_resolution(self, task_id: str, password: str) -> dict[str, object]:
if not self.task_state_store.get_task(task_id):
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
self._background_tasks[task_id] = self._background_executor.submit(
self.resolve_password,
task_id,
password,
)
return {"task_id": task_id, "status": "accepted"}
def handle_critic_feedback(
self,
feedback: str,
task_id: str | None = None,
session_id: str | None = None,
feedback_type: str | None = None,
severity: str | None = None,
correction: str | None = None,
remember: bool = True,
retry: bool = False,
assistant_answer: str | None = None,
correctness_override: float | None = None,
usefulness_override: float | None = None,
safety_override: float | None = None,
) -> dict[str, object]:
target_task_id = task_id
target_session_id = session_id
if not target_session_id and not target_task_id:
return {
"status": "error",
"message": "Either task_id or session_id must be provided",
}
state = self.task_state_store.get_task(target_task_id) if target_task_id else None
if not target_session_id and state:
target_session_id = state.get("session_id")
if not target_task_id and target_session_id:
recent_tasks = self.task_state_store.get_session_tasks(target_session_id, limit=1)
if recent_tasks:
target_task_id = recent_tasks[0]["task_id"]
min_weight = 0.3
max_weight = 0.95
user_weight = 0.9
final_weight = max(min_weight, min(max_weight, user_weight))
task_input = state.get("task_input") if state else None
last_directive = state.get("last_directive") if state else None
feedback_type = feedback_type or "other"
severity = severity or "major"
lesson = self._build_feedback_lesson(
feedback_type=feedback_type,
severity=severity,
feedback=feedback,
correction=correction,
task_input=task_input,
)
metadata = {
"feedback_text": feedback,
"feedback_type": feedback_type,
"severity": severity,
"correction": correction,
"assistant_answer": assistant_answer,
"task_input": task_input,
"last_directive": last_directive,
"overrides": {
"correctness": correctness_override,
"usefulness": usefulness_override,
"safety": safety_override,
},
"source": "user",
}
feedback_text = lesson
if correctness_override is not None:
feedback_text += f" | Correctness corrected to: {correctness_override}"
if usefulness_override is not None:
feedback_text += f" | Usefulness corrected to: {usefulness_override}"
if safety_override is not None:
feedback_text += f" | Safety corrected to: {safety_override}"
retry_result = None
stored = False
store_error = None
try:
if remember and self._memory_interface:
self._memory_interface.insert(
text=feedback_text,
kind="critique",
source="user",
task_id=target_task_id,
session_id=target_session_id,
weight=final_weight,
metadata=metadata,
)
stored = True
elif remember and not self._memory_interface:
store_error = "Memory not available"
except Exception as e:
store_error = str(e)
if retry and task_input:
retry_input = self._build_retry_input(
task_input=task_input,
feedback=feedback,
feedback_type=feedback_type,
correction=correction,
)
retry_task = UserTask(
session_id=target_session_id or "feedback-retry",
input=retry_input,
context={
"feedback_retry": True,
"original_task_id": target_task_id,
"feedback_type": feedback_type,
"severity": severity,
"correction": correction,
},
)
retry_result = self.handle_task(retry_task)
status = "ok" if stored or not remember else "error"
return {
"status": status,
"message": "Feedback saved" if stored else (store_error or "Feedback accepted"),
"stored": stored,
"task_id": target_task_id,
"session_id": target_session_id,
"lesson": lesson,
"retry_result": retry_result,
}
def _build_feedback_lesson(
self,
feedback_type: str,
severity: str,
feedback: str,
correction: str | None,
task_input: str | None,
) -> str:
parts = [
"User critique lesson.",
f"Error type: {feedback_type}.",
f"Severity: {severity}.",
]
if task_input:
parts.append(f"Original task: {task_input}")
if feedback:
parts.append(f"What was wrong: {feedback}")
if correction:
parts.append(f"Preferred correction: {correction}")
return " | ".join(parts)
def _build_retry_input(
self,
task_input: str,
feedback: str,
feedback_type: str,
correction: str | None,
) -> str:
retry_input = (
f"Повтори задачу с учетом обратной связи.\n"
f"Исходная задача: {task_input}\n"
f"Тип ошибки: {feedback_type}\n"
f"Что было неверно: {feedback}\n"
)
if correction:
retry_input += f"Как должно быть: {correction}\n"
return retry_input

View File

@ -1,688 +0,0 @@
from __future__ import annotations
import asyncio
from app.core.context_builder import ContextBuilder
from app.core.contracts import CriticScore, ExecutionDirective, PermissionDecision, PermissionRequest, RuntimeEvent, SecretRequest, TaskCheckpoint, UserTask
from app.core.execution_engine import ExecutionEngine
from app.core.async_router import AsyncRouter
from app.events.event_bus import EventBus
from app.events.event_types import CHECKPOINT_SAVED, CONTEXT_BUILT, MEMORY_RECALL_USED, MEMORY_WRITE_DECIDED, REVIEW_RESOLVED, TASK_AWAITING_INPUT, TASK_AWAITING_PERMISSION, TASK_AWAITING_REVIEW, TASK_COMPLETED, TASK_FAILED, TASK_RECEIVED
from app.core.permission_service import PermissionService
from app.memory.recall import MemoryRecallService
from app.memory.write_policy import MemoryWritePolicy
from app.state.checkpoint_store import SQLiteCheckpointStore
from app.state.task_state_store import SQLiteTaskStateStore
def _build_response_directive(execution_result: dict) -> dict | None:
"""Build a response_directive from step_results or direct output for the client."""
result = execution_result.get("result", {})
# Case 1: step_results from plan execution
step_results = result.get("step_results")
if step_results:
response_parts = []
for step in step_results:
result_data = step.get("result", {})
tool_result = result_data.get("result", result_data)
if tool_result.get("ok") and tool_result.get("output"):
response_parts.append(str(tool_result["output"]))
if response_parts:
response_text = "\n\n".join(response_parts)
return ExecutionDirective(
type="respond", payload={"text": response_text}
).model_dump(mode="json")
# Case 2: direct tool output (e.g. from resolve_secret -> execute_tool)
if result.get("ok") and result.get("output"):
return ExecutionDirective(
type="respond", payload={"text": str(result["output"])}
).model_dump(mode="json")
return None
class RuntimeLoop:
"""Central control loop skeleton coordinating task state and events."""
def __init__(
self,
event_bus: EventBus,
task_state_store: SQLiteTaskStateStore,
checkpoint_store: SQLiteCheckpointStore,
context_builder: ContextBuilder,
router: AsyncRouter,
execution_engine: ExecutionEngine,
permission_service: PermissionService,
memory_interface=None,
recall_service: MemoryRecallService | None = None,
memory_policy: MemoryWritePolicy | None = None,
) -> None:
self._event_bus = event_bus
self._task_state_store = task_state_store
self._checkpoint_store = checkpoint_store
self._context_builder = context_builder
self._router = router
self._execution_engine = execution_engine
self._permission_service = permission_service
self._memory_interface = memory_interface
self._recall_service = recall_service
self._memory_policy = memory_policy
def set_recall_service(self, recall_service: MemoryRecallService) -> None:
self._recall_service = recall_service
def set_memory_policy(self, policy: MemoryWritePolicy | None) -> None:
self._memory_policy = policy
def run_task(self, task: UserTask) -> dict[str, object]:
# Check input for hard-stop commands BEFORE processing
hard_stop_check = self._permission_service.check_shell_command(
task_id=task.task_id,
session_id=task.session_id,
command=task.input,
)
if hard_stop_check.get("decision") == "hard_stop":
# Immediately reject hard-stop commands
self._publish(task, TASK_RECEIVED, {"status": "received"})
checkpoint = TaskCheckpoint(task_id=task.task_id, status="received")
self._checkpoint_store.save(checkpoint)
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
error_msg = f"⚠️ BLOCKED: {hard_stop_check.get('reason', 'Hard stop command')}"
self._publish(task, TASK_FAILED, {
"directive": {},
"execution_result": {"error": error_msg},
})
return {
"task_id": task.task_id,
"status": "failed",
"directive": {},
"result": {"error": error_msg},
"events": [e.model_dump(mode="json") for e in self._event_bus.list_for_task(task.task_id)],
}
state = self._task_state_store.create_task(
task.task_id,
{
"status": "received",
"session_id": task.session_id,
"plan": None,
"task_input": task.input,
"task_context": task.context,
},
)
self._publish(task, TASK_RECEIVED, {"status": "received"})
checkpoint = TaskCheckpoint(task_id=task.task_id, status="received")
self._checkpoint_store.save(checkpoint)
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
context = self._context_builder.build(task=task, checkpoint=checkpoint)
self._publish(task, CONTEXT_BUILT, {"keys": sorted(context.keys())})
# Active memory recall: system decides if it needs to search memory
recall_result = asyncio.run(self._run_recall(task))
if recall_result["should_recall"]:
context["memory_recall"] = {
"query": recall_result["query"],
"summary": recall_result["summary"],
"entries": [
{"text": e.text, "kind": e.kind, "weight": e.weight}
for e in recall_result["results"]
],
}
self._publish(task, MEMORY_RECALL_USED, {
"query": recall_result["query"],
"results_count": len(recall_result["results"]),
"reason": recall_result["reason"],
})
directive = asyncio.run(
self._router.decide(state=state, context=context, task_id=task.task_id, session_id=task.session_id)
)
execution_result = self._execution_engine.execute(task=task, directive=directive)
state_patch = {"status": execution_result["status"], "last_directive": directive.model_dump(mode="json")}
if execution_result["status"] == "awaiting_permission":
state_patch["pending_permission_request"] = execution_result["result"]["permission_request"]
state_patch["pending_secret_request"] = None
state_patch["resolved_permission_decision"] = None
elif execution_result["status"] == "awaiting_input":
state_patch["pending_permission_request"] = None
state_patch["pending_secret_request"] = execution_result["result"]["secret_request"]
state_patch["resolved_permission_decision"] = None
elif execution_result["status"] == "awaiting_password":
state_patch["pending_permission_request"] = None
state_patch["pending_secret_request"] = None
state_patch["resolved_permission_decision"] = None
state_patch["pending_password_request"] = {
"command": execution_result["result"].get("command", ""),
"reason": "Permission denied - требуется sudo пароль",
"attempts": 0,
}
elif execution_result["status"] == "awaiting_review":
state_patch["pending_permission_request"] = None
state_patch["pending_secret_request"] = None
state_patch["resolved_permission_decision"] = None
state_patch["pending_review"] = execution_result["result"]["review"]
else:
state_patch["pending_permission_request"] = None
state_patch["pending_secret_request"] = None
state_patch["resolved_permission_decision"] = None
state_patch["pending_review"] = None
self._task_state_store.update_task(task.task_id, state_patch)
final_status = str(execution_result["status"])
# For awaiting states - do NOT mark task as completed, keep it in pending state
if final_status in ("awaiting_permission", "awaiting_input", "awaiting_password", "awaiting_review"):
# Task stays in pending state, don't update to completed
pass
else:
self._task_state_store.update_task(task.task_id, {"status": final_status})
final_checkpoint = TaskCheckpoint(
task_id=task.task_id,
status=final_status,
context_snapshot=context,
)
self._checkpoint_store.save(final_checkpoint)
# Generate response for user
# Case 1: step_results from plan execution
if final_status == "completed" and execution_result.get("result", {}).get("step_results"):
step_results = execution_result["result"]["step_results"]
response_parts = []
for step in step_results:
result_data = step.get("result", {})
tool_result = result_data.get("result", result_data)
if tool_result.get("ok") and tool_result.get("output"):
response_parts.append(tool_result["output"])
if response_parts:
response_text = "\n\n".join(response_parts)
execution_result["response_directive"] = ExecutionDirective(
type="respond", payload={"text": response_text}
).model_dump(mode="json")
# Case 2: respond directive from orchestrator (direct response, no steps)
if final_status == "completed" and not execution_result.get("response_directive"):
# Use the original directive from router.decide()
if hasattr(directive, "type") and directive.type == "respond":
if directive.payload.get("text"):
execution_result["response_directive"] = directive.model_dump(mode="json")
elif isinstance(directive, dict) and directive.get("type") == "respond":
if directive.get("payload", {}).get("text"):
execution_result["response_directive"] = directive
# Map status to terminal event type
if final_status == "completed":
terminal_event_type = TASK_COMPLETED
elif final_status == "failed":
terminal_event_type = TASK_FAILED
elif final_status == "awaiting_permission":
terminal_event_type = TASK_AWAITING_PERMISSION
elif final_status == "awaiting_input":
terminal_event_type = TASK_AWAITING_INPUT
elif final_status == "awaiting_review":
terminal_event_type = TASK_AWAITING_REVIEW
elif final_status == "awaiting_password":
terminal_event_type = TASK_AWAITING_PERMISSION
else:
terminal_event_type = TASK_FAILED
self._publish(
task,
terminal_event_type,
{
"directive": directive.model_dump(mode="json"),
"execution_result": execution_result["result"],
},
)
# Save task and result to memory for session context
self._save_to_memory(task, execution_result, final_status)
return {
"task_id": task.task_id,
"status": final_status,
"directive": directive.model_dump(mode="json"),
"result": {
**execution_result["result"],
"response_directive": execution_result.get("response_directive"),
},
"events": [event.model_dump(mode="json") for event in self._event_bus.list_for_task(task.task_id)],
}
def resolve_permission(self, task_id: str, decision: str) -> dict[str, object]:
state = self._task_state_store.get_task(task_id)
if not state:
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
pending_request_payload = state.get("pending_permission_request")
last_directive_payload = state.get("last_directive")
if not pending_request_payload or not last_directive_payload:
return {"task_id": task_id, "status": "failed", "result": {"error": "No pending permission request"}}
task = UserTask(
task_id=task_id,
session_id=state["session_id"],
input=state["task_input"],
context=state.get("task_context", {}),
)
# Get command from pending request
command = pending_request_payload.get("command", "")
# Resolve permission using new service
resolved = self._permission_service.resolve_permission(
task_id=task_id,
session_id=state["session_id"],
command=command,
decision=decision,
)
if decision == "deny":
execution_result = {
"status": "failed",
"result": {
"error": "Permission denied by user.",
"permission_decision": resolved,
},
}
elif decision == "allow_with_password":
directive = ExecutionDirective.model_validate(last_directive_payload)
self._task_state_store.update_task(
task.task_id,
{
"status": "awaiting_password",
"pending_password_request": {
"command": command,
"reason": pending_request_payload.get("reason", "Требуется пароль для выполнения команды"),
"attempts": 0,
},
"pending_permission_request": None,
},
)
self._publish(task, TASK_AWAITING_PERMISSION, {
"password_required": True,
"command": command,
})
return {
"task_id": task_id,
"status": "awaiting_password",
"result": {"message": "Требуется ввод пароля"},
}
else:
directive = ExecutionDirective.model_validate(last_directive_payload)
execution_result = self._execution_engine.execute(
task=task,
directive=directive,
)
final_status = str(execution_result["status"])
if decision != "allow_with_password":
self._task_state_store.update_task(
task.task_id,
{
"status": final_status,
"pending_permission_request": None,
"pending_secret_request": execution_result["result"].get("secret_request")
if final_status == "awaiting_input"
else None,
"pending_review": execution_result["result"].get("review")
if final_status == "awaiting_review"
else None,
"resolved_permission_decision": resolved,
},
)
checkpoint = TaskCheckpoint(task_id=task.task_id, status=final_status)
self._checkpoint_store.save(checkpoint)
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
if final_status == "completed":
terminal_event_type = TASK_COMPLETED
elif final_status == "awaiting_input":
terminal_event_type = TASK_AWAITING_INPUT
elif final_status == "awaiting_permission":
terminal_event_type = TASK_AWAITING_PERMISSION
elif final_status == "awaiting_review":
terminal_event_type = TASK_AWAITING_REVIEW
else:
terminal_event_type = TASK_FAILED
self._publish(
task,
terminal_event_type,
{
"permission_resolution": resolved.model_dump(mode="json") if hasattr(resolved, 'model_dump') else resolved,
"execution_result": execution_result["result"],
},
)
# Save to memory after permission resolution
self._save_to_memory(task, execution_result, final_status)
return {
"task_id": task.task_id,
"status": final_status,
"result": {
**execution_result["result"],
"response_directive": _build_response_directive(execution_result),
},
"events": [event.model_dump(mode="json") for event in self._event_bus.list_for_task(task.task_id)],
}
def resolve_secret(self, task_id: str, secret: str) -> dict[str, object]:
state = self._task_state_store.get_task(task_id)
if not state:
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
pending_secret_payload = state.get("pending_secret_request")
last_directive_payload = state.get("last_directive")
resolved_permission_payload = state.get("resolved_permission_decision")
if not pending_secret_payload or not last_directive_payload:
return {"task_id": task_id, "status": "failed", "result": {"error": "No pending secret request"}}
if not resolved_permission_payload:
return {"task_id": task_id, "status": "failed", "result": {"error": "No resolved permission available"}}
task = UserTask(
task_id=task_id,
session_id=state["session_id"],
input=state["task_input"],
context=state.get("task_context", {}),
)
_secret_request = SecretRequest.model_validate(pending_secret_payload)
directive = ExecutionDirective.model_validate(last_directive_payload)
execution_result = self._execution_engine.execute(
task=task,
directive=directive,
permission_override=None,
secret_override=secret,
)
final_status = str(execution_result["status"])
pending_review = execution_result["result"].get("review") if final_status == "awaiting_review" else None
pending_secret = execution_result["result"].get("secret_request") if final_status == "awaiting_input" else None
self._task_state_store.update_task(
task.task_id,
{
"status": final_status,
"pending_secret_request": pending_secret,
"resolved_permission_decision": resolved_permission_payload if final_status == "awaiting_input" else None,
"pending_review": pending_review,
},
)
checkpoint = TaskCheckpoint(task_id=task.task_id, status=final_status)
self._checkpoint_store.save(checkpoint)
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
if final_status == "completed":
terminal_event_type = TASK_COMPLETED
elif final_status == "awaiting_input":
terminal_event_type = TASK_AWAITING_INPUT
elif final_status == "awaiting_permission":
terminal_event_type = TASK_AWAITING_PERMISSION
elif final_status == "awaiting_review":
terminal_event_type = TASK_AWAITING_REVIEW
else:
terminal_event_type = TASK_FAILED
self._publish(
task,
terminal_event_type,
{
"secret_resolution": {"task_id": task_id},
"execution_result": execution_result["result"],
},
)
return {
"task_id": task.task_id,
"status": final_status,
"result": {
**execution_result["result"],
"response_directive": _build_response_directive(execution_result),
},
"events": [event.model_dump(mode="json") for event in self._event_bus.list_for_task(task.task_id)],
}
def resolve_review(self, task_id: str, decision: str, correction: str | None = None) -> dict[str, object]:
state = self._task_state_store.get_task(task_id)
if not state:
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
pending_review = state.get("pending_review")
if not pending_review:
return {"task_id": task_id, "status": "failed", "result": {"error": "No pending review"}}
task = UserTask(
task_id=task_id,
session_id=state["session_id"],
input=state["task_input"],
context={
**state.get("task_context", {}),
"previous_action_review": {
"decision": decision,
"correction": correction,
"review": pending_review,
},
},
)
self._publish(task, REVIEW_RESOLVED, {
"decision": decision,
"correction": correction,
"review": pending_review,
})
if self._memory_interface:
try:
self._memory_interface.insert(
text=f"User reviewed model action as {decision}. Correction: {correction or ''}. Review: {pending_review}",
kind="critique",
source="user",
task_id=task_id,
session_id=state["session_id"],
weight=0.9 if decision == "wrong_action" else 0.5,
metadata={"decision": decision, "review": pending_review},
)
except Exception:
pass
self._task_state_store.update_task(task_id, {"pending_review": None, "status": "replanning"})
return self.run_task(task)
def resolve_password(self, task_id: str, password: str) -> dict[str, object]:
state = self._task_state_store.get_task(task_id)
if not state:
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
pending_password_payload = state.get("pending_password_request")
last_directive_payload = state.get("last_directive")
if not pending_password_payload or not last_directive_payload:
return {"task_id": task_id, "status": "failed", "result": {"error": "No pending password request"}}
current_attempt = pending_password_payload.get("attempts", 0) + 1
task = UserTask(
task_id=task_id,
session_id=state["session_id"],
input=state["task_input"],
context=state.get("task_context", {}),
)
directive = ExecutionDirective.model_validate(last_directive_payload)
execution_result = self._execution_engine.execute(
task=task,
directive=directive,
password_override=password,
)
final_status = str(execution_result["status"])
if final_status == "failed":
error_msg = execution_result.get("result", {}).get("error", "")
is_password_error = "permission denied" in error_msg.lower() or "incorrect password" in error_msg.lower()
if is_password_error and current_attempt < 3:
self._task_state_store.update_task(
task.task_id,
{
"status": "awaiting_password",
"pending_password_request": {
"command": pending_password_payload.get("command"),
"reason": pending_password_payload.get("reason"),
"attempts": current_attempt,
},
},
)
self._publish(task, TASK_AWAITING_PERMISSION, {
"password_attempt_failed": True,
"attempts": current_attempt,
"max_attempts": 3,
"message": "Неверный пароль. Попробуйте снова.",
})
return {
"task_id": task_id,
"status": "awaiting_password",
"result": {"error": "Неверный пароль", "attempts": current_attempt, "max_attempts": 3},
}
else:
self._task_state_store.update_task(
task.task_id,
{
"status": "failed",
"pending_password_request": None,
"password_attempts": current_attempt,
},
)
self._publish(task, TASK_FAILED, {
"password_failed": True,
"attempts": current_attempt,
"message": "Неверный пароль (3 попытки). Передаю решение модели.",
"execution_result": execution_result["result"],
})
return {
"task_id": task_id,
"status": "failed",
"result": {
"error": "Password failed after 3 attempts",
"attempts": current_attempt,
"message": "Пользователь 3 раза ввёл неверный пароль. Решение за вами.",
},
}
self._task_state_store.update_task(
task.task_id,
{
"status": final_status,
"pending_password_request": None,
},
)
checkpoint = TaskCheckpoint(task_id=task.task_id, status=final_status)
self._checkpoint_store.save(checkpoint)
self._publish(task, TASK_COMPLETED, {"execution_result": execution_result["result"]})
# Save to memory after password resolution
self._save_to_memory(task, execution_result, final_status)
return {
"task_id": task.task_id,
"status": final_status,
"result": {
**execution_result["result"],
"response_directive": _build_response_directive(execution_result),
},
"events": [event.model_dump(mode="json") for event in self._event_bus.list_for_task(task.task_id)],
}
def _publish(self, task: UserTask, event_type: str, payload: dict[str, object]) -> None:
event = RuntimeEvent(
task_id=task.task_id,
session_id=task.session_id,
sequence=self._event_bus.next_sequence(task.task_id),
type=event_type,
payload=payload,
)
self._event_bus.publish(event)
async def _run_recall(self, task: UserTask) -> dict:
"""Run active memory recall before orchestration."""
if not self._recall_service:
return {"should_recall": False, "reason": "no_recall_service", "query": "", "results": [], "summary": ""}
try:
return await self._recall_service.recall(task_input=task.input)
except Exception as e:
return {"should_recall": False, "reason": f"recall_error: {e}", "query": "", "results": [], "summary": ""}
def _save_to_memory(self, task: UserTask, execution_result: dict, status: str) -> None:
"""Save task input and result to memory for session context, using MemoryWritePolicy."""
if not self._memory_interface:
return
try:
# Build a synthetic critic_score for policy based on task status
# For summary/tool_result without real critic, we derive from execution outcome
if status == "completed":
synthetic_score = CriticScore(
correctness=0.9, usefulness=0.8, safety=0.95,
memory_store=True, weight=0.85, explanation="Task completed successfully"
)
elif status == "failed":
synthetic_score = CriticScore(
correctness=0.2, usefulness=0.3, safety=0.7,
memory_store=True, weight=0.5, explanation="Task failed — store for learning"
)
else:
synthetic_score = CriticScore(
correctness=0.5, usefulness=0.5, safety=0.8,
memory_store=False, weight=0.3, explanation=f"Status: {status}"
)
# Save task input as summary
decision = "store"
if self._memory_policy:
decision = self._memory_policy.decide(
critic_score=synthetic_score,
memory_type="summary",
session_id=task.session_id,
)
if decision in ("store", "store_with_weight"):
weight = synthetic_score.weight if decision == "store_with_weight" else 0.8
self._memory_interface.insert(
text=f"User request: {task.input}",
kind="summary",
source="user",
task_id=task.task_id,
session_id=task.session_id,
weight=weight,
metadata={"status": status, "policy_decision": decision},
)
self._publish(task, MEMORY_WRITE_DECIDED, {
"kind": "summary", "decision": decision, "text_preview": task.input[:80]
})
# Save execution result
result_text = ""
if status == "completed":
step_results = execution_result.get("result", {}).get("step_results", [])
if step_results:
for step in step_results:
tool_result = step.get("result", {}).get("result", {})
if tool_result.get("output"):
result_text += f" | {step.get('step_id')}: {tool_result.get('output')[:200]}"
elif status == "failed":
result_text = f" | Error: {execution_result.get('result', {}).get('error', 'Unknown')}"
if result_text:
decision = "store"
if self._memory_policy:
decision = self._memory_policy.decide(
critic_score=synthetic_score,
memory_type="tool_result",
session_id=task.session_id,
)
if decision in ("store", "store_with_weight"):
weight = synthetic_score.weight if decision == "store_with_weight" else 0.7
self._memory_interface.insert(
text=f"Result: {status}{result_text}",
kind="tool_result",
source="system",
task_id=task.task_id,
session_id=task.session_id,
weight=weight,
metadata={"status": status, "policy_decision": decision},
)
self._publish(task, MEMORY_WRITE_DECIDED, {
"kind": "tool_result", "decision": decision, "text_preview": result_text[:80]
})
except Exception as e:
import logging
logging.getLogger(__name__).warning(f"Failed to save to memory: {e}")

View File

@ -1,2 +0,0 @@
"""Shared services."""

View File

@ -1,2 +0,0 @@
"""Task state and checkpoints."""

View File

@ -1,75 +0,0 @@
from __future__ import annotations
import json
import sqlite3
from pathlib import Path
from app.core.contracts import TaskCheckpoint
class SQLiteCheckpointStore:
"""Durable checkpoint store for resumable runtime state."""
def __init__(self, db_path: str | Path) -> None:
self._db_path = Path(db_path)
self._db_path.parent.mkdir(parents=True, exist_ok=True)
self._initialize()
def save(self, checkpoint: TaskCheckpoint) -> TaskCheckpoint:
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
INSERT OR REPLACE INTO checkpoints (
task_id, status, active_step_id, plan_snapshot_json,
context_snapshot_json, updated_at
) VALUES (?, ?, ?, ?, ?, ?)
""",
(
checkpoint.task_id,
checkpoint.status,
checkpoint.active_step_id,
json.dumps(checkpoint.plan_snapshot, default=str),
json.dumps(checkpoint.context_snapshot, default=str),
checkpoint.updated_at.isoformat(),
),
)
conn.commit()
return checkpoint
def load(self, task_id: str) -> TaskCheckpoint | None:
with sqlite3.connect(self._db_path) as conn:
row = conn.execute(
"""
SELECT task_id, status, active_step_id, plan_snapshot_json,
context_snapshot_json, updated_at
FROM checkpoints
WHERE task_id = ?
""",
(task_id,),
).fetchone()
if not row:
return None
return TaskCheckpoint(
task_id=row[0],
status=row[1],
active_step_id=row[2],
plan_snapshot=json.loads(row[3]),
context_snapshot=json.loads(row[4]),
updated_at=row[5],
)
def _initialize(self) -> None:
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
CREATE TABLE IF NOT EXISTS checkpoints (
task_id TEXT PRIMARY KEY,
status TEXT NOT NULL,
active_step_id TEXT,
plan_snapshot_json TEXT NOT NULL,
context_snapshot_json TEXT NOT NULL,
updated_at TEXT NOT NULL
)
"""
)
conn.commit()

View File

@ -1,77 +0,0 @@
from __future__ import annotations
import json
import sqlite3
from pathlib import Path
from typing import Any
class SQLiteTaskStateStore:
"""Durable task state store for runtime lifecycle state."""
def __init__(self, db_path: str | Path) -> None:
self._db_path = Path(db_path)
self._db_path.parent.mkdir(parents=True, exist_ok=True)
self._initialize()
def create_task(self, task_id: str, initial_state: dict[str, Any]) -> dict[str, Any]:
state = dict(initial_state)
session_id = state.get("session_id")
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
INSERT OR REPLACE INTO task_states (task_id, state_json, session_id)
VALUES (?, ?, ?)
""",
(task_id, json.dumps(state), session_id),
)
conn.commit()
return state
def get_task(self, task_id: str) -> dict[str, Any] | None:
with sqlite3.connect(self._db_path) as conn:
row = conn.execute(
"SELECT state_json FROM task_states WHERE task_id = ?",
(task_id,),
).fetchone()
return json.loads(row[0]) if row else None
def update_task(self, task_id: str, patch: dict[str, Any]) -> dict[str, Any]:
state = self.get_task(task_id) or {}
state.update(patch)
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
INSERT OR REPLACE INTO task_states (task_id, state_json)
VALUES (?, ?)
""",
(task_id, json.dumps(state)),
)
conn.commit()
return state
def _initialize(self) -> None:
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
CREATE TABLE IF NOT EXISTS task_states (
task_id TEXT PRIMARY KEY,
state_json TEXT NOT NULL
)
"""
)
conn.commit()
try:
conn.execute("ALTER TABLE task_states ADD COLUMN session_id TEXT")
conn.commit()
except sqlite3.OperationalError:
pass
def get_session_tasks(self, session_id: str, limit: int = 10) -> list[dict[str, Any]]:
with sqlite3.connect(self._db_path) as conn:
conn.row_factory = sqlite3.Row
rows = conn.execute(
"SELECT state_json FROM task_states WHERE session_id = ? ORDER BY rowid DESC LIMIT ?",
(session_id, limit),
).fetchall()
return [json.loads(row[0]) for row in rows]

View File

@ -1,2 +0,0 @@
"""Streaming projections."""

View File

@ -1,46 +0,0 @@
from __future__ import annotations
import asyncio
from collections import defaultdict
from dataclasses import dataclass
from app.core.contracts import RuntimeEvent
from app.events.event_bus import EventBus
class StreamingManager:
"""Simple in-process projection from event bus to websocket consumers."""
def __init__(self, event_bus: EventBus) -> None:
self._event_bus = event_bus
self._subscribers: dict[str, list[StreamSubscriber]] = defaultdict(list)
self._event_bus.subscribe(self._on_event)
def replay_events(self, task_id: str) -> list[RuntimeEvent]:
return self._event_bus.list_for_task(task_id)
def subscribe(self, task_id: str) -> asyncio.Queue[RuntimeEvent]:
queue: asyncio.Queue[RuntimeEvent] = asyncio.Queue()
self._subscribers[task_id].append(
StreamSubscriber(loop=asyncio.get_running_loop(), queue=queue)
)
return queue
def unsubscribe(self, task_id: str, queue: asyncio.Queue[RuntimeEvent]) -> None:
listeners = self._subscribers.get(task_id, [])
for listener in list(listeners):
if listener.queue is queue:
listeners.remove(listener)
break
if not listeners and task_id in self._subscribers:
del self._subscribers[task_id]
def _on_event(self, event: RuntimeEvent) -> None:
for listener in list(self._subscribers.get(event.task_id, [])):
listener.loop.call_soon_threadsafe(listener.queue.put_nowait, event)
@dataclass
class StreamSubscriber:
loop: asyncio.AbstractEventLoop
queue: asyncio.Queue[RuntimeEvent]

View File

@ -1,2 +0,0 @@
"""Tool registry and tool adapters."""

View File

@ -1,24 +0,0 @@
from __future__ import annotations
from abc import ABC, abstractmethod
from typing import Any
from app.core.contracts import ToolResult, UserTask
class BaseTool(ABC):
name: str = ""
description: str = ""
@property
def name(self) -> str:
return getattr(self, '_name', self.__class__.__name__.replace('Tool', '').lower())
@property
def description(self) -> str:
return getattr(self, '_description', "")
@abstractmethod
def execute(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
raise NotImplementedError

View File

@ -1,83 +0,0 @@
from __future__ import annotations
import importlib
import json
import logging
from pathlib import Path
from typing import Any
logger = logging.getLogger(__name__)
PLUGINS_DIR = Path(__file__).parent / "plugins"
class ToolDiscovery:
"""Decentralized tool discovery system."""
def __init__(self, plugins_dir: Path | None = None) -> None:
self._plugins_dir = plugins_dir or PLUGINS_DIR
def discover(self) -> dict[str, Any]:
"""Discover all tools from plugins directory."""
tools = {}
if not self._plugins_dir.exists():
logger.warning(f"Plugins directory not found: {self._plugins_dir}")
return tools
for folder in self._plugins_dir.iterdir():
if not folder.is_dir():
continue
manifest_file = folder / "manifest.json"
if not manifest_file.exists():
logger.warning(f"Missing manifest.json in {folder.name}")
continue
try:
manifest = self._load_manifest(manifest_file)
tool_name = manifest.get("name", folder.name)
tools[tool_name] = {
"manifest": manifest,
"tool_class": folder.name,
}
logger.info(f"Discovered tool: {tool_name}")
except Exception as e:
logger.error(f"Failed to load tool {folder.name}: {e}")
continue
return tools
def _load_manifest(self, manifest_file: Path) -> dict[str, Any]:
with open(manifest_file) as f:
return json.load(f)
def _load_tool_class(self, tool_name: str, manifest: dict[str, Any]) -> Any:
entrypoint = manifest.get("entrypoint", "Tool")
module = importlib.import_module(f"app.tools.plugins.{tool_name}")
tool_class = getattr(module, entrypoint)
return tool_class
def get_tool_schemas(self) -> list[dict[str, Any]]:
"""Get schemas for all discovered tools."""
tools = self.discover()
schemas = []
for name, data in tools.items():
manifest = data.get("manifest", {})
schemas.append({
"name": name,
"description": manifest.get("description", ""),
"args_schema": manifest.get("args_schema", {}),
"requires_permission": manifest.get("requires_permission", False),
})
return schemas
def discover_tools() -> dict[str, Any]:
"""Convenience function for quick tool discovery."""
discovery = ToolDiscovery()
return discovery.discover()

View File

@ -1,26 +0,0 @@
from __future__ import annotations
from app.core.contracts import ToolResult, UserTask
from app.tools.base import BaseTool
from app.tools.sandbox import ToolSandbox
class FileReadTool(BaseTool):
name = "file_read"
def __init__(self, sandbox: ToolSandbox) -> None:
self._sandbox = sandbox
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
path = args.get("path")
if not path:
return ToolResult(tool=self.name, ok=False, error="Missing path")
resolved = self._sandbox.ensure_path_allowed(str(path))
content = resolved.read_text(encoding="utf-8")
return ToolResult(
tool=self.name,
ok=True,
output=content,
metadata={"path": str(resolved), "size": len(content)},
)

View File

@ -1,27 +0,0 @@
from __future__ import annotations
from app.core.contracts import ToolResult, UserTask
from app.tools.base import BaseTool
from app.tools.sandbox import ToolSandbox
class FileWriteTool(BaseTool):
name = "file_write"
def __init__(self, sandbox: ToolSandbox) -> None:
self._sandbox = sandbox
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
path = args.get("path")
content = str(args.get("content", ""))
if not path:
return ToolResult(tool=self.name, ok=False, error="Missing path")
resolved = self._sandbox.ensure_path_allowed(str(path))
resolved.parent.mkdir(parents=True, exist_ok=True)
resolved.write_text(content, encoding="utf-8")
return ToolResult(
tool=self.name,
ok=True,
output=f"Wrote {len(content)} bytes",
metadata={"path": str(resolved), "size": len(content)},
)

View File

@ -1,123 +0,0 @@
from __future__ import annotations
import logging
from typing import Any
from app.tools.base import BaseTool
from app.core.contracts import ToolResult, UserTask
from app.tools.sandbox import ToolSandbox
logger = logging.getLogger(__name__)
class MemoryInsertTool(BaseTool):
_name = "memory_insert"
_description = "Store information in memory"
def __init__(self, sandbox: ToolSandbox, memory_interface=None) -> None:
super().__init__()
self._sandbox = sandbox
self._memory = memory_interface
def execute(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
text = args.get("text", "")
kind = args.get("kind", "fact")
source = args.get("source", "user")
weight = args.get("weight", 0.5)
if not text:
return ToolResult(tool="memory_insert", ok=False, output="", error="text is required")
if not self._memory:
return ToolResult(tool="memory_insert", ok=False, output="", error="Memory not available")
try:
entry = self._memory.insert(
text=text,
kind=kind,
source=source,
task_id=task.task_id,
session_id=task.session_id,
weight=weight,
)
return ToolResult(
tool="memory_insert",
ok=True,
output=f"Stored: {entry.id}",
metadata={"entry_id": entry.id},
)
except Exception as e:
logger.warning(f"Memory insert failed: {e}")
return ToolResult(tool="memory_insert", ok=False, output="", error=str(e))
class MemorySearchTool(BaseTool):
_name = "memory_search"
_description = "Search memory for information"
def __init__(self, sandbox: ToolSandbox, memory_interface=None) -> None:
super().__init__()
self._sandbox = sandbox
self._memory = memory_interface
def execute(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
query = args.get("query", "")
top_k = args.get("top_k", 5)
if not query:
return ToolResult(tool="memory_search", ok=False, output="", error="query is required")
if not self._memory:
return ToolResult(tool="memory_search", ok=False, output="", error="Memory not available")
try:
results = self._memory.search(query, top_k=top_k)
if not results:
return ToolResult(tool="memory_search", ok=True, output="No results found", metadata={"count": 0})
output_lines = []
for entry, score in results:
output_lines.append(f"[{score:.2f}] {entry.text[:100]}")
return ToolResult(
tool="memory_search",
ok=True,
output="\n".join(output_lines),
metadata={"count": len(results)},
)
except Exception as e:
logger.warning(f"Memory search failed: {e}")
return ToolResult(tool="memory_search", ok=False, output="", error=str(e))
class MemoryListTool(BaseTool):
_name = "memory_list"
_description = "List recent memories"
def __init__(self, sandbox: ToolSandbox, memory_interface=None) -> None:
super().__init__()
self._sandbox = sandbox
self._memory = memory_interface
def execute(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
limit = args.get("limit", 10)
if not self._memory:
return ToolResult(tool="memory_list", ok=False, output="", error="Memory not available")
try:
entries = self._memory.get_recent(limit=limit)
if not entries:
return ToolResult(tool="memory_list", ok=True, output="No memories", metadata={"count": 0})
output_lines = []
for entry in entries:
output_lines.append(f"{entry.kind}: {entry.text[:80]}")
return ToolResult(
tool="memory_list",
ok=True,
output="\n".join(output_lines),
metadata={"count": len(entries)},
)
except Exception as e:
logger.warning(f"Memory list failed: {e}")
return ToolResult(tool="memory_list", ok=False, output="", error=str(e))

View File

@ -1,35 +0,0 @@
from __future__ import annotations
from app.core.contracts import ToolResult, UserTask
from app.tools.base import BaseTool
from app.tools.sandbox import ToolSandbox
class Tool(BaseTool):
name = "file_read"
description = "Read file contents"
def __init__(self, sandbox: ToolSandbox) -> None:
self._sandbox = sandbox
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
path = args.get("path")
if not path:
return ToolResult(tool=self.name, ok=False, error="Missing path")
try:
resolved = self._sandbox.ensure_path_allowed(str(path))
if not resolved.exists():
return ToolResult(tool=self.name, ok=False, error=f"File not found: {path}")
content = resolved.read_text(encoding="utf-8")
return ToolResult(
tool=self.name,
ok=True,
output=content,
metadata={"path": str(resolved), "size": len(content)},
)
except PermissionError as e:
return ToolResult(tool=self.name, ok=False, error=f"Access denied: {e}")
except FileNotFoundError as e:
return ToolResult(tool=self.name, ok=False, error=f"File not found: {path}")
except Exception as e:
return ToolResult(tool=self.name, ok=False, error=f"Error: {e}")

View File

@ -1,10 +0,0 @@
{
"name": "file_read",
"version": "1.0",
"entrypoint": "Tool",
"description": "Read file contents from allowed paths",
"args_schema": {
"path": {"type": "string", "required": true, "description": "File path to read"}
},
"requires_permission": false
}

View File

@ -1,33 +0,0 @@
from __future__ import annotations
from app.core.contracts import ToolResult, UserTask
from app.tools.base import BaseTool
from app.tools.sandbox import ToolSandbox
class Tool(BaseTool):
name = "file_write"
description = "Write content to file"
def __init__(self, sandbox: ToolSandbox) -> None:
self._sandbox = sandbox
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
path = args.get("path")
content = str(args.get("content", ""))
if not path:
return ToolResult(tool=self.name, ok=False, error="Missing path")
try:
resolved = self._sandbox.ensure_path_allowed(str(path))
resolved.parent.mkdir(parents=True, exist_ok=True)
resolved.write_text(content, encoding="utf-8")
return ToolResult(
tool=self.name,
ok=True,
output=f"Wrote {len(content)} bytes",
metadata={"path": str(resolved), "size": len(content)},
)
except PermissionError as e:
return ToolResult(tool=self.name, ok=False, error=f"Access denied: {e}")
except Exception as e:
return ToolResult(tool=self.name, ok=False, error=f"Error: {e}")

View File

@ -1,11 +0,0 @@
{
"name": "file_write",
"version": "1.0",
"entrypoint": "Tool",
"description": "Write content to file",
"args_schema": {
"path": {"type": "string", "required": true, "description": "File path to write"},
"content": {"type": "string", "required": true, "description": "Content to write"}
},
"requires_permission": true
}

View File

@ -1,112 +0,0 @@
from __future__ import annotations
import logging
from typing import Any
from app.core.contracts import ToolResult, UserTask
from app.tools.base import BaseTool
logger = logging.getLogger(__name__)
class Tool(BaseTool):
name = "memory"
description = "Memory operations: insert, search, list"
def __init__(self, memory_interface=None) -> None:
self._memory = memory_interface
def execute(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
action = args.get("action", "search")
if action == "insert":
return self._insert(task, args)
elif action == "search":
return self._search(task, args)
elif action == "list":
return self._list(task, args)
else:
return ToolResult(tool=self.name, ok=False, error=f"Unknown action: {action}")
def _insert(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
text = args.get("text", "")
kind = args.get("kind", "fact")
source = args.get("source", "user")
weight = args.get("weight", 0.5)
if not text:
return ToolResult(tool=self.name, ok=False, output="", error="text is required")
if not self._memory:
return ToolResult(tool=self.name, ok=False, output="", error="Memory not available")
try:
entry = self._memory.insert(
text=text,
kind=kind,
source=source,
task_id=task.task_id,
session_id=task.session_id,
weight=weight,
)
return ToolResult(
tool=self.name,
ok=True,
output=f"Stored: {entry.id}",
metadata={"entry_id": entry.id},
)
except Exception as e:
logger.warning(f"Memory insert failed: {e}")
return ToolResult(tool=self.name, ok=False, output="", error=str(e))
def _search(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
query = args.get("query", "")
top_k = args.get("top_k", 5)
if not query:
return ToolResult(tool=self.name, ok=False, output="", error="query is required")
if not self._memory:
return ToolResult(tool=self.name, ok=False, output="", error="Memory not available")
try:
results = self._memory.search(query, top_k=top_k)
if not results:
return ToolResult(tool=self.name, ok=True, output="No results found", metadata={"count": 0})
output_lines = []
for entry, score in results:
output_lines.append(f"[{score:.2f}] {entry.text[:100]}")
return ToolResult(
tool=self.name,
ok=True,
output="\n".join(output_lines),
metadata={"count": len(results)},
)
except Exception as e:
logger.warning(f"Memory search failed: {e}")
return ToolResult(tool=self.name, ok=False, output="", error=str(e))
def _list(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
limit = args.get("limit", 10)
if not self._memory:
return ToolResult(tool=self.name, ok=False, output="", error="Memory not available")
try:
entries = self._memory.get_recent(limit=limit)
if not entries:
return ToolResult(tool=self.name, ok=True, output="No memories", metadata={"count": 0})
output_lines = []
for entry in entries:
output_lines.append(f"{entry.kind}: {entry.text[:80]}")
return ToolResult(
tool=self.name,
ok=True,
output="\n".join(output_lines),
metadata={"count": len(entries)},
)
except Exception as e:
logger.warning(f"Memory list failed: {e}")
return ToolResult(tool=self.name, ok=False, output="", error=str(e))

View File

@ -1,22 +0,0 @@
{
"name": "memory",
"version": "1.0",
"entrypoint": "Tool",
"description": "Memory operations: insert, search, list",
"args_schema": {
"action": {
"type": "string",
"required": true,
"description": "Action: insert, search, or list",
"enum": ["insert", "search", "list"]
},
"text": {"type": "string", "required": false, "description": "Text to store (insert)"},
"query": {"type": "string", "required": false, "description": "Query string (search)"},
"kind": {"type": "string", "required": false, "description": "Memory kind: fact, command, etc"},
"source": {"type": "string", "required": false, "description": "Source: user, system, etc"},
"weight": {"type": "number", "required": false, "description": "Memory weight 0-1"},
"top_k": {"type": "number", "required": false, "description": "Max results (search)"},
"limit": {"type": "number", "required": false, "description": "Max entries (list)"}
},
"requires_permission": false
}

View File

@ -1,55 +0,0 @@
from __future__ import annotations
from app.core.contracts import ToolResult, UserTask
from app.tools.base import BaseTool
from app.tools.sandbox import ToolSandbox
def _detect_sudo_auth_failure(output: str) -> bool:
normalized = output.lower()
return any(
marker in normalized
for marker in (
"incorrect password",
"incorrect password attempt",
"sudo: no password was provided",
"sorry, try again",
"authentication failure",
)
)
class Tool(BaseTool):
name = "shell_exec"
description = "Execute shell commands"
def __init__(self, sandbox: ToolSandbox) -> None:
self._sandbox = sandbox
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
command = str(args.get("command", "")).strip()
if not command:
return ToolResult(tool=self.name, ok=False, error="Missing command", metadata={"exit_code": -1})
cwd = args.get("cwd")
stdin_secret = args.get("stdin_secret")
output_callback = args.get("__output_callback")
completed = self._sandbox.run_shell(
command=command,
cwd=str(cwd) if cwd else None,
stdin_data=str(stdin_secret) if stdin_secret is not None else None,
output_callback=output_callback if callable(output_callback) else None,
)
output = completed.stdout if completed.returncode == 0 else completed.stderr or completed.stdout
sudo_auth_failed = completed.returncode != 0 and _detect_sudo_auth_failure(
f"{completed.stdout}\n{completed.stderr}"
)
return ToolResult(
tool=self.name,
ok=completed.returncode == 0,
output=output,
error=None if completed.returncode == 0 else f"Command failed with exit code {completed.returncode}",
metadata={
"exit_code": completed.returncode,
"sudo_auth_failed": sudo_auth_failed,
},
)

View File

@ -1,12 +0,0 @@
{
"name": "shell_exec",
"version": "1.0",
"entrypoint": "Tool",
"description": "Execute shell commands in sandboxed environment",
"args_schema": {
"command": {"type": "string", "required": true, "description": "Shell command to execute"},
"cwd": {"type": "string", "required": false, "description": "Working directory"},
"stdin_secret": {"type": "string", "required": false, "description": "Data to pass via stdin"}
},
"requires_permission": true
}

View File

@ -1,61 +0,0 @@
from __future__ import annotations
import logging
from typing import Any, Callable
from app.tools.base import BaseTool
from app.tools.discover import ToolDiscovery
logger = logging.getLogger(__name__)
class ToolRegistry:
def __init__(self) -> None:
self._tools: dict[str, BaseTool] = {}
self._schemas: dict[str, dict[str, Any]] = {}
def register(self, tool: BaseTool) -> None:
self._tools[tool.name] = tool
def discover_and_init(
self,
init_factory: Callable[[dict], BaseTool] | None = None,
) -> None:
"""Discover tools from plugins and initialize them."""
discovery = ToolDiscovery()
discovered = discovery.discover()
for name, data in discovered.items():
manifest = data.get("manifest", {})
if init_factory:
tool = init_factory({"name": name, "manifest": manifest})
else:
tool_instance = data.get("instance")
if tool_instance:
self._tools[name] = tool_instance
self._schemas[name] = {
"description": manifest.get("description", ""),
"args_schema": manifest.get("args_schema", {}),
"requires_permission": manifest.get("requires_permission", False),
}
logger.info(f"Registered tool: {name}")
logger.warning(f"No init_factory provided for {name}")
def get(self, name: str) -> BaseTool:
if name not in self._tools:
raise KeyError(f"Tool {name} is not registered")
return self._tools[name]
def list_names(self) -> list[str]:
return list(self._tools.keys())
def get_schema(self, name: str) -> dict[str, Any]:
return self._schemas.get(name, {})
def list_schemas(self) -> list[dict[str, Any]]:
return [
{"name": name, **schema}
for name, schema in self._schemas.items()
]

View File

@ -1,139 +0,0 @@
from __future__ import annotations
import os
import signal
import subprocess
import threading
import time
from pathlib import Path
from typing import Callable
class ToolSandbox:
"""Applies simple working directory and timeout restrictions."""
def __init__(
self,
allowed_root: str | Path,
timeout_ms: int,
command_timeout_ms: int | None = None,
idle_timeout_ms: int | None = None,
) -> None:
self._allowed_root = Path(allowed_root).resolve()
self._timeout_seconds = max(timeout_ms / 1000, 0.001)
self._command_timeout_seconds = max((command_timeout_ms or timeout_ms) / 1000, 0.001)
self._idle_timeout_seconds = max((idle_timeout_ms or timeout_ms) / 1000, 0.001)
def ensure_path_allowed(self, path: str | Path) -> Path:
resolved = Path(path).expanduser().resolve()
# Permission-first model: path is allowed if it exists
# Permission service will handle write/shell restrictions
return resolved
def run_shell(
self,
command: str,
cwd: str | Path | None = None,
stdin_data: str | None = None,
output_callback: Callable[[str, str], None] | None = None,
) -> subprocess.CompletedProcess[str]:
working_directory = self.ensure_path_allowed(cwd or self._allowed_root)
env = {"PATH": os.environ.get("PATH", "")}
if output_callback is None:
return subprocess.run(
command,
shell=True,
cwd=str(working_directory),
env=env,
text=True,
capture_output=True,
input=stdin_data,
timeout=self._command_timeout_seconds,
check=False,
)
process = subprocess.Popen(
command,
shell=True,
cwd=str(working_directory),
env=env,
text=True,
stdin=subprocess.PIPE if stdin_data is not None else None,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
start_new_session=True,
)
stdout_chunks: list[str] = []
stderr_chunks: list[str] = []
output_lock = threading.Lock()
last_output_at = time.monotonic()
if stdin_data is not None and process.stdin is not None:
process.stdin.write(stdin_data)
process.stdin.close()
def read_stream(stream_name: str) -> None:
stream = process.stdout if stream_name == "stdout" else process.stderr
if stream is None:
return
chunks = stdout_chunks if stream_name == "stdout" else stderr_chunks
try:
for line in iter(stream.readline, ""):
if not line:
break
chunks.append(line)
nonlocal last_output_at
with output_lock:
last_output_at = time.monotonic()
output_callback(stream_name, line)
finally:
stream.close()
stdout_thread = threading.Thread(target=read_stream, args=("stdout",), daemon=True)
stderr_thread = threading.Thread(target=read_stream, args=("stderr",), daemon=True)
stdout_thread.start()
stderr_thread.start()
timed_out = False
timeout_reason: str | None = None
started_at = time.monotonic()
return_code: int | None = None
while return_code is None:
return_code = process.poll()
if return_code is not None:
break
now = time.monotonic()
with output_lock:
idle_for = now - last_output_at
if now - started_at > self._command_timeout_seconds:
timed_out = True
timeout_reason = f"Command timed out after {self._command_timeout_seconds:.0f}s"
break
if idle_for > self._idle_timeout_seconds:
timed_out = True
timeout_reason = f"Command produced no output for {self._idle_timeout_seconds:.0f}s"
break
time.sleep(0.1)
if timed_out:
try:
os.killpg(process.pid, signal.SIGKILL)
except ProcessLookupError:
pass
except PermissionError:
process.kill()
return_code = process.wait()
timeout_message = f"{timeout_reason}\n"
stderr_chunks.append(timeout_message)
output_callback("stderr", timeout_message)
stdout_thread.join(timeout=1)
stderr_thread.join(timeout=1)
return subprocess.CompletedProcess(
args=command,
returncode=return_code if not timed_out else -9,
stdout="".join(stdout_chunks),
stderr="".join(stderr_chunks),
)

View File

@ -1,65 +0,0 @@
from __future__ import annotations
from app.core.contracts import ToolResult, UserTask
from app.tools.base import BaseTool
from app.tools.sandbox import ToolSandbox
def _detect_sudo_auth_failure(output: str) -> bool:
normalized = output.lower()
return any(
marker in normalized
for marker in (
"incorrect password",
"incorrect password attempt",
"sudo: no password was provided",
"sudo: password incorrect",
"sorry, try again",
"authentication failure",
"wrong password",
)
)
class ShellExecTool(BaseTool):
name = "shell_exec"
def __init__(self, sandbox: ToolSandbox) -> None:
self._sandbox = sandbox
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
command = str(args.get("command", "")).strip()
if not command:
return ToolResult(tool=self.name, ok=False, error="Missing command", metadata={"exit_code": -1})
cwd = args.get("cwd")
stdin_secret = args.get("stdin_secret")
password = args.get("password")
output_callback = args.get("__output_callback")
if password:
command = f'echo "{password}" | sudo -S {command}'
completed = self._sandbox.run_shell(
command=command,
cwd=str(cwd) if cwd else None,
stdin_data=str(stdin_secret) if stdin_secret is not None else None,
output_callback=output_callback if callable(output_callback) else None,
)
output = completed.stdout if completed.returncode == 0 else completed.stderr or completed.stdout
error_output = completed.stderr or completed.stdout
sudo_auth_failed = completed.returncode != 0 and _detect_sudo_auth_failure(
f"{completed.stdout}\n{completed.stderr}"
)
needs_sudo = completed.returncode != 0 and "permission denied" in error_output.lower() and not sudo_auth_failed
return ToolResult(
tool=self.name,
ok=completed.returncode == 0,
output=output,
error=None if completed.returncode == 0 else f"Command failed with exit code {completed.returncode}",
metadata={
"exit_code": completed.returncode,
"needs_sudo": needs_sudo,
"sudo_auth_failed": sudo_auth_failed,
},
)

View File

@ -1,42 +0,0 @@
{
"thinker": {
"path": "Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf",
"backend": "vulkan",
"n_gpu_layers": -1,
"max_tokens": 2048,
"temperature": 0.3
},
"json_compiler": {
"path": "gemma-4-E4B-it-Q4_K_M.gguf",
"backend": "cpu",
"n_gpu_layers": 0,
"max_tokens": 1024,
"temperature": 0.1
},
"coder": {
"path": "X-Coder-SFT-Qwen3-8B.Q6_K.gguf",
"backend": "cpu",
"n_gpu_layers": 0,
"max_tokens": 2048,
"temperature": 0.2
},
"critic": {
"path": "gemma-4-E4B-it-Q4_K_M.gguf",
"backend": "cpu",
"n_gpu_layers": 0,
"max_tokens": 1024,
"temperature": 0.1
},
"sys_util": {
"path": "Menlo_Lucy-Q4_K_M.gguf",
"backend": "cpu",
"n_gpu_layers": 0,
"max_tokens": 1024,
"temperature": 0.1
},
"embeddings": {
"path": "all-MiniLM-L6-v2",
"model_name": "sentence-transformers/all-MiniLM-L6-v2",
"embedding_dim": 384
}
}

View File

@ -1,42 +0,0 @@
{
"thinker": {
"path": "Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf",
"backend": "vulkan",
"n_gpu_layers": -1,
"max_tokens": 2048,
"temperature": 0.3
},
"json_compiler": {
"path": "gemma-4-E4B-it-Q4_K_M.gguf",
"backend": "cpu",
"n_gpu_layers": 0,
"max_tokens": 1024,
"temperature": 0.1
},
"coder": {
"path": "X-Coder-SFT-Qwen3-8B.Q6_K.gguf",
"backend": "cpu",
"n_gpu_layers": 0,
"max_tokens": 2048,
"temperature": 0.2
},
"critic": {
"path": "gemma-4-E4B-it-Q4_K_M.gguf",
"backend": "cpu",
"n_gpu_layers": 0,
"max_tokens": 1024,
"temperature": 0.1
},
"sys_util": {
"path": "Menlo_Lucy-Q4_K_M.gguf",
"backend": "cpu",
"n_gpu_layers": 0,
"max_tokens": 1024,
"temperature": 0.1
},
"embeddings": {
"path": "all-MiniLM-L6-v2",
"model_name": "sentence-transformers/all-MiniLM-L6-v2",
"embedding_dim": 384
}
}

View File

@ -1,42 +0,0 @@
{
"thinker": {
"path": "Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf",
"backend": "vulkan",
"n_gpu_layers": -1,
"max_tokens": 2048,
"temperature": 0.3
},
"json_compiler": {
"path": "gemma-4-E4B-it-Q4_K_M.gguf",
"backend": "cpu",
"n_gpu_layers": 0,
"max_tokens": 1024,
"temperature": 0.1
},
"coder": {
"path": "X-Coder-SFT-Qwen3-8B.Q6_K.gguf",
"backend": "cpu",
"n_gpu_layers": 0,
"max_tokens": 2048,
"temperature": 0.2
},
"critic": {
"path": "gemma-4-E4B-it-Q4_K_M.gguf",
"backend": "cpu",
"n_gpu_layers": 0,
"max_tokens": 1024,
"temperature": 0.1
},
"sys_util": {
"path": "Menlo_Lucy-Q4_K_M.gguf",
"backend": "cpu",
"n_gpu_layers": 0,
"max_tokens": 1024,
"temperature": 0.1
},
"embeddings": {
"path": "all-MiniLM-L6-v2",
"model_name": "sentence-transformers/all-MiniLM-L6-v2",
"embedding_dim": 384
}
}

53
config/models.yaml Normal file
View File

@ -0,0 +1,53 @@
default_provider: llama_server
models:
thinker:
provider: llama_server
base_url: http://127.0.0.1:8081/v1
model: local-main
purpose: free_cognition
structured_output: false
temperature: 0.4
max_output_tokens: 8192
system_prompt: prompts/roles/thinker.md
critic:
provider: llama_server
base_url: http://127.0.0.1:8081/v1
model: local-main
purpose: reflection
structured_output: false
temperature: 0.1
max_output_tokens: 4096
system_prompt: prompts/roles/critic.md
coder:
provider: llama_server
base_url: http://127.0.0.1:8081/v1
model: local-main
purpose: code_generation
structured_output: false
temperature: 0.2
max_output_tokens: 16384
system_prompt: prompts/roles/coder.md
action:
provider: llama_server
base_url: http://127.0.0.1:8081/v1
model: local-main
purpose: action_directive
structured_output: true
temperature: 0.0
max_output_tokens: 2048
system_prompt: prompts/roles/action.md
response_schema: duck_core/schemas/action_directive.schema.json
summary:
provider: llama_server
base_url: http://127.0.0.1:8081/v1
model: local-main
purpose: context_summary
structured_output: false
temperature: 0.1
max_output_tokens: 4096
system_prompt: prompts/roles/summary.md

View File

@ -1,94 +0,0 @@
{
"description": "Permission-first model configuration",
"settings": {
"allow_caching": true,
"cache_file": "data/runtime/allowed_commands.json",
"normalize_commands": true,
"split_chained": true
},
"command_categories": {
"hard_stop": {
"description": "Commands that are never executed - hard stop",
"allow_once": false,
"allow_always": false,
"commands": [
"rm -rf /",
"rm -rf /*",
"dd if=/dev/zero of=/dev/sd*",
"dd if=/dev/zero of=/dev/hd*",
"mkfs",
"> /dev/sd*",
"> /dev/hd*"
]
},
"no_always": {
"description": "Dangerous commands - allow once only",
"allow_once": true,
"allow_always": false,
"commands": [
"rm -rf *",
"rm -rf .*",
"curl |",
"wget -O- |",
":(){:|:&};:",
"fork",
"chmod -R 000",
"chmod -R 777",
"chown -R",
"apt",
"apt-get",
"dpkg",
"yum",
"dnf",
"pacman",
"shutdown",
"reboot",
"halt",
"init 0",
"init 6",
"telinit",
"systemctl stop",
"systemctl start",
"systemctl restart",
"service stop",
"service start",
"kill -9 -1",
"killall",
"pkill -9",
"reboot -f",
"shutdown -h now",
"poweroff",
"echo .* > /proc/",
"echo .* > /sys/"
]
},
"normal": {
"description": "Normal commands - allow once or always",
"allow_once": true,
"allow_always": true,
"commands": [
"shell_exec",
"file_write"
],
"file_extensions": [
".py",
".txt",
".json",
".md",
".yaml",
".yml",
".sh",
".bash"
]
}
},
"path_settings": {
"allow_read_outside": true,
"allow_write_paths": [
"/home/mirivlad/git/ducklm",
"/tmp"
],
"require_confirmation_for_write": true,
"require_confirmation_for_shell": true
}
}

View File

@ -1,15 +0,0 @@
{
"thinker": "You are the orchestrator of a local AI agent runtime. Your job is to analyze the user's task and decide how to execute it.\n\n## Decision Types\n\n1. **Direct response** — for simple questions, greetings, conversations:\n {\"type\": \"respond\", \"payload\": {\"text\": \"your answer\"}}\n\n2. **Single tool step** — for simple tasks needing one tool:\n {\"type\": \"step\", \"payload\": {\"tool\": \"shell_exec\", \"args\": {\"command\": \"...\"}}}\n\n3. **Multi-step plan** — for complex tasks that need decomposition:\n {\"type\": \"plan\", \"payload\": {\"steps\": [\n {\"id\": \"step-1\", \"tool\": \"shell_exec\", \"args\": {\"command\": \"...\"}, \"description\": \"...\", \"depends_on\": []},\n {\"id\": \"step-2\", \"tool\": \"file_read\", \"args\": {\"path\": \"...\"}, \"description\": \"...\", \"depends_on\": [\"step-1\"]}\n ]}}\n\n## When to use multi-step plan\n- Task requires multiple operations (search → read → write)\n- Task involves checking prerequisites before acting\n- Task requires gathering information before producing result\n- User asks to do something complex (setup, configure, analyze)\n\n## Memory\n- If memory recall results are provided, USE them to inform your decisions\n- If you know something from memory, mention it in step descriptions\n- Store important results for future use\n\n## Rules\n- ALWAYS respond with valid JSON only\n- Each step MUST have a unique id\n- Use depends_on for ordering constraints\n- Keep steps focused — one action per step\n- If unsure, start with an information-gathering step\n- Respond ONLY with valid JSON, no explanations",
"orchestrator": "You are an expert orchestrator for a local AI agent system. Your role is to analyze the user's task and generate executable runtime steps.\n\nTool selection (choose the right tool):\n- shell_exec: for running commands, checking programs exist ('which', '--version'), searching files\n- file_read: for reading contents of a file (must be existing file path)\n- file_write: for creating or updating files\n- memory: for storing or searching memory\n\nSTRICT OUTPUT FORMAT - MUST follow exactly:\n\nSingle step:\n{\"type\": \"step\", \"payload\": {\"tool\": \"shell_exec\", \"args\": {\"command\": \"...\"}}}\n{\"type\": \"step\", \"payload\": {\"tool\": \"file_read\", \"args\": {\"path\": \"...\"}}}\n{\"type\": \"step\", \"payload\": {\"tool\": \"file_write\", \"args\": {\"path\": \"...\", \"content\": \"...\"}}}\n\nMulti-step plan:\n{\"type\": \"plan\", \"payload\": {\"steps\": [{\"tool\": \"file_read\", \"args\": {\"path\": \"...\"}, \"description\": \"...\", \"depends_on\": []}]}}\n\nDirect response:\n{\"type\": \"respond\", \"payload\": {\"text\": \"...\"}}\n\nIMPORTANT:\n- Use exactly {\"type\": \"step|plan|respond\", \"payload\": {...}} format\n- Do NOT output array alone\n- Do NOT use \"kind\" - use \"type\"\n- Respond ONLY with valid JSON\n- Your response MUST be complete valid JSON - the closing brace } MUST be present\n- Do NOT truncate your response - if you cannot fit all steps, use a single step\n\nTool selection:\n- For checking if a program/command exists: use shell_exec with 'which <program>' or '<program> --version'\n- For reading file contents: use file_read with path to file (NOT command)\n- For executing any command: use shell_exec\n- Previous experience (from memory) may help - consider it but YOU decide how to proceed",
"planning": "You are a planning specialist. Generate execution plans.\n\nOutput MUST be:\n{\"type\": \"plan\", \"version\": \"1.0\", \"payload\": {\"steps\": [{\"tool\": \"\", \"args\": {}, \"description\": \"...\", \"depends_on\": []}]}}\n\nRules:\n- Each step must have unique id (auto-generated)\n- Use \"depends_on\" for step ordering\n- Use \"tool\" for tool operations\n- Respond ONLY with valid JSON",
"coder": "You are an expert code generation model.\n\nOutput format:\n{\"type\": \"code\", \"payload\": {\"language\": \"python\", \"content\": \"...\"}}\n\nOR for completion:\n{\"type\": \"respond\", \"payload\": {\"text\": \"...\"}}\n\nGenerate clean, working code. Respond ONLY with valid JSON.",
"critic": "You are a critic model. Evaluate tool execution results.\n\nScoring criteria:\n- correctness: 0-1 (does result accomplish task?)\n- usefulness: 0-1 (is result useful?)\n- safety: 0-1 (is result safe?)\n- suggest_memory: boolean (should this be stored in memory?)\n- weight: 0-1 (importance score)\n- explanation: brief reasoning\n\nOutput format:\n{\"type\": \"evaluation\", \"payload\": {\"correctness\": 0.0-1.0, \"usefulness\": 0.0-1.0, \"safety\": 0.0-1.0, \"suggest_memory\": true|false, \"weight\": 0.0-1.0, \"explanation\": \"...\"}}\n\nRespond ONLY with valid JSON.",
"system": "You are ducklm, a local AI agent runtime.\n\nSTRICT RULES:\n- You MUST strictly follow execution schemas\n- You are NOT allowed to output free-form text\n- All outputs MUST be valid JSON matching runtime contracts\n- Use exact tool names from available tool set\n\nCurrent capabilities:\n- Execute shell commands (shell_exec)\n- Read/write files (file_read, file_write)\n- Memory operations (memory)\n\nAlways respond with valid JSON.",
"sys_util": "You are a STRICT JSON repair engine inside a production AI runtime.\nYour job is ONLY to fix invalid JSON syntax.\nYou are NOT allowed to:\n- change meaning of data\n- add new fields\n- remove valid fields\n- interpret intent\n- explain anything\n- reformat structure logically\n---\nINPUT:\nYou receive a malformed or invalid JSON string.\n---\nOUTPUT RULES:\n- Output ONLY valid JSON\n- No markdown\n- No comments\n- No explanations\n- No extra text\n---\nREPAIR RULES (STRICT):\nFix ONLY syntax issues:\n- missing or extra commas\n- missing quotes\n- incorrect brackets\n- trailing commas\n- invalid escaping\n- broken strings\n- unbalanced braces\nDO NOT:\n- rename keys\n- reorder fields intentionally\n- guess missing semantic data\n- \"improve\" structure\n---\nIMPORTANT:\nIf multiple valid repairs exist:\n→ choose the minimal change that makes JSON valid\n---\nOUTPUT MUST BE VALID JSON OR NOTHING ELSE\nInvalid JSON:"
}

View File

@ -1,9 +0,0 @@
You are an expert code generation model.
Output format:
{"type": "code", "payload": {"language": "python", "content": "..."}}
OR for completion:
{"type": "respond", "payload": {"text": "..."}}
Generate clean, working code. Respond ONLY with valid JSON.

View File

@ -1,14 +0,0 @@
You are a critic model. Evaluate tool execution results.
Scoring criteria:
- correctness: 0-1 (does result accomplish task?)
- usefulness: 0-1 (is result useful?)
- safety: 0-1 (is result safe?)
- suggest_memory: boolean (should this be stored in memory?)
- weight: 0-1 (importance score)
- explanation: brief reasoning
Output format:
{"type": "evaluation", "payload": {"correctness": 0.0-1.0, "usefulness": 0.0-1.0, "safety": 0.0-1.0, "suggest_memory": true|false, "weight": 0.0-1.0, "explanation": "..."}}
Respond ONLY with valid JSON.

View File

@ -1,25 +0,0 @@
You are a JSON Compiler. Convert semantic plan to strict JSON.
INPUT: Semantic plan from Thinker
OUTPUT: Valid JSON only
RULES:
- Convert ONLY, do not make decisions
- Do not invent tools
- Do not modify plan logic
- Do not skip steps
- Output ONLY valid JSON
AVAILABLE TOOLS:
- file_write (requires permission)
- shell_exec (execute shell commands, requires permission)
- memory (no permission needed)
- file_read (no permission needed)
- respond (just return text to user, no execution)
IMPORTANT: Use exactly "shell_exec" (not "shell") for shell commands!
OUTPUT FORMAT:
{"type": "plan", "payload": {"steps": [{"id": "1", "tool": "shell_exec", "args": {"command": "..."}, "depends_on": []}]}}
OR
{"type": "respond", "payload": {"text": "..."}}

View File

@ -1,34 +0,0 @@
You are an expert orchestrator for a local AI agent system. Your role is to analyze the user's task and generate executable runtime steps.
Tool selection (choose the right tool):
- shell_exec: for running commands, checking programs exist ('which', '--version'), searching files
- file_read: for reading contents of a file (must be existing file path)
- file_write: for creating or updating files
- memory: for storing or searching memory
STRICT OUTPUT FORMAT - MUST follow exactly:
Single step:
{"type": "step", "payload": {"tool": "shell_exec", "args": {"command": "..."}}}
{"type": "step", "payload": {"tool": "file_read", "args": {"path": "..."}}}
{"type": "step", "payload": {"tool": "file_write", "args": {"path": "...", "content": "..."}}}
Multi-step plan:
{"type": "plan", "payload": {"steps": [{"tool": "file_read", "args": {"path": "..."}, "description": "...", "depends_on": []}]}}
Direct response:
{"type": "respond", "payload": {"text": "..."}}
IMPORTANT:
- Use exactly {"type": "step|plan|respond", "payload": {...}} format
- Do NOT output array alone
- Do NOT use "kind" - use "type"
- Respond ONLY with valid JSON
- Your response MUST be complete valid JSON - the closing brace } MUST be present
- Do NOT truncate your response - if you cannot fit all steps, use a single step
Tool selection:
- For checking if a program/command exists: use shell_exec with 'which <program>' or '<program> --version'
- For reading file contents: use file_read with path to file (NOT command)
- For executing any command: use shell_exec
- Previous experience (from memory) may help - consider it but YOU decide how to proceed

View File

@ -1,10 +0,0 @@
You are a planning specialist. Generate execution plans.
Output MUST be:
{"type": "plan", "version": "1.0", "payload": {"steps": [{"tool": "", "args": {}, "description": "...", "depends_on": []}]}}
Rules:
- Each step must have unique id (auto-generated)
- Use "depends_on" for step ordering
- Use "tool" for tool operations
- Respond ONLY with valid JSON

View File

@ -1,41 +0,0 @@
You are a STRICT JSON repair engine inside a production AI runtime.
Your job is ONLY to fix invalid JSON syntax.
You are NOT allowed to:
- change meaning of data
- add new fields
- remove valid fields
- interpret intent
- explain anything
- reformat structure logically
---
INPUT:
You receive a malformed or invalid JSON string.
---
OUTPUT RULES:
- Output ONLY valid JSON
- No markdown
- No comments
- No explanations
- No extra text
---
REPAIR RULES (STRICT):
Fix ONLY syntax issues:
- missing or extra commas
- missing quotes
- incorrect brackets
- trailing commas
- invalid escaping
- broken strings
- unbalanced braces
DO NOT:
- rename keys
- reorder fields intentionally
- guess missing semantic data
- "improve" structure
---
IMPORTANT:
If multiple valid repairs exist:
→ choose the minimal change that makes JSON valid
---
OUTPUT MUST BE VALID JSON OR NOTHING ELSE
Invalid JSON:

View File

@ -1,14 +0,0 @@
You are ducklm, a local AI agent runtime.
STRICT RULES:
- You MUST strictly follow execution schemas
- You are NOT allowed to output free-form text
- All outputs MUST be valid JSON matching runtime contracts
- Use exact tool names from available tool set
Current capabilities:
- Execute shell commands (shell_exec)
- Read/write files (file_read, file_write)
- Memory operations (memory)
Always respond with valid JSON.

View File

@ -1,36 +0,0 @@
You are a Thinker. Analyze user task and create execution plan.
CONTEXT:
{task_summary}
{memory_context}
AVAILABLE TOOLS (injected at runtime):
{tools_json}
INSTRUCTIONS:
1. Understand what user wants
2. Create step-by-step plan in natural language
3. Choose appropriate tools from available
4. If the user asks about the current local machine, filesystem, processes,
packages, logs, runtime state, or anything that must be observed rather than
answered from general knowledge, use an appropriate tool.
5. For exploratory tasks, prefer one robust inspection command over many brittle
dependent checks. Missing optional files should be treated as information, not
as a fatal failure.
MODE: {mode_hint}
- If mode is "execution": create a plan with TOOL STEPS (shell_exec, file_write, etc)
- If mode is "conversation": just respond with text, NO tool execution
- If mode is "clarification_needed": ask user for clarification
OUTPUT FORMAT (SEMANTIC PLAN - NOT JSON):
For execution mode:
ПЛАН:
Шаг 1: [use tool - e.g., shell_exec]
Шаг 2: [use tool]
For conversation mode:
ОТВЕТ: [just text, no tools needed]
For clarification:
ОТВЕТ: [вопрос пользователю для уточнения]

View File

@ -1,42 +0,0 @@
{
"step_timeout_ms": 30000,
"task_timeout_ms": 300000,
"shell_command_timeout_ms": 3600000,
"shell_idle_timeout_ms": 600000,
"planner_retry_limit": 2,
"tool_retry_limit": 1,
"replan_limit": 1,
"max_execution_steps": 20,
"retrieval_top_k": 5,
"max_context_tokens": 8192,
"context_budgets": {
"system": 512,
"task": 512,
"memory": 2048,
"execution": 2048,
"tools": 1024,
"safety": 512
},
"reserve_for_generation_pct": 25,
"orchestrator_retry_limit": 2,
"memory_thresholds": {
"default_store_weight": 0.8
},
"critic_fallback_policy": "continue_without_critic",
"checkpoint_policy": {
"save_on_transition": true
},
"event_retention_policy": {
"keep_all": true
},
"streaming_settings": {
"enabled": true
},
"debug": true,
"debug_orchestrator_log_length": 500,
"json_fix_retry_limit": 2,
"json_fix_use_sys_util": true,
"intent_classifier": "thinker",
"recall_model": "json_compiler",
"critic_retry_limit": 2
}

11
docker-compose.memory.yml Normal file
View File

@ -0,0 +1,11 @@
services:
qdrant:
image: qdrant/qdrant:latest
ports:
- "6333:6333"
- "6334:6334"
volumes:
- qdrant_storage:/qdrant/storage
volumes:
qdrant_storage:

5
docs/architecture.md Normal file
View File

@ -0,0 +1,5 @@
# Architecture
DuckLM is organized as WebChat and FastAPI over Duck Core. Duck Core owns task state, context building, model calls, events, tools, approvals, skills, experience, and memory adapters.
The first vertical slice is WebChat -> FastAPI -> RuntimeLoop -> ModelClient -> llama-server -> SQLite event timeline.

View File

@ -0,0 +1,9 @@
# Experience Learning
Experience records are stored in SQLite. Suggested skill updates are written to `skills/_proposals/` and are not applied automatically.
Use:
```bash
curl http://127.0.0.1:8000/v1/experience
```

71
docs/how_to_run.md Normal file
View File

@ -0,0 +1,71 @@
# How To Run
1. Install dependencies:
```bash
python3 -m venv .venv
. .venv/bin/activate
python -m pip install -e ".[dev]"
```
2. Configure:
```bash
cp .env.example .env
```
The default `DUCK_MAIN_MODEL_PATH` points to `./models/Qwen3.6/nonMTP/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf`.
3. Start `llama-server`:
```bash
bash scripts/llama/start_main.sh start
```
Useful process commands:
```bash
bash scripts/llama/start_main.sh status
bash scripts/llama/start_main.sh logs --follow
bash scripts/llama/start_main.sh restart
bash scripts/llama/start_main.sh stop
```
4. Start DuckLM API:
```bash
python -m duck_core.api
```
5. Open WebChat:
```text
http://127.0.0.1:8000/
```
6. Send a task:
```bash
curl -X POST http://127.0.0.1:8000/v1/chat \
-H "Content-Type: application/json" \
-d '{"message":"Скажи коротко, что ты DuckLM","workspace":"./workspace","debug":true}'
```
7. Inspect events:
```bash
curl http://127.0.0.1:8000/v1/tasks/<task_id>/events
```
8. Approvals:
```bash
curl http://127.0.0.1:8000/v1/approvals/pending
```
9. Stop services:
```bash
bash scripts/llama/start_main.sh stop
docker compose -f docker-compose.memory.yml down
```

15
docs/how_to_test.md Normal file
View File

@ -0,0 +1,15 @@
# How To Test
Run smoke tests:
```bash
python -m pytest tests/smoke -v
```
Run verification scripts against a running API:
```bash
bash scripts/verify/verify_basic_chat.sh
bash scripts/verify/verify_models_roles.sh
bash scripts/verify/verify_tool_blocking.sh
```

View File

@ -0,0 +1,44 @@
# Local Llama Server
DuckLM expects an OpenAI-compatible `llama-server` at `http://127.0.0.1:8081/v1` by default.
On the current Radeon RX580 system, `llama.cpp` is built locally with Vulkan:
```bash
bash scripts/llama/build_vulkan.sh
```
The main model is Qwen3.6 35B A3B nonMTP:
```text
models/Qwen3.6/nonMTP/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf
```
Start it in the background with:
```bash
bash scripts/llama/start_main.sh start
```
Manage the process:
```bash
bash scripts/llama/start_main.sh status
bash scripts/llama/start_main.sh logs
bash scripts/llama/start_main.sh logs --follow
bash scripts/llama/start_main.sh restart
bash scripts/llama/start_main.sh stop
```
The local `.env` uses:
```env
DUCK_LLAMA_SERVER_BIN=./vendor/llama.cpp/build/bin/llama-server
DUCK_CTX_SIZE=4096
DUCK_N_GPU_LAYERS=20
DUCK_PARALLEL=1
DUCK_LLAMA_DEVICE=Vulkan0
DUCK_LLAMA_EXTRA_ARGS="--reasoning off --cache-ram 0"
```
MTP is available only through `scripts/llama/start_thinker_mtp_experimental.sh` and is not used by the action JSON endpoint by default.

View File

@ -0,0 +1,5 @@
# Memory Architecture
Semantic memory uses Qdrant as the vector store. Embeddings come from `/v1/embeddings` when the model backend supports it.
If embeddings are unavailable, `VectorMemory` fails explicitly with `EmbeddingsUnavailableError`; it does not invent a local embedding algorithm.

7
docs/model_roles.md Normal file
View File

@ -0,0 +1,7 @@
# Model Roles
Roles are logical, not physical. `thinker`, `critic`, `coder`, `action`, and `summary` may all point to the same model.
Each role can differ by prompt, temperature, output limit, response format, schema, memory scope, and endpoint. Request-level parameters can change per call. Backend-level parameters such as GGUF path, context size, GPU offload, MTP, and server port require the backend to be started with the desired settings.
See `config/models.yaml` for one model mapped to all roles.

5
docs/performance_mtp.md Normal file
View File

@ -0,0 +1,5 @@
# Performance And MTP
MTP/speculative decoding is an inference backend concern. DuckLM keeps action JSON on the normal endpoint by default.
Use `scripts/llama/start_thinker_mtp_experimental.sh` only for experiments. Benchmark scaffolding is in `scripts/bench/bench_runtime.py`.

View File

@ -1,24 +0,0 @@
# UI Bootstrap And Review Flow Plan
## Goal
Move the web chat UI to Bootstrap 5.3 with Bootswatch themes and improve review/password/terminal-output ergonomics.
## Required Changes
- Replace the current hand-written visual system in `app/api/static/index.html` with Bootstrap 5.3 layout/components.
- Add Bootswatch theme support with a visible theme selector and persistent localStorage choice.
- Password/secret input must submit on Enter as well as the "Отправить" button.
- Console/tool output must render inside a collapsed Bootstrap accordion item.
- The accordion body must contain terminal-style output inside `<pre></pre>`.
- The terminal accordion must expand only when the user clicks it.
- Review UI must show critic/system assessment and user voting buttons:
- `Ошибочное действие`
- `Всё верно`
- optional correction/comment text.
## Notes
- Keep runtime event handling WebSocket-driven.
- Do not mix console output with assistant prose.
- Keep raw tool output available for debugging, but collapsed by default.

9
docs/skills.md Normal file
View File

@ -0,0 +1,9 @@
# Skills
Skills are procedural memory, not hardcoded routing. The first skill is `analyze_project`, loaded from `skills/analyze_project/skill.yaml`.
Use:
```bash
curl http://127.0.0.1:8000/v1/skills
```

Some files were not shown because too many files have changed in this diff Show More