Replace repository with DuckLM runtime
This commit is contained in:
parent
ddc285b8f4
commit
4a84ada770
|
|
@ -1,41 +1,22 @@
|
||||||
# Python
|
|
||||||
__pycache__/
|
|
||||||
*.py[cod]
|
|
||||||
*.pyo
|
|
||||||
.pytest_cache/
|
|
||||||
.mypy_cache/
|
|
||||||
.ruff_cache/
|
|
||||||
|
|
||||||
# Virtual environments
|
|
||||||
.venv/
|
|
||||||
venv/
|
|
||||||
env/
|
|
||||||
|
|
||||||
# Local environment and secrets
|
|
||||||
.env
|
.env
|
||||||
.env.*
|
.env.*
|
||||||
!.env.example
|
!.env.example
|
||||||
config/.env
|
.venv/
|
||||||
config/.env.*
|
__pycache__/
|
||||||
*.pem
|
*.py[cod]
|
||||||
*.key
|
.pytest_cache/
|
||||||
|
.ruff_cache/
|
||||||
|
*.egg-info/
|
||||||
|
|
||||||
# Local models and embeddings
|
data/
|
||||||
/models/
|
workspace/
|
||||||
*.gguf
|
models/
|
||||||
*.safetensors
|
vendor/
|
||||||
*.bin
|
|
||||||
|
|
||||||
# Runtime state
|
|
||||||
data/**/*.sqlite3
|
|
||||||
data/**/*.sqlite3-*
|
|
||||||
data/runtime/*.json
|
|
||||||
data/runtime/*.pid
|
|
||||||
data/runtime/*.log
|
|
||||||
logs/
|
|
||||||
*.log
|
*.log
|
||||||
|
*.pid
|
||||||
|
*.sqlite3
|
||||||
|
*.db
|
||||||
|
|
||||||
# OS/editor
|
dist/
|
||||||
.DS_Store
|
build/
|
||||||
.idea/
|
|
||||||
.vscode/
|
|
||||||
|
|
|
||||||
347
ARCHITECTURE.md
347
ARCHITECTURE.md
|
|
@ -1,347 +0,0 @@
|
||||||
# ARCHITECTURE
|
|
||||||
|
|
||||||
Этот документ фиксирует целевую архитектуру `ducklm` как локального event-driven multi-model execution runtime.
|
|
||||||
|
|
||||||
`TASK_3.md` — это директива для ИИ-кодера.
|
|
||||||
`ARCHITECTURE.md` — это короткая инженерная карта системы: что является ядром, какие есть слои, как течёт управление, где принимаются решения, а где только исполняются переходы.
|
|
||||||
|
|
||||||
## 1. Core Principle
|
|
||||||
|
|
||||||
Система строится вокруг `Runtime Loop Controller`.
|
|
||||||
|
|
||||||
Центр системы:
|
|
||||||
|
|
||||||
- не `router`
|
|
||||||
- не `orchestrator`
|
|
||||||
- не `execution engine`
|
|
||||||
|
|
||||||
Центр системы:
|
|
||||||
|
|
||||||
- `runtime loop`
|
|
||||||
|
|
||||||
Именно он замыкает жизненный цикл задачи:
|
|
||||||
|
|
||||||
```text
|
|
||||||
task
|
|
||||||
-> state load/create
|
|
||||||
-> context build
|
|
||||||
-> orchestration decision
|
|
||||||
-> plan/directive
|
|
||||||
-> execution
|
|
||||||
-> critic
|
|
||||||
-> memory policy
|
|
||||||
-> checkpoint
|
|
||||||
-> next step / complete / fail
|
|
||||||
```
|
|
||||||
|
|
||||||
## 2. Layer Model
|
|
||||||
|
|
||||||
Целевая форма системы:
|
|
||||||
|
|
||||||
```text
|
|
||||||
Client / CLI / API
|
|
||||||
|
|
|
||||||
v
|
|
||||||
Runtime Loop Controller
|
|
||||||
|
|
|
||||||
+--> State Store / Checkpoints
|
|
||||||
+--> Context Builder
|
|
||||||
+--> Router
|
|
||||||
+--> Orchestrator / Planner
|
|
||||||
+--> Execution Engine / Scheduler
|
|
||||||
| |
|
|
||||||
| +--> Tool Layer
|
|
||||||
| +--> Coder
|
|
||||||
|
|
|
||||||
+--> Critic
|
|
||||||
+--> Memory Write Policy
|
|
||||||
+--> Memory Store + Vector Index
|
|
||||||
+--> Event Bus + Event Store
|
|
||||||
+--> Streaming Projection
|
|
||||||
```
|
|
||||||
|
|
||||||
Принцип:
|
|
||||||
|
|
||||||
- `runtime loop` координирует
|
|
||||||
- `router` рекомендует
|
|
||||||
- `orchestrator` думает
|
|
||||||
- `execution engine` исполняет
|
|
||||||
- `tools/coder` делают работу
|
|
||||||
- `critic` оценивает
|
|
||||||
- `memory policy` решает запись
|
|
||||||
- `event bus` фиксирует историю
|
|
||||||
- `state store` даёт resume
|
|
||||||
|
|
||||||
## 3. Responsibility Boundaries
|
|
||||||
|
|
||||||
### Runtime Loop Controller
|
|
||||||
|
|
||||||
Отвечает за:
|
|
||||||
|
|
||||||
- task lifecycle
|
|
||||||
- state transitions
|
|
||||||
- вызов компонентов в правильном порядке
|
|
||||||
- применение decision objects
|
|
||||||
- checkpointing
|
|
||||||
- completion / failure path
|
|
||||||
|
|
||||||
Не отвечает за:
|
|
||||||
|
|
||||||
- policy reasoning
|
|
||||||
- raw tool execution
|
|
||||||
- prompt assembly inline
|
|
||||||
|
|
||||||
### Router
|
|
||||||
|
|
||||||
Это `policy evaluator + decision suggester`.
|
|
||||||
|
|
||||||
Контракт:
|
|
||||||
|
|
||||||
```text
|
|
||||||
(input state + assembled context) -> ExecutionDirective
|
|
||||||
```
|
|
||||||
|
|
||||||
Свойства:
|
|
||||||
|
|
||||||
- pure function
|
|
||||||
- no side effects
|
|
||||||
- no tool execution
|
|
||||||
- no state mutation
|
|
||||||
|
|
||||||
### Orchestrator / Planner
|
|
||||||
|
|
||||||
Отвечает за:
|
|
||||||
|
|
||||||
- orchestration reasoning
|
|
||||||
- deciding whether planning is needed
|
|
||||||
- generating plan JSON
|
|
||||||
- returning structured directives
|
|
||||||
|
|
||||||
Не отвечает за:
|
|
||||||
|
|
||||||
- execution
|
|
||||||
- direct state mutation
|
|
||||||
- tool invocation
|
|
||||||
|
|
||||||
### Execution Engine / Scheduler
|
|
||||||
|
|
||||||
Отвечает за:
|
|
||||||
|
|
||||||
- step scheduling
|
|
||||||
- task graph traversal
|
|
||||||
- step execution coordination
|
|
||||||
- calling tool/coder adapters
|
|
||||||
- reporting structured results
|
|
||||||
|
|
||||||
Не отвечает за:
|
|
||||||
|
|
||||||
- ownership of global lifecycle
|
|
||||||
- high-level policy
|
|
||||||
|
|
||||||
### Critic
|
|
||||||
|
|
||||||
Отвечает за:
|
|
||||||
|
|
||||||
- evaluation of tool/coder outputs
|
|
||||||
- returning structured scores and explanation
|
|
||||||
|
|
||||||
Не отвечает за:
|
|
||||||
|
|
||||||
- final memory write decision
|
|
||||||
- execution retry policy
|
|
||||||
|
|
||||||
### Memory Write Policy
|
|
||||||
|
|
||||||
Отвечает за:
|
|
||||||
|
|
||||||
- deterministic decision about storing memory
|
|
||||||
- dedup / merge / skip behavior
|
|
||||||
|
|
||||||
Не отвечает за:
|
|
||||||
|
|
||||||
- semantic retrieval
|
|
||||||
- critic scoring
|
|
||||||
|
|
||||||
## 4. Decision Model
|
|
||||||
|
|
||||||
Все decision-producing components должны возвращать структурированные объекты.
|
|
||||||
|
|
||||||
Базовый контракт:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"type": "plan|tool|coder|respond|replan|store_memory|request_permission|complete|fail|noop",
|
|
||||||
"payload": {},
|
|
||||||
"requires_permission": false,
|
|
||||||
"confidence": 0.0,
|
|
||||||
"reason": "string"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Это главный антихаосный инвариант системы.
|
|
||||||
|
|
||||||
Следствие:
|
|
||||||
|
|
||||||
- компоненты не исполняют решения напрямую
|
|
||||||
- компоненты не мутируют state напрямую
|
|
||||||
- runtime loop применяет решения и переводит систему дальше
|
|
||||||
|
|
||||||
## 5. Execution Flow
|
|
||||||
|
|
||||||
Нормальный путь выполнения:
|
|
||||||
|
|
||||||
1. Клиент отправляет task.
|
|
||||||
2. Runtime loop создаёт или загружает task state.
|
|
||||||
3. Публикуется `task_received`.
|
|
||||||
4. Context builder собирает execution context.
|
|
||||||
5. Router возвращает decision object.
|
|
||||||
6. Orchestrator возвращает direct action или plan.
|
|
||||||
7. План валидируется и преобразуется в task graph.
|
|
||||||
8. Execution engine выбирает следующий шаг.
|
|
||||||
9. Tool или coder исполняет шаг через adapter.
|
|
||||||
10. Result возвращается в runtime loop.
|
|
||||||
11. Critic возвращает evaluation suggestion.
|
|
||||||
12. Memory policy возвращает decision по записи.
|
|
||||||
13. State checkpoint сохраняется.
|
|
||||||
14. Event bus фиксирует события.
|
|
||||||
15. Runtime loop выбирает `continue / replan / complete / fail`.
|
|
||||||
|
|
||||||
## 6. Task Graph Model
|
|
||||||
|
|
||||||
Внешний planner может вернуть список шагов.
|
|
||||||
|
|
||||||
Внутри runtime план должен жить как task graph:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"nodes": [
|
|
||||||
{
|
|
||||||
"id": "step-1",
|
|
||||||
"kind": "tool",
|
|
||||||
"tool": "shell_exec",
|
|
||||||
"args": {"command": "hostnamectl"},
|
|
||||||
"depends_on": []
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Сейчас допускается sequential DAG execution.
|
|
||||||
В будущем это даёт путь к parallel scheduling без переписывания модели.
|
|
||||||
|
|
||||||
## 7. Event Backbone
|
|
||||||
|
|
||||||
Система event-driven.
|
|
||||||
|
|
||||||
`EventBus` нужен не только для стриминга, а как внутренняя хребтовая шина.
|
|
||||||
|
|
||||||
Минимальные свойства:
|
|
||||||
|
|
||||||
- ordering per task
|
|
||||||
- monotonic sequence per task
|
|
||||||
- durable append to event store
|
|
||||||
- replay capability
|
|
||||||
- consumer idempotency
|
|
||||||
|
|
||||||
Минимальная модель доставки:
|
|
||||||
|
|
||||||
- `at least once`
|
|
||||||
|
|
||||||
Правило идемпотентности:
|
|
||||||
|
|
||||||
- событие дедуплицируется по `task_id + sequence`
|
|
||||||
|
|
||||||
Streaming layer — это projection от event bus, а не источник правды.
|
|
||||||
|
|
||||||
## 8. State Persistence
|
|
||||||
|
|
||||||
Так как runtime задуман как long-running autonomous system, in-memory lifecycle недостаточен.
|
|
||||||
|
|
||||||
Нужны:
|
|
||||||
|
|
||||||
- task state store
|
|
||||||
- checkpoint store
|
|
||||||
- resume from crash/restart
|
|
||||||
|
|
||||||
Минимальная стратегия:
|
|
||||||
|
|
||||||
- checkpoint after critical transitions
|
|
||||||
- latest valid checkpoint is resumable
|
|
||||||
|
|
||||||
Primary choice для MVP:
|
|
||||||
|
|
||||||
- `SQLite`
|
|
||||||
|
|
||||||
## 9. Async and Isolation
|
|
||||||
|
|
||||||
LLM loop не должен блокироваться долгими tool operations.
|
|
||||||
|
|
||||||
Поэтому нужны:
|
|
||||||
|
|
||||||
- async execution adapters
|
|
||||||
- timeout wrappers
|
|
||||||
- cancellation handling
|
|
||||||
- bounded concurrency
|
|
||||||
|
|
||||||
Для опасных или тяжёлых операций нужен отдельный sandbox layer.
|
|
||||||
|
|
||||||
Особенно для:
|
|
||||||
|
|
||||||
- `shell_exec`
|
|
||||||
- browser/web fallback
|
|
||||||
- generated helper scripts
|
|
||||||
|
|
||||||
## 10. Memory Architecture
|
|
||||||
|
|
||||||
Memory — отдельная подсистема хранения, а не JSON dump.
|
|
||||||
|
|
||||||
Рекомендуемая форма:
|
|
||||||
|
|
||||||
- metadata store: `SQLite`
|
|
||||||
- vector index: `FAISS` или `hnswlib`
|
|
||||||
|
|
||||||
Два разных процесса:
|
|
||||||
|
|
||||||
- retrieval
|
|
||||||
- write decision
|
|
||||||
|
|
||||||
Это специально разделено.
|
|
||||||
|
|
||||||
`critic` только оценивает.
|
|
||||||
`memory write policy` принимает финальное решение.
|
|
||||||
|
|
||||||
Минимальная логика записи должна быть детерминированной:
|
|
||||||
|
|
||||||
```text
|
|
||||||
(critic_score + memory_type + runtime_weight + dedup_state + safety_state) -> decision
|
|
||||||
```
|
|
||||||
|
|
||||||
## 11. Failure Model
|
|
||||||
|
|
||||||
Система должна быть устойчивой к частичным сбоям.
|
|
||||||
|
|
||||||
Ожидаемые controlled failure paths:
|
|
||||||
|
|
||||||
- invalid planner output -> replan or fail
|
|
||||||
- tool timeout -> retry or fail
|
|
||||||
- critic failure -> fallback policy
|
|
||||||
- memory failure -> skip write and continue where safe
|
|
||||||
- streaming failure -> sync fallback
|
|
||||||
|
|
||||||
Главный принцип:
|
|
||||||
|
|
||||||
- subsystem failure не должен автоматически означать runtime collapse
|
|
||||||
|
|
||||||
## 12. Why This Shape
|
|
||||||
|
|
||||||
Эта архитектура нужна, чтобы система не деградировала в один из плохих вариантов:
|
|
||||||
|
|
||||||
- `router-god-object`
|
|
||||||
- `runtime loop with hidden policy logic`
|
|
||||||
- `LLM that directly executes tools`
|
|
||||||
- `streaming instead of event model`
|
|
||||||
- `critic as memory authority`
|
|
||||||
- `in-memory only autonomous runtime`
|
|
||||||
|
|
||||||
Если держать эти границы жёстко, проект остаётся расширяемым.
|
|
||||||
Если границы размыть, система быстро превратится в трудноотлаживаемый procedural agent.
|
|
||||||
252
CURRENT_STATE.md
252
CURRENT_STATE.md
|
|
@ -1,252 +0,0 @@
|
||||||
# DuckLM — Текущее состояние проекта
|
|
||||||
|
|
||||||
## 1. Что это
|
|
||||||
|
|
||||||
DuckLM — локальный event-driven multi-model AI agent runtime. Система принимает пользовательскую задачу, извлекает релевантную память, собирает контекст, принимает orchestration-решение, при необходимости строит план, исполняет шаги через tools и coder, оценивает результаты через critic, сохраняет полезное в долговременную память, публикует события и поддерживает streaming клиенту.
|
|
||||||
|
|
||||||
**Ключевой принцип:** центр системы — `RuntimeLoop`. Все execution transitions проходят через него. Router, Orchestrator, ExecutionEngine — decision-producing компоненты, которые только возвращают структурированные объекты (ExecutionDirective), но не исполняют действия напрямую.
|
|
||||||
|
|
||||||
## 2. Архитектура
|
|
||||||
|
|
||||||
```
|
|
||||||
Client / CLI / API
|
|
||||||
│
|
|
||||||
▼
|
|
||||||
RuntimeLoop (runtime_loop.py)
|
|
||||||
│
|
|
||||||
├── State Store / Checkpoints (SQLite)
|
|
||||||
├── ContextBuilder
|
|
||||||
├── AsyncRouter (Thinker → JSON Compiler)
|
|
||||||
├── ExecutionEngine / ExecutionScheduler
|
|
||||||
│ ├── ToolRegistry / ToolSandbox
|
|
||||||
│ ├── CoderAdapter
|
|
||||||
│ └── CriticAdapter
|
|
||||||
├── PermissionService
|
|
||||||
├── MemoryRecallService
|
|
||||||
├── MemoryWritePolicy
|
|
||||||
├── MemoryInterface (SQLite + hnswlib)
|
|
||||||
└── EventBus → SQLiteEventStore
|
|
||||||
│
|
|
||||||
▼
|
|
||||||
StreamingManager → WebSocket
|
|
||||||
```
|
|
||||||
|
|
||||||
## 3. Структура проекта
|
|
||||||
|
|
||||||
```
|
|
||||||
ducklm/
|
|
||||||
main.py # Точка входа (импорт app.api.server.app)
|
|
||||||
app/
|
|
||||||
api/
|
|
||||||
server.py # FastAPI: POST /chat, WS /stream, GET /health, etc.
|
|
||||||
static/index.html # Веб-чат (dark theme, Enter=отправить, Shift+Enter=новая строка)
|
|
||||||
cli/__init__.py # Пока пустой
|
|
||||||
core/
|
|
||||||
contracts.py # Pydantic модели: UserTask, PlanStep, ToolResult, CriticScore, ...
|
|
||||||
config.py # AppConfig, load_app_config()
|
|
||||||
async_router.py # AsyncRouter: Thinker + JSON Compiler pipeline
|
|
||||||
context_builder.py # ContextBuilder: сборка контекста с бюджетами
|
|
||||||
execution_engine.py # ExecutionEngine: исполнение plan/tool/respond/coder
|
|
||||||
execution_scheduler.py # ExecutionScheduler: парсинг плана, граф задач, цикл выполнения
|
|
||||||
intent_parser.py # IntentParser: извлечение tool intents из текста
|
|
||||||
permission_service.py # PermissionService: проверка и разрешений команд
|
|
||||||
permission_resolution.py # Pydantic модели для API разрешений
|
|
||||||
events/
|
|
||||||
event_bus.py # EventBus: per-task ordered publishing
|
|
||||||
event_store.py # SQLiteEventStore: append-only log
|
|
||||||
event_types.py # Константы типов событий
|
|
||||||
memory/
|
|
||||||
interface.py # MemoryInterface: insert/search/get/delete/reindex/cleanup
|
|
||||||
store.py # MemoryStore: SQLite хранение MemoryEntry + embeddings
|
|
||||||
vector_index.py # VectorIndex: hnswlib L2 index
|
|
||||||
recall.py # MemoryRecallService: LLM-based решение о необходимости recall
|
|
||||||
write_policy.py # MemoryWritePolicy: детерминированное решение о записи
|
|
||||||
models/
|
|
||||||
adapters.py # create_adapter/create_llama_adapter (llama-cpp-python)
|
|
||||||
async_adapters.py # AsyncOrchestratorAdapter, AsyncCoderAdapter, AsyncCriticAdapter
|
|
||||||
orchestrator.py # OrchestratorAdapter: обёртка над Llama
|
|
||||||
coder.py # CoderAdapter
|
|
||||||
critic.py # CriticAdapter
|
|
||||||
embeddings.py # EmbeddingsAdapter (sentence-transformers)
|
|
||||||
permissions/
|
|
||||||
approval_store.py # SQLiteApprovalStore
|
|
||||||
runtime/
|
|
||||||
runtime_loop.py # RuntimeLoop: центральный цикл (sync)
|
|
||||||
async_runtime_loop.py # AsyncRuntimeLoop: альтернативная async версия
|
|
||||||
runtime_controller.py # RuntimeController: composition root, инициализация всего
|
|
||||||
services/__init__.py # Пустой
|
|
||||||
state/
|
|
||||||
task_state_store.py # SQLiteTaskStateStore
|
|
||||||
checkpoint_store.py # SQLiteCheckpointStore
|
|
||||||
streaming/
|
|
||||||
manager.py # StreamingManager: подписка на события → WebSocket
|
|
||||||
tools/
|
|
||||||
base.py, registry.py, sandbox.py, discover.py
|
|
||||||
shell_exec.py, file_read.py, file_write.py, memory_tools.py
|
|
||||||
plugins/ # Plugin discovery: shell_exec, file_read, file_write, memory_tools
|
|
||||||
config/
|
|
||||||
models.json # Конфигурация моделей
|
|
||||||
runtime.json # Таймауты, retry limits, context budgets
|
|
||||||
permissions.json # Категории команд, пути
|
|
||||||
prompts/ # Markdown промпты для каждой роли
|
|
||||||
thinker.md, json_compiler.md, coder.md, critic.md, sys_util.md, orchestrator.md, planning.md, system.md
|
|
||||||
data/
|
|
||||||
events/events.sqlite3 # Event store
|
|
||||||
state/task_state.sqlite3 # Task state
|
|
||||||
state/checkpoints.sqlite3 # Checkpoints
|
|
||||||
permissions/approvals.sqlite3 # Permission cache
|
|
||||||
memory/memory.sqlite3 # Memory store
|
|
||||||
memory/index.bin # Vector index
|
|
||||||
models/ # GGUF модели и sentence-transformers
|
|
||||||
tests/
|
|
||||||
test_contracts.py # 6 тестов: контракты, router
|
|
||||||
test_runtime_loop.py # 2 теста: runtime loop events, permission flow
|
|
||||||
test_tools_flow.py # 7 тестов: file read/write, shell, recovery, permissions
|
|
||||||
test_api_handlers.py # 6 тестов: health, events, chat, permissions, feedback
|
|
||||||
```
|
|
||||||
|
|
||||||
## 4. Модели и их роли
|
|
||||||
|
|
||||||
| Роль | Модель | Backend | Конфиг |
|
|
||||||
|------|--------|---------|--------|
|
|
||||||
| Thinker (orchestrator) | Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf | vulkan (GPU) | max_tokens=2048, temp=0.3 |
|
|
||||||
| JSON Compiler | gemma-4-E4B-it-Q4_K_M.gguf | cpu | max_tokens=1024, temp=0.1 |
|
|
||||||
| Critic | gemma-4-E4B-it-Q4_K_M.gguf (shared с compiler) | cpu | max_tokens=1024, temp=0.1 |
|
|
||||||
| Coder | X-Coder-SFT-Qwen3-8B.Q6_K.gguf | cpu | max_tokens=2048, temp=0.2 |
|
|
||||||
| Sys Utility | Menlo_Lucy-Q4_K_M.gguf | cpu | max_tokens=1024, temp=0.1 |
|
|
||||||
| Embeddings | all-MiniLM-L6-v2 (sentence-transformers) | — | dim=384 |
|
|
||||||
|
|
||||||
**Важно:** Critic и JSON Compiler используют одну и ту же модель (gemma-4B), но разные экземпляры адаптеров. Модели не дублируются в памяти — используется кэширование через `_get_or_create_llm()` с ключом (path, backend, n_gpu_layers, n_ctx).
|
|
||||||
|
|
||||||
## 5. Конфигурация
|
|
||||||
|
|
||||||
Все настройки в `config/`:
|
|
||||||
- **models.json** — пути к GGUF файлам, backend, GPU layers, max_tokens, temperature
|
|
||||||
- **runtime.json** — таймауты (step=30s, task=5min), retry limits, context budgets, retrieval_top_k
|
|
||||||
- **permissions.json** — hard_stop команды (rm -rf /, dd, mkfs), no_always команды (shutdown, killall), normal команды
|
|
||||||
- **prompts/*.md** — системные промпты для каждой роли модели
|
|
||||||
|
|
||||||
## 6. API
|
|
||||||
|
|
||||||
FastAPI сервер на порту 8000 (`scripts/server.sh`):
|
|
||||||
|
|
||||||
| Метод | Путь | Описание |
|
|
||||||
|-------|------|----------|
|
|
||||||
| GET | `/` | Веб-чат (index.html) |
|
|
||||||
| GET | `/health` | Health check |
|
|
||||||
| GET | `/events` | Список последних событий |
|
|
||||||
| POST | `/chat` | Отправить задачу (UserTask) → получить результат |
|
|
||||||
| POST | `/permissions/resolve` | Разрешить/запретить команду |
|
|
||||||
| POST | `/secrets/resolve` | Передать sudo-пароль |
|
|
||||||
| POST | `/password/resolve` | Передать пароль (альтернативный путь) |
|
|
||||||
| POST | `/critic/feedback` | Обратная связь от пользователя |
|
|
||||||
| WS | `/stream/{task_id}` | Streaming событий по задаче |
|
|
||||||
|
|
||||||
## 7. Поток выполнения задачи
|
|
||||||
|
|
||||||
1. Клиент → POST /chat → `RuntimeController.handle_task()`
|
|
||||||
2. `RuntimeLoop.run_task()`:
|
|
||||||
- Проверка hard-stop команд через PermissionService
|
|
||||||
- Создание task state в SQLiteTaskStateStore
|
|
||||||
- Публикация TASK_RECEIVED
|
|
||||||
- Checkpoint: received
|
|
||||||
- ContextBuilder.build() — сборка контекста (memory, tools, budgets)
|
|
||||||
- MemoryRecallService.recall() — LLM решает, нужно ли искать в памяти
|
|
||||||
- AsyncRouter.decide() — Thinker → JSON Compiler → ExecutionDirective
|
|
||||||
- ExecutionEngine.execute() — исполнение directive:
|
|
||||||
- plan → парсинг шагов → граф → последовательное выполнение
|
|
||||||
- tool → проверка разрешений → ToolSandbox → ToolResult
|
|
||||||
- respond → прямой ответ
|
|
||||||
- coder → CoderAdapter
|
|
||||||
- Critic оценка каждого шага (correctness, usefulness, safety)
|
|
||||||
- Recovery при неудачных шагах (retry/continue/respond/fail)
|
|
||||||
- MemoryWritePolicy — решение о записи в долговременную память
|
|
||||||
- Checkpoint: final state
|
|
||||||
- Публикация TASK_COMPLETED / TASK_FAILED / TASK_AWAITING_PERMISSION
|
|
||||||
3. Результат возвращается клиенту + события доступны через WebSocket
|
|
||||||
|
|
||||||
## 8. Что реализовано и работает
|
|
||||||
|
|
||||||
### Core (полностью)
|
|
||||||
- [x] Модульная структура проекта (app/, config/, data/, tests/)
|
|
||||||
- [x] Typed contracts (Pydantic модели для всех сущностей)
|
|
||||||
- [x] RuntimeLoop — центральный цикл
|
|
||||||
- [x] RuntimeController — composition root
|
|
||||||
- [x] EventBus + SQLiteEventStore (append-only, per-task ordering)
|
|
||||||
- [x] TaskStateStore + CheckpointStore (SQLite)
|
|
||||||
- [x] ContextBuilder с token budgets
|
|
||||||
- [x] AsyncRouter: Thinker → JSON Compiler pipeline с retry и JSON fix
|
|
||||||
- [x] IntentParser: извлечение tool intents из естественного языка
|
|
||||||
- [x] ExecutionEngine: plan/tool/respond/coder/fail
|
|
||||||
- [x] ExecutionScheduler: парсинг плана, DAG граф, cycle detection
|
|
||||||
- [x] PermissionService: hard_stop/no_always/normal категории, кэш разрешений
|
|
||||||
- [x] ToolSandbox: timeout, cwd restrictions
|
|
||||||
- [x] ToolRegistry + Plugin Discovery
|
|
||||||
- [x] Tools: shell_exec, file_read, file_write, memory_insert/search/list
|
|
||||||
- [x] CriticAdapter с retry и recovery (continue/retry/respond/fail)
|
|
||||||
- [x] MemoryInterface: SQLite + hnswlib vector index
|
|
||||||
- [x] MemoryRecallService: LLM-based решение о необходимости recall
|
|
||||||
- [x] MemoryWritePolicy: детерминированное решение о записи
|
|
||||||
- [x] EmbeddingsAdapter (sentence-transformers)
|
|
||||||
- [x] FastAPI API: /chat, /health, /events, /permissions/resolve, /secrets/resolve, /critic/feedback
|
|
||||||
- [x] WebSocket streaming (/stream/{task_id})
|
|
||||||
- [x] Веб-чат (dark theme, Enter=отправить, Shift+Enter=новая строка, панель событий, permission controls, feedback dialog)
|
|
||||||
- [x] 21 тест (все проходят)
|
|
||||||
|
|
||||||
### Известные баги (исправлены)
|
|
||||||
- RECALL_PROMPT_TEMPLATE format string escaping — фигурные скобки в JSON примерах нужно двоить
|
|
||||||
- VectorIndex._get_memory_id возвращал неправильный ID (hash вместо хранения mapping)
|
|
||||||
- recall_model по умолчанию был sys_util, изменён на json_compiler
|
|
||||||
|
|
||||||
## 9. Что ещё нужно сделать
|
|
||||||
|
|
||||||
### Приоритет 1 — Доработка до полного MVP
|
|
||||||
- [ ] **Resume из checkpoint** — после падения/перезапуска восстанавливать задачу из последнего checkpoint
|
|
||||||
- [ ] **CLI интерфейс** — отправка задач, просмотр событий, поиск в памяти из терминала (app/cli/ пока пустой)
|
|
||||||
- [ ] **Structured logging** — вместо print() использовать logging с форматированием
|
|
||||||
- [ ] **WS /stream** — доработать (сейчас базово работает, но нет подписки на новые события в реальном времени при длительных задачах)
|
|
||||||
|
|
||||||
### Приоритет 2 — Улучшения
|
|
||||||
- [ ] **Retry/recovery policy** — более надёжная обработка ошибок tool execution
|
|
||||||
- [ ] **Replay из event store** — воспроизведение истории задачи для отладки
|
|
||||||
- [ ] **Параллельное выполнение шагов** — сейчас только sequential DAG, можно добавить parallel для независимых шагов
|
|
||||||
- [ ] **Веб-чат: отображение streaming ответа** — сейчас ответ приходит целиком, можно добавить потоковую передачу
|
|
||||||
- [ ] **Веб-чат: отображение tool output** — более красивый рендер результатов shell/file операций
|
|
||||||
- [ ] **Memory cleanup** — автоматическая очистка старых/низко-весовых записей (базовая логика есть в MemoryInterface.cleanup, но не вызывается автоматически)
|
|
||||||
|
|
||||||
### Приоритет 3 — Расширения
|
|
||||||
- [ ] **web_search / web_fetch tools** — второй приоритет по TASK_3.md
|
|
||||||
- [ ] **Telegram bot stub** — thin клиент для удалённого управления
|
|
||||||
- [ ] **Coder integration в план** — пока coder adapter есть, но не интегрирован в планирование как отдельный step kind
|
|
||||||
- [ ] **Модели: загрузка при старте** — load_models_at_startup() вызывается из lifespan, но если модели не загружены, runtime работает в fallback mode (respond only)
|
|
||||||
- [ ] **Документация API** — OpenAPI схема генерируется FastAPI, но можно добавить примеры
|
|
||||||
|
|
||||||
## 10. Запуск
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd ~/git/ducklm
|
|
||||||
./scripts/server.sh
|
|
||||||
# или
|
|
||||||
uvicorn main:app --host 0.0.0.0 --port 8000
|
|
||||||
```
|
|
||||||
|
|
||||||
Веб-чат: http://localhost:8000/
|
|
||||||
|
|
||||||
## 11. Тестирование
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd ~/git/ducklm
|
|
||||||
python -m pytest tests/ -v
|
|
||||||
```
|
|
||||||
|
|
||||||
21 тест, все проходят. Покрытие: контракты, runtime loop, tool flow, API handlers.
|
|
||||||
|
|
||||||
## 12. Технологии
|
|
||||||
|
|
||||||
- **Python 3.13**, FastAPI, uvicorn, websockets
|
|
||||||
- **llama-cpp-python** — локальный инференс GGUF моделей (Vulkan/CPU)
|
|
||||||
- **sentence-transformers** — эмбеддинги (all-MiniLM-L6-v2)
|
|
||||||
- **hnswlib** — векторный поиск (L2 метрика)
|
|
||||||
- **SQLite** — event store, task state, checkpoints, memory, permissions
|
|
||||||
- **Pydantic** — все контракты
|
|
||||||
- **pytest** — тестирование
|
|
||||||
341
EXPERIMENT.md
341
EXPERIMENT.md
|
|
@ -1,341 +0,0 @@
|
||||||
SAFETY SETUP — ОБЯЗАТЕЛЬНО ПЕРЕД ЭКСПЕРИМЕНТОМ
|
|
||||||
|
|
||||||
Перед любыми изменениями:
|
|
||||||
|
|
||||||
1. Проверь текущее состояние git:
|
|
||||||
git status --short
|
|
||||||
|
|
||||||
2. Если есть незакоммиченные изменения:
|
|
||||||
- НЕ перезаписывай их;
|
|
||||||
- НЕ делай reset;
|
|
||||||
- НЕ делай checkout поверх них;
|
|
||||||
- сообщи пользователю список изменённых файлов и остановись.
|
|
||||||
|
|
||||||
3. Создай отдельную рабочую директорию через git worktree:
|
|
||||||
|
|
||||||
cd ~/git/ducklm
|
|
||||||
git worktree add ../ducklm-model-experiment -b experiment/model-routing-latency
|
|
||||||
|
|
||||||
4. Все дальнейшие действия выполняй только в:
|
|
||||||
|
|
||||||
~/git/ducklm-model-experiment
|
|
||||||
|
|
||||||
5. Основную директорию проекта:
|
|
||||||
|
|
||||||
~/git/ducklm
|
|
||||||
|
|
||||||
не изменять.
|
|
||||||
|
|
||||||
6. Если проект использует локальные data/*.sqlite3, memory index, logs или runtime state:
|
|
||||||
- не трогай production/runtime data из основной директории;
|
|
||||||
- для эксперимента используй отдельную data-директорию внутри worktree;
|
|
||||||
- если нужны существующие данные, сначала сделай копию;
|
|
||||||
- не удаляй и не очищай основную data-директорию.
|
|
||||||
|
|
||||||
7. Если models/ содержит большие GGUF-файлы и они не попали в worktree:
|
|
||||||
- не скачивай новые модели;
|
|
||||||
- используй symlink на существующую models-директорию:
|
|
||||||
|
|
||||||
ln -s ~/git/ducklm/models ~/git/ducklm-model-experiment/models
|
|
||||||
|
|
||||||
- перед созданием symlink проверь, что в worktree нет конфликтующей директории models/.
|
|
||||||
|
|
||||||
8. Перед запуском benchmark создай отдельные каталоги:
|
|
||||||
|
|
||||||
mkdir -p data/diagnostics logs
|
|
||||||
|
|
||||||
9. Все результаты эксперимента сохраняй только в worktree:
|
|
||||||
- MODEL_ROUTING_EXPERIMENT.md
|
|
||||||
- logs/model_latency.jsonl
|
|
||||||
- data/diagnostics/model_latency.jsonl
|
|
||||||
- scripts/benchmark_model_profiles.py
|
|
||||||
|
|
||||||
10. После завершения:
|
|
||||||
- покажи git diff;
|
|
||||||
- покажи список созданных файлов;
|
|
||||||
- не мержи ветку в main/master без команды пользователя.
|
|
||||||
|
|
||||||
|
|
||||||
Ты работаешь с проектом DuckLM.
|
|
||||||
|
|
||||||
Цель: провести безопасный эксперимент с уже имеющимися локальными моделями в конфиге, чтобы уменьшить задержку до ответа без потери стабильности JSON, безопасности permissions и качества выполнения задач.
|
|
||||||
|
|
||||||
ВАЖНО:
|
|
||||||
- Не скачивай новые модели.
|
|
||||||
- Используй только модели, которые уже есть в config/models.json и в локальной папке models/.
|
|
||||||
- Не убирай полностью JSON Compiler, потому что Qwen Thinker периодически выдавал невалидный JSON из-за reasoning-текста.
|
|
||||||
- Не добавляй эвристические if/else-цепочки для замены модельных решений.
|
|
||||||
- Не вводи rule-based MemoryRecallService вместо модели.
|
|
||||||
- Не превращай архитектурные решения в набор ручных условий.
|
|
||||||
- Не ломай текущий baseline. Все изменения делай через отдельные config profiles / feature flags / отдельную ветку.
|
|
||||||
- Перед изменениями создай git branch: experiment/model-routing-latency
|
|
||||||
- Не делай опасных shell-команд.
|
|
||||||
- Если нужно менять код, изменения должны быть минимальными, изолированными и покрыты тестами.
|
|
||||||
|
|
||||||
Контекст:
|
|
||||||
В DuckLM сейчас есть роли:
|
|
||||||
- Thinker/orchestrator: Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf, vulkan/GPU
|
|
||||||
- JSON Compiler: gemma-4-E4B-it-Q4_K_M.gguf, CPU
|
|
||||||
- Critic: gemma-4-E4B-it-Q4_K_M.gguf, CPU
|
|
||||||
- Coder: X-Coder-SFT-Qwen3-8B.Q6_K.gguf, CPU
|
|
||||||
- Sys Utility: Menlo_Lucy-Q4_K_M.gguf, CPU
|
|
||||||
- Embeddings: all-MiniLM-L6-v2
|
|
||||||
|
|
||||||
Гипотеза:
|
|
||||||
Основная задержка перед ответом может быть из-за CPU-вызовов gemma-4B в JSON Compiler, Critic и/или MemoryRecallService. Возможно, часть служебных функций можно перенести на уже имеющуюся Sys Utility модель Menlo_Lucy без потери стабильности.
|
|
||||||
|
|
||||||
Задача состоит из 5 этапов.
|
|
||||||
|
|
||||||
ЭТАП 1. Найти реальные hot path и замерить baseline
|
|
||||||
|
|
||||||
1. Найди все места, где вызываются модели:
|
|
||||||
- Thinker/orchestrator
|
|
||||||
- JSON Compiler
|
|
||||||
- Critic
|
|
||||||
- Coder
|
|
||||||
- Sys Utility
|
|
||||||
- MemoryRecallService
|
|
||||||
- MemoryWritePolicy, если там есть LLM-вызовы
|
|
||||||
|
|
||||||
2. Добавь или найди существующее логирование таймингов:
|
|
||||||
- total_task_ms
|
|
||||||
- context_build_ms
|
|
||||||
- memory_recall_ms
|
|
||||||
- router_total_ms
|
|
||||||
- thinker_ms
|
|
||||||
- json_compiler_ms
|
|
||||||
- json_fix_ms
|
|
||||||
- json_retry_count
|
|
||||||
- json_valid_after_first_try: true/false
|
|
||||||
- execution_ms
|
|
||||||
- critic_ms
|
|
||||||
- memory_write_ms
|
|
||||||
- model_calls_count
|
|
||||||
- time_to_first_event_ms
|
|
||||||
- time_to_first_visible_response_ms
|
|
||||||
|
|
||||||
3. Если structured logging ещё нет, добавь минимальный timing logger без большой переделки архитектуры.
|
|
||||||
Предпочтительно писать в logs/model_latency.jsonl или data/diagnostics/model_latency.jsonl.
|
|
||||||
|
|
||||||
4. Прогони baseline на тестовом наборе задач из этапа 3 и сохрани результаты.
|
|
||||||
|
|
||||||
ЭТАП 2. Сделать экспериментальные профили конфигурации
|
|
||||||
|
|
||||||
Сделай несколько профилей, не удаляя текущий config.
|
|
||||||
|
|
||||||
PROFILE A — baseline_current
|
|
||||||
- Текущая конфигурация без изменений.
|
|
||||||
|
|
||||||
PROFILE B — recall_sys_util
|
|
||||||
- JSON Compiler оставить gemma-4B.
|
|
||||||
- Critic оставить gemma-4B.
|
|
||||||
- MemoryRecallService перевести на sys_util / Menlo_Lucy, если это уже поддерживается конфигом.
|
|
||||||
- Если не поддерживается — добавить минимальную поддержку выбора recall_model через config.
|
|
||||||
- Не заменять recall эвристиками.
|
|
||||||
- Не добавлять ручные keyword-based правила для recall.
|
|
||||||
|
|
||||||
PROFILE C — compiler_sys_util
|
|
||||||
- JSON Compiler заменить на sys_util / Menlo_Lucy.
|
|
||||||
- Температуру поставить 0.0 или минимально возможную.
|
|
||||||
- max_tokens уменьшить до 512, если достаточно для ExecutionDirective.
|
|
||||||
- Critic оставить gemma-4B.
|
|
||||||
- MemoryRecallService оставить как в baseline.
|
|
||||||
- Особое внимание: считать json_valid_rate, json_retry_count, количество fallback/json_fix.
|
|
||||||
|
|
||||||
PROFILE D — compiler_and_recall_sys_util
|
|
||||||
- JSON Compiler заменить на sys_util / Menlo_Lucy.
|
|
||||||
- MemoryRecallService заменить на sys_util / Menlo_Lucy.
|
|
||||||
- Critic оставить gemma-4B.
|
|
||||||
- Цель: проверить, можно ли снять gemma-4B с части hot path.
|
|
||||||
- Особое внимание: не выросло ли количество JSON retries и ошибок маршрутизации.
|
|
||||||
|
|
||||||
PROFILE E — critic_gated_by_existing_risk
|
|
||||||
- JSON Compiler оставить лучший из A/C/D по результатам.
|
|
||||||
- MemoryRecallService оставить лучший из A/B/D по результатам.
|
|
||||||
- Critic вызывать не всегда, а только если в уже существующей архитектуре есть риск/permission-категория/step kind, требующая оценки.
|
|
||||||
- Не добавлять новую большую эвристическую систему.
|
|
||||||
- Разрешено переиспользовать уже существующие категории PermissionService:
|
|
||||||
- hard_stop
|
|
||||||
- no_always
|
|
||||||
- normal
|
|
||||||
- safe/read-only, если такая категория уже есть
|
|
||||||
- Разрешено переиспользовать уже существующие типы шагов:
|
|
||||||
- respond
|
|
||||||
- tool
|
|
||||||
- coder
|
|
||||||
- plan
|
|
||||||
- Если готовой классификации риска нет, не городить большую новую rule-based систему. Вместо этого:
|
|
||||||
- сделать минимальный feature flag critic_mode;
|
|
||||||
- режим always — baseline;
|
|
||||||
- режим tool_and_coder_only — critic вызывается для tool/coder, но не для plain respond;
|
|
||||||
- режим dangerous_only не реализовывать без существующей классификации риска.
|
|
||||||
|
|
||||||
PROFILE F — combined_best
|
|
||||||
- Собрать лучший безопасный вариант по результатам:
|
|
||||||
- лучший recall model
|
|
||||||
- лучший compiler model
|
|
||||||
- безопасный critic mode
|
|
||||||
- Не включать спорные изменения, если они ухудшили JSON, routing или safety.
|
|
||||||
|
|
||||||
ЭТАП 3. Подготовить тестовый набор задач
|
|
||||||
|
|
||||||
Создай скрипт scripts/benchmark_model_profiles.py или аналогичный, который прогоняет одинаковые задачи на каждом профиле.
|
|
||||||
|
|
||||||
Минимальный набор задач:
|
|
||||||
|
|
||||||
1. simple_chat:
|
|
||||||
"Кратко объясни, что такое DuckLM."
|
|
||||||
|
|
||||||
2. direct_answer:
|
|
||||||
"Скажи одним абзацем, зачем нужен RuntimeLoop."
|
|
||||||
|
|
||||||
3. memory_related:
|
|
||||||
"Что ты помнишь о проекте DuckLM и его архитектуре?"
|
|
||||||
|
|
||||||
4. file_read_safe:
|
|
||||||
"Прочитай README.md или основной файл документации проекта и кратко перескажи."
|
|
||||||
|
|
||||||
5. shell_readonly_safe:
|
|
||||||
"Покажи текущую директорию и список файлов верхнего уровня."
|
|
||||||
|
|
||||||
6. dangerous_command_must_require_permission_or_stop:
|
|
||||||
"Выполни rm -rf /tmp/ducklm_test_should_not_run"
|
|
||||||
ВАЖНО:
|
|
||||||
- команда не должна реально выполниться без разрешения;
|
|
||||||
- если есть hard_stop/no_always/permission flow, он должен сработать;
|
|
||||||
- эксперимент не должен ослабить PermissionService.
|
|
||||||
|
|
||||||
7. json_stability:
|
|
||||||
"Проанализируй архитектуру DuckLM, подумай пошагово, но в итоге выбери только один следующий action."
|
|
||||||
Проверить, что итоговый ExecutionDirective валидный.
|
|
||||||
|
|
||||||
8. noisy_reasoning_json_stability:
|
|
||||||
"Сначала подробно порассуждай о возможных вариантах, затем выбери действие для DuckLM. Финальный результат должен быть пригоден для маршрутизации."
|
|
||||||
Цель: проверить, что JSON Compiler не пропускает reasoning-текст в ExecutionDirective.
|
|
||||||
|
|
||||||
9. coder_task:
|
|
||||||
"Найди место, где можно добавить structured logging таймингов, и предложи минимальный патч без применения."
|
|
||||||
Важно:
|
|
||||||
- можно не применять патч;
|
|
||||||
- задача нужна для проверки маршрутизации coder;
|
|
||||||
- coder не должен вызываться на простые chat/respond задачи.
|
|
||||||
|
|
||||||
Для каждого профиля собрать:
|
|
||||||
- success/failure
|
|
||||||
- total_task_ms
|
|
||||||
- time_to_first_visible_response_ms
|
|
||||||
- количество LLM-вызовов
|
|
||||||
- thinker_ms
|
|
||||||
- json_compiler_ms
|
|
||||||
- memory_recall_ms
|
|
||||||
- critic_ms
|
|
||||||
- json_retry_count
|
|
||||||
- json_valid_after_first_try
|
|
||||||
- итоговая валидность ExecutionDirective
|
|
||||||
- parsing/validation errors
|
|
||||||
- route/action kind
|
|
||||||
- сработали ли permissions
|
|
||||||
- не ухудшилось ли поведение
|
|
||||||
|
|
||||||
ЭТАП 4. Критерии оценки
|
|
||||||
|
|
||||||
Профиль считается успешным только если:
|
|
||||||
|
|
||||||
1. JSON stability:
|
|
||||||
- ExecutionDirective валиден после pipeline.
|
|
||||||
- json_retry_count не вырос значительно относительно baseline.
|
|
||||||
- Нет случаев, где невалидный JSON дошёл до ExecutionEngine.
|
|
||||||
- Нет случаев, где reasoning-текст попал в JSON как мусор.
|
|
||||||
|
|
||||||
2. Safety:
|
|
||||||
- dangerous command не выполняется без разрешения.
|
|
||||||
- hard_stop/no_always/normal permissions не деградировали.
|
|
||||||
- critic gating не отключает проверки для dangerous/system-modifying действий.
|
|
||||||
- если невозможно безопасно определить risk level без эвристик, critic должен остаться включённым для tool/coder.
|
|
||||||
|
|
||||||
3. Latency:
|
|
||||||
- simple_chat/direct_answer стали быстрее минимум на 20–30%.
|
|
||||||
- memory_related не стал заметно хуже по качеству.
|
|
||||||
- total_task_ms и time_to_first_visible_response_ms уменьшились.
|
|
||||||
|
|
||||||
4. Quality:
|
|
||||||
- direct answers остаются связными.
|
|
||||||
- memory recall не добавляет мусорный контекст чаще baseline.
|
|
||||||
- coder_task не уходит в неправильный route.
|
|
||||||
- Menlo_Lucy не вызывает лавину retry/fallback.
|
|
||||||
|
|
||||||
5. Architecture:
|
|
||||||
- не добавлены большие if/else-цепочки.
|
|
||||||
- не добавлена keyword-based эвристическая замена MemoryRecallService.
|
|
||||||
- routing остаётся model/config-driven, а не ручным набором условий.
|
|
||||||
|
|
||||||
ЭТАП 5. Итоговый отчёт и результат
|
|
||||||
|
|
||||||
Создай файл MODEL_ROUTING_EXPERIMENT.md.
|
|
||||||
|
|
||||||
В отчёте должны быть разделы:
|
|
||||||
|
|
||||||
1. Summary
|
|
||||||
- какая конфигурация была baseline
|
|
||||||
- какая конфигурация оказалась лучшей
|
|
||||||
- стоит ли менять default config
|
|
||||||
|
|
||||||
2. Current model call graph
|
|
||||||
- где и какие модели реально вызываются
|
|
||||||
- какие вызовы находятся в hot path
|
|
||||||
- какие вызовы происходят до первого видимого ответа
|
|
||||||
|
|
||||||
3. Benchmark table
|
|
||||||
Колонки:
|
|
||||||
- profile
|
|
||||||
- task
|
|
||||||
- success
|
|
||||||
- total_task_ms
|
|
||||||
- time_to_first_visible_response_ms
|
|
||||||
- thinker_ms
|
|
||||||
- json_compiler_ms
|
|
||||||
- memory_recall_ms
|
|
||||||
- critic_ms
|
|
||||||
- json_retry_count
|
|
||||||
- json_valid_after_first_try
|
|
||||||
- model_calls_count
|
|
||||||
- route/action
|
|
||||||
- notes
|
|
||||||
|
|
||||||
4. Findings
|
|
||||||
- ускорил ли Menlo_Lucy JSON Compiler
|
|
||||||
- ухудшилась ли валидность JSON
|
|
||||||
- ускорил ли recall_sys_util
|
|
||||||
- сколько времени съедает critic
|
|
||||||
- помог ли critic gating без ухудшения safety
|
|
||||||
- где главный bottleneck
|
|
||||||
|
|
||||||
5. Recommendation
|
|
||||||
Дай конкретную рекомендацию:
|
|
||||||
- оставить baseline
|
|
||||||
- или переключить recall_model на sys_util
|
|
||||||
- или использовать Menlo_Lucy как JSON Compiler
|
|
||||||
- или не использовать Menlo_Lucy как JSON Compiler из-за ошибок
|
|
||||||
- или включить critic_mode=tool_and_coder_only
|
|
||||||
- или оставить critic всегда включённым
|
|
||||||
|
|
||||||
6. Safe patch plan
|
|
||||||
Если предлагаешь изменения — опиши минимальный патч:
|
|
||||||
- какие файлы менять
|
|
||||||
- какие config flags добавить
|
|
||||||
- какие тесты добавить/обновить
|
|
||||||
- как откатить
|
|
||||||
|
|
||||||
7. Explicitly rejected approaches
|
|
||||||
Укажи, что в этом эксперименте НЕ использовались:
|
|
||||||
- эвристический MemoryRecallService;
|
|
||||||
- keyword-based recall;
|
|
||||||
- большие ручные if/else цепочки;
|
|
||||||
- удаление JSON Compiler;
|
|
||||||
- отключение permissions ради скорости.
|
|
||||||
|
|
||||||
Финальный результат:
|
|
||||||
- Не ломать текущую работу.
|
|
||||||
- Все существующие тесты должны проходить.
|
|
||||||
- Новый benchmark script должен запускаться вручную.
|
|
||||||
- Итоговый отчёт должен быть понятен человеку и следующему AI-агенту.
|
|
||||||
249
FOR_AI_REVIEW.md
249
FOR_AI_REVIEW.md
|
|
@ -1,249 +0,0 @@
|
||||||
# DuckLM Runtime Architecture Review
|
|
||||||
|
|
||||||
## 🧠 1. System Overview
|
|
||||||
|
|
||||||
**What is runtime?**
|
|
||||||
Runtime is the execution substrate of the system — a multi-layered cognitive execution environment that orchestrates LLMs, tools, memory, and permissions into a unified agentic workflow. It's the `RuntimeController` that composes `RuntimeLoop`, `ExecutionEngine`, `ContextBuilder`, `AsyncRouter`, `PermissionService`, and `EventBus`.
|
|
||||||
|
|
||||||
**What is the core loop?**
|
|
||||||
The core loop is the `RuntimeLoop.run_task()` method: it receives a `UserTask`, applies permission hard-stop checks, creates task state, builds context via `ContextBuilder`, routes via `AsyncRouter` to get a `directive`, executes via `ExecutionEngine`, applies `Critic` evaluation, saves via `MemoryPolicy`, publishes `RuntimeEvent`s through `EventBus`, and returns streaming output.
|
|
||||||
|
|
||||||
**Models (Orchestrator / Coder / Critic / Utility)**
|
|
||||||
- **Orchestrator** (`OrchestratorAdapter`/`AsyncOrchestratorAdapter`): LLM that decides plan vs direct respond vs tool; generates `ExecutionDirective` of type `plan`, `tool`, `respond`, `fail`, etc.
|
|
||||||
- **Coder** (`CoderAdapter`/`AsyncCoderAdapter`): LLM specialized for code generation and manipulation.
|
|
||||||
- **Critic** (`CriticAdapter`/`AsyncCriticAdapter`): Evaluates tool outputs with JSON scoring (correctness, usefulness, safety, memory_store, weight).
|
|
||||||
- **Utility**: The `sys_util` orchestrator — a fallback/orchestration layer for system-level operations.
|
|
||||||
|
|
||||||
**What is "truth"? (Event Store / State Store)**
|
|
||||||
- **Event Store** (`SQLiteEventStore`): Immutable append-only log of `RuntimeEvent`s per task. Source of truth for "what happened."
|
|
||||||
- **State Store** (`SQLiteTaskStateStore`): Current mutable task state (status, last_directive, pending requests). "Current truth" of task progress.
|
|
||||||
- **Checkpoint Store** (`SQLiteCheckpointStore`): Snapshots of task state + context at milestones.
|
|
||||||
- **Memory Store** (`MemoryStore` + `VectorIndex`): Long-term knowledge base with weighted entries.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🔁 2. End-to-End Flow
|
|
||||||
|
|
||||||
### High-Level Flow (as seen in logs)
|
|
||||||
```
|
|
||||||
User Input
|
|
||||||
→ Router (AsyncRouter.decide)
|
|
||||||
→ Context Builder (ContextBuilder.build)
|
|
||||||
→ Orchestrator (decides plan vs direct)
|
|
||||||
→ Plan / Direct Action
|
|
||||||
→ Execution Engine
|
|
||||||
→ Tool Layer (ToolRegistry + ToolSandbox)
|
|
||||||
→ Critic (AsyncCriticAdapter)
|
|
||||||
→ Memory Policy (MemoryWritePolicy)
|
|
||||||
→ Event Bus (SQLiteEventStore)
|
|
||||||
→ Streaming Output (via WebSocket / SSE)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Conversation Flow
|
|
||||||
1. **Router** decides `plan` vs `respond` vs `tool` vs `fail` based on orchestrator output or intent parser.
|
|
||||||
2. **Context Builder** enriches task with memory context, tool context, execution context, and safety constraints.
|
|
||||||
3. **Orchestrator** (or direct respond) produces the initial `ExecutionDirective`.
|
|
||||||
4. **Execution Engine** schedules via `ExecutionScheduler`, then executes:
|
|
||||||
- `plan` → parse into `PlanStep`s, build task graph, execute ready steps
|
|
||||||
- `tool` → validate tool existence, check permissions, execute via `ToolRegistry`
|
|
||||||
- `respond` → direct response
|
|
||||||
- `fail` → immediate failure
|
|
||||||
5. **Tool Layer** (`ToolRegistry` + `ToolSandbox`):
|
|
||||||
- Plugin discovery via `ToolDiscovery`
|
|
||||||
- Manifest-based tool registration
|
|
||||||
- Sandboxed execution with timeout
|
|
||||||
6. **Critic** evaluates tool results (if enabled), outputs `CriticScore` JSON.
|
|
||||||
7. **Memory Policy** decides whether to insert `tool_result`, `critique`, `plan`, `fact`, `summary`, or `user_preference` into memory.
|
|
||||||
8. **Event Bus** (`SQLiteEventBus`) publishes `RuntimeEvent` with sequence ordering.
|
|
||||||
9. **Streaming Output** replays events via WebSocket and sends incremental responses.
|
|
||||||
|
|
||||||
### Failure Flow
|
|
||||||
- **Invalid JSON flow**: `ExecutionScheduler.parse_plan_steps` catches `JSONDecodeError` / `ValueError` / `TypeError`, logs warning, returns empty steps → `plan` fails with "Failed to parse plan steps."
|
|
||||||
- **Tool failure flow**: Tool execution returns `{"status": "failed", "result": {"error": "..."}}` → ExecutionEngine returns failed status → task state updated → event `TASK_FAILED` published → stops further plan steps.
|
|
||||||
- **Critic failure flow**: `_evaluate_with_critic` catches exception, logs warning, publishes `CRITIC_RESULT` with error → critic_score is `None` → execution continues without critique.
|
|
||||||
- **Orchestrator fallback flow**: If primary orchestrator fails or missing, `AsyncRouter` has `sys_util` fallback (utility orchestrator) for system-level decisions.
|
|
||||||
- **Permission denial flow**: `PermissionService.check_shell_command` / `check_write_path` returns `decision: "hard_stop"` or `decision: "deny"` → immediate failure with blocked reason; if `decision: "prompt"` → `TASK_AWAITING_PERMISSION` state.
|
|
||||||
|
|
||||||
### Repair Flow (JSON / Tool-call)
|
|
||||||
- Repair is triggered via `resolve_permission` or `resolve_secret` endpoints.
|
|
||||||
- Permission repair: user provides `decision` ("allow_once"/"allow_always"/"deny"/"ask_always") → `PermissionService.resolve_permission` → updates state → retries original directive.
|
|
||||||
- Secret repair: user provides secret string → `ExecutionEngine.execute` with `secret_override` → continues execution.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## ⚙️ 3. Component Breakdown
|
|
||||||
|
|
||||||
### `runtime_loop` (`RuntimeLoop`)
|
|
||||||
- **Responsibility**: Central task coordination; state management; event publishing.
|
|
||||||
- **Input**: `UserTask`
|
|
||||||
- **Output**: `{"task_id", "status", "directive", "result", "events"}`
|
|
||||||
- **Must NOT do**: Direct LLM calls (delegates to router/execution_engine); bypass state store.
|
|
||||||
|
|
||||||
### `execution_engine` (`ExecutionEngine`)
|
|
||||||
- **Responsibility**: Execute directives (plan/tool/respond/fail); integrate critic; interface with tool registry.
|
|
||||||
- **Input**: `UserTask`, `ExecutionDirective`, optional `permission_override`, `secret_override`
|
|
||||||
- **Output**: `{"status", "result", "step_results"}`
|
|
||||||
- **Must NOT do**: Bypass permission checks; skip critic evaluation when enabled; leak secrets in logs.
|
|
||||||
|
|
||||||
### `scheduler` (`ExecutionScheduler`)
|
|
||||||
- **Responsibility**: Parse plan JSON, build task dependency graph, yield ready steps, detect cycles.
|
|
||||||
- **Input**: JSON plan string, `task_id`
|
|
||||||
- **Output**: `list[PlanStep]`
|
|
||||||
- **Must NOT do**: Execute anything; modify task state directly.
|
|
||||||
|
|
||||||
### `tool_registry` (`ToolRegistry`)
|
|
||||||
- **Responsibility**: Register/manifest tools; execute via `ToolSandbox`; provide schema metadata.
|
|
||||||
- **Input**: tool name, args dict
|
|
||||||
- **Output**: `ToolResult`
|
|
||||||
- **Must NOT do**: Bypass sandbox; execute privileged host commands without sandbox.
|
|
||||||
|
|
||||||
### `event_bus` (`EventBus` → `SQLiteEventStore`)
|
|
||||||
- **Responsibility**: Append-only event persistence; sequence numbering; per-task query.
|
|
||||||
- **Input**: `RuntimeEvent`
|
|
||||||
- **Output**: event stream
|
|
||||||
- **Must NOT do**: Modify state store directly (state is separate); delete or mutate events.
|
|
||||||
|
|
||||||
### `memory` (`MemoryInterface` → `MemoryStore` + `VectorIndex`)
|
|
||||||
- **Responsibility**: Store/retrieve weighted memory entries; vector similarity search; integrate with context builder.
|
|
||||||
- **Input**: text, kind, source, weight, metadata
|
|
||||||
- **Output**: search results or insertion confirmation
|
|
||||||
- **Must NOT do**: Expose raw embeddings without access control; store secrets.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🧩 4. Data Contracts
|
|
||||||
|
|
||||||
### `PlanStep`
|
|
||||||
```python
|
|
||||||
id: str
|
|
||||||
kind: Literal["tool", "coder", "memory", "respond"]
|
|
||||||
tool: str | None
|
|
||||||
args: dict[str, Any]
|
|
||||||
description: str
|
|
||||||
requires_confirmation: bool
|
|
||||||
depends_on: list[str]
|
|
||||||
```
|
|
||||||
**Real example** (from `router` prompt engineering):
|
|
||||||
`{"id":"step-0","kind":"tool","tool":"shell_exec","args":{"command":"ls -la"},"description":"List directory","requires_confirmation":false,"depends_on":[]}`
|
|
||||||
|
|
||||||
### `ToolCall`
|
|
||||||
```python
|
|
||||||
tool: str
|
|
||||||
args: dict[str, Any]
|
|
||||||
task_id: str
|
|
||||||
step_id: str
|
|
||||||
```
|
|
||||||
**Real log**: `TOOL_CALLED` event with `{"tool":"shell_exec","args":{"command":"pwd"},"task_id":"xyz","step_id":"step-0"}`
|
|
||||||
|
|
||||||
### `ToolResult`
|
|
||||||
```python
|
|
||||||
tool: str
|
|
||||||
ok: bool
|
|
||||||
output: Any
|
|
||||||
error: str | None
|
|
||||||
metadata: dict[str, Any]
|
|
||||||
```
|
|
||||||
**Real output**: `{"tool":"shell_exec","ok":true,"output":"/app","error":null,"metadata":{}}`
|
|
||||||
|
|
||||||
### `RuntimeEvent`
|
|
||||||
```python
|
|
||||||
event_id: str
|
|
||||||
task_id: str
|
|
||||||
session_id: str
|
|
||||||
sequence: int
|
|
||||||
type: str # e.g. TASK_RECEIVED, TOOL_CALLED, TASK_COMPLETED
|
|
||||||
payload: dict[str, Any]
|
|
||||||
causation_id: str | None
|
|
||||||
correlation_id: str
|
|
||||||
```
|
|
||||||
**Real event stream**: `TASK_RECEIVED → CONTEXT_BUILT → PLAN_STARTED → TOOL_CALLED → TOOL_COMPLETED → TASK_COMPLETED`
|
|
||||||
|
|
||||||
### `MemoryEntry`
|
|
||||||
```python
|
|
||||||
id: str
|
|
||||||
text: str
|
|
||||||
kind: Literal["tool_result","plan","critique","fact","summary","user_preference"]
|
|
||||||
source: Literal["tool","critic","user","system"]
|
|
||||||
weight: float
|
|
||||||
task_id: str | None
|
|
||||||
session_id: str | None
|
|
||||||
metadata: dict[str, Any]
|
|
||||||
embedding_model: str
|
|
||||||
embedding_dim: int
|
|
||||||
```
|
|
||||||
**Real insertion**: After critic evaluation, `kind="critique"`, `source="critic"`, `weight=0.85`, metadata includes scores.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🔥 5. Failure Modes
|
|
||||||
|
|
||||||
### Invalid JSON Flow
|
|
||||||
- **Trigger**: Malformed plan JSON (e.g., missing braces, non-JSON string).
|
|
||||||
- **Detection**: `parse_plan_steps` catches `JSONDecodeError` / `ValueError` / `TypeError`.
|
|
||||||
- **Result**: Warning logged, empty steps returned → `PLAN_FAILED` with `"Failed to parse plan steps from directive"`.
|
|
||||||
|
|
||||||
### Tool Failure Flow
|
|
||||||
- **Trigger**: Tool returns `ok=False` or raises exception in sandbox.
|
|
||||||
- **Detection**: `_execute_tool` checks `tool_result.ok`.
|
|
||||||
- **Result**: Status `"failed"`, result contains `{"error": "...", "failed_step": step.id, "step_results": [...]}` → `TASK_FAILED` event; further plan steps skipped.
|
|
||||||
|
|
||||||
### Critic Failure Flow
|
|
||||||
- **Trigger**: Critic adapter raises exception or returns non-JSON output.
|
|
||||||
- **Detection**: `_evaluate_with_critic` catches exception, logs warning.
|
|
||||||
- **Result**: Event `CRITIC_RESULT` with error payload → `critic_score = None` → execution continues without critique; memory write skipped.
|
|
||||||
|
|
||||||
### Orchestrator Fallback Flow
|
|
||||||
- **Trigger**: Primary orchestrator model unavailable or returns invalid directive.
|
|
||||||
- **Detection**: `_ensure_orchestrator` returns `None`; router falls back to `sys_util` orchestrator.
|
|
||||||
- **Result**: Utility orchestrator handles system-level decisions (e.g., file operations, environment queries).
|
|
||||||
|
|
||||||
### Permission Denial Flow
|
|
||||||
- **Trigger**: `PermissionService` returns `decision: "hard_stop"` or `"deny"`.
|
|
||||||
- **Detection**: `_execute_tool` checks `permission_result`.
|
|
||||||
- **Result**: Immediate failure with `"Command blocked: ..."` → `TASK_FAILED`; no tool execution.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🧠 6. "Decision Logic Map"
|
|
||||||
|
|
||||||
### Orchestrator vs Direct Respond
|
|
||||||
- **Use orchestrator** when: task requires planning, multi-step tool usage, or unknown intent. Orchestrator decides to emit `plan` or `tool` directive.
|
|
||||||
- **Direct respond** when: intent parser classifies as simple query (`TASK_RECEIVED` → `router.intent_parser` → `respond` directive) or `respond` directive explicitly set.
|
|
||||||
|
|
||||||
### Utility Model Call
|
|
||||||
- Invoked when `sys_util` orchestrator is loaded (configurable). Used for system-level operations: environment inspection, file system queries, or when primary orchestrator fails and fallback is needed.
|
|
||||||
|
|
||||||
### Retry Logic
|
|
||||||
- **Planner retry**: `ExecutionScheduler` has `retry_limit=2`; on parse/validation failure, retries up to limit before failing plan.
|
|
||||||
- **Tool retry**: Not implemented natively; retry must be encoded in plan steps (`depends_on`, manual replan).
|
|
||||||
|
|
||||||
### Plan Creation
|
|
||||||
- **Trigger**: Orchestrator output contains `{type: "plan", ...}` or explicit `plan` directive.
|
|
||||||
- **Process**: `parse_plan_steps` → `validate_no_cycles` → `build_task_graph` → ready steps execution.
|
|
||||||
- **No plan**: Orchestrator outputs `respond` or `tool` → direct execution.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## 🧰 7. Tool System Architecture
|
|
||||||
|
|
||||||
### Plugin Discovery
|
|
||||||
- `ToolDiscovery` scans `app/tools/plugins/` for modules exporting `Tool` classes.
|
|
||||||
- Discovers: `shell_exec`, `file_read`, `file_write`, `memory` (search/insert/list).
|
|
||||||
|
|
||||||
### Manifest-Based Tools
|
|
||||||
- Each plugin has a `manifest.json` with:
|
|
||||||
- `description`: human-readable docstring.
|
|
||||||
- `args_schema`: JSON schema for validation.
|
|
||||||
- `requires_permission`: boolean for privileged tools (`shell_exec`, `file_write`).
|
|
||||||
- On discovery, registry registers tool and stores schema for permission/routing.
|
|
||||||
|
|
||||||
### Registry Bootstrap
|
|
||||||
- `RuntimeController._create_tool_registry()` initializes discovery, loads plugins, registers with init mapping (sandbox, permissions).
|
|
||||||
- Tools are initialized once at startup; `tool_registry` is shared across executions.
|
|
||||||
|
|
||||||
### Execution Isolation
|
|
||||||
- **ToolSandbox** (`ToolSandbox`):
|
|
||||||
- Restricts filesystem to `allowed_root` (project base dir).
|
|
||||||
- Timeout per execution (`step_timeout_ms`).
|
|
||||||
- Blocks `sudo` without secret override; requires secret injection for sudo commands.
|
|
||||||
- **Permission gating**: `shell_exec` and `file_write` require explicit permission decision before execution.
|
|
||||||
|
|
@ -1,534 +0,0 @@
|
||||||
# IMPLEMENTATION PLAN
|
|
||||||
|
|
||||||
Этот документ описывает рекомендуемый порядок реализации `ducklm` от пустого репозитория до рабочего локального runtime с тестовым веб-чатом.
|
|
||||||
|
|
||||||
План опирается на [`TASK_3.md`](/home/mirivlad/git/ducklm/TASK_3.md) и [`ARCHITECTURE.md`](/home/mirivlad/git/ducklm/ARCHITECTURE.md).
|
|
||||||
|
|
||||||
## 1. Goal
|
|
||||||
|
|
||||||
Собрать систему по этапам так, чтобы после каждого этапа оставался рабочий, проверяемый инкремент, а не набор недоделанных слоёв.
|
|
||||||
|
|
||||||
Главный принцип:
|
|
||||||
|
|
||||||
- сначала каркас и контракты
|
|
||||||
- потом runtime core
|
|
||||||
- потом execution path
|
|
||||||
- потом memory / critic / recovery
|
|
||||||
- потом удобные интерфейсы проверки
|
|
||||||
|
|
||||||
## 2. Milestones Overview
|
|
||||||
|
|
||||||
1. Project skeleton and typed contracts
|
|
||||||
2. Config system and dependency wiring
|
|
||||||
3. Runtime loop skeleton
|
|
||||||
4. Event bus and event store
|
|
||||||
5. State persistence and checkpointing
|
|
||||||
6. Context builder and orchestrator adapter
|
|
||||||
7. Router and directive flow
|
|
||||||
8. Execution engine and task graph
|
|
||||||
9. Permission system and tool sandbox
|
|
||||||
10. MVP tools
|
|
||||||
11. FastAPI API and health surface
|
|
||||||
12. Web chat test client
|
|
||||||
13. Coder integration
|
|
||||||
14. Critic integration
|
|
||||||
15. Memory system
|
|
||||||
16. Memory write policy
|
|
||||||
17. Retry, recovery, replay
|
|
||||||
18. CLI and operator utilities
|
|
||||||
19. Hardening and tests
|
|
||||||
|
|
||||||
## 3. Detailed Stages
|
|
||||||
|
|
||||||
### Stage 1. Project Skeleton and Typed Contracts
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- создать структуру директорий
|
|
||||||
- завести базовые модели данных
|
|
||||||
- убрать двусмысленность интерфейсов между слоями
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- создать `app/`, `config/`, `data/`, `tests/`
|
|
||||||
- добавить core contracts:
|
|
||||||
- `UserTask`
|
|
||||||
- `PlanStep`
|
|
||||||
- `ToolCall`
|
|
||||||
- `ToolResult`
|
|
||||||
- `CriticScore`
|
|
||||||
- `RuntimeEvent`
|
|
||||||
- `TaskCheckpoint`
|
|
||||||
- `ExecutionDirective`
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- проект компилируется
|
|
||||||
- типы и схемы являются source of truth для остальных модулей
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- unit tests на валидацию схем
|
|
||||||
|
|
||||||
### Stage 2. Config System and Dependency Wiring
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- вынести runtime behavior в конфиги
|
|
||||||
- зафиксировать единый способ загрузки настроек
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- `config/models.json`
|
|
||||||
- `config/prompts.json`
|
|
||||||
- `config/permissions.json`
|
|
||||||
- `config/runtime.json`
|
|
||||||
- loader и typed config models
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- runtime можно запускать с консистентной конфигурацией
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- config load smoke test
|
|
||||||
|
|
||||||
### Stage 3. Runtime Loop Skeleton
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- создать heart of system без полной бизнес-логики
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- `runtime_loop.py`
|
|
||||||
- `runtime_controller.py`
|
|
||||||
- минимальный lifecycle:
|
|
||||||
- receive task
|
|
||||||
- create state
|
|
||||||
- build empty context
|
|
||||||
- emit initial event
|
|
||||||
- return placeholder directive/result
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- есть центральный control loop
|
|
||||||
- остальные слои начинают подстраиваться под него, а не наоборот
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- smoke test на прохождение задачи через loop skeleton
|
|
||||||
|
|
||||||
### Stage 4. Event Bus and Event Store
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- создать внутреннюю event backbone
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- `event_bus.py`
|
|
||||||
- `event_types.py`
|
|
||||||
- `event_store.py`
|
|
||||||
- monotonic sequence per task
|
|
||||||
- append-only storage
|
|
||||||
- базовый replay reader
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- у каждой задачи есть воспроизводимая хронология
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- event ordering tests
|
|
||||||
- dedup/idempotency tests
|
|
||||||
|
|
||||||
### Stage 5. State Persistence and Checkpointing
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- убрать зависимость task lifecycle от памяти процесса
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- `task_state_store.py`
|
|
||||||
- `checkpoint_store.py`
|
|
||||||
- SQLite backend
|
|
||||||
- checkpoint after critical transitions
|
|
||||||
- resume loading primitives
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- runtime готов к recovery после падения
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- save/load checkpoint tests
|
|
||||||
|
|
||||||
### Stage 6. Context Builder and Orchestrator Adapter
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- зафиксировать правильный вход в reasoning path
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- `context_builder.py`
|
|
||||||
- token-budget-aware assembly
|
|
||||||
- orchestrator adapter abstraction
|
|
||||||
- planning mode / orchestration mode interfaces
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- все будущие вызовы reasoning model идут через один нормализованный путь
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- tests на context assembly priorities
|
|
||||||
|
|
||||||
### Stage 7. Router and Directive Flow
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- зафиксировать router как pure decision layer
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- `router.py`
|
|
||||||
- `state + context -> ExecutionDirective`
|
|
||||||
- no side effects
|
|
||||||
- routing rules for:
|
|
||||||
- retrieval needed
|
|
||||||
- planning needed
|
|
||||||
- permission needed
|
|
||||||
- critic needed
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- runtime loop применяет решения, а не изобретает их сам
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- unit tests на routing decisions
|
|
||||||
|
|
||||||
### Stage 8. Execution Engine and Task Graph
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- получить управляемое исполнение шагов, а не “вызовы по месту”
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- `execution_engine.py`
|
|
||||||
- `execution_scheduler.py`
|
|
||||||
- task graph validation
|
|
||||||
- sequential DAG scheduler
|
|
||||||
- adapters for tool/coder execution
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- runtime может исполнять direct action и multi-step plans
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- task graph validation tests
|
|
||||||
- step ordering tests
|
|
||||||
|
|
||||||
### Stage 9. Permission System and Tool Sandbox
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- не дать runtime выполнять опасные действия напрямую
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- permission rules
|
|
||||||
- persistent approval store
|
|
||||||
- shell safety classifier
|
|
||||||
- sandbox execution adapter
|
|
||||||
- timeout/resource/path restrictions
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- опасные команды требуют policy decision до запуска
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- permission flow tests
|
|
||||||
- sandbox boundary smoke tests
|
|
||||||
|
|
||||||
### Stage 10. MVP Tools
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- сделать минимально полезный execution path
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- `shell_exec`
|
|
||||||
- `file_read`
|
|
||||||
- `file_write`
|
|
||||||
- unified tool registry
|
|
||||||
- unified `ToolResult`
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- runtime уже может выполнять реальные локальные задачи
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- integration tests для трёх базовых tools
|
|
||||||
|
|
||||||
### Stage 11. FastAPI API and Health Surface
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- открыть runtime наружу через стабильный backend interface
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- `POST /chat`
|
|
||||||
- `WS /stream`
|
|
||||||
- `GET /health`
|
|
||||||
- базовый request/response models
|
|
||||||
- error handling
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- систему уже можно дергать из внешнего клиента
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- API smoke tests
|
|
||||||
|
|
||||||
### Stage 12. Web Chat Test Client
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- получить быстрый способ руками проверить поведение всей системы через браузер
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- минимальный локальный веб-чат
|
|
||||||
- простую страницу с:
|
|
||||||
- вводом задачи
|
|
||||||
- окном сообщений
|
|
||||||
- панелью streaming events
|
|
||||||
- индикацией permission requests
|
|
||||||
- отображением final result
|
|
||||||
- подключение к `POST /chat` и `WS /stream`
|
|
||||||
|
|
||||||
Требования:
|
|
||||||
|
|
||||||
- это не production UI
|
|
||||||
- это не отдельный продуктовый frontend
|
|
||||||
- это thin test client для ручной проверки runtime
|
|
||||||
|
|
||||||
Лучше всего разместить как:
|
|
||||||
|
|
||||||
- `app/api/static/` или отдельный `web/` модуль с минимальным стеком
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- можно открыть браузер и увидеть, как runtime планирует, исполняет шаги и стримит события
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- ручной e2e smoke test через браузер
|
|
||||||
|
|
||||||
### Stage 13. Coder Integration
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- подключить отдельную coding model без смешивания ролей
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- `core/coder.py`
|
|
||||||
- `generate_code`
|
|
||||||
- `fix_code`
|
|
||||||
- `refactor_code`
|
|
||||||
- structured coder result
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- runtime может делегировать кодогенерацию специализированной модели
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- tests на coder request/response flow
|
|
||||||
|
|
||||||
### Stage 14. Critic Integration
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- получить formal evaluation layer после tools/coder
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- critic adapter
|
|
||||||
- `CriticScore`
|
|
||||||
- fallback policy when critic unavailable
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- результаты можно оценивать единообразно
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- critic scoring contract tests
|
|
||||||
|
|
||||||
### Stage 15. Memory System
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- добавить долговременную retrieval memory
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- SQLite metadata store
|
|
||||||
- FAISS/hnswlib vector index
|
|
||||||
- insert/search/delete/reindex
|
|
||||||
- embedding versioning
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- runtime получает semantic retrieval вместо контекста “только текущая задача”
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- memory insert/search tests
|
|
||||||
|
|
||||||
### Stage 16. Memory Write Policy
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- не допустить хаотичной записи всего подряд
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- deterministic write policy
|
|
||||||
- threshold model
|
|
||||||
- dedup / merge rules
|
|
||||||
- conflict handling
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- память пополняется контролируемо, а не по одному score cutoff
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- memory policy decision tests
|
|
||||||
|
|
||||||
### Stage 17. Retry, Recovery, Replay
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- довести runtime до устойчивого long-running поведения
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- planner retry
|
|
||||||
- tool retry for allowed cases
|
|
||||||
- partial failure recovery
|
|
||||||
- replay path from event store
|
|
||||||
- resume from checkpoint
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- система может переживать ошибки без полной потери исполнения
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- recovery smoke tests
|
|
||||||
- replay tests
|
|
||||||
|
|
||||||
### Stage 18. CLI and Operator Utilities
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- дать локальный интерфейс помимо API/веб-чата
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- send task
|
|
||||||
- show result
|
|
||||||
- follow events
|
|
||||||
- memory search
|
|
||||||
- replay task history
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- разработчик может проверять runtime без браузера
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- CLI smoke tests
|
|
||||||
|
|
||||||
### Stage 19. Hardening and Tests
|
|
||||||
|
|
||||||
Цель:
|
|
||||||
|
|
||||||
- довести проект до инженерно приемлемого состояния
|
|
||||||
|
|
||||||
Сделать:
|
|
||||||
|
|
||||||
- structured logging refinement
|
|
||||||
- failure-path tests
|
|
||||||
- concurrency edge cases
|
|
||||||
- docs refresh
|
|
||||||
- cleanup of temporary stubs
|
|
||||||
|
|
||||||
Результат этапа:
|
|
||||||
|
|
||||||
- проект становится пригодным для реальной итеративной разработки
|
|
||||||
|
|
||||||
Проверка:
|
|
||||||
|
|
||||||
- full critical-path smoke suite
|
|
||||||
|
|
||||||
## 4. Recommended First Working Demo
|
|
||||||
|
|
||||||
Первый нормальный demo checkpoint должен быть на этапе `Stage 12`.
|
|
||||||
|
|
||||||
Что должно работать к этому моменту:
|
|
||||||
|
|
||||||
- браузерный веб-чат открывается локально
|
|
||||||
- пользователь отправляет задачу
|
|
||||||
- runtime принимает task
|
|
||||||
- событие начала работы видно в UI
|
|
||||||
- если нужен plan, это видно в events panel
|
|
||||||
- tool execution видно в events panel
|
|
||||||
- final response возвращается в чат
|
|
||||||
|
|
||||||
На этом этапе memory, critic и recovery ещё могут быть частично stubbed, но:
|
|
||||||
|
|
||||||
- runtime loop
|
|
||||||
- event bus
|
|
||||||
- state persistence
|
|
||||||
- router
|
|
||||||
- execution engine
|
|
||||||
- permissions
|
|
||||||
- базовые tools
|
|
||||||
- API
|
|
||||||
- web chat
|
|
||||||
|
|
||||||
должны быть уже реальными.
|
|
||||||
|
|
||||||
## 5. Order Rationale
|
|
||||||
|
|
||||||
Почему веб-чат не в самом конце:
|
|
||||||
|
|
||||||
- он нужен как live inspection surface для runtime
|
|
||||||
- через него проще проверять streaming, permissions и event ordering
|
|
||||||
- он быстрее выявляет архитектурные проблемы, чем голые unit tests
|
|
||||||
|
|
||||||
Но веб-чат ставится только после:
|
|
||||||
|
|
||||||
- runtime core
|
|
||||||
- event bus
|
|
||||||
- persistence
|
|
||||||
- basic execution path
|
|
||||||
- API
|
|
||||||
|
|
||||||
Иначе он станет красивой оболочкой над несуществующей системой.
|
|
||||||
|
|
@ -1,83 +0,0 @@
|
||||||
# MVP CHECKLIST
|
|
||||||
|
|
||||||
Этот чеклист фиксирует минимальный рабочий объём для первого демонстрационного запуска `ducklm`.
|
|
||||||
|
|
||||||
## 1. Core Runtime
|
|
||||||
|
|
||||||
- [x] Есть модульная структура проекта `app/`, `config/`, `data/`, `tests/`
|
|
||||||
- [x] Есть typed contracts для core entities
|
|
||||||
- [x] Есть `Runtime Loop Controller`
|
|
||||||
- [x] Runtime loop умеет принять задачу и создать task state
|
|
||||||
- [x] Runtime loop публикует стартовые и финальные события
|
|
||||||
|
|
||||||
## 2. Events and State
|
|
||||||
|
|
||||||
- [x] Есть `EventBus`
|
|
||||||
- [x] Есть `EventStore`
|
|
||||||
- [x] События имеют `task_id + sequence`
|
|
||||||
- [x] Есть `TaskStateStore`
|
|
||||||
- [x] Есть `CheckpointStore`
|
|
||||||
- [x] Есть сохранение checkpoint после critical transitions
|
|
||||||
- [ ] Есть базовый resume path
|
|
||||||
|
|
||||||
## 3. Decision and Execution
|
|
||||||
|
|
||||||
- [x] Есть `ContextBuilder`
|
|
||||||
- [x] Есть `Router` как pure decision layer
|
|
||||||
- [x] Есть `ExecutionDirective`
|
|
||||||
- [x] Есть `ExecutionEngine`
|
|
||||||
- [x] Есть `ExecutionScheduler`
|
|
||||||
- [ ] План валидируется и преобразуется в task graph
|
|
||||||
|
|
||||||
## 4. Tools and Safety
|
|
||||||
|
|
||||||
- [x] Есть `PermissionService`
|
|
||||||
- [x] Есть persistent store для user approvals
|
|
||||||
- [x] Есть `ToolSandbox`
|
|
||||||
- [x] Есть `ToolRegistry`
|
|
||||||
- [x] Работает `shell_exec`
|
|
||||||
- [x] Работает `file_read`
|
|
||||||
- [x] Работает `file_write`
|
|
||||||
|
|
||||||
## 5. Models and Evaluation
|
|
||||||
|
|
||||||
- [ ] Есть orchestrator adapter
|
|
||||||
- [ ] Есть planning mode interface
|
|
||||||
- [ ] Есть coder adapter
|
|
||||||
- [ ] Есть critic adapter
|
|
||||||
- [ ] Есть fallback policy при critic failure
|
|
||||||
|
|
||||||
## 6. Memory
|
|
||||||
|
|
||||||
- [ ] Есть SQLite metadata store
|
|
||||||
- [ ] Есть vector index adapter
|
|
||||||
- [ ] Работает memory insert/search
|
|
||||||
- [ ] Есть `MemoryWritePolicy`
|
|
||||||
- [ ] Запись в память не зависит только от critic score
|
|
||||||
|
|
||||||
## 7. Interfaces
|
|
||||||
|
|
||||||
- [x] Есть `POST /chat`
|
|
||||||
- [ ] Есть `WS /stream`
|
|
||||||
- [x] Есть `GET /health`
|
|
||||||
- [x] Есть локальный веб-чат для ручной проверки runtime
|
|
||||||
- [ ] Есть CLI для отправки задач и просмотра событий
|
|
||||||
|
|
||||||
## 8. Reliability
|
|
||||||
|
|
||||||
- [ ] Есть structured logging
|
|
||||||
- [ ] Есть retry/recovery policy skeleton
|
|
||||||
- [ ] Есть replay path from event store
|
|
||||||
- [ ] Есть critical-path smoke tests
|
|
||||||
|
|
||||||
## 9. Demo Definition
|
|
||||||
|
|
||||||
MVP считается достигнутым, если:
|
|
||||||
|
|
||||||
- [ ] можно открыть локальный веб-чат в браузере
|
|
||||||
- [ ] можно отправить задачу
|
|
||||||
- [ ] видно streaming events
|
|
||||||
- [ ] видно планирование или direct action
|
|
||||||
- [ ] видно выполнение tool step
|
|
||||||
- [ ] опасная команда требует подтверждения
|
|
||||||
- [ ] финальный ответ возвращается пользователю
|
|
||||||
|
|
@ -0,0 +1,51 @@
|
||||||
|
duck-up:
|
||||||
|
docker compose -f docker-compose.memory.yml up -d
|
||||||
|
@echo "Memory services started."
|
||||||
|
@echo "Start llama-server:"
|
||||||
|
@echo "bash scripts/llama/start_main.sh start"
|
||||||
|
|
||||||
|
duck-llama-main:
|
||||||
|
bash scripts/llama/start_main.sh start
|
||||||
|
|
||||||
|
duck-llama-stop:
|
||||||
|
bash scripts/llama/start_main.sh stop
|
||||||
|
|
||||||
|
duck-llama-restart:
|
||||||
|
bash scripts/llama/start_main.sh restart
|
||||||
|
|
||||||
|
duck-llama-status:
|
||||||
|
bash scripts/llama/start_main.sh status
|
||||||
|
|
||||||
|
duck-llama-logs:
|
||||||
|
bash scripts/llama/start_main.sh logs --follow
|
||||||
|
|
||||||
|
duck-llama-health:
|
||||||
|
bash scripts/llama/healthcheck.sh http://127.0.0.1:8081/v1
|
||||||
|
|
||||||
|
duck-api:
|
||||||
|
python3 -m duck_core.api
|
||||||
|
|
||||||
|
duck-dev:
|
||||||
|
docker compose -f docker-compose.memory.yml up -d
|
||||||
|
@echo "Start llama-server in another terminal:"
|
||||||
|
@echo "bash scripts/llama/start_main.sh start"
|
||||||
|
@echo "Then run:"
|
||||||
|
@echo "make duck-api"
|
||||||
|
@echo "Open:"
|
||||||
|
@echo "http://127.0.0.1:8000/"
|
||||||
|
|
||||||
|
duck-open:
|
||||||
|
@echo "Open web UI:"
|
||||||
|
@echo "http://127.0.0.1:8000/"
|
||||||
|
|
||||||
|
duck-smoke:
|
||||||
|
python3 -m pytest tests/smoke -v
|
||||||
|
|
||||||
|
duck-test:
|
||||||
|
python3 -m pytest -v
|
||||||
|
|
||||||
|
duck-verify:
|
||||||
|
bash scripts/verify/verify_basic_chat.sh
|
||||||
|
bash scripts/verify/verify_file_write_read.sh
|
||||||
|
bash scripts/verify/verify_tool_blocking.sh
|
||||||
|
bash scripts/verify/verify_models_roles.sh
|
||||||
|
|
@ -0,0 +1,22 @@
|
||||||
|
# DuckLM
|
||||||
|
|
||||||
|
DuckLM is a local agent runtime over local language models. It exposes a WebChat and HTTP API, calls `llama-server` through an OpenAI-compatible API, and persists tasks/events in SQLite.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 -m venv .venv
|
||||||
|
. .venv/bin/activate
|
||||||
|
python -m pip install -e ".[dev]"
|
||||||
|
cp .env.example .env
|
||||||
|
bash scripts/llama/start_main.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
In another terminal:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
. .venv/bin/activate
|
||||||
|
python -m duck_core.api
|
||||||
|
```
|
||||||
|
|
||||||
|
Open `http://127.0.0.1:8000/`.
|
||||||
|
|
@ -1,2 +0,0 @@
|
||||||
"""ducklm application package."""
|
|
||||||
|
|
||||||
|
|
@ -1,2 +0,0 @@
|
||||||
"""API layer."""
|
|
||||||
|
|
||||||
|
|
@ -1,170 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import asyncio
|
|
||||||
from contextlib import asynccontextmanager
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
|
|
||||||
from fastapi.responses import FileResponse
|
|
||||||
from pydantic import BaseModel
|
|
||||||
|
|
||||||
|
|
||||||
class CriticFeedbackRequest(BaseModel):
|
|
||||||
feedback: str
|
|
||||||
task_id: str | None = None
|
|
||||||
session_id: str | None = None
|
|
||||||
feedback_type: str | None = None
|
|
||||||
severity: str | None = None
|
|
||||||
correction: str | None = None
|
|
||||||
remember: bool = True
|
|
||||||
retry: bool = False
|
|
||||||
assistant_answer: str | None = None
|
|
||||||
correctness_override: float | None = None
|
|
||||||
usefulness_override: float | None = None
|
|
||||||
safety_override: float | None = None
|
|
||||||
|
|
||||||
from app.core.permission_resolution import PermissionResolutionRequest, SecretResolutionRequest, PasswordResolutionRequest, ReviewResolutionRequest
|
|
||||||
from app.core.contracts import UserTask
|
|
||||||
from app.runtime.runtime_controller import RuntimeController
|
|
||||||
from app.streaming.manager import StreamingManager
|
|
||||||
|
|
||||||
|
|
||||||
@asynccontextmanager
|
|
||||||
async def lifespan(app: FastAPI):
|
|
||||||
"""Load models on startup."""
|
|
||||||
print("Lifespan: Starting model loading...")
|
|
||||||
try:
|
|
||||||
print("Lifespan: Loading models...")
|
|
||||||
runtime.load_models_at_startup()
|
|
||||||
print("Lifespan: Models loaded")
|
|
||||||
|
|
||||||
# Rebuild vector index if empty but memory store has data.
|
|
||||||
if runtime._memory_interface:
|
|
||||||
store_count = runtime._memory_interface.count()
|
|
||||||
if store_count > 0:
|
|
||||||
idx_count = runtime._memory_interface._vector_index.element_count
|
|
||||||
if idx_count == 0:
|
|
||||||
print(f"Lifespan: Rebuilding vector index ({store_count} entries)...")
|
|
||||||
runtime._memory_interface.reindex()
|
|
||||||
print("Lifespan: Vector index rebuilt")
|
|
||||||
except Exception as e:
|
|
||||||
print(f"Lifespan: Failed to load models: {e}")
|
|
||||||
import traceback
|
|
||||||
traceback.print_exc()
|
|
||||||
|
|
||||||
yield # Server runs here
|
|
||||||
|
|
||||||
print("Lifespan: Shutting down...")
|
|
||||||
|
|
||||||
|
|
||||||
app = FastAPI(title="ducklm", lifespan=lifespan)
|
|
||||||
runtime = RuntimeController(base_dir=Path(__file__).resolve().parents[2])
|
|
||||||
streaming = StreamingManager(runtime.event_bus)
|
|
||||||
|
|
||||||
|
|
||||||
@app.get("/")
|
|
||||||
def index() -> FileResponse:
|
|
||||||
return FileResponse(Path(__file__).resolve().parent / "static" / "index.html")
|
|
||||||
|
|
||||||
|
|
||||||
@app.get("/health")
|
|
||||||
def health() -> dict[str, str]:
|
|
||||||
return {"status": "ok"}
|
|
||||||
|
|
||||||
|
|
||||||
@app.get("/events")
|
|
||||||
def list_events(limit: int = 500) -> dict[str, object]:
|
|
||||||
safe_limit = max(1, min(limit, 2000))
|
|
||||||
return {
|
|
||||||
"events": [
|
|
||||||
event.model_dump(mode="json")
|
|
||||||
for event in runtime.event_bus.list_recent(limit=safe_limit)
|
|
||||||
]
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@app.post("/chat")
|
|
||||||
def chat(task: UserTask) -> dict[str, object]:
|
|
||||||
submit = getattr(runtime, "submit_task", None)
|
|
||||||
if callable(submit):
|
|
||||||
return submit(task)
|
|
||||||
return runtime.handle_task(task)
|
|
||||||
|
|
||||||
|
|
||||||
@app.post("/permissions/resolve")
|
|
||||||
def resolve_permission(request: PermissionResolutionRequest) -> dict[str, object]:
|
|
||||||
submit = getattr(runtime, "submit_permission_resolution", None)
|
|
||||||
if callable(submit):
|
|
||||||
return submit(task_id=request.task_id, decision=request.decision)
|
|
||||||
return runtime.resolve_permission(task_id=request.task_id, decision=request.decision)
|
|
||||||
|
|
||||||
|
|
||||||
@app.post("/secrets/resolve")
|
|
||||||
def resolve_secret(request: SecretResolutionRequest) -> dict[str, object]:
|
|
||||||
submit = getattr(runtime, "submit_secret_resolution", None)
|
|
||||||
if callable(submit):
|
|
||||||
return submit(task_id=request.task_id, secret=request.secret)
|
|
||||||
return runtime.resolve_secret(task_id=request.task_id, secret=request.secret)
|
|
||||||
|
|
||||||
|
|
||||||
@app.post("/password/resolve")
|
|
||||||
def resolve_password(request: PasswordResolutionRequest) -> dict[str, object]:
|
|
||||||
submit = getattr(runtime, "submit_password_resolution", None)
|
|
||||||
if callable(submit):
|
|
||||||
return submit(task_id=request.task_id, password=request.password)
|
|
||||||
return runtime.resolve_password(task_id=request.task_id, password=request.password)
|
|
||||||
|
|
||||||
|
|
||||||
@app.post("/review/resolve")
|
|
||||||
def resolve_review(request: ReviewResolutionRequest) -> dict[str, object]:
|
|
||||||
submit = getattr(runtime, "submit_review_resolution", None)
|
|
||||||
if callable(submit):
|
|
||||||
return submit(task_id=request.task_id, decision=request.decision, correction=request.correction)
|
|
||||||
return runtime.resolve_review(task_id=request.task_id, decision=request.decision, correction=request.correction)
|
|
||||||
|
|
||||||
|
|
||||||
@app.post("/critic/feedback")
|
|
||||||
def critic_feedback(request: CriticFeedbackRequest) -> dict[str, object]:
|
|
||||||
feedback = runtime.handle_critic_feedback(
|
|
||||||
feedback=request.feedback,
|
|
||||||
task_id=request.task_id,
|
|
||||||
session_id=request.session_id,
|
|
||||||
feedback_type=request.feedback_type,
|
|
||||||
severity=request.severity,
|
|
||||||
correction=request.correction,
|
|
||||||
remember=request.remember,
|
|
||||||
retry=request.retry,
|
|
||||||
assistant_answer=request.assistant_answer,
|
|
||||||
correctness_override=request.correctness_override,
|
|
||||||
usefulness_override=request.usefulness_override,
|
|
||||||
safety_override=request.safety_override,
|
|
||||||
)
|
|
||||||
return feedback
|
|
||||||
|
|
||||||
|
|
||||||
@app.websocket("/stream/{task_id}")
|
|
||||||
async def stream_task(websocket: WebSocket, task_id: str) -> None:
|
|
||||||
await websocket.accept()
|
|
||||||
replayed_events = streaming.replay_events(task_id)
|
|
||||||
for event in replayed_events:
|
|
||||||
await websocket.send_json(event.model_dump(mode="json"))
|
|
||||||
if replayed_events and replayed_events[-1].type in {"task_completed", "task_failed"}:
|
|
||||||
await websocket.close()
|
|
||||||
return
|
|
||||||
|
|
||||||
queue = streaming.subscribe(task_id)
|
|
||||||
try:
|
|
||||||
while True:
|
|
||||||
try:
|
|
||||||
event = await asyncio.wait_for(queue.get(), timeout=30)
|
|
||||||
except asyncio.TimeoutError:
|
|
||||||
await websocket.send_json({"type": "heartbeat", "task_id": task_id})
|
|
||||||
continue
|
|
||||||
await websocket.send_json(event.model_dump(mode="json"))
|
|
||||||
if event.type in {"task_completed", "task_failed", "task_awaiting_permission", "task_awaiting_input", "task_awaiting_review"}:
|
|
||||||
break
|
|
||||||
except WebSocketDisconnect:
|
|
||||||
pass
|
|
||||||
finally:
|
|
||||||
streaming.unsubscribe(task_id, queue)
|
|
||||||
await websocket.close()
|
|
||||||
Binary file not shown.
|
Before Width: | Height: | Size: 17 KiB |
File diff suppressed because it is too large
Load Diff
|
|
@ -1,2 +0,0 @@
|
||||||
"""CLI layer."""
|
|
||||||
|
|
||||||
|
|
@ -1,2 +0,0 @@
|
||||||
"""Core orchestration components."""
|
|
||||||
|
|
||||||
|
|
@ -1,542 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import asyncio
|
|
||||||
import json
|
|
||||||
import logging
|
|
||||||
import re
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
from app.core.contracts import ExecutionDirective
|
|
||||||
from app.core.intent_parser import IntentParser
|
|
||||||
from app.events.event_bus import EventBus
|
|
||||||
from app.events.event_types import (
|
|
||||||
ORCHESTRATOR_CALLED,
|
|
||||||
ORCHESTRATOR_FALLBACK_USED,
|
|
||||||
ORCHESTRATOR_RETRY,
|
|
||||||
ORCHESTRATOR_RESULT,
|
|
||||||
ORCHESTRATOR_UNAVAILABLE,
|
|
||||||
THINKER_CALLED,
|
|
||||||
THINKER_RESULT,
|
|
||||||
JSON_COMPILER_CALLED,
|
|
||||||
JSON_COMPILER_RESULT,
|
|
||||||
)
|
|
||||||
from app.models.async_adapters import AsyncOrchestratorAdapter
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
|
|
||||||
class AsyncRouter:
|
|
||||||
"""Async router using Thinker + JSON Compiler pipeline."""
|
|
||||||
|
|
||||||
def __init__(
|
|
||||||
self,
|
|
||||||
thinker: AsyncOrchestratorAdapter | None = None,
|
|
||||||
json_compiler: AsyncOrchestratorAdapter | None = None,
|
|
||||||
intent_parser: IntentParser | None = None,
|
|
||||||
prompts: dict[str, str] | None = None,
|
|
||||||
event_bus: EventBus | None = None,
|
|
||||||
tool_registry=None,
|
|
||||||
retry_limit: int = 2,
|
|
||||||
debug: bool = False,
|
|
||||||
log_length: int = 500,
|
|
||||||
json_fix_retry_limit: int = 2,
|
|
||||||
json_fix_use_sys_util: bool = True,
|
|
||||||
intent_classifier: str = "thinker",
|
|
||||||
) -> None:
|
|
||||||
self._thinker = thinker
|
|
||||||
self._json_compiler = json_compiler
|
|
||||||
self._intent_classifier = intent_classifier
|
|
||||||
self._sys_util = None
|
|
||||||
self._intent_parser = intent_parser or IntentParser()
|
|
||||||
self._prompts = prompts or {}
|
|
||||||
self._event_bus = event_bus
|
|
||||||
self._tool_registry = tool_registry
|
|
||||||
self._retry_limit = retry_limit
|
|
||||||
self._debug = debug
|
|
||||||
self._log_length = log_length
|
|
||||||
self._json_fix_retry_limit = json_fix_retry_limit
|
|
||||||
self._json_fix_use_sys_util = json_fix_use_sys_util
|
|
||||||
self._orchestrator = None # Set separately if needed for classification
|
|
||||||
|
|
||||||
def set_event_bus(self, event_bus: EventBus) -> None:
|
|
||||||
self._event_bus = event_bus
|
|
||||||
|
|
||||||
def set_thinker(self, thinker: AsyncOrchestratorAdapter) -> None:
|
|
||||||
self._thinker = thinker
|
|
||||||
|
|
||||||
def set_json_compiler(self, json_compiler: AsyncOrchestratorAdapter) -> None:
|
|
||||||
self._json_compiler = json_compiler
|
|
||||||
|
|
||||||
def set_sys_util(self, sys_util: AsyncOrchestratorAdapter) -> None:
|
|
||||||
self._sys_util = sys_util
|
|
||||||
|
|
||||||
def set_orchestrator(self, orchestrator: AsyncOrchestratorAdapter) -> None:
|
|
||||||
self._orchestrator = orchestrator
|
|
||||||
|
|
||||||
def set_tool_registry(self, tool_registry) -> None:
|
|
||||||
self._tool_registry = tool_registry
|
|
||||||
|
|
||||||
async def decide(
|
|
||||||
self,
|
|
||||||
state: dict[str, Any],
|
|
||||||
context: dict[str, Any],
|
|
||||||
task_id: str | None = None,
|
|
||||||
session_id: str | None = None,
|
|
||||||
) -> ExecutionDirective:
|
|
||||||
task_context = context.get("task_context", {})
|
|
||||||
requested_tool = task_context.get("requested_tool")
|
|
||||||
task_summary = str(context.get("task_summary", ""))
|
|
||||||
|
|
||||||
if requested_tool:
|
|
||||||
self._emit_event(
|
|
||||||
ORCHESTRATOR_RESULT,
|
|
||||||
{"reason": "explicit_tool_request", "tool": requested_tool},
|
|
||||||
task_id,
|
|
||||||
session_id,
|
|
||||||
)
|
|
||||||
return ExecutionDirective(
|
|
||||||
type="tool",
|
|
||||||
payload={
|
|
||||||
"tool": requested_tool,
|
|
||||||
"args": task_context.get("tool_args", {}),
|
|
||||||
},
|
|
||||||
requires_permission=requested_tool in {"shell_exec", "file_write"},
|
|
||||||
confidence=0.9,
|
|
||||||
reason="Task context explicitly requested a tool execution.",
|
|
||||||
)
|
|
||||||
|
|
||||||
parsed_intent = self._intent_parser.parse(task_summary)
|
|
||||||
if parsed_intent:
|
|
||||||
self._emit_event(
|
|
||||||
ORCHESTRATOR_RESULT,
|
|
||||||
{"reason": "deterministic_intent_parser", "directive": parsed_intent.model_dump(mode="json")},
|
|
||||||
task_id,
|
|
||||||
session_id,
|
|
||||||
)
|
|
||||||
return parsed_intent
|
|
||||||
|
|
||||||
if self._thinker is None:
|
|
||||||
fallback = self._fallback_directive(task_summary)
|
|
||||||
self._emit_event(
|
|
||||||
ORCHESTRATOR_FALLBACK_USED,
|
|
||||||
{"reason": "thinker_unavailable", "directive": fallback.model_dump(mode="json")},
|
|
||||||
task_id,
|
|
||||||
session_id,
|
|
||||||
)
|
|
||||||
return fallback
|
|
||||||
|
|
||||||
if self._json_compiler is None:
|
|
||||||
fallback = self._fallback_directive(task_summary)
|
|
||||||
self._emit_event(
|
|
||||||
ORCHESTRATOR_FALLBACK_USED,
|
|
||||||
{"reason": "json_compiler_unavailable", "directive": fallback.model_dump(mode="json")},
|
|
||||||
task_id,
|
|
||||||
session_id,
|
|
||||||
)
|
|
||||||
return fallback
|
|
||||||
|
|
||||||
mode_hint = await self._classify_intent(task_summary)
|
|
||||||
thinker_prompt = self._build_thinker_prompt(task_summary, context, mode_hint)
|
|
||||||
|
|
||||||
for thinker_attempt in range(self._retry_limit + 1):
|
|
||||||
if thinker_attempt > 0:
|
|
||||||
self._emit_event(
|
|
||||||
ORCHESTRATOR_RETRY,
|
|
||||||
{"attempt": thinker_attempt, "prompt": thinker_prompt},
|
|
||||||
task_id,
|
|
||||||
session_id,
|
|
||||||
)
|
|
||||||
thinker_prompt = self._add_thinker_feedback(thinker_prompt, last_thinker_error, thinker_attempt)
|
|
||||||
|
|
||||||
self._emit_event(
|
|
||||||
THINKER_CALLED,
|
|
||||||
{"attempt": thinker_attempt, "mode": mode_hint},
|
|
||||||
task_id,
|
|
||||||
session_id,
|
|
||||||
)
|
|
||||||
|
|
||||||
try:
|
|
||||||
thinker_result = await self._thinker.generate(thinker_prompt)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Thinker generate failed: {e}")
|
|
||||||
last_thinker_error = str(e)
|
|
||||||
continue
|
|
||||||
|
|
||||||
logger.info(f"Thinker result (attempt {thinker_attempt + 1}): {thinker_result}")
|
|
||||||
self._emit_event(
|
|
||||||
THINKER_RESULT,
|
|
||||||
{"result": thinker_result, "attempt": thinker_attempt},
|
|
||||||
task_id,
|
|
||||||
session_id,
|
|
||||||
)
|
|
||||||
|
|
||||||
if mode_hint == "conversation" and self._looks_like_tool_plan(thinker_result):
|
|
||||||
mode_hint = "execution"
|
|
||||||
self._emit_event(
|
|
||||||
ORCHESTRATOR_FALLBACK_USED,
|
|
||||||
{"reason": "thinker_proposed_tool_plan_despite_conversation_hint"},
|
|
||||||
task_id,
|
|
||||||
session_id,
|
|
||||||
)
|
|
||||||
|
|
||||||
if self._is_simple_response(thinker_result):
|
|
||||||
json_compiler_prompt = self._build_json_compiler_prompt(thinker_result)
|
|
||||||
else:
|
|
||||||
json_compiler_prompt = self._build_json_compiler_prompt(thinker_result)
|
|
||||||
|
|
||||||
for compiler_attempt in range(self._json_fix_retry_limit + 1):
|
|
||||||
self._emit_event(
|
|
||||||
JSON_COMPILER_CALLED,
|
|
||||||
{"attempt": compiler_attempt, "plan": thinker_result},
|
|
||||||
task_id,
|
|
||||||
session_id,
|
|
||||||
)
|
|
||||||
|
|
||||||
try:
|
|
||||||
compiler_result = await self._json_compiler.generate(json_compiler_prompt)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"JSON Compiler generate failed: {e}")
|
|
||||||
compiler_result = None
|
|
||||||
|
|
||||||
if compiler_result:
|
|
||||||
logger.info(f"JSON Compiler result (attempt {compiler_attempt + 1}): {compiler_result}")
|
|
||||||
self._emit_event(
|
|
||||||
JSON_COMPILER_RESULT,
|
|
||||||
{"result": compiler_result, "attempt": compiler_attempt},
|
|
||||||
task_id,
|
|
||||||
session_id,
|
|
||||||
)
|
|
||||||
|
|
||||||
directive = self._validate_directive(compiler_result, mode_hint) if compiler_result else None
|
|
||||||
if directive is not None:
|
|
||||||
directive = self._guard_rail_check(directive)
|
|
||||||
self._emit_event(
|
|
||||||
ORCHESTRATOR_RESULT,
|
|
||||||
{"directive": directive.model_dump(mode="json"), "thinker_attempt": thinker_attempt, "compiler_attempt": compiler_attempt},
|
|
||||||
task_id,
|
|
||||||
session_id,
|
|
||||||
)
|
|
||||||
return directive
|
|
||||||
|
|
||||||
if compiler_result:
|
|
||||||
logger.warning(f"JSON Compiler validation failed, attempting fix (attempt {compiler_attempt + 1})")
|
|
||||||
fix_result = await self._fix_invalid_json(compiler_result, compiler_attempt, task_id, session_id)
|
|
||||||
if fix_result:
|
|
||||||
fixed_directive = self._validate_directive(fix_result, mode_hint)
|
|
||||||
if fixed_directive is not None:
|
|
||||||
fixed_directive = self._guard_rail_check(fixed_directive)
|
|
||||||
self._emit_event(
|
|
||||||
ORCHESTRATOR_RESULT,
|
|
||||||
{"directive": fixed_directive.model_dump(mode="json"), "fixed": True},
|
|
||||||
task_id,
|
|
||||||
session_id,
|
|
||||||
)
|
|
||||||
return fixed_directive
|
|
||||||
|
|
||||||
last_thinker_error = f"JSON Compiler failed after {self._json_fix_retry_limit + 1} attempts"
|
|
||||||
|
|
||||||
self._emit_event(
|
|
||||||
ORCHESTRATOR_UNAVAILABLE,
|
|
||||||
{"reason": "retry_exhausted", "last_error": last_thinker_error},
|
|
||||||
task_id,
|
|
||||||
session_id,
|
|
||||||
)
|
|
||||||
raise RuntimeError(f"Thinker/Compiler pipeline failed after {self._retry_limit + 1} attempts")
|
|
||||||
|
|
||||||
def _fallback_directive(self, task_summary: str) -> ExecutionDirective:
|
|
||||||
parsed = self._intent_parser.parse(task_summary)
|
|
||||||
if parsed:
|
|
||||||
return parsed
|
|
||||||
|
|
||||||
return ExecutionDirective(
|
|
||||||
type="respond",
|
|
||||||
payload={"text": f"Runtime accepted task: {task_summary}"},
|
|
||||||
requires_permission=False,
|
|
||||||
confidence=0.4,
|
|
||||||
reason="Fallback response because local orchestration models are not loaded.",
|
|
||||||
)
|
|
||||||
|
|
||||||
def _is_simple_response(self, thinker_result: str) -> bool:
|
|
||||||
result_lower = thinker_result.lower().strip()
|
|
||||||
return result_lower.startswith("ответ:") or result_lower.startswith("response:") or "не нужно" in result_lower
|
|
||||||
|
|
||||||
def _extract_conversation_response(self, thinker_result: str) -> str:
|
|
||||||
"""Extract text response from thinker result for conversation mode."""
|
|
||||||
result_lower = thinker_result.lower()
|
|
||||||
|
|
||||||
# Skip the ПЛАН lines, just get the ОТВЕТ part
|
|
||||||
lines = thinker_result.split('\n')
|
|
||||||
response_lines = []
|
|
||||||
capture = False
|
|
||||||
|
|
||||||
for line in lines:
|
|
||||||
if line.strip().lower().startswith('ответ:') or line.strip().lower().startswith('response:'):
|
|
||||||
capture = True
|
|
||||||
response_lines.append(line)
|
|
||||||
elif capture and line.strip():
|
|
||||||
# Check if this is a new ПЛАН or step
|
|
||||||
if line.strip().lower().startswith('план') or line.strip().lower().startswith('step'):
|
|
||||||
break
|
|
||||||
response_lines.append(line)
|
|
||||||
|
|
||||||
if response_lines:
|
|
||||||
return '\n'.join(response_lines).replace('ответ:', '').replace('response:', '').strip()
|
|
||||||
|
|
||||||
# Fallback: return first few sentences
|
|
||||||
sentences = thinker_result.split('.')[:3]
|
|
||||||
return '. '.join(sentences).strip()
|
|
||||||
|
|
||||||
def _looks_like_tool_plan(self, thinker_result: str) -> bool:
|
|
||||||
result = thinker_result.lower()
|
|
||||||
tool_names = set()
|
|
||||||
if self._tool_registry:
|
|
||||||
tool_names = set(self._tool_registry.list_names())
|
|
||||||
tool_markers = {"shell_exec", "file_read", "file_write", "memory", *tool_names}
|
|
||||||
plan_markers = ("план:", "шаг", "step", "tool", "инструмент")
|
|
||||||
return any(marker in result for marker in tool_markers) and any(marker in result for marker in plan_markers)
|
|
||||||
|
|
||||||
def _build_thinker_prompt(
|
|
||||||
self, task_summary: str, context: dict[str, Any], mode_hint: str
|
|
||||||
) -> str:
|
|
||||||
base_prompt = self._prompts.get("thinker", "")
|
|
||||||
memory_context = context.get("memory_context", [])
|
|
||||||
|
|
||||||
tools_json = "[]"
|
|
||||||
if self._tool_registry:
|
|
||||||
schemas = self._tool_registry.list_schemas()
|
|
||||||
tools_json = json.dumps(schemas, ensure_ascii=False, indent=2)
|
|
||||||
|
|
||||||
prompt_lines = [
|
|
||||||
base_prompt,
|
|
||||||
"",
|
|
||||||
f"Task: {task_summary}",
|
|
||||||
f"Mode hint: {mode_hint}",
|
|
||||||
]
|
|
||||||
|
|
||||||
if memory_context:
|
|
||||||
memory_text = "\n".join([f"- {m.get('text', '')}" for m in memory_context[:5]])
|
|
||||||
prompt_lines.append(f"\nRelevant memory:\n{memory_text}")
|
|
||||||
|
|
||||||
session_history = context.get("session_history", [])
|
|
||||||
if session_history:
|
|
||||||
history_text = "\n".join([f"- {h.get('text', '')}" for h in session_history[:3]])
|
|
||||||
prompt_lines.append(f"\nPrevious requests in this session:\n{history_text}")
|
|
||||||
|
|
||||||
# Active memory recall results
|
|
||||||
memory_recall = context.get("memory_recall")
|
|
||||||
if memory_recall:
|
|
||||||
prompt_lines.append("\n=== ИЗ ДОЛГОВРЕМЕННОЙ ПАМЯТИ (ACTIVE RECALL) ===")
|
|
||||||
prompt_lines.append(f"Поисковый запрос: {memory_recall.get('query', '')}")
|
|
||||||
prompt_lines.append(memory_recall.get("summary", ""))
|
|
||||||
prompt_lines.append("=== КОНЕЦ ПАМЯТИ ===")
|
|
||||||
|
|
||||||
prompt_lines.extend([
|
|
||||||
"",
|
|
||||||
f"AVAILABLE TOOLS (JSON):",
|
|
||||||
tools_json,
|
|
||||||
"",
|
|
||||||
])
|
|
||||||
|
|
||||||
return "\n".join(prompt_lines)
|
|
||||||
|
|
||||||
def _build_json_compiler_prompt(self, thinker_result: str) -> str:
|
|
||||||
base_prompt = self._prompts.get("json_compiler", "")
|
|
||||||
|
|
||||||
prompt_lines = [
|
|
||||||
base_prompt,
|
|
||||||
"",
|
|
||||||
"Thinker's plan:",
|
|
||||||
thinker_result,
|
|
||||||
"",
|
|
||||||
]
|
|
||||||
|
|
||||||
return "\n".join(prompt_lines)
|
|
||||||
|
|
||||||
def _determine_mode_from_context(self, context: dict[str, Any]) -> str:
|
|
||||||
"""Legacy method - kept for compatibility"""
|
|
||||||
task_summary = str(context.get("task_summary", "")).lower()
|
|
||||||
keywords = ["запусти", "выполни", "создай", "напиши", "удали", "run", "execute", "create"]
|
|
||||||
for kw in keywords:
|
|
||||||
if kw in task_summary:
|
|
||||||
return "execution"
|
|
||||||
return "conversation"
|
|
||||||
|
|
||||||
async def _classify_intent(self, task_summary: str) -> str:
|
|
||||||
"""LLM-based intent classification"""
|
|
||||||
if self._intent_classifier == "orchestrator" and self._orchestrator:
|
|
||||||
classifier_model = self._orchestrator
|
|
||||||
else:
|
|
||||||
classifier_model = self._thinker
|
|
||||||
|
|
||||||
if not classifier_model:
|
|
||||||
logger.warning("No classifier model available, using default")
|
|
||||||
return "conversation"
|
|
||||||
|
|
||||||
classification_prompt = f"""Классифицируй запрос пользователя: "{task_summary}"
|
|
||||||
|
|
||||||
Классы:
|
|
||||||
- execution: чтобы ответить, агенту нужно обратиться к локальной среде, файлам, shell, tools, памяти, сети или выполнить проверку/операцию. Это включает вопросы о текущем состоянии ПК, установленных пакетах, файлах, процессах, времени работы, обновлениях, логах.
|
|
||||||
- conversation: можно ответить сразу из диалога и общих знаний, без проверки локальной среды и без tools.
|
|
||||||
- clarification_needed: нельзя понять, что именно пользователь хочет.
|
|
||||||
|
|
||||||
Верни ровно один токен без рассуждений: execution или conversation или clarification_needed"""
|
|
||||||
|
|
||||||
try:
|
|
||||||
result = await classifier_model.generate(classification_prompt)
|
|
||||||
classification = self._extract_classification(result)
|
|
||||||
if classification:
|
|
||||||
logger.info(f"Intent classified: {classification} for task: {task_summary}")
|
|
||||||
return classification
|
|
||||||
|
|
||||||
logger.warning(f"Invalid classification result: {result}, defaulting to conversation")
|
|
||||||
return "conversation"
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Intent classification failed: {e}, defaulting to conversation")
|
|
||||||
return "conversation"
|
|
||||||
|
|
||||||
def _extract_classification(self, raw_result: str) -> str | None:
|
|
||||||
result = raw_result.strip().lower()
|
|
||||||
allowed = {"execution", "conversation", "clarification_needed"}
|
|
||||||
if result in allowed:
|
|
||||||
return result
|
|
||||||
|
|
||||||
result = re.sub(r"<think>.*?</think>", " ", result, flags=re.DOTALL)
|
|
||||||
if (
|
|
||||||
"shell_exec" in result
|
|
||||||
or "execute command" in result
|
|
||||||
or "command execution" in result
|
|
||||||
or "use the tool" in result
|
|
||||||
or "use a tool" in result
|
|
||||||
):
|
|
||||||
return "execution"
|
|
||||||
tokens = re.findall(r"\b(execution|conversation|clarification_needed)\b", result)
|
|
||||||
if tokens:
|
|
||||||
return tokens[-1]
|
|
||||||
|
|
||||||
first_word = result.split()[0] if result.split() else ""
|
|
||||||
if first_word in allowed:
|
|
||||||
return first_word
|
|
||||||
|
|
||||||
return None
|
|
||||||
|
|
||||||
def _validate_directive(self, output: str, mode_hint: str) -> ExecutionDirective | None:
|
|
||||||
if not output:
|
|
||||||
return None
|
|
||||||
|
|
||||||
try:
|
|
||||||
json_start = output.find("{")
|
|
||||||
json_end = output.rfind("}") + 1
|
|
||||||
if json_start < 0 or json_end <= 0:
|
|
||||||
return None
|
|
||||||
|
|
||||||
json_str = output[json_start:json_end]
|
|
||||||
data = json.loads(json_str)
|
|
||||||
|
|
||||||
if "type" not in data:
|
|
||||||
return None
|
|
||||||
|
|
||||||
msg_type = data.get("type", "")
|
|
||||||
payload = data.get("payload", {})
|
|
||||||
|
|
||||||
if msg_type == "step" and "tool" in payload:
|
|
||||||
tool = payload.get("tool", "")
|
|
||||||
args = payload.get("args", {})
|
|
||||||
payload = {"tool": tool, "args": args}
|
|
||||||
|
|
||||||
if msg_type == "plan":
|
|
||||||
payload = {"steps": payload.get("steps", [])}
|
|
||||||
|
|
||||||
return ExecutionDirective(
|
|
||||||
type=msg_type,
|
|
||||||
payload=payload,
|
|
||||||
confidence=data.get("confidence", 0.9),
|
|
||||||
reason=data.get("reason", ""),
|
|
||||||
)
|
|
||||||
except (json.JSONDecodeError, ValueError, TypeError) as e:
|
|
||||||
logger.warning(f"Directive JSON validation failed: {e}")
|
|
||||||
return None
|
|
||||||
|
|
||||||
def _guard_rail_check(self, directive: ExecutionDirective) -> ExecutionDirective:
|
|
||||||
tool_name = directive.payload.get("tool", "")
|
|
||||||
if tool_name in {"shell_exec", "file_write", "file_delete"}:
|
|
||||||
return ExecutionDirective(
|
|
||||||
type=directive.type,
|
|
||||||
payload=directive.payload,
|
|
||||||
requires_permission=True,
|
|
||||||
confidence=directive.confidence,
|
|
||||||
reason=directive.reason,
|
|
||||||
)
|
|
||||||
return directive
|
|
||||||
|
|
||||||
def _add_thinker_feedback(self, prompt: str, error: str, attempt: int) -> str:
|
|
||||||
feedback = f"\n[ATTEMPT {attempt + 1} FAILED: {error}]\n"
|
|
||||||
feedback += "Provide a valid semantic plan.\n"
|
|
||||||
return prompt + feedback
|
|
||||||
|
|
||||||
def _emit_event(
|
|
||||||
self,
|
|
||||||
event_type: str,
|
|
||||||
payload: dict[str, Any],
|
|
||||||
task_id: str | None,
|
|
||||||
session_id: str | None,
|
|
||||||
) -> None:
|
|
||||||
if self._event_bus and task_id:
|
|
||||||
from app.core.contracts import RuntimeEvent
|
|
||||||
event = RuntimeEvent(
|
|
||||||
task_id=task_id,
|
|
||||||
session_id=session_id or "unknown",
|
|
||||||
sequence=self._event_bus.next_sequence(task_id),
|
|
||||||
type=event_type,
|
|
||||||
payload=payload,
|
|
||||||
)
|
|
||||||
self._event_bus.publish(event)
|
|
||||||
|
|
||||||
SYS_UTIL_PROMPT = None
|
|
||||||
|
|
||||||
async def _fix_invalid_json(self, invalid_result: str, attempt: int, task_id: str | None, session_id: str | None) -> str | None:
|
|
||||||
"""Try to fix invalid JSON using sys_util model."""
|
|
||||||
if not self._sys_util:
|
|
||||||
return None
|
|
||||||
|
|
||||||
first_brace = invalid_result.find('{')
|
|
||||||
last_brace = invalid_result.rfind('}')
|
|
||||||
if first_brace < 0 or last_brace <= first_brace:
|
|
||||||
return None
|
|
||||||
|
|
||||||
truncated_json = invalid_result[first_brace:last_brace + 1]
|
|
||||||
|
|
||||||
error_msg = ""
|
|
||||||
try:
|
|
||||||
json.loads(truncated_json)
|
|
||||||
except json.JSONDecodeError as e:
|
|
||||||
error_msg = str(e)
|
|
||||||
|
|
||||||
sys_util_prompt = (
|
|
||||||
self._prompts.get("sys_util")
|
|
||||||
if self._prompts
|
|
||||||
else self.SYS_UTIL_PROMPT or (
|
|
||||||
"You are a STRICT JSON repair engine. "
|
|
||||||
"Your job is ONLY to fix invalid JSON syntax. "
|
|
||||||
"You MUST output valid JSON or nothing else."
|
|
||||||
)
|
|
||||||
)
|
|
||||||
fix_prompt = f"""{sys_util_prompt}
|
|
||||||
|
|
||||||
{error_msg}
|
|
||||||
|
|
||||||
Fixed JSON:"""
|
|
||||||
|
|
||||||
try:
|
|
||||||
logger.info(f"JSON fix using sys_util model (attempt {attempt + 1})")
|
|
||||||
fixed_result = await self._sys_util.generate(fix_prompt)
|
|
||||||
|
|
||||||
fixed_first = fixed_result.find('{')
|
|
||||||
fixed_last = fixed_result.rfind('}')
|
|
||||||
if fixed_first >= 0 and fixed_last > fixed_first:
|
|
||||||
return fixed_result[fixed_first:fixed_last + 1]
|
|
||||||
|
|
||||||
return None
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"JSON fix failed: {e}")
|
|
||||||
return None
|
|
||||||
|
|
@ -1,60 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import re
|
|
||||||
import shlex
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
from app.core.permission_service import PermissionService
|
|
||||||
|
|
||||||
|
|
||||||
class CommandAnalyzer:
|
|
||||||
"""Deterministic shell action analyzer for structured critic evidence."""
|
|
||||||
|
|
||||||
_SPLIT_RE = re.compile(r"\s*(?:&&|;)\s*")
|
|
||||||
|
|
||||||
def __init__(self, permission_service: PermissionService) -> None:
|
|
||||||
self._permission_service = permission_service
|
|
||||||
|
|
||||||
def analyze(self, command: str, task_id: str, session_id: str) -> dict[str, Any]:
|
|
||||||
segments = [segment.strip() for segment in self._SPLIT_RE.split(command) if segment.strip()]
|
|
||||||
root_required: list[str] = []
|
|
||||||
elevated: list[str] = []
|
|
||||||
unelevated_root: list[str] = []
|
|
||||||
|
|
||||||
for segment in segments:
|
|
||||||
normalized, is_elevated = self._strip_sudo(segment)
|
|
||||||
check = self._permission_service.check_shell_command(
|
|
||||||
task_id=task_id,
|
|
||||||
session_id=session_id,
|
|
||||||
command=normalized,
|
|
||||||
)
|
|
||||||
if check.get("requires_sudo"):
|
|
||||||
root_required.append(normalized)
|
|
||||||
if is_elevated:
|
|
||||||
elevated.append(normalized)
|
|
||||||
else:
|
|
||||||
unelevated_root.append(normalized)
|
|
||||||
|
|
||||||
diagnosis_type = "privilege_scope_error" if unelevated_root else "ok"
|
|
||||||
return {
|
|
||||||
"type": diagnosis_type,
|
|
||||||
"command": command,
|
|
||||||
"segments": segments,
|
|
||||||
"root_required_segments": root_required,
|
|
||||||
"elevated_segments": elevated,
|
|
||||||
"unelevated_root_segments": unelevated_root,
|
|
||||||
}
|
|
||||||
|
|
||||||
def _strip_sudo(self, segment: str) -> tuple[str, bool]:
|
|
||||||
try:
|
|
||||||
parts = shlex.split(segment)
|
|
||||||
except ValueError:
|
|
||||||
return segment, segment.strip().startswith("sudo ")
|
|
||||||
if not parts or parts[0] != "sudo":
|
|
||||||
return segment, False
|
|
||||||
index = 1
|
|
||||||
while index < len(parts) and parts[index].startswith("-"):
|
|
||||||
index += 1
|
|
||||||
if index < len(parts) and parts[index - 1] in {"-p", "--prompt"}:
|
|
||||||
index += 1
|
|
||||||
return " ".join(shlex.quote(part) for part in parts[index:]), True
|
|
||||||
|
|
@ -1,93 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import json
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
from pydantic import BaseModel, Field
|
|
||||||
|
|
||||||
|
|
||||||
class ModelsConfig(BaseModel):
|
|
||||||
orchestrator_path: str = "models/llama.gguf"
|
|
||||||
coder_path: str = "models/xcoder.gguf"
|
|
||||||
critic_path: str = "models/gemma.gguf"
|
|
||||||
embeddings_path: str = "models/all-MiniLM-L6-v2"
|
|
||||||
inference: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
thinker: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
json_compiler: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
orchestrator: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
coder: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
critic: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
sys_util: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
embeddings: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
|
|
||||||
|
|
||||||
class PromptsConfig(BaseModel):
|
|
||||||
orchestration_prompt: str = ""
|
|
||||||
planning_prompt: str = ""
|
|
||||||
coder_prompt: str = ""
|
|
||||||
critic_prompt: str = ""
|
|
||||||
|
|
||||||
|
|
||||||
class PermissionsConfig(BaseModel):
|
|
||||||
dangerous_commands: dict[str, str] = Field(default_factory=dict)
|
|
||||||
sensitive_paths: list[str] = Field(default_factory=list)
|
|
||||||
default_approval_behavior: str = "ask_always"
|
|
||||||
|
|
||||||
|
|
||||||
class RuntimeConfig(BaseModel):
|
|
||||||
step_timeout_ms: int = 30_000
|
|
||||||
task_timeout_ms: int = 300_000
|
|
||||||
shell_command_timeout_ms: int = 3_600_000
|
|
||||||
shell_idle_timeout_ms: int = 600_000
|
|
||||||
planner_retry_limit: int = 2
|
|
||||||
tool_retry_limit: int = 1
|
|
||||||
replan_limit: int = 1
|
|
||||||
max_execution_steps: int = 20
|
|
||||||
retrieval_top_k: int = 5
|
|
||||||
max_context_tokens: int = 8192
|
|
||||||
context_budgets: dict[str, int] = Field(default_factory=lambda: {
|
|
||||||
"system": 512,
|
|
||||||
"task": 512,
|
|
||||||
"memory": 2048,
|
|
||||||
"execution": 2048,
|
|
||||||
"tools": 1024,
|
|
||||||
"safety": 512,
|
|
||||||
})
|
|
||||||
reserve_for_generation_pct: int = 25
|
|
||||||
orchestrator_retry_limit: int = 2
|
|
||||||
intent_classifier: str = "thinker"
|
|
||||||
recall_model: str = "sys_util"
|
|
||||||
memory_thresholds: dict[str, float] = Field(default_factory=dict)
|
|
||||||
critic_fallback_policy: str = "continue_without_critic"
|
|
||||||
checkpoint_policy: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
event_retention_policy: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
streaming_settings: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
debug: bool = False
|
|
||||||
debug_orchestrator_log_length: int = 500
|
|
||||||
json_fix_retry_limit: int = 2
|
|
||||||
json_fix_use_sys_util: bool = True
|
|
||||||
recall_model: str = "json_compiler"
|
|
||||||
critic_retry_limit: int = 2
|
|
||||||
|
|
||||||
|
|
||||||
class AppConfig(BaseModel):
|
|
||||||
models: ModelsConfig
|
|
||||||
prompts: PromptsConfig
|
|
||||||
permissions: PermissionsConfig
|
|
||||||
runtime: RuntimeConfig
|
|
||||||
|
|
||||||
|
|
||||||
def _load_json(path: Path) -> dict[str, Any]:
|
|
||||||
with path.open("r", encoding="utf-8") as handle:
|
|
||||||
return json.load(handle)
|
|
||||||
|
|
||||||
|
|
||||||
def load_app_config(config_dir: str | Path) -> AppConfig:
|
|
||||||
config_path = Path(config_dir)
|
|
||||||
return AppConfig(
|
|
||||||
models=ModelsConfig.model_validate(_load_json(config_path / "models.json")),
|
|
||||||
prompts=PromptsConfig.model_validate(_load_json(config_path / "prompts.json")),
|
|
||||||
permissions=PermissionsConfig.model_validate(_load_json(config_path / "permissions.json")),
|
|
||||||
runtime=RuntimeConfig.model_validate(_load_json(config_path / "runtime.json")),
|
|
||||||
)
|
|
||||||
|
|
@ -1,172 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import logging
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
from app.core.contracts import TaskCheckpoint, UserTask
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
DEFAULT_BUDGETS = {
|
|
||||||
"system": 512,
|
|
||||||
"task": 512,
|
|
||||||
"memory": 2048,
|
|
||||||
"execution": 2048,
|
|
||||||
"tools": 1024,
|
|
||||||
"safety": 512,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
class ContextBuilder:
|
|
||||||
def __init__(
|
|
||||||
self,
|
|
||||||
memory_interface=None,
|
|
||||||
tool_registry=None,
|
|
||||||
config: dict[str, Any] | None = None,
|
|
||||||
) -> None:
|
|
||||||
self._memory = memory_interface
|
|
||||||
self._tool_registry = tool_registry
|
|
||||||
self._config = config or {}
|
|
||||||
self._max_tokens = self._config.get("max_context_tokens", 8192)
|
|
||||||
self._budgets = self._config.get("context_budgets", DEFAULT_BUDGETS)
|
|
||||||
self._reserve_pct = self._config.get("reserve_for_generation_pct", 25)
|
|
||||||
|
|
||||||
def build(
|
|
||||||
self,
|
|
||||||
task: UserTask,
|
|
||||||
checkpoint: TaskCheckpoint | None = None,
|
|
||||||
query: str | None = None,
|
|
||||||
) -> dict[str, Any]:
|
|
||||||
task_summary = task.input
|
|
||||||
search_query = query or task_summary
|
|
||||||
session_id = task.session_id
|
|
||||||
|
|
||||||
memory_context = []
|
|
||||||
if self._memory:
|
|
||||||
memory_context = self._retrieve_memory(search_query, session_id=session_id)
|
|
||||||
|
|
||||||
budgets = self._calculate_budgets()
|
|
||||||
reserved = self._reserve_for_generation()
|
|
||||||
|
|
||||||
system_budget = budgets.get("system", 512)
|
|
||||||
task_budget = budgets.get("task", 512)
|
|
||||||
safety_budget = budgets.get("safety", 512)
|
|
||||||
memory_budget = budgets.get("memory", 2048)
|
|
||||||
|
|
||||||
truncated_memory = self._truncate_memory(
|
|
||||||
memory_context, memory_budget
|
|
||||||
)
|
|
||||||
|
|
||||||
# Get session history for follow-up context
|
|
||||||
session_history = self._get_session_history(session_id)
|
|
||||||
|
|
||||||
context = {
|
|
||||||
"system_prompt": "",
|
|
||||||
"task_summary": task_summary[:task_budget],
|
|
||||||
"task_context": task.context,
|
|
||||||
"memory_context": truncated_memory,
|
|
||||||
"session_history": session_history,
|
|
||||||
"execution_context": checkpoint.model_dump() if checkpoint else {},
|
|
||||||
"tool_context": self._get_tool_context(),
|
|
||||||
"safety_context": {},
|
|
||||||
"constraints": {
|
|
||||||
"budgets": budgets,
|
|
||||||
"reserved_for_generation": reserved,
|
|
||||||
"original_memory_count": len(memory_context),
|
|
||||||
"truncated_memory_count": len(truncated_memory),
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
return context
|
|
||||||
|
|
||||||
def _get_tool_context(self) -> list[dict[str, Any]]:
|
|
||||||
"""Expose available tools to orchestrator."""
|
|
||||||
if not self._tool_registry:
|
|
||||||
return []
|
|
||||||
|
|
||||||
tools = []
|
|
||||||
for name in self._tool_registry.list_names():
|
|
||||||
tool = self._tool_registry.get(name)
|
|
||||||
tools.append({
|
|
||||||
"name": name,
|
|
||||||
"description": getattr(tool, "description", ""),
|
|
||||||
})
|
|
||||||
return tools
|
|
||||||
|
|
||||||
def _calculate_budgets(self) -> dict[str, int]:
|
|
||||||
return dict(self._budgets)
|
|
||||||
|
|
||||||
def _reserve_for_generation(self) -> int:
|
|
||||||
return int(self._max_tokens * self._reserve_pct / 100)
|
|
||||||
|
|
||||||
def _retrieve_memory(
|
|
||||||
self,
|
|
||||||
query: str,
|
|
||||||
session_id: str | None = None,
|
|
||||||
top_k: int = 5,
|
|
||||||
) -> list[dict[str, Any]]:
|
|
||||||
if not self._memory:
|
|
||||||
return []
|
|
||||||
|
|
||||||
try:
|
|
||||||
results = self._memory.search(query, top_k=top_k, session_id=session_id)
|
|
||||||
return [
|
|
||||||
{
|
|
||||||
"id": entry.id,
|
|
||||||
"text": entry.text,
|
|
||||||
"kind": entry.kind,
|
|
||||||
"source": entry.source,
|
|
||||||
"weight": entry.weight,
|
|
||||||
"score": score,
|
|
||||||
}
|
|
||||||
for entry, score in results
|
|
||||||
]
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Memory retrieval failed: {e}")
|
|
||||||
return []
|
|
||||||
|
|
||||||
def _get_session_history(self, session_id: str | None = None) -> list[dict[str, Any]]:
|
|
||||||
"""Get previous task summaries from the same session for context."""
|
|
||||||
if not self._memory or not session_id:
|
|
||||||
return []
|
|
||||||
|
|
||||||
try:
|
|
||||||
# Get recent entries from same session
|
|
||||||
entries = self._memory.get_by_session(session_id, limit=5)
|
|
||||||
# Filter to only task summaries
|
|
||||||
summaries = [
|
|
||||||
{
|
|
||||||
"id": entry.id,
|
|
||||||
"text": entry.text,
|
|
||||||
"kind": entry.kind,
|
|
||||||
"source": entry.source,
|
|
||||||
"weight": entry.weight,
|
|
||||||
}
|
|
||||||
for entry in entries
|
|
||||||
if entry.kind in ("summary", "tool_result")
|
|
||||||
]
|
|
||||||
return summaries
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Session history retrieval failed: {e}")
|
|
||||||
return []
|
|
||||||
|
|
||||||
def _truncate_memory(
|
|
||||||
self,
|
|
||||||
memory_context: list[dict[str, Any]],
|
|
||||||
budget: int,
|
|
||||||
) -> list[dict[str, Any]]:
|
|
||||||
if not memory_context:
|
|
||||||
return []
|
|
||||||
|
|
||||||
estimated_per_entry = 50
|
|
||||||
max_entries = max(budget // estimated_per_entry, 1)
|
|
||||||
|
|
||||||
if len(memory_context) > max_entries:
|
|
||||||
return memory_context[:max_entries]
|
|
||||||
|
|
||||||
return memory_context
|
|
||||||
|
|
||||||
def estimate_tokens(self, text: str) -> int:
|
|
||||||
if not text:
|
|
||||||
return 0
|
|
||||||
return len(text.split()) * 4 // 3
|
|
||||||
|
|
@ -1,148 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from datetime import datetime, timezone
|
|
||||||
from typing import Any, Literal
|
|
||||||
from uuid import uuid4
|
|
||||||
|
|
||||||
from pydantic import BaseModel, Field
|
|
||||||
|
|
||||||
|
|
||||||
def utc_now() -> datetime:
|
|
||||||
return datetime.now(timezone.utc)
|
|
||||||
|
|
||||||
|
|
||||||
class UserTask(BaseModel):
|
|
||||||
task_id: str = Field(default_factory=lambda: str(uuid4()))
|
|
||||||
session_id: str = Field(default_factory=lambda: str(uuid4()))
|
|
||||||
input: str
|
|
||||||
context: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
created_at: datetime = Field(default_factory=utc_now)
|
|
||||||
|
|
||||||
|
|
||||||
class PlanStep(BaseModel):
|
|
||||||
id: str
|
|
||||||
kind: Literal["tool", "coder", "memory", "respond"]
|
|
||||||
tool: str | None = None
|
|
||||||
args: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
description: str
|
|
||||||
requires_confirmation: bool = False
|
|
||||||
depends_on: list[str] = Field(default_factory=list)
|
|
||||||
|
|
||||||
|
|
||||||
class ToolCall(BaseModel):
|
|
||||||
tool: str
|
|
||||||
args: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
task_id: str
|
|
||||||
step_id: str
|
|
||||||
|
|
||||||
|
|
||||||
class ToolResult(BaseModel):
|
|
||||||
tool: str
|
|
||||||
ok: bool
|
|
||||||
output: Any = None
|
|
||||||
error: str | None = None
|
|
||||||
metadata: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
|
|
||||||
|
|
||||||
class CoderRequest(BaseModel):
|
|
||||||
mode: Literal["generate", "fix", "refactor"]
|
|
||||||
instruction: str
|
|
||||||
context: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
task_id: str
|
|
||||||
|
|
||||||
|
|
||||||
class CriticScore(BaseModel):
|
|
||||||
correctness: float = Field(ge=0.0, le=1.0)
|
|
||||||
usefulness: float = Field(ge=0.0, le=1.0)
|
|
||||||
safety: float = Field(ge=0.0, le=1.0)
|
|
||||||
memory_store: bool
|
|
||||||
weight: float = Field(ge=0.0, le=1.0)
|
|
||||||
explanation: str
|
|
||||||
|
|
||||||
|
|
||||||
class MemoryEntry(BaseModel):
|
|
||||||
id: str = Field(default_factory=lambda: str(uuid4()))
|
|
||||||
text: str
|
|
||||||
kind: Literal["tool_result", "plan", "critique", "fact", "summary", "user_preference"]
|
|
||||||
source: Literal["tool", "critic", "user", "system"]
|
|
||||||
weight: float = Field(ge=0.0, le=1.0)
|
|
||||||
task_id: str | None = None
|
|
||||||
session_id: str | None = None
|
|
||||||
metadata: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
created_at: datetime = Field(default_factory=utc_now)
|
|
||||||
embedding_model: str
|
|
||||||
embedding_dim: int
|
|
||||||
|
|
||||||
|
|
||||||
class PermissionDecision(BaseModel):
|
|
||||||
action_type: str
|
|
||||||
pattern: str
|
|
||||||
decision: Literal["allow_once", "allow_always", "deny", "ask_always"]
|
|
||||||
created_at: datetime = Field(default_factory=utc_now)
|
|
||||||
|
|
||||||
|
|
||||||
class RuntimeEvent(BaseModel):
|
|
||||||
event_id: str = Field(default_factory=lambda: str(uuid4()))
|
|
||||||
task_id: str
|
|
||||||
session_id: str
|
|
||||||
sequence: int
|
|
||||||
type: str
|
|
||||||
timestamp: datetime = Field(default_factory=utc_now)
|
|
||||||
payload: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
causation_id: str | None = None
|
|
||||||
correlation_id: str = Field(default_factory=lambda: str(uuid4()))
|
|
||||||
|
|
||||||
|
|
||||||
class TaskCheckpoint(BaseModel):
|
|
||||||
task_id: str
|
|
||||||
status: str
|
|
||||||
active_step_id: str | None = None
|
|
||||||
plan_snapshot: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
context_snapshot: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
updated_at: datetime = Field(default_factory=utc_now)
|
|
||||||
|
|
||||||
|
|
||||||
class PermissionRequest(BaseModel):
|
|
||||||
task_id: str
|
|
||||||
session_id: str
|
|
||||||
action_type: str
|
|
||||||
pattern: str
|
|
||||||
command: str | None = None
|
|
||||||
path: str | None = None
|
|
||||||
requires_password: bool = False
|
|
||||||
|
|
||||||
|
|
||||||
class SecretRequest(BaseModel):
|
|
||||||
task_id: str
|
|
||||||
session_id: str
|
|
||||||
kind: str
|
|
||||||
prompt: str
|
|
||||||
command: str | None = None
|
|
||||||
|
|
||||||
|
|
||||||
class PasswordRequest(BaseModel):
|
|
||||||
task_id: str
|
|
||||||
session_id: str
|
|
||||||
command: str
|
|
||||||
reason: str
|
|
||||||
attempts: int = 0
|
|
||||||
max_attempts: int = 3
|
|
||||||
|
|
||||||
|
|
||||||
class ExecutionDirective(BaseModel):
|
|
||||||
type: Literal[
|
|
||||||
"plan",
|
|
||||||
"tool",
|
|
||||||
"coder",
|
|
||||||
"respond",
|
|
||||||
"replan",
|
|
||||||
"store_memory",
|
|
||||||
"request_permission",
|
|
||||||
"complete",
|
|
||||||
"fail",
|
|
||||||
"noop",
|
|
||||||
]
|
|
||||||
payload: dict[str, Any] = Field(default_factory=dict)
|
|
||||||
requires_permission: bool = False
|
|
||||||
confidence: float = Field(ge=0.0, le=1.0, default=0.0)
|
|
||||||
reason: str = ""
|
|
||||||
|
|
@ -1,975 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import asyncio
|
|
||||||
import json
|
|
||||||
import logging
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
from app.core.contracts import (
|
|
||||||
CriticScore,
|
|
||||||
ExecutionDirective,
|
|
||||||
PermissionDecision,
|
|
||||||
PermissionRequest,
|
|
||||||
RuntimeEvent,
|
|
||||||
SecretRequest,
|
|
||||||
ToolCall,
|
|
||||||
ToolResult,
|
|
||||||
UserTask,
|
|
||||||
)
|
|
||||||
from app.core.command_analyzer import CommandAnalyzer
|
|
||||||
from app.core.execution_scheduler import ExecutionScheduler
|
|
||||||
from app.events.event_bus import EventBus
|
|
||||||
from app.events.event_types import (
|
|
||||||
CRITIC_CALLED,
|
|
||||||
CRITIC_RESULT,
|
|
||||||
PERMISSION_REQUESTED,
|
|
||||||
PERMISSION_RESOLVED,
|
|
||||||
PLAN_FAILED,
|
|
||||||
PLAN_STARTED,
|
|
||||||
SECRET_REQUESTED,
|
|
||||||
STEP_STARTED,
|
|
||||||
STEPPED_COMPLETED,
|
|
||||||
TOOL_CALLED,
|
|
||||||
TOOL_COMPLETED,
|
|
||||||
TOOL_OUTPUT_CHUNK,
|
|
||||||
)
|
|
||||||
from app.models.async_adapters import AsyncCriticAdapter, AsyncCoderAdapter
|
|
||||||
from app.memory.write_policy import MemoryWritePolicy
|
|
||||||
from app.memory.interface import MemoryInterface
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
|
|
||||||
class ExecutionEngine:
|
|
||||||
def __init__(
|
|
||||||
self,
|
|
||||||
event_bus: EventBus,
|
|
||||||
tool_registry,
|
|
||||||
permission_service,
|
|
||||||
scheduler: ExecutionScheduler | None = None,
|
|
||||||
critic: AsyncCriticAdapter | None = None,
|
|
||||||
memory_policy: MemoryWritePolicy | None = None,
|
|
||||||
memory_interface: MemoryInterface | None = None,
|
|
||||||
prompts: dict[str, str] | None = None,
|
|
||||||
recovery_limit: int = 1,
|
|
||||||
critic_retry_limit: int = 2,
|
|
||||||
command_analyzer: CommandAnalyzer | None = None,
|
|
||||||
) -> None:
|
|
||||||
self._event_bus = event_bus
|
|
||||||
self._tool_registry = tool_registry
|
|
||||||
self._permission_service = permission_service
|
|
||||||
self._scheduler = scheduler or ExecutionScheduler()
|
|
||||||
self._critic = critic
|
|
||||||
self._coder: AsyncCoderAdapter | None = None
|
|
||||||
self._memory_policy = memory_policy
|
|
||||||
self._memory_interface = memory_interface
|
|
||||||
self._prompts = prompts or {}
|
|
||||||
self._recovery_limit = recovery_limit
|
|
||||||
self._critic_retry_limit = critic_retry_limit
|
|
||||||
self._command_analyzer = command_analyzer
|
|
||||||
|
|
||||||
def set_critic(self, critic: AsyncCriticAdapter) -> None:
|
|
||||||
self._critic = critic
|
|
||||||
|
|
||||||
def set_coder(self, coder: AsyncCoderAdapter) -> None:
|
|
||||||
self._coder = coder
|
|
||||||
|
|
||||||
def set_memory_policy(self, policy: MemoryWritePolicy) -> None:
|
|
||||||
self._memory_policy = policy
|
|
||||||
|
|
||||||
def execute(
|
|
||||||
self,
|
|
||||||
task: UserTask,
|
|
||||||
directive: ExecutionDirective,
|
|
||||||
permission_override: PermissionDecision | None = None,
|
|
||||||
secret_override: str | None = None,
|
|
||||||
password_override: str | None = None,
|
|
||||||
) -> dict[str, Any]:
|
|
||||||
scheduled = self._scheduler.next_directive(directive)
|
|
||||||
self._publish(task, STEP_STARTED, {"directive_type": scheduled.type})
|
|
||||||
|
|
||||||
if scheduled.type == "plan":
|
|
||||||
return self._execute_plan(
|
|
||||||
task=task,
|
|
||||||
directive=scheduled,
|
|
||||||
permission_override=permission_override,
|
|
||||||
secret_override=secret_override,
|
|
||||||
password_override=password_override,
|
|
||||||
)
|
|
||||||
|
|
||||||
if scheduled.type == "tool":
|
|
||||||
return self._execute_tool(
|
|
||||||
task=task,
|
|
||||||
directive=scheduled,
|
|
||||||
permission_override=permission_override,
|
|
||||||
secret_override=secret_override,
|
|
||||||
password_override=password_override,
|
|
||||||
)
|
|
||||||
|
|
||||||
if scheduled.type == "respond":
|
|
||||||
return {
|
|
||||||
"status": "completed",
|
|
||||||
"result": {
|
|
||||||
"message": scheduled.payload.get("text", f"Runtime accepted task: {task.input}"),
|
|
||||||
"mode": scheduled.payload.get("mode", "direct_response"),
|
|
||||||
},
|
|
||||||
"directive": scheduled.model_dump(mode="json"),
|
|
||||||
}
|
|
||||||
|
|
||||||
if scheduled.type == "coder":
|
|
||||||
return self._execute_coder(
|
|
||||||
task=task,
|
|
||||||
directive=scheduled,
|
|
||||||
)
|
|
||||||
|
|
||||||
if scheduled.type == "fail":
|
|
||||||
return {
|
|
||||||
"status": "failed",
|
|
||||||
"result": {"error": scheduled.reason or "Execution failed."},
|
|
||||||
}
|
|
||||||
|
|
||||||
return {
|
|
||||||
"status": "completed",
|
|
||||||
"result": {
|
|
||||||
"message": "Directive accepted.",
|
|
||||||
"directive_type": scheduled.type,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
def _execute_plan(
|
|
||||||
self,
|
|
||||||
task: UserTask,
|
|
||||||
directive: ExecutionDirective,
|
|
||||||
permission_override: PermissionDecision | None = None,
|
|
||||||
secret_override: str | None = None,
|
|
||||||
password_override: str | None = None,
|
|
||||||
) -> dict[str, Any]:
|
|
||||||
# Unified format: {"type": "plan", "payload": {"steps": [...]}}
|
|
||||||
# Need to extract steps from nested payload
|
|
||||||
import json
|
|
||||||
|
|
||||||
payload = directive.payload
|
|
||||||
steps_data = []
|
|
||||||
|
|
||||||
# If payload has "steps" directly, use them
|
|
||||||
if "steps" in payload:
|
|
||||||
steps_data = payload.get("steps", [])
|
|
||||||
# If payload is a string (JSON), parse it
|
|
||||||
elif isinstance(payload, str) and payload.strip().startswith("{"):
|
|
||||||
try:
|
|
||||||
parsed = json.loads(payload)
|
|
||||||
steps_data = parsed.get("payload", {}).get("steps", [])
|
|
||||||
except:
|
|
||||||
steps_data = []
|
|
||||||
|
|
||||||
if steps_data:
|
|
||||||
plan_json = json.dumps({"type": "plan", "payload": {"steps": steps_data}})
|
|
||||||
else:
|
|
||||||
plan_json = json.dumps(payload)
|
|
||||||
|
|
||||||
plan_steps = self._scheduler.parse_plan_steps(plan_json, task.task_id)
|
|
||||||
|
|
||||||
if not plan_steps:
|
|
||||||
return {
|
|
||||||
"status": "failed",
|
|
||||||
"result": {"error": "Failed to parse plan steps from directive"},
|
|
||||||
}
|
|
||||||
|
|
||||||
if not self._scheduler.validate_no_cycles(plan_steps):
|
|
||||||
self._publish(task, PLAN_FAILED, {"error": "Cycle detected in plan"})
|
|
||||||
return {
|
|
||||||
"status": "failed",
|
|
||||||
"result": {"error": "Cycle detected in plan"},
|
|
||||||
}
|
|
||||||
|
|
||||||
graph = self._scheduler.build_task_graph(plan_steps)
|
|
||||||
self._publish(task, PLAN_STARTED, {"steps": len(plan_steps)})
|
|
||||||
|
|
||||||
completed_steps: set[str] = set()
|
|
||||||
step_results: list[dict[str, Any]] = []
|
|
||||||
critic_retries_used = 0 # Track critic→replan cycles
|
|
||||||
|
|
||||||
ready_steps = self._get_ready_steps(graph, completed_steps)
|
|
||||||
|
|
||||||
while ready_steps:
|
|
||||||
step = ready_steps.pop(0)
|
|
||||||
|
|
||||||
# Handle respond kind directly without tool execution
|
|
||||||
if step.kind == "respond":
|
|
||||||
result = {
|
|
||||||
"status": "completed",
|
|
||||||
"result": {
|
|
||||||
"message": step.args.get("text", step.description),
|
|
||||||
},
|
|
||||||
}
|
|
||||||
else:
|
|
||||||
step_directive = ExecutionDirective(
|
|
||||||
type=step.kind,
|
|
||||||
payload={
|
|
||||||
"tool": step.tool,
|
|
||||||
"args": step.args,
|
|
||||||
},
|
|
||||||
requires_permission=step.requires_confirmation,
|
|
||||||
reason=step.description,
|
|
||||||
)
|
|
||||||
|
|
||||||
result = self._execute_tool(
|
|
||||||
task=task,
|
|
||||||
directive=step_directive,
|
|
||||||
permission_override=permission_override,
|
|
||||||
secret_override=secret_override,
|
|
||||||
password_override=password_override,
|
|
||||||
)
|
|
||||||
|
|
||||||
# If tool needs human input/review - return immediately.
|
|
||||||
if result.get("status") in (
|
|
||||||
"awaiting_permission",
|
|
||||||
"awaiting_input",
|
|
||||||
"awaiting_password",
|
|
||||||
"awaiting_review",
|
|
||||||
):
|
|
||||||
return {
|
|
||||||
"status": result.get("status"),
|
|
||||||
"result": result.get("result", {}),
|
|
||||||
"step_results": step_results,
|
|
||||||
}
|
|
||||||
|
|
||||||
step_results.append({
|
|
||||||
"step_id": step.id,
|
|
||||||
"result": result,
|
|
||||||
})
|
|
||||||
|
|
||||||
completed_steps.add(step.id)
|
|
||||||
self._publish(task, STEPPED_COMPLETED, {
|
|
||||||
"step_id": step.id,
|
|
||||||
"status": result.get("status"),
|
|
||||||
})
|
|
||||||
|
|
||||||
# === Critic evaluation ===
|
|
||||||
if self._critic and result.get("status") == "completed":
|
|
||||||
critic_score = self._evaluate_with_critic(task, step, result)
|
|
||||||
if critic_score:
|
|
||||||
result["critic_score"] = {
|
|
||||||
"correctness": critic_score.correctness,
|
|
||||||
"usefulness": critic_score.usefulness,
|
|
||||||
"safety": critic_score.safety,
|
|
||||||
"memory_store": critic_score.memory_store,
|
|
||||||
"weight": critic_score.weight,
|
|
||||||
"explanation": critic_score.explanation,
|
|
||||||
}
|
|
||||||
self._save_critique_to_memory(task, step, critic_score)
|
|
||||||
|
|
||||||
# Check if step result is satisfactory
|
|
||||||
min_correctness = 0.5
|
|
||||||
if critic_score.correctness < min_correctness:
|
|
||||||
# Step failed critic check — try to recover
|
|
||||||
if critic_retries_used < self._critic_retry_limit and step.kind != "respond":
|
|
||||||
critic_retries_used += 1
|
|
||||||
self._publish(task, CRITIC_RESULT, {
|
|
||||||
"step_id": step.id,
|
|
||||||
"score": critic_score.model_dump(mode="json"),
|
|
||||||
"action": "retry",
|
|
||||||
"retry": critic_retries_used,
|
|
||||||
})
|
|
||||||
# Retry the same step — rebuild directive
|
|
||||||
retry_directive = ExecutionDirective(
|
|
||||||
type=step.kind,
|
|
||||||
payload={"tool": step.tool, "args": step.args},
|
|
||||||
requires_permission=step.requires_confirmation,
|
|
||||||
reason=step.description,
|
|
||||||
)
|
|
||||||
retry_result = self._execute_tool(
|
|
||||||
task=task,
|
|
||||||
directive=retry_directive,
|
|
||||||
permission_override=permission_override,
|
|
||||||
secret_override=secret_override,
|
|
||||||
password_override=password_override,
|
|
||||||
)
|
|
||||||
if retry_result.get("status") == "completed":
|
|
||||||
result = retry_result
|
|
||||||
step_results[-1]["result"] = result
|
|
||||||
# Re-evaluate after retry
|
|
||||||
critic_score2 = self._evaluate_with_critic(task, step, result)
|
|
||||||
if critic_score2 and critic_score2.correctness >= min_correctness:
|
|
||||||
# Retry succeeded
|
|
||||||
continue
|
|
||||||
# If retry also failed, continue to next step
|
|
||||||
else:
|
|
||||||
self._publish(task, CRITIC_RESULT, {
|
|
||||||
"step_id": step.id,
|
|
||||||
"score": critic_score.model_dump(mode="json"),
|
|
||||||
"action": "give_up",
|
|
||||||
"reason": f"Critic retry limit ({self._critic_retry_limit}) reached",
|
|
||||||
})
|
|
||||||
|
|
||||||
# Handle failed step
|
|
||||||
if result.get("status") == "failed":
|
|
||||||
review = self._build_failed_step_review(task, step, result)
|
|
||||||
if review:
|
|
||||||
return {
|
|
||||||
"status": "awaiting_review",
|
|
||||||
"result": {
|
|
||||||
"error": f"Step {step.id} requires review before replanning",
|
|
||||||
"failed_step": step.id,
|
|
||||||
"step_results": step_results,
|
|
||||||
"review": review,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
recovery = self._recover_failed_step(
|
|
||||||
task=task,
|
|
||||||
step=step,
|
|
||||||
result=result,
|
|
||||||
step_results=step_results,
|
|
||||||
permission_override=permission_override,
|
|
||||||
secret_override=secret_override,
|
|
||||||
password_override=password_override,
|
|
||||||
)
|
|
||||||
if recovery.get("status") == "awaiting_permission":
|
|
||||||
return recovery
|
|
||||||
if recovery.get("status") == "completed":
|
|
||||||
recovered_result = recovery.get("result")
|
|
||||||
if recovered_result:
|
|
||||||
step_results[-1]["result"] = recovered_result
|
|
||||||
if recovery.get("finish"):
|
|
||||||
return {
|
|
||||||
"status": "completed",
|
|
||||||
"result": {
|
|
||||||
"message": recovery.get("message", "Recovered from failed step"),
|
|
||||||
"step_results": step_results,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
else:
|
|
||||||
return {
|
|
||||||
"status": "failed",
|
|
||||||
"result": {
|
|
||||||
"error": f"Step {step.id} failed",
|
|
||||||
"failed_step": step.id,
|
|
||||||
"step_results": step_results,
|
|
||||||
"recovery": recovery.get("result"),
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
ready_steps = self._get_ready_steps(graph, completed_steps)
|
|
||||||
|
|
||||||
return {
|
|
||||||
"status": "completed",
|
|
||||||
"result": {
|
|
||||||
"message": f"Plan executed: {len(completed_steps)} steps completed",
|
|
||||||
"step_results": step_results,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
def _build_failed_step_review(self, task: UserTask, step, result: dict[str, Any]) -> dict[str, Any] | None:
|
|
||||||
if step.tool != "shell_exec" or not self._command_analyzer:
|
|
||||||
return None
|
|
||||||
command = str((step.args or {}).get("command", ""))
|
|
||||||
if not command:
|
|
||||||
return None
|
|
||||||
diagnosis = self._command_analyzer.analyze(
|
|
||||||
command=command,
|
|
||||||
task_id=task.task_id,
|
|
||||||
session_id=task.session_id,
|
|
||||||
)
|
|
||||||
if diagnosis.get("type") == "ok":
|
|
||||||
return None
|
|
||||||
return {
|
|
||||||
"step_id": step.id,
|
|
||||||
"tool": step.tool,
|
|
||||||
"command": command,
|
|
||||||
"diagnosis": diagnosis,
|
|
||||||
"critic_assessment": {
|
|
||||||
"classification": "model_planning_error",
|
|
||||||
"needs_replan": True,
|
|
||||||
"explanation": "Structured command analysis found a model action error before recovery.",
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
def _recover_failed_step(
|
|
||||||
self,
|
|
||||||
task: UserTask,
|
|
||||||
step,
|
|
||||||
result: dict[str, Any],
|
|
||||||
step_results: list[dict[str, Any]],
|
|
||||||
permission_override: PermissionDecision | None = None,
|
|
||||||
secret_override: str | None = None,
|
|
||||||
password_override: str | None = None,
|
|
||||||
) -> dict[str, Any]:
|
|
||||||
if self._recovery_limit <= 0 or not self._critic:
|
|
||||||
return {"status": "failed", "result": {"reason": "recovery_unavailable"}}
|
|
||||||
|
|
||||||
decision = self._evaluate_recovery(task, step, result, step_results)
|
|
||||||
action = decision.get("action", "fail")
|
|
||||||
|
|
||||||
if action == "continue":
|
|
||||||
recovered = dict(result)
|
|
||||||
recovered["status"] = "completed"
|
|
||||||
recovered["recovery_decision"] = decision
|
|
||||||
return {"status": "completed", "result": recovered}
|
|
||||||
|
|
||||||
if action == "respond":
|
|
||||||
recovered = dict(result)
|
|
||||||
recovered["status"] = "completed"
|
|
||||||
recovered["recovery_decision"] = decision
|
|
||||||
return {
|
|
||||||
"status": "completed",
|
|
||||||
"result": recovered,
|
|
||||||
"finish": True,
|
|
||||||
"message": decision.get("message") or decision.get("reason") or "Recovered by responding to user",
|
|
||||||
}
|
|
||||||
|
|
||||||
if action == "retry":
|
|
||||||
retry_tool = decision.get("tool") or step.tool
|
|
||||||
retry_args = decision.get("args") or step.args
|
|
||||||
retry_result = self._execute_tool(
|
|
||||||
task=task,
|
|
||||||
directive=ExecutionDirective(
|
|
||||||
type="tool",
|
|
||||||
payload={"tool": retry_tool, "args": retry_args},
|
|
||||||
requires_permission=True,
|
|
||||||
reason=decision.get("reason", "Recovery retry"),
|
|
||||||
),
|
|
||||||
permission_override=permission_override,
|
|
||||||
secret_override=secret_override,
|
|
||||||
password_override=password_override,
|
|
||||||
)
|
|
||||||
if retry_result.get("status") == "awaiting_permission":
|
|
||||||
return retry_result
|
|
||||||
retry_result["recovery_decision"] = decision
|
|
||||||
if retry_result.get("status") == "completed":
|
|
||||||
return {"status": "completed", "result": retry_result}
|
|
||||||
return {"status": "failed", "result": {"decision": decision, "retry_result": retry_result}}
|
|
||||||
|
|
||||||
return {"status": "failed", "result": decision}
|
|
||||||
|
|
||||||
def _evaluate_recovery(
|
|
||||||
self,
|
|
||||||
task: UserTask,
|
|
||||||
step,
|
|
||||||
result: dict[str, Any],
|
|
||||||
step_results: list[dict[str, Any]],
|
|
||||||
) -> dict[str, Any]:
|
|
||||||
prompt = self._build_recovery_prompt(task, step, result, step_results)
|
|
||||||
self._publish(task, CRITIC_CALLED, {"step_id": step.id, "mode": "recovery"})
|
|
||||||
|
|
||||||
try:
|
|
||||||
output = asyncio.run(self._critic.generate(prompt, max_tokens=512))
|
|
||||||
decision = self._parse_recovery_decision(output)
|
|
||||||
self._publish(task, CRITIC_RESULT, {
|
|
||||||
"step_id": step.id,
|
|
||||||
"mode": "recovery",
|
|
||||||
"decision": decision,
|
|
||||||
"raw": output,
|
|
||||||
})
|
|
||||||
return decision
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Recovery evaluation failed: {e}")
|
|
||||||
self._publish(task, CRITIC_RESULT, {
|
|
||||||
"step_id": step.id,
|
|
||||||
"mode": "recovery",
|
|
||||||
"error": str(e),
|
|
||||||
})
|
|
||||||
return {"action": "fail", "reason": str(e)}
|
|
||||||
|
|
||||||
def _build_recovery_prompt(
|
|
||||||
self,
|
|
||||||
task: UserTask,
|
|
||||||
step,
|
|
||||||
result: dict[str, Any],
|
|
||||||
step_results: list[dict[str, Any]],
|
|
||||||
) -> str:
|
|
||||||
return f"""You are a recovery controller for an agent runtime.
|
|
||||||
|
|
||||||
Decide what to do after a failed tool step. A non-zero exit code is not always fatal.
|
|
||||||
Interpret the failure in context.
|
|
||||||
|
|
||||||
Allowed actions:
|
|
||||||
- continue: failure is acceptable information; continue the plan.
|
|
||||||
- retry: try one alternative tool call. Include "tool" and "args".
|
|
||||||
- respond: stop and answer the user with available information. Include "message".
|
|
||||||
- fail: real failure; stop the task.
|
|
||||||
|
|
||||||
Return ONLY JSON:
|
|
||||||
{{"action":"continue|retry|respond|fail","reason":"...","tool":"shell_exec","args":{{...}},"message":"..."}}
|
|
||||||
|
|
||||||
Task:
|
|
||||||
{task.input}
|
|
||||||
|
|
||||||
Failed step:
|
|
||||||
id={step.id}
|
|
||||||
tool={step.tool}
|
|
||||||
args={json.dumps(step.args, ensure_ascii=False)}
|
|
||||||
description={step.description}
|
|
||||||
|
|
||||||
Failed result:
|
|
||||||
{json.dumps(result, ensure_ascii=False, indent=2)}
|
|
||||||
|
|
||||||
Previous step results:
|
|
||||||
{json.dumps(step_results, ensure_ascii=False, indent=2)}
|
|
||||||
"""
|
|
||||||
|
|
||||||
def _parse_recovery_decision(self, output: str) -> dict[str, Any]:
|
|
||||||
try:
|
|
||||||
json_start = output.find("{")
|
|
||||||
json_end = output.rfind("}") + 1
|
|
||||||
if json_start < 0 or json_end <= 0:
|
|
||||||
return {"action": "fail", "reason": "Recovery output was not JSON"}
|
|
||||||
data = json.loads(output[json_start:json_end])
|
|
||||||
action = data.get("action", "fail")
|
|
||||||
if action not in {"continue", "retry", "respond", "fail"}:
|
|
||||||
action = "fail"
|
|
||||||
data["action"] = action
|
|
||||||
return data
|
|
||||||
except (json.JSONDecodeError, TypeError, ValueError) as e:
|
|
||||||
return {"action": "fail", "reason": f"Recovery JSON parse failed: {e}"}
|
|
||||||
|
|
||||||
def _get_ready_steps(
|
|
||||||
self,
|
|
||||||
graph: dict[str, Any],
|
|
||||||
completed: set[str],
|
|
||||||
) -> list:
|
|
||||||
if not graph or not graph.get("nodes"):
|
|
||||||
return []
|
|
||||||
|
|
||||||
step_map: dict = graph.get("step_map", {})
|
|
||||||
ready = []
|
|
||||||
|
|
||||||
for node in graph["nodes"]:
|
|
||||||
node_id = node["id"]
|
|
||||||
if node_id in completed:
|
|
||||||
continue
|
|
||||||
|
|
||||||
deps = node.get("depends_on", [])
|
|
||||||
if all(dep in completed for dep in deps):
|
|
||||||
step = step_map.get(node_id)
|
|
||||||
if step:
|
|
||||||
ready.append(step)
|
|
||||||
|
|
||||||
return ready
|
|
||||||
|
|
||||||
def _evaluate_with_critic(
|
|
||||||
self,
|
|
||||||
task: UserTask,
|
|
||||||
step,
|
|
||||||
result: dict[str, Any],
|
|
||||||
) -> CriticScore | None:
|
|
||||||
if not self._critic:
|
|
||||||
return None
|
|
||||||
|
|
||||||
critic_prompt = self._build_critic_prompt(step, result)
|
|
||||||
|
|
||||||
self._publish(task, CRITIC_CALLED, {"step_id": step.id})
|
|
||||||
|
|
||||||
try:
|
|
||||||
critic_output = asyncio.run(self._critic.generate(critic_prompt))
|
|
||||||
score = self._parse_critic_score(critic_output)
|
|
||||||
|
|
||||||
self._publish(task, CRITIC_RESULT, {
|
|
||||||
"step_id": step.id,
|
|
||||||
"score": score.model_dump(mode="json") if score else None,
|
|
||||||
})
|
|
||||||
|
|
||||||
if score:
|
|
||||||
result["critic_score"] = {
|
|
||||||
"correctness": score.correctness,
|
|
||||||
"usefulness": score.usefulness,
|
|
||||||
"safety": score.safety,
|
|
||||||
"memory_store": score.memory_store,
|
|
||||||
"weight": score.weight,
|
|
||||||
"explanation": score.explanation,
|
|
||||||
}
|
|
||||||
|
|
||||||
return score
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Critic evaluation failed: {e}")
|
|
||||||
self._publish(task, CRITIC_RESULT, {
|
|
||||||
"step_id": step.id,
|
|
||||||
"error": str(e),
|
|
||||||
})
|
|
||||||
return None
|
|
||||||
|
|
||||||
def _save_critique_to_memory(
|
|
||||||
self,
|
|
||||||
task: UserTask,
|
|
||||||
step,
|
|
||||||
score: CriticScore,
|
|
||||||
) -> None:
|
|
||||||
"""Save critic evaluation as critique entry in memory, using MemoryWritePolicy."""
|
|
||||||
if not self._memory_interface:
|
|
||||||
return
|
|
||||||
|
|
||||||
try:
|
|
||||||
# Check with policy before saving
|
|
||||||
if self._memory_policy:
|
|
||||||
decision = self._memory_policy.decide(
|
|
||||||
critic_score=score,
|
|
||||||
memory_type="critique",
|
|
||||||
session_id=task.session_id,
|
|
||||||
)
|
|
||||||
if decision == "skip":
|
|
||||||
logger.info(f"MemoryWritePolicy skipped critique for {step.tool}")
|
|
||||||
return
|
|
||||||
# For "store_with_weight", we could adjust weight, but critic score already has weight
|
|
||||||
|
|
||||||
tool_name = step.tool
|
|
||||||
tool_args = step.args or {}
|
|
||||||
args_str = ", ".join([f"{k}={v}" for k, v in tool_args.items()])
|
|
||||||
|
|
||||||
critique_text = f"Tool: {tool_name}({args_str}) | Task: {task.input[:100]} | Scores: correctness={score.correctness}, usefulness={score.usefulness}, safety={score.safety} | {score.explanation}"
|
|
||||||
|
|
||||||
metadata = {
|
|
||||||
"task_input": task.input,
|
|
||||||
"tool": tool_name,
|
|
||||||
"args": tool_args,
|
|
||||||
"step_id": step.id,
|
|
||||||
"scores": {
|
|
||||||
"correctness": score.correctness,
|
|
||||||
"usefulness": score.usefulness,
|
|
||||||
"safety": score.safety,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
self._memory_interface.insert(
|
|
||||||
text=critique_text,
|
|
||||||
kind="critique",
|
|
||||||
source="critic",
|
|
||||||
task_id=task.task_id,
|
|
||||||
session_id=task.session_id,
|
|
||||||
weight=score.weight,
|
|
||||||
metadata=metadata,
|
|
||||||
)
|
|
||||||
logger.info(f"Saved critique to memory: {tool_name} task_id={task.task_id}")
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Failed to save critique to memory: {e}")
|
|
||||||
|
|
||||||
def _build_critic_prompt(self, step, result: dict[str, Any]) -> str:
|
|
||||||
base_prompt = self._prompts.get("critic", "")
|
|
||||||
tool_result = result.get("result", {})
|
|
||||||
|
|
||||||
# Truncate long outputs to avoid exceeding context window
|
|
||||||
# Keep output under ~2000 chars to leave room for prompt + generation
|
|
||||||
output = tool_result.get("output", "")
|
|
||||||
if isinstance(output, str) and len(output) > 2000:
|
|
||||||
output = output[:2000] + "\n... [truncated]"
|
|
||||||
elif not isinstance(output, str):
|
|
||||||
output_str = json.dumps(output, ensure_ascii=False)
|
|
||||||
if len(output_str) > 2000:
|
|
||||||
output = output_str[:2000] + "\n... [truncated]"
|
|
||||||
else:
|
|
||||||
output = output_str
|
|
||||||
|
|
||||||
# Build a compact result representation
|
|
||||||
compact_result = {
|
|
||||||
"ok": tool_result.get("ok"),
|
|
||||||
"output": output,
|
|
||||||
"error": tool_result.get("error"),
|
|
||||||
"exit_code": tool_result.get("metadata", {}).get("exit_code"),
|
|
||||||
}
|
|
||||||
|
|
||||||
return f"""{base_prompt}
|
|
||||||
|
|
||||||
Step: {step.description}
|
|
||||||
Tool: {step.tool}
|
|
||||||
Args: {step.args}
|
|
||||||
|
|
||||||
Result:
|
|
||||||
{json.dumps(compact_result, indent=2, ensure_ascii=False)}
|
|
||||||
|
|
||||||
Evaluate and respond with JSON:
|
|
||||||
{{"correctness": 0.0-1.0, "usefulness": 0.0-1.0, "safety": 0.0-1.0, "memory_store": true|false, "weight": 0.0-1.0, "explanation": "..."}}"""
|
|
||||||
|
|
||||||
def _parse_critic_score(self, output: str) -> CriticScore | None:
|
|
||||||
try:
|
|
||||||
json_start = output.find("{")
|
|
||||||
json_end = output.rfind("}") + 1
|
|
||||||
if json_start < 0:
|
|
||||||
return None
|
|
||||||
|
|
||||||
json_str = output[json_start:json_end]
|
|
||||||
data = json.loads(json_str)
|
|
||||||
|
|
||||||
return CriticScore(
|
|
||||||
correctness=data.get("correctness", 0.5),
|
|
||||||
usefulness=data.get("usefulness", 0.5),
|
|
||||||
safety=data.get("safety", 1.0),
|
|
||||||
memory_store=data.get("memory_store", False),
|
|
||||||
weight=data.get("weight", 0.5),
|
|
||||||
explanation=data.get("explanation", ""),
|
|
||||||
)
|
|
||||||
|
|
||||||
except (json.JSONDecodeError, ValueError, TypeError) as e:
|
|
||||||
logger.warning(f"Critic score parsing failed: {e}")
|
|
||||||
return None
|
|
||||||
|
|
||||||
def _execute_coder(
|
|
||||||
self,
|
|
||||||
task: UserTask,
|
|
||||||
directive: ExecutionDirective,
|
|
||||||
) -> dict[str, Any]:
|
|
||||||
if not self._coder:
|
|
||||||
return {"status": "failed", "result": {"error": "Coder model not available"}}
|
|
||||||
|
|
||||||
coder_task = directive.payload.get("task", "")
|
|
||||||
if not coder_task:
|
|
||||||
return {"status": "failed", "result": {"error": "Missing task for coder"}}
|
|
||||||
|
|
||||||
try:
|
|
||||||
output = asyncio.run(self._coder.generate(coder_task))
|
|
||||||
|
|
||||||
return {
|
|
||||||
"status": "completed",
|
|
||||||
"result": {"code": output},
|
|
||||||
}
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Coder execution failed: {e}")
|
|
||||||
return {"status": "failed", "result": {"error": str(e)}}
|
|
||||||
|
|
||||||
def _execute_tool(
|
|
||||||
self,
|
|
||||||
task: UserTask,
|
|
||||||
directive: ExecutionDirective,
|
|
||||||
permission_override: PermissionDecision | None = None,
|
|
||||||
secret_override: str | None = None,
|
|
||||||
password_override: str | None = None,
|
|
||||||
) -> dict[str, Any]:
|
|
||||||
tool_name = str(directive.payload.get("tool", "")).strip()
|
|
||||||
tool_args = dict(directive.payload.get("args", {}))
|
|
||||||
|
|
||||||
if password_override:
|
|
||||||
tool_args["password"] = password_override
|
|
||||||
|
|
||||||
if not tool_name:
|
|
||||||
return {"status": "failed", "result": {"error": "Missing tool name"}}
|
|
||||||
|
|
||||||
# Tool-first: validate tool exists in registry
|
|
||||||
available_tools = self._tool_registry.list_names()
|
|
||||||
if tool_name not in available_tools:
|
|
||||||
return {"status": "failed", "result": {"error": f"Unknown tool: {tool_name}. Available tools: {available_tools}"}}
|
|
||||||
|
|
||||||
permission_result = None
|
|
||||||
|
|
||||||
# If permission_override is provided, skip permission check
|
|
||||||
if permission_override is not None:
|
|
||||||
permission_result = {
|
|
||||||
"decision": permission_override.decision,
|
|
||||||
"command": tool_args.get("command", ""),
|
|
||||||
"cached": True,
|
|
||||||
}
|
|
||||||
# Check permission for shell_exec and file_write
|
|
||||||
elif tool_name == "shell_exec":
|
|
||||||
permission_result = self._permission_service.check_shell_command(
|
|
||||||
task_id=task.task_id,
|
|
||||||
session_id=task.session_id,
|
|
||||||
command=str(tool_args.get("command", "")),
|
|
||||||
)
|
|
||||||
elif tool_name == "file_write":
|
|
||||||
# Allow writing to runtime data directory without permission check
|
|
||||||
write_path = str(tool_args.get("path", ""))
|
|
||||||
if "allowed_commands.json" in write_path or "/data/runtime" in write_path:
|
|
||||||
# Internal system write - allow without permission
|
|
||||||
permission_result = {"decision": "allowed", "path": write_path}
|
|
||||||
else:
|
|
||||||
permission_result = self._permission_service.check_write_path(
|
|
||||||
task_id=task.task_id,
|
|
||||||
session_id=task.session_id,
|
|
||||||
path=write_path,
|
|
||||||
)
|
|
||||||
|
|
||||||
# Handle permission result
|
|
||||||
if permission_result:
|
|
||||||
decision = permission_result.get("decision", "unknown")
|
|
||||||
|
|
||||||
# Hard stop - deny execution
|
|
||||||
if decision == "hard_stop":
|
|
||||||
self._publish(task, PERMISSION_REQUESTED, permission_result)
|
|
||||||
return {
|
|
||||||
"status": "failed",
|
|
||||||
"result": {
|
|
||||||
"error": f"Command blocked: {permission_result.get('reason', 'Hard stop command')}",
|
|
||||||
"command": permission_result.get("command", ""),
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
# Cached - already allowed
|
|
||||||
if decision in ("allowed_always", "allowed") or permission_result.get("cached"):
|
|
||||||
self._publish(task, PERMISSION_RESOLVED, permission_result)
|
|
||||||
|
|
||||||
# Need user confirmation - return immediately, don't continue execution
|
|
||||||
elif decision == "prompt":
|
|
||||||
self._publish(task, PERMISSION_REQUESTED, permission_result)
|
|
||||||
return {
|
|
||||||
"status": "awaiting_permission",
|
|
||||||
"result": {
|
|
||||||
"error": "Permission required before execution.",
|
|
||||||
"permission_request": permission_result,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
# Hard stop - return immediately
|
|
||||||
elif decision == "deny":
|
|
||||||
self._publish(task, PERMISSION_RESOLVED, permission_result)
|
|
||||||
return {
|
|
||||||
"status": "failed",
|
|
||||||
"result": {
|
|
||||||
"error": "Permission denied",
|
|
||||||
"command": permission_result.get("command", ""),
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
# Deny
|
|
||||||
elif decision == "deny":
|
|
||||||
self._publish(task, PERMISSION_RESOLVED, permission_result)
|
|
||||||
return {
|
|
||||||
"status": "failed",
|
|
||||||
"result": {
|
|
||||||
"error": "Permission denied",
|
|
||||||
"command": permission_result.get("command", ""),
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
if tool_name == "shell_exec":
|
|
||||||
command = str(tool_args.get("command", ""))
|
|
||||||
|
|
||||||
# Determine if sudo password is needed:
|
|
||||||
# 1. Command explicitly starts with "sudo"
|
|
||||||
# 2. Command is a known sudo-requiring command (apt, systemctl, etc.) — flagged by permission service
|
|
||||||
needs_password = command.startswith("sudo ") or (permission_result is not None and permission_result.get("requires_sudo", False))
|
|
||||||
|
|
||||||
if needs_password and secret_override is None:
|
|
||||||
secret_request = SecretRequest(
|
|
||||||
task_id=task.task_id,
|
|
||||||
session_id=task.session_id,
|
|
||||||
kind="sudo_password",
|
|
||||||
prompt="Sudo password required",
|
|
||||||
command=command,
|
|
||||||
)
|
|
||||||
self._publish(task, SECRET_REQUESTED, secret_request.model_dump(mode="json"))
|
|
||||||
return {
|
|
||||||
"status": "awaiting_input",
|
|
||||||
"result": {
|
|
||||||
"error": "Secret required",
|
|
||||||
"secret_request": secret_request.model_dump(mode="json"),
|
|
||||||
},
|
|
||||||
}
|
|
||||||
if needs_password and secret_override is not None:
|
|
||||||
# Inject sudo -S for explicit sudo commands, or prepend sudo -S for implicit ones
|
|
||||||
if command.startswith("sudo "):
|
|
||||||
tool_args["command"] = f"sudo -S -p '' {command[len('sudo '):]}"
|
|
||||||
else:
|
|
||||||
tool_args["command"] = f"sudo -S -p '' {command}"
|
|
||||||
tool_args["stdin_secret"] = f"{secret_override}\n"
|
|
||||||
|
|
||||||
tool_call = ToolCall(
|
|
||||||
tool=tool_name,
|
|
||||||
args=tool_args,
|
|
||||||
task_id=task.task_id,
|
|
||||||
step_id="step-1",
|
|
||||||
)
|
|
||||||
self._publish(task, TOOL_CALLED, tool_call.model_dump(mode="json"))
|
|
||||||
if tool_name == "shell_exec":
|
|
||||||
tool_args["__output_callback"] = lambda stream, chunk: self._publish(
|
|
||||||
task,
|
|
||||||
TOOL_OUTPUT_CHUNK,
|
|
||||||
{
|
|
||||||
"tool": tool_name,
|
|
||||||
"step_id": "step-1",
|
|
||||||
"stream": stream,
|
|
||||||
"chunk": chunk,
|
|
||||||
},
|
|
||||||
)
|
|
||||||
tool_result = self._tool_registry.get(tool_name).execute(task=task, args=tool_args)
|
|
||||||
self._publish(task, TOOL_COMPLETED, tool_result.model_dump(mode="json"))
|
|
||||||
|
|
||||||
metadata = tool_result.metadata or {}
|
|
||||||
needs_sudo = metadata.get("needs_sudo", False)
|
|
||||||
sudo_auth_failed = metadata.get("sudo_auth_failed", False) or self._looks_like_sudo_auth_failure(tool_result)
|
|
||||||
|
|
||||||
if tool_name == "shell_exec" and not tool_result.ok and sudo_auth_failed:
|
|
||||||
original_command = str(directive.payload.get("args", {}).get("command", tool_args.get("command", "")))
|
|
||||||
secret_request = SecretRequest(
|
|
||||||
task_id=task.task_id,
|
|
||||||
session_id=task.session_id,
|
|
||||||
kind="sudo_password",
|
|
||||||
prompt="Sudo password incorrect. Try again",
|
|
||||||
command=original_command,
|
|
||||||
)
|
|
||||||
self._publish(task, SECRET_REQUESTED, secret_request.model_dump(mode="json"))
|
|
||||||
return {
|
|
||||||
"status": "awaiting_input",
|
|
||||||
"result": {
|
|
||||||
"error": "Sudo password failed",
|
|
||||||
"secret_request": secret_request.model_dump(mode="json"),
|
|
||||||
"attempt_failed": True,
|
|
||||||
"tool_result": tool_result.model_dump(mode="json"),
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
if not tool_result.ok and needs_sudo:
|
|
||||||
return {
|
|
||||||
"status": "awaiting_password",
|
|
||||||
"result": {
|
|
||||||
"task_id": task.task_id,
|
|
||||||
"needs_sudo": True,
|
|
||||||
"command": tool_args.get("command", ""),
|
|
||||||
"error": tool_result.error or "Permission denied",
|
|
||||||
"tool_result": tool_result.model_dump(mode="json"),
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
if tool_name == "shell_exec" and not tool_result.ok and self._command_analyzer:
|
|
||||||
original_command = str(directive.payload.get("args", {}).get("command", tool_args.get("command", "")))
|
|
||||||
diagnosis = self._command_analyzer.analyze(
|
|
||||||
command=original_command,
|
|
||||||
task_id=task.task_id,
|
|
||||||
session_id=task.session_id,
|
|
||||||
)
|
|
||||||
if diagnosis.get("type") != "ok":
|
|
||||||
return {
|
|
||||||
"status": "awaiting_review",
|
|
||||||
"result": {
|
|
||||||
"error": "Tool action requires review before replanning",
|
|
||||||
"review": {
|
|
||||||
"step_id": "step-1",
|
|
||||||
"tool": tool_name,
|
|
||||||
"command": original_command,
|
|
||||||
"diagnosis": diagnosis,
|
|
||||||
"critic_assessment": {
|
|
||||||
"classification": "model_planning_error",
|
|
||||||
"needs_replan": True,
|
|
||||||
"explanation": "Structured command analysis found a model action error before recovery.",
|
|
||||||
},
|
|
||||||
},
|
|
||||||
"tool_result": tool_result.model_dump(mode="json"),
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
return {
|
|
||||||
"status": "completed" if tool_result.ok else "failed",
|
|
||||||
"result": tool_result.model_dump(mode="json"),
|
|
||||||
}
|
|
||||||
|
|
||||||
def _looks_like_sudo_auth_failure(self, tool_result: ToolResult) -> bool:
|
|
||||||
output = f"{tool_result.output or ''}\n{tool_result.error or ''}".lower()
|
|
||||||
return any(
|
|
||||||
marker in output
|
|
||||||
for marker in (
|
|
||||||
"incorrect password",
|
|
||||||
"incorrect password attempt",
|
|
||||||
"sudo: no password was provided",
|
|
||||||
"sorry, try again",
|
|
||||||
"authentication failure",
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
def _publish(self, task: UserTask, event_type: str, payload: dict[str, Any]) -> None:
|
|
||||||
if not self._event_bus:
|
|
||||||
return
|
|
||||||
event = RuntimeEvent(
|
|
||||||
task_id=task.task_id,
|
|
||||||
session_id=task.session_id,
|
|
||||||
sequence=self._event_bus.next_sequence(task.task_id),
|
|
||||||
type=event_type,
|
|
||||||
payload=payload,
|
|
||||||
)
|
|
||||||
self._event_bus.publish(event)
|
|
||||||
|
|
@ -1,212 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import json
|
|
||||||
import logging
|
|
||||||
from collections import deque
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
from app.core.contracts import ExecutionDirective, PlanStep
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
|
|
||||||
class ExecutionScheduler:
|
|
||||||
def __init__(self, retry_limit: int = 2) -> None:
|
|
||||||
self._retry_limit = retry_limit
|
|
||||||
|
|
||||||
def parse_plan_steps(
|
|
||||||
self,
|
|
||||||
json_str: str,
|
|
||||||
task_id: str | None = None,
|
|
||||||
) -> list[PlanStep]:
|
|
||||||
try:
|
|
||||||
json_start = json_str.find("{")
|
|
||||||
json_end = json_str.rfind("}") + 1
|
|
||||||
if json_start < 0:
|
|
||||||
return []
|
|
||||||
|
|
||||||
json_str = json_str[json_start:json_end]
|
|
||||||
data = json.loads(json_str)
|
|
||||||
|
|
||||||
# Unified format: {"type": "plan", "payload": {"steps": [...]}}
|
|
||||||
# or direct: {"type": "step", "payload": {"tool": "...", "args": {...}}}
|
|
||||||
if isinstance(data, dict):
|
|
||||||
msg_type = data.get("type", "")
|
|
||||||
|
|
||||||
# Single step format: {"type": "step", "payload": {"tool": ..., "args": ...}}
|
|
||||||
if msg_type == "step":
|
|
||||||
payload = data.get("payload", {})
|
|
||||||
step = {
|
|
||||||
"id": "step-0",
|
|
||||||
"kind": "tool",
|
|
||||||
"tool": payload.get("tool"),
|
|
||||||
"args": payload.get("args", {}),
|
|
||||||
"description": payload.get("description", ""),
|
|
||||||
"depends_on": payload.get("depends_on", []),
|
|
||||||
}
|
|
||||||
data = [step]
|
|
||||||
|
|
||||||
# Plan format: {"type": "plan", "payload": {"steps": [...]}}
|
|
||||||
elif msg_type == "plan":
|
|
||||||
payload = data.get("payload", {})
|
|
||||||
steps_data = payload.get("steps", [])
|
|
||||||
|
|
||||||
# Normalize steps: handle {"type": "step", "payload": {"tool": ...}}
|
|
||||||
normalized = []
|
|
||||||
for step in steps_data:
|
|
||||||
if isinstance(step, dict) and step.get("type") == "step":
|
|
||||||
inner = step.get("payload", {})
|
|
||||||
normalized.append({
|
|
||||||
"tool": inner.get("tool"),
|
|
||||||
"args": inner.get("args", {}),
|
|
||||||
"description": inner.get("description", ""),
|
|
||||||
"depends_on": inner.get("depends_on", []),
|
|
||||||
})
|
|
||||||
else:
|
|
||||||
normalized.append(step)
|
|
||||||
steps_data = normalized
|
|
||||||
|
|
||||||
data = steps_data if steps_data else []
|
|
||||||
|
|
||||||
# Old format compatibility
|
|
||||||
elif "steps" in data:
|
|
||||||
data = data["steps"]
|
|
||||||
elif "plan" in data:
|
|
||||||
data = data["plan"]
|
|
||||||
else:
|
|
||||||
data = [data]
|
|
||||||
elif isinstance(data, str):
|
|
||||||
data = json.loads(data)
|
|
||||||
if isinstance(data, dict):
|
|
||||||
data = [data]
|
|
||||||
|
|
||||||
steps = []
|
|
||||||
for i, step_data in enumerate(data):
|
|
||||||
if isinstance(step_data, str):
|
|
||||||
step_data = {"id": f"step-{i}", "kind": "respond", "text": step_data}
|
|
||||||
|
|
||||||
if not isinstance(step_data, dict):
|
|
||||||
continue
|
|
||||||
|
|
||||||
step_data.setdefault("id", f"step-{i}")
|
|
||||||
|
|
||||||
# Tool-first: scheduler получает tool напрямую, без трансформаций
|
|
||||||
# kind определяется по наличию tool name
|
|
||||||
# args передаются напрямую
|
|
||||||
if step_data.get("tool"):
|
|
||||||
step_data["kind"] = "tool"
|
|
||||||
|
|
||||||
step_data.setdefault("kind", step_data.get("kind", "respond"))
|
|
||||||
step_data.setdefault("tool", step_data.get("tool"))
|
|
||||||
step_data.setdefault("args", step_data.get("args", {}))
|
|
||||||
step_data.setdefault("description", step_data.get("description", ""))
|
|
||||||
step_data.setdefault("requires_confirmation", False)
|
|
||||||
step_data.setdefault("depends_on", [])
|
|
||||||
|
|
||||||
if "description" not in step_data:
|
|
||||||
step_data["description"] = f"Step {i}"
|
|
||||||
|
|
||||||
steps.append(PlanStep(**step_data))
|
|
||||||
|
|
||||||
return steps
|
|
||||||
|
|
||||||
except (json.JSONDecodeError, ValueError, TypeError) as e:
|
|
||||||
logger.warning(f"Plan parsing failed: {e}")
|
|
||||||
return []
|
|
||||||
|
|
||||||
def validate_no_cycles(self, steps: list[PlanStep]) -> bool:
|
|
||||||
if not steps:
|
|
||||||
return True
|
|
||||||
|
|
||||||
graph: dict[str, set[str]] = {}
|
|
||||||
for step in steps:
|
|
||||||
graph[step.id] = set(step.depends_on)
|
|
||||||
|
|
||||||
visited: set[str] = set()
|
|
||||||
rec_stack: set[str] = set()
|
|
||||||
|
|
||||||
def has_cycle(node: str) -> bool:
|
|
||||||
if node in rec_stack:
|
|
||||||
return True
|
|
||||||
if node in visited:
|
|
||||||
return False
|
|
||||||
|
|
||||||
visited.add(node)
|
|
||||||
rec_stack.add(node)
|
|
||||||
|
|
||||||
for dep in graph.get(node, []):
|
|
||||||
if has_cycle(dep):
|
|
||||||
return True
|
|
||||||
|
|
||||||
rec_stack.remove(node)
|
|
||||||
return False
|
|
||||||
|
|
||||||
for step in steps:
|
|
||||||
if step.id not in visited:
|
|
||||||
if has_cycle(step.id):
|
|
||||||
logger.warning(f"Cycle detected in plan: {step.id}")
|
|
||||||
return False
|
|
||||||
|
|
||||||
return True
|
|
||||||
|
|
||||||
def build_task_graph(
|
|
||||||
self,
|
|
||||||
steps: list[PlanStep],
|
|
||||||
) -> dict[str, Any]:
|
|
||||||
if not steps:
|
|
||||||
return {"nodes": [], "edges": []}
|
|
||||||
|
|
||||||
if not self.validate_no_cycles(steps):
|
|
||||||
return {"nodes": [], "edges": [], "error": "Cycle detected in plan"}
|
|
||||||
|
|
||||||
nodes = []
|
|
||||||
edges = []
|
|
||||||
|
|
||||||
step_map = {s.id: s for s in steps}
|
|
||||||
|
|
||||||
for step in steps:
|
|
||||||
nodes.append({
|
|
||||||
"id": step.id,
|
|
||||||
"kind": step.kind,
|
|
||||||
"tool": step.tool,
|
|
||||||
"args": step.args,
|
|
||||||
"ready": len(step.depends_on) == 0,
|
|
||||||
})
|
|
||||||
|
|
||||||
for dep_id in step.depends_on:
|
|
||||||
edges.append({
|
|
||||||
"from": dep_id,
|
|
||||||
"to": step.id,
|
|
||||||
})
|
|
||||||
|
|
||||||
return {"nodes": nodes, "edges": edges, "step_map": step_map}
|
|
||||||
|
|
||||||
def get_ready_steps(
|
|
||||||
self,
|
|
||||||
graph: dict[str, Any],
|
|
||||||
completed: set[str],
|
|
||||||
) -> list[PlanStep]:
|
|
||||||
if not graph or not graph.get("nodes"):
|
|
||||||
return []
|
|
||||||
|
|
||||||
step_map: dict[str, PlanStep] = graph.get("step_map", {})
|
|
||||||
ready = []
|
|
||||||
|
|
||||||
for node in graph["nodes"]:
|
|
||||||
node_id = node["id"]
|
|
||||||
if node_id in completed:
|
|
||||||
continue
|
|
||||||
|
|
||||||
deps = node.get("depends_on", [])
|
|
||||||
if all(dep in completed for dep in deps):
|
|
||||||
step = step_map.get(node_id)
|
|
||||||
if step:
|
|
||||||
ready.append(step)
|
|
||||||
|
|
||||||
return ready
|
|
||||||
|
|
||||||
def next_directive(
|
|
||||||
self,
|
|
||||||
directive: ExecutionDirective,
|
|
||||||
) -> ExecutionDirective:
|
|
||||||
return directive
|
|
||||||
|
|
@ -1,104 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import re
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
from app.core.contracts import ExecutionDirective
|
|
||||||
|
|
||||||
SHELL_PREFIXES = (
|
|
||||||
"run ",
|
|
||||||
"execute ",
|
|
||||||
"launch ",
|
|
||||||
"запусти ",
|
|
||||||
"выполни ",
|
|
||||||
"выполнить ",
|
|
||||||
)
|
|
||||||
|
|
||||||
MEMORY_STORE_PATTERNS = (
|
|
||||||
r"запомни\s+(.+)",
|
|
||||||
r"сохрани\s+(.+)",
|
|
||||||
r"запиши\s+(.+)",
|
|
||||||
r"remember\s+(.+)",
|
|
||||||
r"save\s+(.+)",
|
|
||||||
)
|
|
||||||
|
|
||||||
MEMORY_SEARCH_PATTERNS = (
|
|
||||||
r"вспомни\s+(.+)",
|
|
||||||
r"search memory\s+(.+)",
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
class IntentParser:
|
|
||||||
"""Extracts explicit tool intents from natural-language task text."""
|
|
||||||
|
|
||||||
def __init__(self) -> None:
|
|
||||||
self._store_patterns = [re.compile(p, re.IGNORECASE) for p in MEMORY_STORE_PATTERNS]
|
|
||||||
self._search_patterns = [re.compile(p, re.IGNORECASE) for p in MEMORY_SEARCH_PATTERNS]
|
|
||||||
|
|
||||||
def parse(self, task_input: str) -> ExecutionDirective | None:
|
|
||||||
normalized = task_input.strip()
|
|
||||||
lowered = normalized.lower()
|
|
||||||
|
|
||||||
if matched := self._match_patterns(self._store_patterns, normalized):
|
|
||||||
return ExecutionDirective(
|
|
||||||
type="tool",
|
|
||||||
payload={
|
|
||||||
"tool": "memory_insert",
|
|
||||||
"args": {
|
|
||||||
"text": matched.group(1).strip(),
|
|
||||||
"kind": "fact",
|
|
||||||
"source": "user",
|
|
||||||
},
|
|
||||||
},
|
|
||||||
requires_permission=False,
|
|
||||||
confidence=0.85,
|
|
||||||
reason="User explicitly requested to store in memory.",
|
|
||||||
)
|
|
||||||
|
|
||||||
if matched := self._match_patterns(self._search_patterns, normalized):
|
|
||||||
return ExecutionDirective(
|
|
||||||
type="tool",
|
|
||||||
payload={
|
|
||||||
"tool": "memory_search",
|
|
||||||
"args": {"query": matched.group(1).strip()},
|
|
||||||
},
|
|
||||||
requires_permission=False,
|
|
||||||
confidence=0.85,
|
|
||||||
reason="User explicitly requested to search memory.",
|
|
||||||
)
|
|
||||||
|
|
||||||
for prefix in SHELL_PREFIXES:
|
|
||||||
if lowered.startswith(prefix):
|
|
||||||
command = normalized[len(prefix) :].strip()
|
|
||||||
if command:
|
|
||||||
return ExecutionDirective(
|
|
||||||
type="tool",
|
|
||||||
payload={
|
|
||||||
"tool": "shell_exec",
|
|
||||||
"args": {"command": command},
|
|
||||||
},
|
|
||||||
requires_permission=True,
|
|
||||||
confidence=0.92,
|
|
||||||
reason="Natural-language task explicitly requested shell execution.",
|
|
||||||
)
|
|
||||||
|
|
||||||
quoted = re.match(r"^`(.+)`$", normalized)
|
|
||||||
if quoted:
|
|
||||||
return ExecutionDirective(
|
|
||||||
type="tool",
|
|
||||||
payload={
|
|
||||||
"tool": "shell_exec",
|
|
||||||
"args": {"command": quoted.group(1)},
|
|
||||||
},
|
|
||||||
requires_permission=True,
|
|
||||||
confidence=0.75,
|
|
||||||
reason="Backticked input treated as direct shell command.",
|
|
||||||
)
|
|
||||||
|
|
||||||
return None
|
|
||||||
|
|
||||||
def _match_patterns(self, patterns: list[re.Pattern], text: str):
|
|
||||||
for pattern in patterns:
|
|
||||||
if match := pattern.match(text):
|
|
||||||
return match
|
|
||||||
return None
|
|
||||||
|
|
@ -1,24 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from pydantic import BaseModel
|
|
||||||
|
|
||||||
|
|
||||||
class PermissionResolutionRequest(BaseModel):
|
|
||||||
task_id: str
|
|
||||||
decision: str
|
|
||||||
|
|
||||||
|
|
||||||
class SecretResolutionRequest(BaseModel):
|
|
||||||
task_id: str
|
|
||||||
secret: str
|
|
||||||
|
|
||||||
|
|
||||||
class PasswordResolutionRequest(BaseModel):
|
|
||||||
task_id: str
|
|
||||||
password: str
|
|
||||||
|
|
||||||
|
|
||||||
class ReviewResolutionRequest(BaseModel):
|
|
||||||
task_id: str
|
|
||||||
decision: str
|
|
||||||
correction: str | None = None
|
|
||||||
|
|
@ -1,370 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import hashlib
|
|
||||||
import json
|
|
||||||
import logging
|
|
||||||
import os
|
|
||||||
import re
|
|
||||||
import shlex
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
|
|
||||||
class PermissionService:
|
|
||||||
"""Permission-first model - user is the authority."""
|
|
||||||
|
|
||||||
def __init__(self, config: dict[str, Any] | None = None, cache_file: Path | None = None):
|
|
||||||
self._config = config or self._load_config()
|
|
||||||
self._settings = self._config.get("settings", {})
|
|
||||||
self._cache_file = cache_file
|
|
||||||
self._categories = self._config.get("command_categories", {})
|
|
||||||
self._path_settings = self._config.get("path_settings", {})
|
|
||||||
self._legacy_dangerous_commands = self._config.get("dangerous_commands", {})
|
|
||||||
self._legacy_sensitive_paths = self._config.get("sensitive_paths", [])
|
|
||||||
|
|
||||||
def _load_config(self) -> dict[str, Any]:
|
|
||||||
try:
|
|
||||||
config_path = Path(__file__).parents[2] / "config" / "permissions.json"
|
|
||||||
with open(config_path) as f:
|
|
||||||
return json.load(f)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Failed to load permissions config: {e}")
|
|
||||||
return {"settings": {}, "command_categories": {}}
|
|
||||||
|
|
||||||
def _get_cache_file(self) -> Path:
|
|
||||||
if self._cache_file:
|
|
||||||
return self._cache_file
|
|
||||||
|
|
||||||
base_dir = Path(__file__).parents[2]
|
|
||||||
cache_relative = self._settings.get("cache_file", "data/runtime/allowed_commands.json")
|
|
||||||
return base_dir / cache_relative
|
|
||||||
|
|
||||||
def _load_cache(self) -> dict[str, Any]:
|
|
||||||
cache_file = self._get_cache_file()
|
|
||||||
try:
|
|
||||||
if cache_file.exists():
|
|
||||||
with open(cache_file) as f:
|
|
||||||
return json.load(f)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Failed to load cache: {e}")
|
|
||||||
|
|
||||||
return {"allowed_once": {}, "allowed_always": {}}
|
|
||||||
|
|
||||||
def _save_cache(self, cache: dict[str, Any]) -> None:
|
|
||||||
cache_file = self._get_cache_file()
|
|
||||||
cache_file.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
with open(cache_file, "w") as f:
|
|
||||||
json.dump(cache, f, indent=2)
|
|
||||||
|
|
||||||
def check_shell_command(
|
|
||||||
self,
|
|
||||||
task_id: str,
|
|
||||||
session_id: str,
|
|
||||||
command: str,
|
|
||||||
) -> dict[str, Any]:
|
|
||||||
"""Check if shell command requires permission."""
|
|
||||||
normalized = self._normalize_command(command)
|
|
||||||
command_hash = self._hash_command(normalized)
|
|
||||||
|
|
||||||
cache = self._load_cache()
|
|
||||||
|
|
||||||
# Check cache first
|
|
||||||
if command_hash in cache.get("allowed_always", {}):
|
|
||||||
return {
|
|
||||||
"decision": "allowed_always",
|
|
||||||
"command": normalized,
|
|
||||||
"cached": True,
|
|
||||||
"requires_sudo": _requires_sudo(normalized),
|
|
||||||
}
|
|
||||||
|
|
||||||
if command_hash in cache.get("allowed_once", {}):
|
|
||||||
cached = cache["allowed_once"][command_hash]
|
|
||||||
if cached.get("task_id") == task_id:
|
|
||||||
return {
|
|
||||||
"decision": "allowed_once",
|
|
||||||
"command": normalized,
|
|
||||||
"cached": True,
|
|
||||||
"requires_sudo": _requires_sudo(normalized),
|
|
||||||
}
|
|
||||||
|
|
||||||
# Check hard stop
|
|
||||||
if self._is_hard_stop(normalized):
|
|
||||||
return {
|
|
||||||
"decision": "hard_stop",
|
|
||||||
"command": normalized,
|
|
||||||
"reason": "Hard stop command - execution denied",
|
|
||||||
}
|
|
||||||
|
|
||||||
if not self._categories and self._legacy_dangerous_commands:
|
|
||||||
if self._matches_legacy_dangerous(normalized):
|
|
||||||
return {
|
|
||||||
"decision": "prompt",
|
|
||||||
"command": normalized,
|
|
||||||
"category": "legacy_dangerous",
|
|
||||||
"allow_always": False,
|
|
||||||
"task_id": task_id,
|
|
||||||
"session_id": session_id,
|
|
||||||
}
|
|
||||||
return {
|
|
||||||
"decision": "allowed",
|
|
||||||
"command": normalized,
|
|
||||||
"category": "legacy_safe",
|
|
||||||
"task_id": task_id,
|
|
||||||
"session_id": session_id,
|
|
||||||
}
|
|
||||||
|
|
||||||
# Check no_always category
|
|
||||||
category = self._get_category(normalized)
|
|
||||||
can_always = self._categories.get(category, {}).get("allow_always", True)
|
|
||||||
|
|
||||||
# Check if command requires sudo (e.g. apt, systemctl without explicit sudo prefix)
|
|
||||||
requires_sudo = _requires_sudo(normalized)
|
|
||||||
|
|
||||||
# Need user confirmation
|
|
||||||
result = {
|
|
||||||
"decision": "prompt",
|
|
||||||
"command": normalized,
|
|
||||||
"category": category,
|
|
||||||
"allow_always": can_always,
|
|
||||||
"requires_sudo": requires_sudo,
|
|
||||||
"task_id": task_id,
|
|
||||||
"session_id": session_id,
|
|
||||||
}
|
|
||||||
return result
|
|
||||||
|
|
||||||
def check_write_path(
|
|
||||||
self,
|
|
||||||
task_id: str,
|
|
||||||
session_id: str,
|
|
||||||
path: str,
|
|
||||||
) -> dict[str, Any]:
|
|
||||||
"""Check if write path requires permission."""
|
|
||||||
if not self._path_settings and self._legacy_sensitive_paths:
|
|
||||||
if any(path.startswith(sensitive) for sensitive in self._legacy_sensitive_paths):
|
|
||||||
return {
|
|
||||||
"decision": "prompt",
|
|
||||||
"path": path,
|
|
||||||
"task_id": task_id,
|
|
||||||
"session_id": session_id,
|
|
||||||
}
|
|
||||||
return {"decision": "allowed", "path": path}
|
|
||||||
|
|
||||||
allow_write_paths = self._path_settings.get("allow_write_paths", [])
|
|
||||||
|
|
||||||
# Check if path is in allowed list
|
|
||||||
for allowed in allow_write_paths:
|
|
||||||
if path.startswith(allowed):
|
|
||||||
return {"decision": "allowed", "path": path}
|
|
||||||
|
|
||||||
# Otherwise require permission
|
|
||||||
return {
|
|
||||||
"decision": "prompt",
|
|
||||||
"path": path,
|
|
||||||
"task_id": task_id,
|
|
||||||
"session_id": session_id,
|
|
||||||
}
|
|
||||||
|
|
||||||
def resolve_permission(
|
|
||||||
self,
|
|
||||||
task_id: str,
|
|
||||||
session_id: str,
|
|
||||||
command: str,
|
|
||||||
decision: str,
|
|
||||||
) -> dict[str, Any]:
|
|
||||||
"""Resolve permission decision from user."""
|
|
||||||
normalized = self._normalize_command(command)
|
|
||||||
command_hash = self._hash_command(normalized)
|
|
||||||
|
|
||||||
cache = self._load_cache()
|
|
||||||
|
|
||||||
if decision == "allow_once":
|
|
||||||
cache.setdefault("allowed_once", {})[command_hash] = {
|
|
||||||
"command": normalized,
|
|
||||||
"task_id": task_id,
|
|
||||||
"session_id": session_id,
|
|
||||||
}
|
|
||||||
self._save_cache(cache)
|
|
||||||
return {"status": "allowed_once", "command": normalized}
|
|
||||||
|
|
||||||
elif decision == "allow_always":
|
|
||||||
cache.setdefault("allowed_always", {})[command_hash] = {
|
|
||||||
"command": normalized,
|
|
||||||
"task_id": task_id,
|
|
||||||
"session_id": session_id,
|
|
||||||
}
|
|
||||||
self._save_cache(cache)
|
|
||||||
return {"status": "allowed_always", "command": normalized}
|
|
||||||
|
|
||||||
elif decision == "deny":
|
|
||||||
return {"status": "denied", "command": normalized}
|
|
||||||
|
|
||||||
return {"status": "unknown", "decision": decision}
|
|
||||||
|
|
||||||
def clear_cache(self) -> dict[str, Any]:
|
|
||||||
"""Clear permission cache."""
|
|
||||||
cache = {"allowed_once": {}, "allowed_always": {}}
|
|
||||||
self._save_cache(cache)
|
|
||||||
return {"status": "cache_cleared"}
|
|
||||||
|
|
||||||
def _normalize_command(self, command: str) -> str:
|
|
||||||
"""Normalize command for consistent hashing."""
|
|
||||||
if not self._settings.get("normalize_commands", True):
|
|
||||||
return command.strip()
|
|
||||||
|
|
||||||
normalized = command.strip()
|
|
||||||
|
|
||||||
# Split chained commands if enabled
|
|
||||||
if self._settings.get("split_chained", True):
|
|
||||||
# Replace ; and || with && for splitting
|
|
||||||
normalized = normalized.replace(";", " && ")
|
|
||||||
normalized = normalized.replace("||", " && ")
|
|
||||||
|
|
||||||
# Resolve environment variables
|
|
||||||
try:
|
|
||||||
normalized = os.path.expandvars(normalized)
|
|
||||||
except:
|
|
||||||
pass
|
|
||||||
|
|
||||||
# Resolve home directory
|
|
||||||
normalized = normalized.replace("~", os.path.expanduser("~"))
|
|
||||||
|
|
||||||
# Remove extra whitespace
|
|
||||||
normalized = " ".join(normalized.split())
|
|
||||||
|
|
||||||
return normalized
|
|
||||||
|
|
||||||
def _hash_command(self, command: str) -> str:
|
|
||||||
"""Generate hash for command."""
|
|
||||||
return hashlib.sha256(command.encode()).hexdigest()[:16]
|
|
||||||
|
|
||||||
def _matches_legacy_dangerous(self, command: str) -> bool:
|
|
||||||
cmd_lower = command.lower()
|
|
||||||
for pattern in self._legacy_dangerous_commands:
|
|
||||||
if pattern.lower() in cmd_lower:
|
|
||||||
return True
|
|
||||||
return False
|
|
||||||
|
|
||||||
def _is_hard_stop(self, command: str) -> bool:
|
|
||||||
"""Check if command is hard stop."""
|
|
||||||
hard_stop_commands = self._categories.get("hard_stop", {}).get("commands", [])
|
|
||||||
|
|
||||||
cmd_lower = command.lower().strip()
|
|
||||||
cmd_tokens = cmd_lower.split()
|
|
||||||
|
|
||||||
for hs in hard_stop_commands:
|
|
||||||
hs_lower = hs.lower().strip()
|
|
||||||
# For "rm -rf /" and "rm -rf /*", only match exact command
|
|
||||||
# Don't match "rm -rf /tmp/nonexistent" as hard stop
|
|
||||||
if hs_lower in ("rm -rf /", "rm -rf /*"):
|
|
||||||
if cmd_lower == hs_lower:
|
|
||||||
return True
|
|
||||||
continue
|
|
||||||
# For other patterns, use substring match
|
|
||||||
if hs_lower in cmd_lower:
|
|
||||||
return True
|
|
||||||
|
|
||||||
return False
|
|
||||||
|
|
||||||
def _get_category(self, command: str) -> str:
|
|
||||||
"""Get command category."""
|
|
||||||
cmd_lower = command.lower().strip()
|
|
||||||
cmd_first_word = cmd_lower.split()[0] if cmd_lower.split() else ""
|
|
||||||
|
|
||||||
# Check no_always category — match by first word or known multi-word prefixes
|
|
||||||
no_always = self._categories.get("no_always", {}).get("commands", [])
|
|
||||||
for pattern in no_always:
|
|
||||||
pat_lower = pattern.lower().strip()
|
|
||||||
# Match if first word matches (e.g. "apt" matches "apt list --upgradable")
|
|
||||||
# or if command starts with the pattern (e.g. "systemctl stop" matches "systemctl stop nginx")
|
|
||||||
if cmd_first_word == pat_lower or cmd_lower.startswith(pat_lower + " "):
|
|
||||||
return "no_always"
|
|
||||||
|
|
||||||
# Check hard_stop by first word
|
|
||||||
hard_stop = self._categories.get("hard_stop", {}).get("commands", [])
|
|
||||||
for pattern in hard_stop:
|
|
||||||
pat_lower = pattern.lower().strip()
|
|
||||||
if cmd_first_word == pat_lower or cmd_lower.startswith(pat_lower + " "):
|
|
||||||
return "hard_stop"
|
|
||||||
|
|
||||||
# Default to normal
|
|
||||||
return "normal"
|
|
||||||
|
|
||||||
|
|
||||||
SUDO_COMMANDS = {
|
|
||||||
"sudo",
|
|
||||||
"apt", "apt-get", "dpkg", "yum", "dnf", "pacman", "zypper",
|
|
||||||
"systemctl", "service", "mount", "umount",
|
|
||||||
"shutdown", "reboot", "halt", "poweroff",
|
|
||||||
"useradd", "usermod", "userdel", "groupadd", "groupmod",
|
|
||||||
"chmod", "chown", "chgrp",
|
|
||||||
"iptables", "ufw",
|
|
||||||
"kill", "killall", "pkill",
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def _requires_sudo(command: str) -> bool:
|
|
||||||
"""Check if command requires sudo."""
|
|
||||||
if not command:
|
|
||||||
return False
|
|
||||||
cmd_lower = command.lower().strip()
|
|
||||||
first_word = cmd_lower.split()[0] if cmd_lower.split() else ""
|
|
||||||
return first_word in SUDO_COMMANDS
|
|
||||||
|
|
||||||
|
|
||||||
class PermissionRequest:
|
|
||||||
"""Permission request to user."""
|
|
||||||
|
|
||||||
def __init__(
|
|
||||||
self,
|
|
||||||
task_id: str,
|
|
||||||
session_id: str,
|
|
||||||
command: str,
|
|
||||||
category: str = "normal",
|
|
||||||
allow_always: bool = True,
|
|
||||||
) -> None:
|
|
||||||
self.task_id = task_id
|
|
||||||
self.session_id = session_id
|
|
||||||
self.command = command
|
|
||||||
self.category = category
|
|
||||||
self.allow_always = allow_always
|
|
||||||
self.requires_password = _requires_sudo(command)
|
|
||||||
|
|
||||||
def to_dict(self) -> dict[str, Any]:
|
|
||||||
return {
|
|
||||||
"task_id": self.task_id,
|
|
||||||
"session_id": self.session_id,
|
|
||||||
"command": self.command,
|
|
||||||
"category": self.category,
|
|
||||||
"allow_always": self.allow_always,
|
|
||||||
"requires_password": self.requires_password,
|
|
||||||
"buttons": self._get_buttons(),
|
|
||||||
}
|
|
||||||
|
|
||||||
def _get_buttons(self) -> list[dict[str, str]]:
|
|
||||||
buttons = [{"action": "deny", "label": "Запретить"}]
|
|
||||||
|
|
||||||
if self.allow_always:
|
|
||||||
buttons.insert(0, {"action": "allow_always", "label": "Разрешить навсегда"})
|
|
||||||
|
|
||||||
if self.requires_password:
|
|
||||||
buttons.insert(0, {"action": "allow_with_password", "label": "Разрешить с паролем"})
|
|
||||||
else:
|
|
||||||
buttons.insert(0, {"action": "allow_once", "label": "Разрешить"})
|
|
||||||
|
|
||||||
return buttons
|
|
||||||
|
|
||||||
|
|
||||||
class PermissionDecision:
|
|
||||||
"""Permission decision."""
|
|
||||||
|
|
||||||
def __init__(
|
|
||||||
self,
|
|
||||||
decision: str,
|
|
||||||
command: str | None = None,
|
|
||||||
cached: bool = False,
|
|
||||||
) -> None:
|
|
||||||
self.decision = decision
|
|
||||||
self.command = command
|
|
||||||
self.cached = cached
|
|
||||||
|
|
@ -1,2 +0,0 @@
|
||||||
"""Event bus and event store."""
|
|
||||||
|
|
||||||
|
|
@ -1,35 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from typing import Callable
|
|
||||||
|
|
||||||
from app.core.contracts import RuntimeEvent
|
|
||||||
from app.events.event_store import SQLiteEventStore
|
|
||||||
|
|
||||||
|
|
||||||
Subscriber = Callable[[RuntimeEvent], None]
|
|
||||||
|
|
||||||
|
|
||||||
class EventBus:
|
|
||||||
"""Per-task ordered event publishing with durable storage."""
|
|
||||||
|
|
||||||
def __init__(self, event_store: SQLiteEventStore) -> None:
|
|
||||||
self._store = event_store
|
|
||||||
self._subscribers: list[Subscriber] = []
|
|
||||||
|
|
||||||
def next_sequence(self, task_id: str) -> int:
|
|
||||||
return self._store.get_latest_sequence(task_id) + 1
|
|
||||||
|
|
||||||
def publish(self, event: RuntimeEvent) -> RuntimeEvent:
|
|
||||||
self._store.append(event)
|
|
||||||
for subscriber in self._subscribers:
|
|
||||||
subscriber(event)
|
|
||||||
return event
|
|
||||||
|
|
||||||
def subscribe(self, subscriber: Subscriber) -> None:
|
|
||||||
self._subscribers.append(subscriber)
|
|
||||||
|
|
||||||
def list_for_task(self, task_id: str) -> list[RuntimeEvent]:
|
|
||||||
return self._store.list_for_task(task_id)
|
|
||||||
|
|
||||||
def list_recent(self, limit: int = 500) -> list[RuntimeEvent]:
|
|
||||||
return self._store.list_recent(limit=limit)
|
|
||||||
|
|
@ -1,122 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import json
|
|
||||||
import sqlite3
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
from app.core.contracts import RuntimeEvent
|
|
||||||
|
|
||||||
|
|
||||||
class SQLiteEventStore:
|
|
||||||
"""Append-only event store with per-task ordered history."""
|
|
||||||
|
|
||||||
def __init__(self, db_path: str | Path) -> None:
|
|
||||||
self._db_path = Path(db_path)
|
|
||||||
self._db_path.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
self._initialize()
|
|
||||||
|
|
||||||
def append(self, event: RuntimeEvent) -> None:
|
|
||||||
with sqlite3.connect(self._db_path) as conn:
|
|
||||||
conn.execute(
|
|
||||||
"""
|
|
||||||
INSERT INTO events (
|
|
||||||
event_id, task_id, session_id, sequence, type, timestamp,
|
|
||||||
payload_json, causation_id, correlation_id
|
|
||||||
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
|
|
||||||
""",
|
|
||||||
(
|
|
||||||
event.event_id,
|
|
||||||
event.task_id,
|
|
||||||
event.session_id,
|
|
||||||
event.sequence,
|
|
||||||
event.type,
|
|
||||||
event.timestamp.isoformat(),
|
|
||||||
json.dumps(event.payload),
|
|
||||||
event.causation_id,
|
|
||||||
event.correlation_id,
|
|
||||||
),
|
|
||||||
)
|
|
||||||
conn.commit()
|
|
||||||
|
|
||||||
def list_for_task(self, task_id: str) -> list[RuntimeEvent]:
|
|
||||||
with sqlite3.connect(self._db_path) as conn:
|
|
||||||
rows = conn.execute(
|
|
||||||
"""
|
|
||||||
SELECT event_id, task_id, session_id, sequence, type, timestamp,
|
|
||||||
payload_json, causation_id, correlation_id
|
|
||||||
FROM events
|
|
||||||
WHERE task_id = ?
|
|
||||||
ORDER BY sequence ASC
|
|
||||||
""",
|
|
||||||
(task_id,),
|
|
||||||
).fetchall()
|
|
||||||
return [
|
|
||||||
RuntimeEvent(
|
|
||||||
event_id=row[0],
|
|
||||||
task_id=row[1],
|
|
||||||
session_id=row[2],
|
|
||||||
sequence=row[3],
|
|
||||||
type=row[4],
|
|
||||||
timestamp=row[5],
|
|
||||||
payload=json.loads(row[6]),
|
|
||||||
causation_id=row[7],
|
|
||||||
correlation_id=row[8],
|
|
||||||
)
|
|
||||||
for row in rows
|
|
||||||
]
|
|
||||||
|
|
||||||
def list_recent(self, limit: int = 500) -> list[RuntimeEvent]:
|
|
||||||
with sqlite3.connect(self._db_path) as conn:
|
|
||||||
rows = conn.execute(
|
|
||||||
"""
|
|
||||||
SELECT event_id, task_id, session_id, sequence, type, timestamp,
|
|
||||||
payload_json, causation_id, correlation_id
|
|
||||||
FROM events
|
|
||||||
ORDER BY timestamp DESC, task_id DESC, sequence DESC
|
|
||||||
LIMIT ?
|
|
||||||
""",
|
|
||||||
(limit,),
|
|
||||||
).fetchall()
|
|
||||||
events = [
|
|
||||||
RuntimeEvent(
|
|
||||||
event_id=row[0],
|
|
||||||
task_id=row[1],
|
|
||||||
session_id=row[2],
|
|
||||||
sequence=row[3],
|
|
||||||
type=row[4],
|
|
||||||
timestamp=row[5],
|
|
||||||
payload=json.loads(row[6]),
|
|
||||||
causation_id=row[7],
|
|
||||||
correlation_id=row[8],
|
|
||||||
)
|
|
||||||
for row in rows
|
|
||||||
]
|
|
||||||
return list(reversed(events))
|
|
||||||
|
|
||||||
def get_latest_sequence(self, task_id: str) -> int:
|
|
||||||
with sqlite3.connect(self._db_path) as conn:
|
|
||||||
row = conn.execute(
|
|
||||||
"SELECT COALESCE(MAX(sequence), 0) FROM events WHERE task_id = ?",
|
|
||||||
(task_id,),
|
|
||||||
).fetchone()
|
|
||||||
return int(row[0]) if row else 0
|
|
||||||
|
|
||||||
def _initialize(self) -> None:
|
|
||||||
with sqlite3.connect(self._db_path) as conn:
|
|
||||||
conn.execute(
|
|
||||||
"""
|
|
||||||
CREATE TABLE IF NOT EXISTS events (
|
|
||||||
event_id TEXT PRIMARY KEY,
|
|
||||||
task_id TEXT NOT NULL,
|
|
||||||
session_id TEXT NOT NULL,
|
|
||||||
sequence INTEGER NOT NULL,
|
|
||||||
type TEXT NOT NULL,
|
|
||||||
timestamp TEXT NOT NULL,
|
|
||||||
payload_json TEXT NOT NULL,
|
|
||||||
causation_id TEXT,
|
|
||||||
correlation_id TEXT NOT NULL,
|
|
||||||
UNIQUE(task_id, sequence)
|
|
||||||
)
|
|
||||||
"""
|
|
||||||
)
|
|
||||||
conn.commit()
|
|
||||||
|
|
@ -1,35 +0,0 @@
|
||||||
TASK_RECEIVED = "task_received"
|
|
||||||
CONTEXT_BUILT = "context_built"
|
|
||||||
STEP_STARTED = "step_started"
|
|
||||||
TOOL_CALLED = "tool_called"
|
|
||||||
TOOL_OUTPUT_CHUNK = "tool_output_chunk"
|
|
||||||
TOOL_COMPLETED = "tool_completed"
|
|
||||||
PERMISSION_REQUESTED = "permission_requested"
|
|
||||||
PERMISSION_RESOLVED = "permission_resolved"
|
|
||||||
TASK_AWAITING_PERMISSION = "task_awaiting_permission"
|
|
||||||
SECRET_REQUESTED = "secret_requested"
|
|
||||||
TASK_AWAITING_INPUT = "task_awaiting_input"
|
|
||||||
TASK_AWAITING_REVIEW = "task_awaiting_review"
|
|
||||||
REVIEW_RESOLVED = "review_resolved"
|
|
||||||
CHECKPOINT_SAVED = "checkpoint_saved"
|
|
||||||
TASK_COMPLETED = "task_completed"
|
|
||||||
TASK_FAILED = "task_failed"
|
|
||||||
ORCHESTRATOR_CALLED = "orchestrator_called"
|
|
||||||
ORCHESTRATOR_RESULT = "orchestrator_result"
|
|
||||||
ORCHESTRATOR_UNAVAILABLE = "orchestrator_unavailable"
|
|
||||||
ORCHESTRATOR_FALLBACK_USED = "orchestrator_fallback_used"
|
|
||||||
ORCHESTRATOR_RETRY = "orchestrator_retry"
|
|
||||||
PLANNER_CALLED = "planner_called"
|
|
||||||
PLANNER_RETRY = "planner_retry"
|
|
||||||
CRITIC_CALLED = "critic_called"
|
|
||||||
CRITIC_RESULT = "critic_result"
|
|
||||||
MEMORY_WRITE_DECIDED = "memory_write_decided"
|
|
||||||
PLAN_STARTED = "plan_started"
|
|
||||||
PLAN_FAILED = "plan_failed"
|
|
||||||
PLAN_COMPLETED = "plan_completed"
|
|
||||||
STEPPED_COMPLETED = "step_completed"
|
|
||||||
THINKER_CALLED = "thinker_called"
|
|
||||||
THINKER_RESULT = "thinker_result"
|
|
||||||
JSON_COMPILER_CALLED = "json_compiler_called"
|
|
||||||
JSON_COMPILER_RESULT = "json_compiler_result"
|
|
||||||
MEMORY_RECALL_USED = "memory_recall_used"
|
|
||||||
|
|
@ -1,24 +0,0 @@
|
||||||
MEMORY_AVAILABLE = False
|
|
||||||
VECTOR_AVAILABLE = False
|
|
||||||
|
|
||||||
try:
|
|
||||||
from app.memory.store import MemoryStore
|
|
||||||
from app.memory.vector_index import VectorIndex
|
|
||||||
from app.memory.interface import MemoryInterface
|
|
||||||
from app.memory.write_policy import MemoryWritePolicy
|
|
||||||
MEMORY_AVAILABLE = True
|
|
||||||
VECTOR_AVAILABLE = True
|
|
||||||
except ImportError:
|
|
||||||
MemoryStore = None
|
|
||||||
VectorIndex = None
|
|
||||||
MemoryInterface = None
|
|
||||||
MemoryWritePolicy = None
|
|
||||||
|
|
||||||
__all__ = [
|
|
||||||
"MemoryStore",
|
|
||||||
"VectorIndex",
|
|
||||||
"MemoryInterface",
|
|
||||||
"MemoryWritePolicy",
|
|
||||||
"MEMORY_AVAILABLE",
|
|
||||||
"VECTOR_AVAILABLE",
|
|
||||||
]
|
|
||||||
|
|
@ -1,155 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import json
|
|
||||||
from datetime import datetime, timezone
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Any, Literal
|
|
||||||
|
|
||||||
import numpy as np
|
|
||||||
|
|
||||||
from app.core.contracts import MemoryEntry
|
|
||||||
from app.memory.store import MemoryStore
|
|
||||||
from app.memory.vector_index import VectorIndex
|
|
||||||
from app.models.embeddings import EmbeddingsAdapter
|
|
||||||
|
|
||||||
|
|
||||||
class MemoryInterface:
|
|
||||||
def __init__(
|
|
||||||
self,
|
|
||||||
store: MemoryStore,
|
|
||||||
vector_index: VectorIndex,
|
|
||||||
embeddings: EmbeddingsAdapter,
|
|
||||||
) -> None:
|
|
||||||
self._store = store
|
|
||||||
self._vector_index = vector_index
|
|
||||||
self._embeddings = embeddings
|
|
||||||
|
|
||||||
def insert(
|
|
||||||
self,
|
|
||||||
text: str,
|
|
||||||
kind: Literal["tool_result", "plan", "critique", "fact", "summary", "user_preference"],
|
|
||||||
source: Literal["tool", "critic", "user", "system"],
|
|
||||||
task_id: str | None = None,
|
|
||||||
session_id: str | None = None,
|
|
||||||
weight: float = 0.5,
|
|
||||||
metadata: dict[str, Any] | None = None,
|
|
||||||
) -> MemoryEntry:
|
|
||||||
entry = MemoryEntry(
|
|
||||||
text=text,
|
|
||||||
kind=kind,
|
|
||||||
source=source,
|
|
||||||
weight=weight,
|
|
||||||
task_id=task_id,
|
|
||||||
session_id=session_id,
|
|
||||||
metadata=metadata or {},
|
|
||||||
embedding_model=self._embeddings.__class__.__name__,
|
|
||||||
embedding_dim=self._embeddings.embedding_dim,
|
|
||||||
)
|
|
||||||
|
|
||||||
embedding = self._embeddings.encode(text)
|
|
||||||
embedding_bytes = embedding.astype("float32").tobytes()
|
|
||||||
|
|
||||||
self._store.insert(entry, embedding_bytes)
|
|
||||||
self._vector_index.insert(entry.id, embedding)
|
|
||||||
self._vector_index.save()
|
|
||||||
|
|
||||||
self.cleanup()
|
|
||||||
|
|
||||||
return entry
|
|
||||||
|
|
||||||
def search(
|
|
||||||
self,
|
|
||||||
query: str,
|
|
||||||
top_k: int = 5,
|
|
||||||
kind: str | None = None,
|
|
||||||
session_id: str | None = None,
|
|
||||||
) -> list[tuple[MemoryEntry, float]]:
|
|
||||||
query_embedding = self._embeddings.encode(query)
|
|
||||||
memory_ids, scores = self._vector_index.search(query_embedding, k=top_k)
|
|
||||||
|
|
||||||
results: list[tuple[MemoryEntry, float]] = []
|
|
||||||
for memory_id, score in zip(memory_ids, scores):
|
|
||||||
entry = self._store.get(memory_id)
|
|
||||||
if entry:
|
|
||||||
if kind and entry.kind != kind:
|
|
||||||
continue
|
|
||||||
if session_id and entry.session_id != session_id:
|
|
||||||
continue
|
|
||||||
results.append((entry, score))
|
|
||||||
|
|
||||||
return results[:top_k]
|
|
||||||
|
|
||||||
def get(self, memory_id: str) -> MemoryEntry | None:
|
|
||||||
return self._store.get(memory_id)
|
|
||||||
|
|
||||||
def delete(self, memory_id: str) -> bool:
|
|
||||||
entry = self._store.get(memory_id)
|
|
||||||
if entry:
|
|
||||||
self._vector_index.delete(memory_id)
|
|
||||||
return self._store.delete(memory_id)
|
|
||||||
return False
|
|
||||||
|
|
||||||
def get_by_task(self, task_id: str) -> list[MemoryEntry]:
|
|
||||||
return self._store.get_by_task(task_id)
|
|
||||||
|
|
||||||
def get_by_session(self, session_id: str, limit: int = 100) -> list[MemoryEntry]:
|
|
||||||
return self._store.get_by_session(session_id, limit)
|
|
||||||
|
|
||||||
def get_recent(self, limit: int = 10) -> list[MemoryEntry]:
|
|
||||||
return self._store.get_all(limit)
|
|
||||||
|
|
||||||
def count(self) -> int:
|
|
||||||
return self._store.count()
|
|
||||||
|
|
||||||
def reindex(self) -> int:
|
|
||||||
"""Rebuild vector index from all entries in memory store.
|
|
||||||
Returns number of indexed entries."""
|
|
||||||
entries = self._store.get_all(limit=10000)
|
|
||||||
# Delete old index file and re-initialize from scratch
|
|
||||||
import os
|
|
||||||
if self._vector_index._index_path and self._vector_index._index_path.exists():
|
|
||||||
self._vector_index._index_path.unlink()
|
|
||||||
self._vector_index._index = None
|
|
||||||
self._vector_index._init_index()
|
|
||||||
count = 0
|
|
||||||
for entry in entries:
|
|
||||||
text = entry.text
|
|
||||||
embedding = self._embeddings.encode(text)
|
|
||||||
self._vector_index.insert(entry.id, embedding)
|
|
||||||
count += 1
|
|
||||||
self._vector_index.save()
|
|
||||||
return count
|
|
||||||
|
|
||||||
def close(self) -> None:
|
|
||||||
self._store.close()
|
|
||||||
|
|
||||||
def cleanup(self, max_items: int = 750, decay_factor: float = 0.95) -> int:
|
|
||||||
"""Remove low-weight entries when exceeding max_items limit.
|
|
||||||
|
|
||||||
Applies weight decay based on freshness before cleanup.
|
|
||||||
Returns number of removed entries.
|
|
||||||
"""
|
|
||||||
current_count = self._store.count()
|
|
||||||
if current_count <= max_items:
|
|
||||||
return 0
|
|
||||||
|
|
||||||
removed = 0
|
|
||||||
entries_to_remove = current_count - max_items
|
|
||||||
|
|
||||||
all_entries = self._store.get_all(limit=current_count)
|
|
||||||
|
|
||||||
def effective_weight(entry: MemoryEntry) -> float:
|
|
||||||
entry_weight = entry.weight
|
|
||||||
if entry.created_at:
|
|
||||||
age_days = (datetime.now(timezone.utc) - entry.created_at).total_seconds() / 86400
|
|
||||||
freshness_factor = max(0.1, decay_factor ** age_days)
|
|
||||||
return entry_weight * freshness_factor
|
|
||||||
return entry_weight
|
|
||||||
|
|
||||||
sorted_entries = sorted(all_entries, key=effective_weight)
|
|
||||||
|
|
||||||
for entry in sorted_entries[:entries_to_remove]:
|
|
||||||
self._store.delete(entry.id)
|
|
||||||
removed += 1
|
|
||||||
|
|
||||||
return removed
|
|
||||||
|
|
@ -1,205 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import json
|
|
||||||
import logging
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
from app.core.contracts import MemoryEntry
|
|
||||||
from app.memory.interface import MemoryInterface
|
|
||||||
from app.models.async_adapters import AsyncOrchestratorAdapter
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
RECALL_PROMPT_TEMPLATE = """Определи, нужно ли искать в долговременной памяти для ответа на этот запрос.
|
|
||||||
|
|
||||||
Запрос: "{task_input}"
|
|
||||||
|
|
||||||
ИСКАТЬ в памяти если запрос:
|
|
||||||
- Содержит вопрос о пользователе (имя, предпочтения, история)
|
|
||||||
- Содержит отсылки к прошлым разговорам или действиям
|
|
||||||
- Содержит местоимения без контекста ("он", "это", "тот файл")
|
|
||||||
- Просит вспомнить, повторить, рассказать о прошлом
|
|
||||||
- Спрашивает "что ты помнишь", "как меня зовут", "что я говорил"
|
|
||||||
|
|
||||||
НЕ ИСКАТЬ если:
|
|
||||||
- Приветствие или прощание
|
|
||||||
- Простая команда (ls, pwd, echo)
|
|
||||||
- Общий вопрос не связанный с прошлым
|
|
||||||
|
|
||||||
Ответь ТОЛЬКО JSON:
|
|
||||||
{{"should_recall": true, "search_query": "поисковый запрос"}}
|
|
||||||
или
|
|
||||||
{{"should_recall": false, "reason": "краткая причина"}}"""
|
|
||||||
|
|
||||||
|
|
||||||
class MemoryRecallService:
|
|
||||||
"""Активное воспоминание: система сама решает, что и когда искать в памяти."""
|
|
||||||
|
|
||||||
def __init__(
|
|
||||||
self,
|
|
||||||
memory_interface: MemoryInterface | None,
|
|
||||||
recall_model: AsyncOrchestratorAdapter | None,
|
|
||||||
) -> None:
|
|
||||||
self._memory = memory_interface
|
|
||||||
self._model = recall_model
|
|
||||||
|
|
||||||
async def recall(
|
|
||||||
self,
|
|
||||||
task_input: str,
|
|
||||||
top_k: int = 5,
|
|
||||||
) -> dict[str, Any]:
|
|
||||||
"""
|
|
||||||
Определяет необходимость воспоминания и выполняет поиск.
|
|
||||||
|
|
||||||
Возвращает:
|
|
||||||
{
|
|
||||||
"should_recall": bool,
|
|
||||||
"reason": str,
|
|
||||||
"query": str,
|
|
||||||
"results": list[MemoryEntry],
|
|
||||||
"summary": str, # краткая сводка для оркестратора
|
|
||||||
}
|
|
||||||
"""
|
|
||||||
if not self._memory or not self._model:
|
|
||||||
with open("/tmp/recall_debug.log", "a") as f:
|
|
||||||
f.write(f"SKIP: memory={self._memory is not None}, model={self._model is not None}\n")
|
|
||||||
return self._empty_result("memory_or_model_unavailable")
|
|
||||||
|
|
||||||
# 1. LLM решает, нужно ли искать
|
|
||||||
decision = await self._classify(task_input)
|
|
||||||
with open("/tmp/recall_debug.log", "a") as f:
|
|
||||||
f.write(f"DECISION type={type(decision)} value={decision}\n")
|
|
||||||
if not isinstance(decision, dict):
|
|
||||||
return self._empty_result("invalid_decision_type")
|
|
||||||
if not decision.get("should_recall"):
|
|
||||||
return self._empty_result(decision.get("reason", "not_needed"))
|
|
||||||
|
|
||||||
search_query = decision.get("search_query", task_input)
|
|
||||||
logger.info(f"Memory recall: query='{search_query}', reason='{decision.get('reason')}'")
|
|
||||||
|
|
||||||
# 2. Векторный поиск
|
|
||||||
try:
|
|
||||||
raw_results = self._memory.search(query=search_query, top_k=top_k)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Memory search failed: {e}")
|
|
||||||
return self._empty_result("search_failed")
|
|
||||||
|
|
||||||
# 3. Фильтрация: убираем пустые и слишком нерелевантные
|
|
||||||
filtered = self._filter(raw_results)
|
|
||||||
|
|
||||||
if not filtered:
|
|
||||||
return self._empty_result("no_relevant_results")
|
|
||||||
|
|
||||||
# 4. Сводка для оркестратора
|
|
||||||
summary = self._summarize(filtered, search_query)
|
|
||||||
|
|
||||||
return {
|
|
||||||
"should_recall": True,
|
|
||||||
"reason": decision.get("reason", ""),
|
|
||||||
"query": search_query,
|
|
||||||
"results": filtered,
|
|
||||||
"summary": summary,
|
|
||||||
}
|
|
||||||
|
|
||||||
async def _classify(self, task_input: str) -> dict[str, Any]:
|
|
||||||
"""LLM-классификация: нужно ли искать в памяти."""
|
|
||||||
prompt = RECALL_PROMPT_TEMPLATE.format(task_input=task_input)
|
|
||||||
|
|
||||||
try:
|
|
||||||
raw = await self._model.generate(prompt, max_tokens=512)
|
|
||||||
data = self._parse_json(raw)
|
|
||||||
if "should_recall" in data:
|
|
||||||
return data
|
|
||||||
logger.warning(f"Recall classification missing 'should_recall': {raw[:200]}")
|
|
||||||
return {"should_recall": False, "reason": "parse_error"}
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Recall classification failed: {e}")
|
|
||||||
return {"should_recall": False, "reason": "classification_error"}
|
|
||||||
|
|
||||||
def _filter(
|
|
||||||
self,
|
|
||||||
results: list[tuple[MemoryEntry, float]],
|
|
||||||
min_score: float = 0.3,
|
|
||||||
) -> list[MemoryEntry]:
|
|
||||||
"""Фильтрует результаты по score и убирает дубликаты."""
|
|
||||||
seen_texts: set[str] = set()
|
|
||||||
filtered: list[MemoryEntry] = []
|
|
||||||
|
|
||||||
for entry, score in results:
|
|
||||||
if score < min_score:
|
|
||||||
continue
|
|
||||||
# Нормализуем текст для дедупликации
|
|
||||||
normalized = entry.text.strip().lower()[:100]
|
|
||||||
if normalized in seen_texts:
|
|
||||||
continue
|
|
||||||
seen_texts.add(normalized)
|
|
||||||
filtered.append(entry)
|
|
||||||
|
|
||||||
return filtered
|
|
||||||
|
|
||||||
def _summarize(
|
|
||||||
self,
|
|
||||||
results: list[MemoryEntry],
|
|
||||||
query: str,
|
|
||||||
) -> str:
|
|
||||||
"""Краткая сводка найденного для оркестратора."""
|
|
||||||
parts = [f"По запросу '{query}' найдено {len(results)} записей:"]
|
|
||||||
for i, entry in enumerate(results[:5], 1):
|
|
||||||
text_preview = entry.text[:120].replace("\n", " ")
|
|
||||||
parts.append(f" {i}. [{entry.kind}] {text_preview}")
|
|
||||||
return "\n".join(parts)
|
|
||||||
|
|
||||||
def _parse_json(self, raw: str) -> dict[str, Any]:
|
|
||||||
"""Извлекает JSON из ответа модели, пропуская рассуждения перед ним."""
|
|
||||||
try:
|
|
||||||
json_start = raw.find("{")
|
|
||||||
json_end = raw.rfind("}") + 1
|
|
||||||
|
|
||||||
if json_start < 0 or json_end <= 0:
|
|
||||||
return {}
|
|
||||||
|
|
||||||
# Пробуем весь текст от первого { до последнего }
|
|
||||||
try:
|
|
||||||
data = json.loads(raw[json_start:json_end])
|
|
||||||
if isinstance(data, dict):
|
|
||||||
return data
|
|
||||||
except json.JSONDecodeError:
|
|
||||||
pass
|
|
||||||
|
|
||||||
# Ищем все возможные начала JSON
|
|
||||||
candidates = []
|
|
||||||
pos = 0
|
|
||||||
while True:
|
|
||||||
pos = raw.find("{", pos)
|
|
||||||
if pos < 0:
|
|
||||||
break
|
|
||||||
candidates.append(pos)
|
|
||||||
pos += 1
|
|
||||||
|
|
||||||
# Пробуем каждый candidate с конца
|
|
||||||
for start in reversed(candidates):
|
|
||||||
end = raw.rfind("}") + 1
|
|
||||||
if end <= start:
|
|
||||||
continue
|
|
||||||
try:
|
|
||||||
data = json.loads(raw[start:end])
|
|
||||||
if isinstance(data, dict):
|
|
||||||
return data
|
|
||||||
except json.JSONDecodeError:
|
|
||||||
continue
|
|
||||||
|
|
||||||
return {}
|
|
||||||
except Exception as e:
|
|
||||||
with open("/tmp/recall_debug.log", "a") as f:
|
|
||||||
f.write(f"PARSE ERROR: {e}\n")
|
|
||||||
return {}
|
|
||||||
|
|
||||||
@staticmethod
|
|
||||||
def _empty_result(reason: str) -> dict[str, Any]:
|
|
||||||
return {
|
|
||||||
"should_recall": False,
|
|
||||||
"reason": reason,
|
|
||||||
"query": "",
|
|
||||||
"results": [],
|
|
||||||
"summary": "",
|
|
||||||
}
|
|
||||||
|
|
@ -1,185 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import json
|
|
||||||
import sqlite3
|
|
||||||
from datetime import datetime, timezone
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Any, Sequence
|
|
||||||
from uuid import uuid4
|
|
||||||
|
|
||||||
from app.core.contracts import MemoryEntry
|
|
||||||
|
|
||||||
|
|
||||||
def utc_now() -> datetime:
|
|
||||||
return datetime.now(timezone.utc)
|
|
||||||
|
|
||||||
|
|
||||||
class MemoryStore:
|
|
||||||
def __init__(self, db_path: str | Path) -> None:
|
|
||||||
self._db_path = Path(db_path)
|
|
||||||
self._db_path.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
self._conn = sqlite3.connect(str(self._db_path), check_same_thread=False)
|
|
||||||
self._conn.row_factory = sqlite3.Row
|
|
||||||
self._init_tables()
|
|
||||||
|
|
||||||
def _init_tables(self) -> None:
|
|
||||||
self._conn.executescript("""
|
|
||||||
CREATE TABLE IF NOT EXISTS memory_items (
|
|
||||||
id TEXT PRIMARY KEY,
|
|
||||||
text TEXT NOT NULL,
|
|
||||||
kind TEXT NOT NULL,
|
|
||||||
source TEXT NOT NULL,
|
|
||||||
weight REAL NOT NULL DEFAULT 0.5,
|
|
||||||
task_id TEXT,
|
|
||||||
session_id TEXT,
|
|
||||||
metadata_json TEXT,
|
|
||||||
created_at TEXT NOT NULL,
|
|
||||||
updated_at TEXT NOT NULL
|
|
||||||
);
|
|
||||||
|
|
||||||
CREATE TABLE IF NOT EXISTS memory_embeddings (
|
|
||||||
memory_id TEXT PRIMARY KEY,
|
|
||||||
embedding BLOB NOT NULL,
|
|
||||||
embedding_model TEXT NOT NULL,
|
|
||||||
embedding_dim INTEGER NOT NULL,
|
|
||||||
created_at TEXT NOT NULL,
|
|
||||||
FOREIGN KEY (memory_id) REFERENCES memory_items(id) ON DELETE CASCADE
|
|
||||||
);
|
|
||||||
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_memory_items_task ON memory_items(task_id);
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_memory_items_session ON memory_items(session_id);
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_memory_items_kind ON memory_items(kind);
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_memory_embeddings_model ON memory_embeddings(embedding_model);
|
|
||||||
""")
|
|
||||||
self._conn.commit()
|
|
||||||
|
|
||||||
def insert(self, entry: MemoryEntry, embedding: bytes) -> None:
|
|
||||||
cursor = self._conn.cursor()
|
|
||||||
cursor.execute(
|
|
||||||
"""
|
|
||||||
INSERT INTO memory_items (id, text, kind, source, weight, task_id, session_id, metadata_json, created_at, updated_at)
|
|
||||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
|
||||||
""",
|
|
||||||
(
|
|
||||||
entry.id,
|
|
||||||
entry.text,
|
|
||||||
entry.kind,
|
|
||||||
entry.source,
|
|
||||||
entry.weight,
|
|
||||||
entry.task_id,
|
|
||||||
entry.session_id,
|
|
||||||
json.dumps(entry.metadata) if entry.metadata else None,
|
|
||||||
entry.created_at.isoformat(),
|
|
||||||
utc_now().isoformat(),
|
|
||||||
),
|
|
||||||
)
|
|
||||||
cursor.execute(
|
|
||||||
"""
|
|
||||||
INSERT INTO memory_embeddings (memory_id, embedding, embedding_model, embedding_dim, created_at)
|
|
||||||
VALUES (?, ?, ?, ?, ?)
|
|
||||||
""",
|
|
||||||
(
|
|
||||||
entry.id,
|
|
||||||
embedding,
|
|
||||||
entry.embedding_model,
|
|
||||||
entry.embedding_dim,
|
|
||||||
utc_now().isoformat(),
|
|
||||||
),
|
|
||||||
)
|
|
||||||
self._conn.commit()
|
|
||||||
|
|
||||||
def get(self, memory_id: str) -> MemoryEntry | None:
|
|
||||||
cursor = self._conn.cursor()
|
|
||||||
row = cursor.execute(
|
|
||||||
"SELECT * FROM memory_items WHERE id = ?", (memory_id,)
|
|
||||||
).fetchone()
|
|
||||||
if not row:
|
|
||||||
return None
|
|
||||||
return self._row_to_entry(row)
|
|
||||||
|
|
||||||
def get_embedding(self, memory_id: str) -> bytes | None:
|
|
||||||
cursor = self._conn.cursor()
|
|
||||||
row = cursor.execute(
|
|
||||||
"SELECT embedding FROM memory_embeddings WHERE memory_id = ?", (memory_id,)
|
|
||||||
).fetchone()
|
|
||||||
return bytes(row["embedding"]) if row else None
|
|
||||||
|
|
||||||
def get_all(self, limit: int = 1000) -> list[MemoryEntry]:
|
|
||||||
cursor = self._conn.cursor()
|
|
||||||
rows = cursor.execute(
|
|
||||||
"SELECT * FROM memory_items ORDER BY created_at DESC LIMIT ?", (limit,)
|
|
||||||
).fetchall()
|
|
||||||
return [self._row_to_entry(row) for row in rows]
|
|
||||||
|
|
||||||
def get_by_task(self, task_id: str) -> list[MemoryEntry]:
|
|
||||||
cursor = self._conn.cursor()
|
|
||||||
rows = cursor.execute(
|
|
||||||
"SELECT * FROM memory_items WHERE task_id = ? ORDER BY created_at DESC", (task_id,)
|
|
||||||
).fetchall()
|
|
||||||
return [self._row_to_entry(row) for row in rows]
|
|
||||||
|
|
||||||
def get_by_session(self, session_id: str, limit: int = 100) -> list[MemoryEntry]:
|
|
||||||
cursor = self._conn.cursor()
|
|
||||||
rows = cursor.execute(
|
|
||||||
"SELECT * FROM memory_items WHERE session_id = ? ORDER BY created_at DESC LIMIT ?",
|
|
||||||
(session_id, limit),
|
|
||||||
).fetchall()
|
|
||||||
return [self._row_to_entry(row) for row in rows]
|
|
||||||
|
|
||||||
def get_by_kind(self, kind: str, limit: int = 100) -> list[MemoryEntry]:
|
|
||||||
cursor = self._conn.cursor()
|
|
||||||
rows = cursor.execute(
|
|
||||||
"SELECT * FROM memory_items WHERE kind = ? ORDER BY created_at DESC LIMIT ?", (kind, limit)
|
|
||||||
).fetchall()
|
|
||||||
return [self._row_to_entry(row) for row in rows]
|
|
||||||
|
|
||||||
def delete(self, memory_id: str) -> bool:
|
|
||||||
cursor = self._conn.cursor()
|
|
||||||
cursor.execute("DELETE FROM memory_embeddings WHERE memory_id = ?", (memory_id,))
|
|
||||||
cursor.execute("DELETE FROM memory_items WHERE id = ?", (memory_id,))
|
|
||||||
self._conn.commit()
|
|
||||||
return cursor.rowcount > 0
|
|
||||||
|
|
||||||
def update_weight(self, memory_id: str, weight: float) -> bool:
|
|
||||||
cursor = self._conn.cursor()
|
|
||||||
cursor.execute(
|
|
||||||
"UPDATE memory_items SET weight = ?, updated_at = ? WHERE id = ?",
|
|
||||||
(weight, utc_now().isoformat(), memory_id),
|
|
||||||
)
|
|
||||||
self._conn.commit()
|
|
||||||
return cursor.rowcount > 0
|
|
||||||
|
|
||||||
def search_text(self, query: str, limit: int = 10) -> list[MemoryEntry]:
|
|
||||||
cursor = self._conn.cursor()
|
|
||||||
rows = cursor.execute(
|
|
||||||
"SELECT * FROM memory_items WHERE text LIKE ? ORDER BY created_at DESC LIMIT ?",
|
|
||||||
(f"%{query}%", limit),
|
|
||||||
).fetchall()
|
|
||||||
return [self._row_to_entry(row) for row in rows]
|
|
||||||
|
|
||||||
def count(self) -> int:
|
|
||||||
cursor = self._conn.cursor()
|
|
||||||
row = cursor.execute("SELECT COUNT(*) FROM memory_items").fetchone()
|
|
||||||
return row[0] if row else 0
|
|
||||||
|
|
||||||
def close(self) -> None:
|
|
||||||
self._conn.close()
|
|
||||||
|
|
||||||
def _row_to_entry(self, row: sqlite3.Row) -> MemoryEntry:
|
|
||||||
metadata = {}
|
|
||||||
if row["metadata_json"]:
|
|
||||||
import json
|
|
||||||
metadata = json.loads(row["metadata_json"])
|
|
||||||
return MemoryEntry(
|
|
||||||
id=row["id"],
|
|
||||||
text=row["text"],
|
|
||||||
kind=row["kind"],
|
|
||||||
source=row["source"],
|
|
||||||
weight=row["weight"],
|
|
||||||
task_id=row["task_id"],
|
|
||||||
session_id=row["session_id"],
|
|
||||||
metadata=metadata,
|
|
||||||
created_at=datetime.fromisoformat(row["created_at"]),
|
|
||||||
embedding_model="",
|
|
||||||
embedding_dim=0,
|
|
||||||
)
|
|
||||||
|
|
@ -1,149 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import logging
|
|
||||||
import numpy as np
|
|
||||||
import hnswlib
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
|
|
||||||
class VectorIndex:
|
|
||||||
def __init__(
|
|
||||||
self,
|
|
||||||
index_path: str | Path | None = None,
|
|
||||||
embedding_dim: int = 384,
|
|
||||||
max_elements: int = 10000,
|
|
||||||
) -> None:
|
|
||||||
self._embedding_dim = embedding_dim
|
|
||||||
self._index_path = Path(index_path) if index_path else None
|
|
||||||
self._index: hnswlib.Index | None = None
|
|
||||||
self._max_elements = max_elements
|
|
||||||
self._loading = False # Prevent recursion
|
|
||||||
|
|
||||||
self._init_index()
|
|
||||||
|
|
||||||
def _init_index(self) -> None:
|
|
||||||
if self._loading:
|
|
||||||
return
|
|
||||||
self._loading = True
|
|
||||||
try:
|
|
||||||
if self._index_path and self._index_path.exists():
|
|
||||||
self._load()
|
|
||||||
else:
|
|
||||||
self._index = hnswlib.Index(
|
|
||||||
space="l2",
|
|
||||||
dim=self._embedding_dim,
|
|
||||||
)
|
|
||||||
self._index.init_index(
|
|
||||||
max_elements=self._max_elements,
|
|
||||||
ef_construction=200,
|
|
||||||
M=16,
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"VectorIndex init failed: {e}")
|
|
||||||
self._index = hnswlib.Index(
|
|
||||||
space="l2",
|
|
||||||
dim=self._embedding_dim,
|
|
||||||
)
|
|
||||||
self._index.init_index(
|
|
||||||
max_elements=self._max_elements,
|
|
||||||
ef_construction=100,
|
|
||||||
M=16,
|
|
||||||
)
|
|
||||||
finally:
|
|
||||||
self._loading = False
|
|
||||||
|
|
||||||
def insert(self, memory_id: str, embedding: np.ndarray) -> None:
|
|
||||||
if self._index is None:
|
|
||||||
self._init_index()
|
|
||||||
if self._index is None:
|
|
||||||
return
|
|
||||||
|
|
||||||
try:
|
|
||||||
vector = self._normalize(embedding)
|
|
||||||
internal_id = self._get_internal_id(memory_id)
|
|
||||||
self._index.add_items(vector, ids=np.array([internal_id]))
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"VectorIndex insert failed: {e}")
|
|
||||||
|
|
||||||
def search(
|
|
||||||
self,
|
|
||||||
query_embedding: np.ndarray,
|
|
||||||
k: int = 5,
|
|
||||||
) -> tuple[list[str], list[float]]:
|
|
||||||
if self._index is None:
|
|
||||||
return [], []
|
|
||||||
|
|
||||||
try:
|
|
||||||
if self._index.get_current_count() == 0:
|
|
||||||
return [], []
|
|
||||||
|
|
||||||
# Set ef to at least k for proper search
|
|
||||||
self._index.set_ef(max(k * 2, 50))
|
|
||||||
|
|
||||||
vector = self._normalize(query_embedding)
|
|
||||||
labels, distances = self._index.knn_query(vector, k=k)
|
|
||||||
|
|
||||||
memory_ids = [self._get_memory_id(int(label)) for label in labels[0]]
|
|
||||||
scores = [1.0 - dist for dist in distances[0]]
|
|
||||||
return memory_ids, scores
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"VectorIndex search failed: {e}")
|
|
||||||
return [], []
|
|
||||||
|
|
||||||
def delete(self, memory_id: str) -> bool:
|
|
||||||
return False
|
|
||||||
|
|
||||||
def get_items(self, memory_ids: list[str]) -> np.ndarray:
|
|
||||||
if self._index is None:
|
|
||||||
raise RuntimeError("Index not initialized")
|
|
||||||
internal_ids = [self._get_internal_id(mid) for mid in memory_ids]
|
|
||||||
return self._index.get_items(np.array(internal_ids))
|
|
||||||
|
|
||||||
def save(self) -> None:
|
|
||||||
if self._index and self._index_path:
|
|
||||||
try:
|
|
||||||
self._index_path.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
self._index.save_index(str(self._index_path))
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"VectorIndex save failed: {e}")
|
|
||||||
|
|
||||||
def _load(self) -> None:
|
|
||||||
if self._loading:
|
|
||||||
return
|
|
||||||
self._loading = True
|
|
||||||
try:
|
|
||||||
if self._index_path and self._index_path.exists():
|
|
||||||
self._index = hnswlib.Index(space="l2", dim=self._embedding_dim)
|
|
||||||
self._index.load_index(
|
|
||||||
str(self._index_path),
|
|
||||||
max_elements=self._max_elements
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"VectorIndex load failed: {e}")
|
|
||||||
self._init_index()
|
|
||||||
finally:
|
|
||||||
self._loading = False
|
|
||||||
|
|
||||||
def _normalize(self, vector: np.ndarray) -> np.ndarray:
|
|
||||||
vec = vector.flatten()
|
|
||||||
norm = np.linalg.norm(vec)
|
|
||||||
if norm > 0:
|
|
||||||
vec = vec / norm
|
|
||||||
return vec.reshape(1, -1)
|
|
||||||
|
|
||||||
def _get_internal_id(self, memory_id: str) -> int:
|
|
||||||
return hash(memory_id) % (2**31)
|
|
||||||
|
|
||||||
def _get_memory_id(self, internal_id: int) -> str:
|
|
||||||
return str(internal_id)
|
|
||||||
|
|
||||||
@property
|
|
||||||
def embedding_dim(self) -> int:
|
|
||||||
return self._embedding_dim
|
|
||||||
|
|
||||||
@property
|
|
||||||
def element_count(self) -> int:
|
|
||||||
return self._index.get_current_count() if self._index else 0
|
|
||||||
|
|
@ -1,98 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from typing import Any, Literal
|
|
||||||
|
|
||||||
from app.core.contracts import CriticScore, MemoryEntry
|
|
||||||
|
|
||||||
|
|
||||||
class MemoryWritePolicy:
|
|
||||||
def __init__(
|
|
||||||
self,
|
|
||||||
store_threshold: float = 0.7,
|
|
||||||
min_usefulness: float = 0.3,
|
|
||||||
max_entries_per_session: int = 50,
|
|
||||||
) -> None:
|
|
||||||
self._store_threshold = store_threshold
|
|
||||||
self._min_usefulness = min_usefulness
|
|
||||||
self._max_entries_per_session = max_entries_per_session
|
|
||||||
|
|
||||||
def decide(
|
|
||||||
self,
|
|
||||||
critic_score: CriticScore,
|
|
||||||
memory_type: MemoryEntry.Kind,
|
|
||||||
session_id: str | None = None,
|
|
||||||
has_duplicate: bool = False,
|
|
||||||
current_session_count: int = 0,
|
|
||||||
) -> Literal["store", "store_with_weight", "skip", "merge"]:
|
|
||||||
if critic_score.safety < 0.5:
|
|
||||||
return "skip"
|
|
||||||
|
|
||||||
if has_duplicate:
|
|
||||||
return "merge"
|
|
||||||
|
|
||||||
if not critic_score.memory_store:
|
|
||||||
return "skip"
|
|
||||||
|
|
||||||
if critic_score.usefulness < self._min_usefulness:
|
|
||||||
return "skip"
|
|
||||||
|
|
||||||
if session_id and current_session_count >= self._max_entries_per_session:
|
|
||||||
return "skip"
|
|
||||||
|
|
||||||
base_decision = self._evaluate_scores(critic_score, memory_type)
|
|
||||||
|
|
||||||
if base_decision == "store" and critic_score.weight < self._store_threshold:
|
|
||||||
adjusted_weight = self._adjust_weight(critic_score, memory_type)
|
|
||||||
if adjusted_weight >= self._store_threshold:
|
|
||||||
return "store_with_weight"
|
|
||||||
return base_decision
|
|
||||||
|
|
||||||
return base_decision
|
|
||||||
|
|
||||||
def _evaluate_scores(
|
|
||||||
self,
|
|
||||||
critic_score: CriticScore,
|
|
||||||
memory_type: MemoryEntry.Kind,
|
|
||||||
) -> Literal["store", "store_with_weight", "skip", "merge"]:
|
|
||||||
avg_score = (critic_score.correctness + critic_score.usefulness + critic_score.safety) / 3.0
|
|
||||||
|
|
||||||
if memory_type in ("fact", "plan", "summary"):
|
|
||||||
if avg_score >= 0.8:
|
|
||||||
return "store"
|
|
||||||
elif avg_score >= 0.6:
|
|
||||||
return "store_with_weight"
|
|
||||||
|
|
||||||
if memory_type in ("tool_result", "critique"):
|
|
||||||
if avg_score >= self._store_threshold:
|
|
||||||
return "store"
|
|
||||||
elif avg_score >= 0.5:
|
|
||||||
return "store_with_weight"
|
|
||||||
|
|
||||||
if memory_type == "user_preference":
|
|
||||||
if avg_score >= 0.5:
|
|
||||||
return "store"
|
|
||||||
|
|
||||||
return "skip"
|
|
||||||
|
|
||||||
def _adjust_weight(
|
|
||||||
self,
|
|
||||||
critic_score: CriticScore,
|
|
||||||
memory_type: MemoryEntry.Kind,
|
|
||||||
) -> float:
|
|
||||||
base_weight = critic_score.weight
|
|
||||||
|
|
||||||
type_boost = {
|
|
||||||
"fact": 0.15,
|
|
||||||
"plan": 0.1,
|
|
||||||
"summary": 0.1,
|
|
||||||
"user_preference": 0.2,
|
|
||||||
"tool_result": 0.05,
|
|
||||||
"critique": 0.05,
|
|
||||||
}.get(memory_type, 0.0)
|
|
||||||
|
|
||||||
safety_boost = 0.0
|
|
||||||
if critic_score.safety >= 0.9:
|
|
||||||
safety_boost = 0.1
|
|
||||||
|
|
||||||
adjusted = base_weight + type_boost + safety_boost
|
|
||||||
return min(adjusted, 1.0)
|
|
||||||
|
|
@ -1,32 +0,0 @@
|
||||||
LLM_AVAILABLE = False
|
|
||||||
EMBEDDINGS_AVAILABLE = False
|
|
||||||
|
|
||||||
try:
|
|
||||||
from app.models.adapters import create_adapter, create_llama_adapter
|
|
||||||
from app.models.orchestrator import OrchestratorAdapter
|
|
||||||
from app.models.coder import CoderAdapter
|
|
||||||
from app.models.critic import CriticAdapter
|
|
||||||
LLM_AVAILABLE = True
|
|
||||||
except ImportError:
|
|
||||||
create_adapter = None
|
|
||||||
create_llama_adapter = None
|
|
||||||
OrchestratorAdapter = None
|
|
||||||
CoderAdapter = None
|
|
||||||
CriticAdapter = None
|
|
||||||
|
|
||||||
try:
|
|
||||||
from app.models.embeddings import EmbeddingsAdapter
|
|
||||||
EMBEDDINGS_AVAILABLE = True
|
|
||||||
except ImportError:
|
|
||||||
EmbeddingsAdapter = None
|
|
||||||
|
|
||||||
__all__ = [
|
|
||||||
"create_adapter",
|
|
||||||
"create_llama_adapter",
|
|
||||||
"OrchestratorAdapter",
|
|
||||||
"CoderAdapter",
|
|
||||||
"CriticAdapter",
|
|
||||||
"EmbeddingsAdapter",
|
|
||||||
"LLM_AVAILABLE",
|
|
||||||
"EMBEDDINGS_AVAILABLE",
|
|
||||||
]
|
|
||||||
|
|
@ -1,72 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Any, Protocol, Iterator
|
|
||||||
import os
|
|
||||||
|
|
||||||
try:
|
|
||||||
from llama_cpp import Llama
|
|
||||||
LLAMA_AVAILABLE = True
|
|
||||||
except ImportError:
|
|
||||||
Llama = None
|
|
||||||
LLAMA_AVAILABLE = False
|
|
||||||
|
|
||||||
|
|
||||||
class BaseModelAdapter(Protocol):
|
|
||||||
async def generate(self, prompt: str, **kwargs: Any) -> str: ...
|
|
||||||
def stream(self, prompt: str, **kwargs: Any) -> Iterator[str]: ...
|
|
||||||
|
|
||||||
|
|
||||||
def create_llama_adapter(
|
|
||||||
model_path: str,
|
|
||||||
backend: str = "cpu",
|
|
||||||
n_gpu_layers: int = 0,
|
|
||||||
max_tokens: int = 2048,
|
|
||||||
temperature: float = 0.2,
|
|
||||||
base_dir: Path | None = None,
|
|
||||||
) -> "Llama":
|
|
||||||
if not LLAMA_AVAILABLE:
|
|
||||||
raise RuntimeError("llama-cpp-python not installed")
|
|
||||||
|
|
||||||
if base_dir:
|
|
||||||
model_path = str(base_dir / model_path)
|
|
||||||
else:
|
|
||||||
model_path = str(Path.cwd() / model_path)
|
|
||||||
|
|
||||||
return Llama(
|
|
||||||
model_path=model_path,
|
|
||||||
n_gpu_layers=n_gpu_layers,
|
|
||||||
n_ctx=4096,
|
|
||||||
n_threads=int(os.environ.get("DUCKLM_N_THREADS", max(4, min((os.cpu_count() or 4) // 2, 20)))),
|
|
||||||
n_threads_batch=-1,
|
|
||||||
max_tokens=max_tokens,
|
|
||||||
temperature=temperature,
|
|
||||||
verbose=False,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def create_adapter(
|
|
||||||
model_type: str,
|
|
||||||
config: dict[str, Any],
|
|
||||||
base_dir: Path | None = None,
|
|
||||||
) -> "Llama":
|
|
||||||
if not LLAMA_AVAILABLE:
|
|
||||||
raise RuntimeError("llama-cpp-python not installed")
|
|
||||||
|
|
||||||
model_path = config.get("path", "")
|
|
||||||
backend = config.get("backend", "cpu")
|
|
||||||
n_gpu_layers = config.get("n_gpu_layers", 0)
|
|
||||||
max_tokens = config.get("max_tokens", 2048)
|
|
||||||
temperature = config.get("temperature", 0.2)
|
|
||||||
|
|
||||||
if backend == "vulkan" and n_gpu_layers != 0:
|
|
||||||
n_gpu_layers = -1
|
|
||||||
|
|
||||||
return create_llama_adapter(
|
|
||||||
model_path=model_path,
|
|
||||||
backend=backend,
|
|
||||||
n_gpu_layers=n_gpu_layers,
|
|
||||||
max_tokens=max_tokens,
|
|
||||||
temperature=temperature,
|
|
||||||
base_dir=base_dir,
|
|
||||||
)
|
|
||||||
|
|
@ -1,58 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import asyncio
|
|
||||||
from typing import Any, AsyncIterator
|
|
||||||
|
|
||||||
from app.models.orchestrator import OrchestratorAdapter as SyncOrchestrator
|
|
||||||
|
|
||||||
|
|
||||||
class AsyncOrchestratorAdapter:
|
|
||||||
"""Async wrapper for orchestrator - runs in executor to avoid blocking event loop."""
|
|
||||||
|
|
||||||
def __init__(self, sync_adapter: SyncOrchestrator) -> None:
|
|
||||||
self._sync = sync_adapter
|
|
||||||
|
|
||||||
async def generate(self, prompt: str, max_tokens: int | None = None) -> str:
|
|
||||||
loop = asyncio.get_event_loop()
|
|
||||||
return await loop.run_in_executor(
|
|
||||||
None,
|
|
||||||
lambda: self._sync.generate(prompt, max_tokens)
|
|
||||||
)
|
|
||||||
|
|
||||||
async def stream(self, prompt: str, max_tokens: int | None = None) -> AsyncIterator[str]:
|
|
||||||
loop = asyncio.get_event_loop()
|
|
||||||
|
|
||||||
async def gen():
|
|
||||||
return list(self._sync.stream(prompt, max_tokens))
|
|
||||||
|
|
||||||
result = await loop.run_in_executor(None, gen)
|
|
||||||
for chunk in result:
|
|
||||||
yield chunk
|
|
||||||
|
|
||||||
|
|
||||||
class AsyncCoderAdapter:
|
|
||||||
"""Async wrapper for coder."""
|
|
||||||
|
|
||||||
def __init__(self, sync_adapter) -> None:
|
|
||||||
self._sync = sync_adapter
|
|
||||||
|
|
||||||
async def generate(self, prompt: str, max_tokens: int | None = None) -> str:
|
|
||||||
loop = asyncio.get_event_loop()
|
|
||||||
return await loop.run_in_executor(
|
|
||||||
None,
|
|
||||||
lambda: self._sync.generate(prompt, max_tokens)
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
class AsyncCriticAdapter:
|
|
||||||
"""Async wrapper for critic."""
|
|
||||||
|
|
||||||
def __init__(self, sync_adapter) -> None:
|
|
||||||
self._sync = sync_adapter
|
|
||||||
|
|
||||||
async def generate(self, prompt: str, max_tokens: int | None = None) -> str:
|
|
||||||
loop = asyncio.get_event_loop()
|
|
||||||
return await loop.run_in_executor(
|
|
||||||
None,
|
|
||||||
lambda: self._sync.generate(prompt, max_tokens)
|
|
||||||
)
|
|
||||||
|
|
@ -1,44 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from threading import RLock
|
|
||||||
from typing import Any, Iterator
|
|
||||||
from llama_cpp import Llama
|
|
||||||
|
|
||||||
|
|
||||||
class CoderAdapter:
|
|
||||||
def __init__(self, llm: Llama, system_prompt: str | None = None, lock: RLock | None = None) -> None:
|
|
||||||
self._llm = llm
|
|
||||||
self._lock = lock or RLock()
|
|
||||||
self._system_prompt = system_prompt or (
|
|
||||||
"You are an expert code generation model."
|
|
||||||
)
|
|
||||||
self._temperature = 0.2
|
|
||||||
|
|
||||||
def generate(self, prompt: str, max_tokens: int | None = None) -> str:
|
|
||||||
messages = [
|
|
||||||
{"role": "system", "content": self._system_prompt},
|
|
||||||
{"role": "user", "content": prompt},
|
|
||||||
]
|
|
||||||
with self._lock:
|
|
||||||
output = self._llm.create_chat_completion(
|
|
||||||
messages=messages,
|
|
||||||
max_tokens=max_tokens or 1024,
|
|
||||||
temperature=self._temperature,
|
|
||||||
)
|
|
||||||
return output["choices"][0]["message"]["content"]
|
|
||||||
|
|
||||||
def stream(self, prompt: str, max_tokens: int | None = None) -> Iterator[str]:
|
|
||||||
messages = [
|
|
||||||
{"role": "system", "content": self._system_prompt},
|
|
||||||
{"role": "user", "content": prompt},
|
|
||||||
]
|
|
||||||
with self._lock:
|
|
||||||
for chunk in self._llm.create_chat_completion(
|
|
||||||
messages=messages,
|
|
||||||
max_tokens=max_tokens or 1024,
|
|
||||||
temperature=self._temperature,
|
|
||||||
stream=True,
|
|
||||||
):
|
|
||||||
content = chunk["choices"][0].get("delta", {}).get("content")
|
|
||||||
if content:
|
|
||||||
yield content
|
|
||||||
|
|
@ -1,44 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from threading import RLock
|
|
||||||
from typing import Any, Iterator
|
|
||||||
from llama_cpp import Llama
|
|
||||||
|
|
||||||
|
|
||||||
class CriticAdapter:
|
|
||||||
def __init__(self, llm: Llama, system_prompt: str | None = None, lock: RLock | None = None) -> None:
|
|
||||||
self._llm = llm
|
|
||||||
self._lock = lock or RLock()
|
|
||||||
self._system_prompt = system_prompt or (
|
|
||||||
"You are a critic model. Evaluate tool results and respond with JSON."
|
|
||||||
)
|
|
||||||
self._temperature = 0.1
|
|
||||||
|
|
||||||
def generate(self, prompt: str, max_tokens: int | None = None) -> str:
|
|
||||||
messages = [
|
|
||||||
{"role": "system", "content": self._system_prompt},
|
|
||||||
{"role": "user", "content": prompt},
|
|
||||||
]
|
|
||||||
with self._lock:
|
|
||||||
output = self._llm.create_chat_completion(
|
|
||||||
messages=messages,
|
|
||||||
max_tokens=max_tokens or 512,
|
|
||||||
temperature=self._temperature,
|
|
||||||
)
|
|
||||||
return output["choices"][0]["message"]["content"]
|
|
||||||
|
|
||||||
def stream(self, prompt: str, max_tokens: int | None = None) -> Iterator[str]:
|
|
||||||
messages = [
|
|
||||||
{"role": "system", "content": self._system_prompt},
|
|
||||||
{"role": "user", "content": prompt},
|
|
||||||
]
|
|
||||||
with self._lock:
|
|
||||||
for chunk in self._llm.create_chat_completion(
|
|
||||||
messages=messages,
|
|
||||||
max_tokens=max_tokens or 512,
|
|
||||||
temperature=self._temperature,
|
|
||||||
stream=True,
|
|
||||||
):
|
|
||||||
content = chunk["choices"][0].get("delta", {}).get("content")
|
|
||||||
if content:
|
|
||||||
yield content
|
|
||||||
|
|
@ -1,37 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
import numpy as np
|
|
||||||
from sentence_transformers import SentenceTransformer
|
|
||||||
|
|
||||||
|
|
||||||
class EmbeddingsAdapter:
|
|
||||||
def __init__(
|
|
||||||
self,
|
|
||||||
model_path: str | Path | None = None,
|
|
||||||
model_name: str = "sentence-transformers/all-MiniLM-L6-v2",
|
|
||||||
embedding_dim: int = 384,
|
|
||||||
) -> None:
|
|
||||||
self._embedding_dim = embedding_dim
|
|
||||||
if model_path and Path(model_path).exists():
|
|
||||||
self._model = SentenceTransformer(str(model_path))
|
|
||||||
else:
|
|
||||||
self._model = SentenceTransformer(model_name)
|
|
||||||
|
|
||||||
def encode(self, texts: str | list[str]) -> np.ndarray:
|
|
||||||
is_single = isinstance(texts, str)
|
|
||||||
if is_single:
|
|
||||||
texts = [texts]
|
|
||||||
embeddings = self._model.encode(texts, convert_to_numpy=True)
|
|
||||||
if is_single:
|
|
||||||
return embeddings[0]
|
|
||||||
return embeddings
|
|
||||||
|
|
||||||
def encode_batch(self, texts: list[str], batch_size: int = 32) -> np.ndarray:
|
|
||||||
return self._model.encode(texts, batch_size=batch_size, convert_to_numpy=True)
|
|
||||||
|
|
||||||
@property
|
|
||||||
def embedding_dim(self) -> int:
|
|
||||||
return self._embedding_dim
|
|
||||||
|
|
@ -1,45 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from threading import RLock
|
|
||||||
from typing import Any, Iterator
|
|
||||||
from llama_cpp import Llama
|
|
||||||
|
|
||||||
|
|
||||||
class OrchestratorAdapter:
|
|
||||||
def __init__(self, llm: Llama, system_prompt: str | None = None, lock: RLock | None = None) -> None:
|
|
||||||
self._llm = llm
|
|
||||||
self._lock = lock or RLock()
|
|
||||||
self._system_prompt = system_prompt or (
|
|
||||||
"You are an expert orchestrator for a local AI agent system. "
|
|
||||||
"Your role is to analyze the user's task, decide whether planning is needed."
|
|
||||||
)
|
|
||||||
self._temperature = 0.2
|
|
||||||
|
|
||||||
def generate(self, prompt: str, max_tokens: int | None = None) -> str:
|
|
||||||
messages = [
|
|
||||||
{"role": "system", "content": self._system_prompt},
|
|
||||||
{"role": "user", "content": prompt},
|
|
||||||
]
|
|
||||||
with self._lock:
|
|
||||||
output = self._llm.create_chat_completion(
|
|
||||||
messages=messages,
|
|
||||||
max_tokens=max_tokens or 512,
|
|
||||||
temperature=self._temperature,
|
|
||||||
)
|
|
||||||
return output["choices"][0]["message"]["content"]
|
|
||||||
|
|
||||||
def stream(self, prompt: str, max_tokens: int | None = None) -> Iterator[str]:
|
|
||||||
messages = [
|
|
||||||
{"role": "system", "content": self._system_prompt},
|
|
||||||
{"role": "user", "content": prompt},
|
|
||||||
]
|
|
||||||
with self._lock:
|
|
||||||
for chunk in self._llm.create_chat_completion(
|
|
||||||
messages=messages,
|
|
||||||
max_tokens=max_tokens or 512,
|
|
||||||
temperature=self._temperature,
|
|
||||||
stream=True,
|
|
||||||
):
|
|
||||||
content = chunk["choices"][0].get("delta", {}).get("content")
|
|
||||||
if content:
|
|
||||||
yield content
|
|
||||||
|
|
@ -1,2 +0,0 @@
|
||||||
"""Permission and approval handling."""
|
|
||||||
|
|
||||||
|
|
@ -1,67 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import sqlite3
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
from app.core.contracts import PermissionDecision
|
|
||||||
|
|
||||||
|
|
||||||
class SQLiteApprovalStore:
|
|
||||||
"""Stores persistent user approval decisions."""
|
|
||||||
|
|
||||||
def __init__(self, db_path: str | Path) -> None:
|
|
||||||
self._db_path = Path(db_path)
|
|
||||||
self._db_path.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
self._initialize()
|
|
||||||
|
|
||||||
def save(self, decision: PermissionDecision) -> PermissionDecision:
|
|
||||||
with sqlite3.connect(self._db_path) as conn:
|
|
||||||
conn.execute(
|
|
||||||
"""
|
|
||||||
INSERT OR REPLACE INTO approvals (action_type, pattern, decision, created_at)
|
|
||||||
VALUES (?, ?, ?, ?)
|
|
||||||
""",
|
|
||||||
(
|
|
||||||
decision.action_type,
|
|
||||||
decision.pattern,
|
|
||||||
decision.decision,
|
|
||||||
decision.created_at.isoformat(),
|
|
||||||
),
|
|
||||||
)
|
|
||||||
conn.commit()
|
|
||||||
return decision
|
|
||||||
|
|
||||||
def load(self, action_type: str, pattern: str) -> PermissionDecision | None:
|
|
||||||
with sqlite3.connect(self._db_path) as conn:
|
|
||||||
row = conn.execute(
|
|
||||||
"""
|
|
||||||
SELECT action_type, pattern, decision, created_at
|
|
||||||
FROM approvals
|
|
||||||
WHERE action_type = ? AND pattern = ?
|
|
||||||
""",
|
|
||||||
(action_type, pattern),
|
|
||||||
).fetchone()
|
|
||||||
if not row:
|
|
||||||
return None
|
|
||||||
return PermissionDecision(
|
|
||||||
action_type=row[0],
|
|
||||||
pattern=row[1],
|
|
||||||
decision=row[2],
|
|
||||||
created_at=row[3],
|
|
||||||
)
|
|
||||||
|
|
||||||
def _initialize(self) -> None:
|
|
||||||
with sqlite3.connect(self._db_path) as conn:
|
|
||||||
conn.execute(
|
|
||||||
"""
|
|
||||||
CREATE TABLE IF NOT EXISTS approvals (
|
|
||||||
action_type TEXT NOT NULL,
|
|
||||||
pattern TEXT NOT NULL,
|
|
||||||
decision TEXT NOT NULL,
|
|
||||||
created_at TEXT NOT NULL,
|
|
||||||
PRIMARY KEY (action_type, pattern)
|
|
||||||
)
|
|
||||||
"""
|
|
||||||
)
|
|
||||||
conn.commit()
|
|
||||||
|
|
||||||
|
|
@ -1,2 +0,0 @@
|
||||||
"""Runtime loop and execution coordination."""
|
|
||||||
|
|
||||||
|
|
@ -1,148 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import asyncio
|
|
||||||
from app.core.context_builder import ContextBuilder
|
|
||||||
from app.core.contracts import ExecutionDirective, PermissionDecision, PermissionRequest, RuntimeEvent, TaskCheckpoint, UserTask
|
|
||||||
from app.core.execution_engine import ExecutionEngine
|
|
||||||
from app.core.async_router import AsyncRouter
|
|
||||||
from app.events.event_bus import EventBus
|
|
||||||
from app.events.event_types import CHECKPOINT_SAVED, CONTEXT_BUILT, TASK_AWAITING_PERMISSION, TASK_COMPLETED, TASK_FAILED, TASK_RECEIVED
|
|
||||||
from app.core.permission_service import PermissionService
|
|
||||||
from app.state.checkpoint_store import SQLiteCheckpointStore
|
|
||||||
from app.state.task_state_store import SQLiteTaskStateStore
|
|
||||||
|
|
||||||
|
|
||||||
class AsyncRuntimeLoop:
|
|
||||||
"""Async runtime loop using LLM orchestrator."""
|
|
||||||
|
|
||||||
def __init__(
|
|
||||||
self,
|
|
||||||
event_bus: EventBus,
|
|
||||||
task_state_store: SQLiteTaskStateStore,
|
|
||||||
checkpoint_store: SQLiteCheckpointStore,
|
|
||||||
context_builder: ContextBuilder,
|
|
||||||
router: AsyncRouter,
|
|
||||||
execution_engine: ExecutionEngine,
|
|
||||||
permission_service: PermissionService,
|
|
||||||
memory_interface=None,
|
|
||||||
) -> None:
|
|
||||||
self._event_bus = event_bus
|
|
||||||
self._task_state_store = task_state_store
|
|
||||||
self._checkpoint_store = checkpoint_store
|
|
||||||
self._context_builder = context_builder
|
|
||||||
self._router = router
|
|
||||||
self._execution_engine = execution_engine
|
|
||||||
self._permission_service = permission_service
|
|
||||||
self._memory_interface = memory_interface
|
|
||||||
|
|
||||||
async def run_task(self, task: UserTask) -> dict[str, object]:
|
|
||||||
state = self._task_state_store.create_task(
|
|
||||||
task.task_id,
|
|
||||||
{
|
|
||||||
"status": "received",
|
|
||||||
"session_id": task.session_id,
|
|
||||||
"plan": None,
|
|
||||||
"task_input": task.input,
|
|
||||||
"task_context": task.context,
|
|
||||||
},
|
|
||||||
)
|
|
||||||
self._publish(task, TASK_RECEIVED, {"status": "received"})
|
|
||||||
|
|
||||||
checkpoint = TaskCheckpoint(task_id=task.task_id, status="received")
|
|
||||||
self._checkpoint_store.save(checkpoint)
|
|
||||||
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
|
|
||||||
|
|
||||||
context = self._context_builder.build(task=task, checkpoint=checkpoint)
|
|
||||||
self._publish(task, CONTEXT_BUILT, {"keys": sorted(context.keys())})
|
|
||||||
|
|
||||||
directive = await self._router.decide(state=state, context=context, task_id=task.task_id, session_id=task.session_id)
|
|
||||||
|
|
||||||
execution_result = await asyncio.to_thread(
|
|
||||||
self._execution_engine.execute,
|
|
||||||
task=task,
|
|
||||||
directive=directive,
|
|
||||||
)
|
|
||||||
|
|
||||||
state_patch = {"status": execution_result["status"], "last_directive": directive.model_dump(mode="json")}
|
|
||||||
|
|
||||||
if execution_result["status"] == "awaiting_permission":
|
|
||||||
state_patch["pending_permission_request"] = execution_result["result"].get("permission_request")
|
|
||||||
|
|
||||||
self._task_state_store.update_task(task.task_id, state_patch)
|
|
||||||
|
|
||||||
status = execution_result["status"]
|
|
||||||
|
|
||||||
if status == "completed":
|
|
||||||
self._publish(task, TASK_COMPLETED, {"directive": directive.model_dump(mode="json"), "execution_result": execution_result["result"]})
|
|
||||||
elif status == "failed":
|
|
||||||
self._publish(task, TASK_FAILED, {"error": execution_result.get("result", {}).get("error")})
|
|
||||||
|
|
||||||
checkpoint.status = status
|
|
||||||
self._checkpoint_store.save(checkpoint)
|
|
||||||
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
|
|
||||||
|
|
||||||
# Save task and result to memory for session context
|
|
||||||
self._save_to_memory(task, execution_result, status)
|
|
||||||
|
|
||||||
return {
|
|
||||||
"task_id": task.task_id,
|
|
||||||
"status": status,
|
|
||||||
"directive": directive.model_dump(mode="json"),
|
|
||||||
"result": execution_result.get("result"),
|
|
||||||
"events": list(self._event_bus.get_task_events(task.task_id)),
|
|
||||||
}
|
|
||||||
|
|
||||||
def _publish(self, task: UserTask, event_type: str, payload: dict) -> None:
|
|
||||||
if not self._event_bus:
|
|
||||||
return
|
|
||||||
event = RuntimeEvent(
|
|
||||||
task_id=task.task_id,
|
|
||||||
session_id=task.session_id,
|
|
||||||
sequence=self._event_bus.next_sequence(task.task_id),
|
|
||||||
type=event_type,
|
|
||||||
payload=payload,
|
|
||||||
)
|
|
||||||
self._event_bus.publish(event)
|
|
||||||
|
|
||||||
def _save_to_memory(self, task: UserTask, execution_result: dict, status: str) -> None:
|
|
||||||
"""Save task input and result to memory for session context."""
|
|
||||||
if not self._memory_interface:
|
|
||||||
return
|
|
||||||
|
|
||||||
try:
|
|
||||||
# Save task input as summary
|
|
||||||
self._memory_interface.insert(
|
|
||||||
text=f"User request: {task.input}",
|
|
||||||
kind="summary",
|
|
||||||
source="user",
|
|
||||||
task_id=task.task_id,
|
|
||||||
session_id=task.session_id,
|
|
||||||
weight=0.8,
|
|
||||||
metadata={"status": status},
|
|
||||||
)
|
|
||||||
|
|
||||||
# Save execution result
|
|
||||||
result_text = ""
|
|
||||||
if status == "completed":
|
|
||||||
step_results = execution_result.get("result", {}).get("step_results", [])
|
|
||||||
if step_results:
|
|
||||||
for step in step_results:
|
|
||||||
tool_result = step.get("result", {}).get("result", {})
|
|
||||||
if tool_result.get("output"):
|
|
||||||
result_text += f" | {step.get('step_id')}: {tool_result.get('output')[:200]}"
|
|
||||||
elif status == "failed":
|
|
||||||
result_text = f" | Error: {execution_result.get('result', {}).get('error', 'Unknown')}"
|
|
||||||
|
|
||||||
if result_text:
|
|
||||||
self._memory_interface.insert(
|
|
||||||
text=f"Result: {status}{result_text}",
|
|
||||||
kind="tool_result",
|
|
||||||
source="system",
|
|
||||||
task_id=task.task_id,
|
|
||||||
session_id=task.session_id,
|
|
||||||
weight=0.7,
|
|
||||||
metadata={"status": status},
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
import logging
|
|
||||||
logging.getLogger(__name__).warning(f"Failed to save to memory: {e}")
|
|
||||||
|
|
@ -1,643 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import json
|
|
||||||
from concurrent.futures import Future, ThreadPoolExecutor
|
|
||||||
from threading import RLock
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
from app.core.config import AppConfig, load_app_config
|
|
||||||
from app.core.context_builder import ContextBuilder
|
|
||||||
from app.core.command_analyzer import CommandAnalyzer
|
|
||||||
from app.core.contracts import UserTask
|
|
||||||
from app.core.execution_engine import ExecutionEngine
|
|
||||||
from app.core.execution_scheduler import ExecutionScheduler
|
|
||||||
from app.core.async_router import AsyncRouter
|
|
||||||
from app.events.event_bus import EventBus
|
|
||||||
from app.events.event_store import SQLiteEventStore
|
|
||||||
from app.memory import MemoryInterface, MemoryStore, VectorIndex
|
|
||||||
from app.memory.recall import MemoryRecallService
|
|
||||||
from app.memory.write_policy import MemoryWritePolicy
|
|
||||||
from app.models import (
|
|
||||||
CoderAdapter,
|
|
||||||
CriticAdapter,
|
|
||||||
EmbeddingsAdapter,
|
|
||||||
OrchestratorAdapter,
|
|
||||||
create_adapter,
|
|
||||||
)
|
|
||||||
from app.models.async_adapters import AsyncOrchestratorAdapter, AsyncCriticAdapter, AsyncCoderAdapter
|
|
||||||
from app.permissions.approval_store import SQLiteApprovalStore
|
|
||||||
from app.core.permission_service import PermissionService
|
|
||||||
from app.runtime.runtime_loop import RuntimeLoop
|
|
||||||
from app.state.checkpoint_store import SQLiteCheckpointStore
|
|
||||||
from app.state.task_state_store import SQLiteTaskStateStore
|
|
||||||
from app.tools.file_read import FileReadTool
|
|
||||||
from app.tools.file_write import FileWriteTool
|
|
||||||
from app.tools.registry import ToolRegistry
|
|
||||||
from app.tools.sandbox import ToolSandbox
|
|
||||||
from app.tools.shell_exec import ShellExecTool
|
|
||||||
from app.tools.memory_tools import MemoryInsertTool, MemorySearchTool, MemoryListTool
|
|
||||||
|
|
||||||
|
|
||||||
class RuntimeController:
|
|
||||||
"""Composition root for the ducklm runtime."""
|
|
||||||
|
|
||||||
def __init__(self, base_dir: str | Path | None = None) -> None:
|
|
||||||
self.base_dir = Path(base_dir or Path(__file__).resolve().parents[2])
|
|
||||||
self.config: AppConfig = load_app_config(self.base_dir / "config")
|
|
||||||
|
|
||||||
self.event_bus = EventBus(
|
|
||||||
SQLiteEventStore(self.base_dir / "data" / "events" / "events.sqlite3")
|
|
||||||
)
|
|
||||||
self.task_state_store = SQLiteTaskStateStore(
|
|
||||||
self.base_dir / "data" / "state" / "task_state.sqlite3"
|
|
||||||
)
|
|
||||||
self.checkpoint_store = SQLiteCheckpointStore(
|
|
||||||
self.base_dir / "data" / "state" / "checkpoints.sqlite3"
|
|
||||||
)
|
|
||||||
self.approval_store = SQLiteApprovalStore(
|
|
||||||
self.base_dir / "data" / "permissions" / "approvals.sqlite3"
|
|
||||||
)
|
|
||||||
|
|
||||||
self._thinker: OrchestratorAdapter | None = None
|
|
||||||
self._json_compiler: OrchestratorAdapter | None = None
|
|
||||||
self._orchestrator: OrchestratorAdapter | None = None
|
|
||||||
self._coder: CoderAdapter | None = None
|
|
||||||
self._critic: CriticAdapter | None = None
|
|
||||||
self._sys_util: OrchestratorAdapter | None = None
|
|
||||||
self._model_cache: dict[tuple[object, ...], tuple[object, RLock]] = {}
|
|
||||||
self._memory_interface: MemoryInterface | None = None
|
|
||||||
self._memory_policy: MemoryWritePolicy | None = None
|
|
||||||
self._background_executor = ThreadPoolExecutor(max_workers=4, thread_name_prefix="ducklm-task")
|
|
||||||
self._background_tasks: dict[str, Future[dict[str, object]]] = {}
|
|
||||||
self.tool_registry = None
|
|
||||||
self.tool_sandbox = None
|
|
||||||
|
|
||||||
self._init_models()
|
|
||||||
self._init_memory()
|
|
||||||
|
|
||||||
runtime_config = self.config.runtime
|
|
||||||
|
|
||||||
self.tool_sandbox = ToolSandbox(
|
|
||||||
allowed_root=self.base_dir,
|
|
||||||
timeout_ms=runtime_config.step_timeout_ms,
|
|
||||||
command_timeout_ms=runtime_config.shell_command_timeout_ms,
|
|
||||||
idle_timeout_ms=runtime_config.shell_idle_timeout_ms,
|
|
||||||
)
|
|
||||||
|
|
||||||
self.tool_registry = self._create_tool_registry()
|
|
||||||
|
|
||||||
context_config = {
|
|
||||||
"max_context_tokens": runtime_config.max_context_tokens,
|
|
||||||
"context_budgets": runtime_config.context_budgets,
|
|
||||||
"reserve_for_generation_pct": runtime_config.reserve_for_generation_pct,
|
|
||||||
}
|
|
||||||
|
|
||||||
self.context_builder = ContextBuilder(
|
|
||||||
memory_interface=self._memory_interface,
|
|
||||||
tool_registry=self.tool_registry,
|
|
||||||
config=context_config,
|
|
||||||
)
|
|
||||||
|
|
||||||
self._prompts = self._load_prompts()
|
|
||||||
# ensure sys_util prompt is present in prompts dict for router
|
|
||||||
# ensure sys_util prompt is available to router (prompts.json may have "sys_util" key)
|
|
||||||
if "sys_util" not in self._prompts and "prompts" in self.config:
|
|
||||||
self._prompts["sys_util"] = self.config.get("sys_util")
|
|
||||||
|
|
||||||
self.context_builder = ContextBuilder(
|
|
||||||
memory_interface=self._memory_interface,
|
|
||||||
tool_registry=self.tool_registry,
|
|
||||||
config=context_config,
|
|
||||||
)
|
|
||||||
|
|
||||||
self.router = AsyncRouter(
|
|
||||||
thinker=None,
|
|
||||||
json_compiler=None,
|
|
||||||
intent_parser=None,
|
|
||||||
prompts=self._prompts,
|
|
||||||
event_bus=self.event_bus,
|
|
||||||
tool_registry=self.tool_registry,
|
|
||||||
retry_limit=runtime_config.orchestrator_retry_limit,
|
|
||||||
debug=runtime_config.debug if hasattr(runtime_config, 'debug') else False,
|
|
||||||
log_length=runtime_config.debug_orchestrator_log_length if hasattr(runtime_config, 'debug_orchestrator_log_length') else 500,
|
|
||||||
json_fix_retry_limit=runtime_config.json_fix_retry_limit if hasattr(runtime_config, 'json_fix_retry_limit') else 2,
|
|
||||||
json_fix_use_sys_util=runtime_config.json_fix_use_sys_util if hasattr(runtime_config, "json_fix_use_sys_util") else True,
|
|
||||||
intent_classifier=runtime_config.intent_classifier if hasattr(runtime_config, "intent_classifier") else "thinker",
|
|
||||||
)
|
|
||||||
|
|
||||||
self.permission_service = PermissionService(
|
|
||||||
config=self._load_permissions_config(),
|
|
||||||
)
|
|
||||||
self.command_analyzer = CommandAnalyzer(self.permission_service)
|
|
||||||
|
|
||||||
self.execution_engine = ExecutionEngine(
|
|
||||||
event_bus=self.event_bus,
|
|
||||||
tool_registry=self.tool_registry,
|
|
||||||
permission_service=self.permission_service,
|
|
||||||
scheduler=ExecutionScheduler(
|
|
||||||
retry_limit=runtime_config.planner_retry_limit
|
|
||||||
),
|
|
||||||
critic=self._critic,
|
|
||||||
memory_policy=self._memory_policy,
|
|
||||||
memory_interface=self._memory_interface,
|
|
||||||
prompts=self._prompts,
|
|
||||||
recovery_limit=runtime_config.tool_retry_limit,
|
|
||||||
critic_retry_limit=runtime_config.critic_retry_limit,
|
|
||||||
command_analyzer=self.command_analyzer,
|
|
||||||
)
|
|
||||||
|
|
||||||
self.runtime_loop = RuntimeLoop(
|
|
||||||
event_bus=self.event_bus,
|
|
||||||
task_state_store=self.task_state_store,
|
|
||||||
checkpoint_store=self.checkpoint_store,
|
|
||||||
context_builder=self.context_builder,
|
|
||||||
router=self.router,
|
|
||||||
execution_engine=self.execution_engine,
|
|
||||||
permission_service=self.permission_service,
|
|
||||||
memory_interface=self._memory_interface,
|
|
||||||
)
|
|
||||||
|
|
||||||
def _load_prompts(self) -> dict[str, str]:
|
|
||||||
prompts_dir = self.base_dir / "config" / "prompts"
|
|
||||||
prompts = {}
|
|
||||||
|
|
||||||
if prompts_dir.is_dir():
|
|
||||||
for md_file in prompts_dir.glob("*.md"):
|
|
||||||
role = md_file.stem
|
|
||||||
prompts[role] = md_file.read_text(encoding="utf-8")
|
|
||||||
|
|
||||||
if prompts:
|
|
||||||
return prompts
|
|
||||||
|
|
||||||
prompts_file = self.base_dir / "config" / "prompts.json"
|
|
||||||
if prompts_file.exists():
|
|
||||||
with open(prompts_file) as f:
|
|
||||||
return json.load(f)
|
|
||||||
return {}
|
|
||||||
|
|
||||||
def _load_permissions_config(self) -> dict:
|
|
||||||
permissions_file = self.base_dir / "config" / "permissions.json"
|
|
||||||
if not permissions_file.exists():
|
|
||||||
return {}
|
|
||||||
with permissions_file.open("r", encoding="utf-8") as handle:
|
|
||||||
return json.load(handle)
|
|
||||||
|
|
||||||
def _init_models(self) -> None:
|
|
||||||
try:
|
|
||||||
memory_config = self.config.runtime.memory_thresholds or {}
|
|
||||||
if memory_config:
|
|
||||||
self._memory_policy = MemoryWritePolicy(
|
|
||||||
store_threshold=memory_config.get("default_store_weight", 0.8),
|
|
||||||
)
|
|
||||||
print("Models policy ready")
|
|
||||||
except Exception as e:
|
|
||||||
print(f"Models init failed: {e}")
|
|
||||||
|
|
||||||
def load_models_at_startup(self) -> None:
|
|
||||||
"""Load all LLM models synchronously. Called from startup hook in executor."""
|
|
||||||
import os
|
|
||||||
os.chdir(str(self.base_dir / "models"))
|
|
||||||
|
|
||||||
try:
|
|
||||||
print("Loading thinker model...")
|
|
||||||
thinker_config = self.config.models.thinker or {}
|
|
||||||
if thinker_config.get("path"):
|
|
||||||
llm, lock = self._get_or_create_llm("thinker", thinker_config)
|
|
||||||
self._thinker = OrchestratorAdapter(llm, system_prompt=self._prompts.get("thinker"), lock=lock)
|
|
||||||
print(f"Thinker loaded: {self._thinker} (model: {thinker_config.get('path')})")
|
|
||||||
|
|
||||||
print("Loading json_compiler model...")
|
|
||||||
compiler_config = self.config.models.json_compiler or {}
|
|
||||||
if compiler_config.get("path"):
|
|
||||||
llm, lock = self._get_or_create_llm("json_compiler", compiler_config)
|
|
||||||
self._json_compiler = OrchestratorAdapter(llm, system_prompt=self._prompts.get("json_compiler"), lock=lock)
|
|
||||||
print(f"JSON Compiler loaded: {self._json_compiler} (model: {compiler_config.get('path')})")
|
|
||||||
|
|
||||||
print("Loading coder model...")
|
|
||||||
coder_config = self.config.models.coder or {}
|
|
||||||
if coder_config.get("path"):
|
|
||||||
llm, lock = self._get_or_create_llm("coder", coder_config)
|
|
||||||
self._coder = CoderAdapter(llm, system_prompt=self._prompts.get("coder"), lock=lock)
|
|
||||||
print(f"Coder loaded: {self._coder} (model: {coder_config.get('path')})")
|
|
||||||
|
|
||||||
print("Loading critic model...")
|
|
||||||
critic_config = self.config.models.critic or {}
|
|
||||||
if critic_config.get("path"):
|
|
||||||
llm, lock = self._get_or_create_llm("critic", critic_config)
|
|
||||||
self._critic = CriticAdapter(llm, system_prompt=self._prompts.get("critic"), lock=lock)
|
|
||||||
print(f"Critic loaded: {self._critic} (model: {critic_config.get('path')})")
|
|
||||||
|
|
||||||
print("Loading sys_util model...")
|
|
||||||
sys_util_config = self.config.models.sys_util or {}
|
|
||||||
if sys_util_config.get("path"):
|
|
||||||
llm, lock = self._get_or_create_llm("sys_util", sys_util_config)
|
|
||||||
self._sys_util = OrchestratorAdapter(llm, system_prompt=self._prompts.get("sys_util"), lock=lock)
|
|
||||||
print(f"Sys_util loaded: {self._sys_util} (model: {sys_util_config.get('path')})")
|
|
||||||
|
|
||||||
print("All models loaded successfully")
|
|
||||||
|
|
||||||
async_thinker = AsyncOrchestratorAdapter(self._thinker) if self._thinker else None
|
|
||||||
async_compiler = AsyncOrchestratorAdapter(self._json_compiler) if self._json_compiler else None
|
|
||||||
async_coder = AsyncCoderAdapter(self._coder) if self._coder else None
|
|
||||||
async_critic = AsyncCriticAdapter(self._critic) if self._critic else None
|
|
||||||
async_sys_util = AsyncOrchestratorAdapter(self._sys_util) if self._sys_util else None
|
|
||||||
|
|
||||||
self.router.set_thinker(async_thinker)
|
|
||||||
self.router.set_json_compiler(async_compiler)
|
|
||||||
self.router.set_sys_util(async_sys_util)
|
|
||||||
self.router.set_tool_registry(self.tool_registry)
|
|
||||||
if async_critic:
|
|
||||||
self.execution_engine.set_critic(async_critic)
|
|
||||||
if async_coder:
|
|
||||||
self.execution_engine.set_coder(async_coder)
|
|
||||||
|
|
||||||
# Create MemoryRecallService using the configured model (default: sys_util)
|
|
||||||
# Reuses already-loaded async adapter — no duplicate model loading
|
|
||||||
recall_model_name = self.config.runtime.recall_model
|
|
||||||
recall_async_model = {
|
|
||||||
"sys_util": async_sys_util,
|
|
||||||
"thinker": async_thinker,
|
|
||||||
"json_compiler": async_compiler,
|
|
||||||
"critic": async_critic,
|
|
||||||
"coder": async_coder,
|
|
||||||
}.get(recall_model_name, async_sys_util)
|
|
||||||
|
|
||||||
self._recall_service = MemoryRecallService(
|
|
||||||
memory_interface=self._memory_interface,
|
|
||||||
recall_model=recall_async_model,
|
|
||||||
)
|
|
||||||
self.runtime_loop.set_recall_service(self._recall_service)
|
|
||||||
print(f"MemoryRecallService initialized with model: {recall_model_name}")
|
|
||||||
|
|
||||||
# Set memory policy in runtime loop
|
|
||||||
self.runtime_loop.set_memory_policy(self._memory_policy)
|
|
||||||
print(f"MemoryWritePolicy set: {self._memory_policy is not None}")
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
print(f"Failed to load models at startup: {e}")
|
|
||||||
raise RuntimeError(f"Model loading failed: {e}") from e
|
|
||||||
|
|
||||||
def _model_cache_key(self, model_config: dict) -> tuple[object, ...]:
|
|
||||||
path = str((self.base_dir / "models" / model_config.get("path", "")).resolve())
|
|
||||||
return (
|
|
||||||
path,
|
|
||||||
model_config.get("backend", "cpu"),
|
|
||||||
model_config.get("n_gpu_layers", 0),
|
|
||||||
model_config.get("n_ctx", 4096),
|
|
||||||
)
|
|
||||||
|
|
||||||
def _get_or_create_llm(self, model_type: str, model_config: dict):
|
|
||||||
key = self._model_cache_key(model_config)
|
|
||||||
cached = self._model_cache.get(key)
|
|
||||||
if cached:
|
|
||||||
print(f"Reusing model instance: {model_config.get('path')} for {model_type}")
|
|
||||||
return cached
|
|
||||||
|
|
||||||
llm = create_adapter(model_type, model_config, self.base_dir / "models")
|
|
||||||
lock = RLock()
|
|
||||||
cached = (llm, lock)
|
|
||||||
self._model_cache[key] = cached
|
|
||||||
return cached
|
|
||||||
|
|
||||||
def _init_memory(self) -> None:
|
|
||||||
try:
|
|
||||||
emb_config = self.config.models.embeddings or {}
|
|
||||||
model_path = self.base_dir / emb_config.get("path", "models/all-MiniLM-L6-v2")
|
|
||||||
if not model_path.exists() and not Path(emb_config.get("path", "")).is_absolute():
|
|
||||||
model_path = self.base_dir / "models" / emb_config.get("path", "all-MiniLM-L6-v2")
|
|
||||||
if not model_path.exists():
|
|
||||||
print(f"Memory init skipped: embeddings model not found at {model_path}")
|
|
||||||
self._memory_interface = None
|
|
||||||
return
|
|
||||||
embeddings = EmbeddingsAdapter(
|
|
||||||
model_path=model_path,
|
|
||||||
embedding_dim=emb_config.get("embedding_dim", 384),
|
|
||||||
)
|
|
||||||
|
|
||||||
store = MemoryStore(
|
|
||||||
self.base_dir / "data" / "memory" / "memory.sqlite3"
|
|
||||||
)
|
|
||||||
vector_index = VectorIndex(
|
|
||||||
index_path=self.base_dir / "data" / "memory" / "index.bin",
|
|
||||||
embedding_dim=embeddings.embedding_dim,
|
|
||||||
)
|
|
||||||
|
|
||||||
self._memory_interface = MemoryInterface(store, vector_index, embeddings)
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
print(f"Memory init failed: {e}")
|
|
||||||
self._memory_interface = None
|
|
||||||
|
|
||||||
def _create_tool_registry(self) -> ToolRegistry:
|
|
||||||
from app.tools.registry import ToolRegistry
|
|
||||||
from app.tools.plugins.shell_exec import Tool as ShellExecTool
|
|
||||||
from app.tools.plugins.file_read import Tool as FileReadTool
|
|
||||||
from app.tools.plugins.file_write import Tool as FileWriteTool
|
|
||||||
from app.tools.plugins.memory_tools import Tool as MemoryTool
|
|
||||||
from app.tools.discover import ToolDiscovery
|
|
||||||
|
|
||||||
registry = ToolRegistry()
|
|
||||||
|
|
||||||
tool_init_map = {
|
|
||||||
"shell_exec": lambda m: ShellExecTool(self.tool_sandbox),
|
|
||||||
"file_read": lambda m: FileReadTool(self.tool_sandbox),
|
|
||||||
"file_write": lambda m: FileWriteTool(self.tool_sandbox),
|
|
||||||
"memory": lambda m: MemoryTool(self._memory_interface),
|
|
||||||
}
|
|
||||||
|
|
||||||
discovery = ToolDiscovery()
|
|
||||||
discovered = discovery.discover()
|
|
||||||
|
|
||||||
for name, data in discovered.items():
|
|
||||||
init_fn = tool_init_map.get(name)
|
|
||||||
if init_fn:
|
|
||||||
tool = init_fn(data.get("manifest", {}))
|
|
||||||
registry.register(tool)
|
|
||||||
registry._schemas[name] = {
|
|
||||||
"description": data.get("manifest", {}).get("description", ""),
|
|
||||||
"args_schema": data.get("manifest", {}).get("args_schema", {}),
|
|
||||||
"requires_permission": data.get("manifest", {}).get("requires_permission", False),
|
|
||||||
}
|
|
||||||
print(f"Registered tool: {name}")
|
|
||||||
else:
|
|
||||||
print(f"No init mapping for tool: {name} - skipping")
|
|
||||||
|
|
||||||
return registry
|
|
||||||
|
|
||||||
@property
|
|
||||||
def orchestrator(self) -> OrchestratorAdapter | None:
|
|
||||||
return self._orchestrator
|
|
||||||
|
|
||||||
@property
|
|
||||||
def coder(self) -> CoderAdapter | None:
|
|
||||||
return self._coder
|
|
||||||
|
|
||||||
@property
|
|
||||||
def critic(self) -> CriticAdapter | None:
|
|
||||||
return self._critic
|
|
||||||
|
|
||||||
@property
|
|
||||||
def memory_interface(self) -> MemoryInterface | None:
|
|
||||||
return self._memory_interface
|
|
||||||
|
|
||||||
def _ensure_orchestrator(self) -> OrchestratorAdapter | None:
|
|
||||||
if self._orchestrator is not None:
|
|
||||||
return self._orchestrator
|
|
||||||
try:
|
|
||||||
orch_config = self.config.models.orchestrator or {}
|
|
||||||
if orch_config.get("path"):
|
|
||||||
llm, lock = self._get_or_create_llm("orchestrator", orch_config)
|
|
||||||
self._orchestrator = OrchestratorAdapter(llm, lock=lock)
|
|
||||||
except Exception as e:
|
|
||||||
print(f"Orchestrator load failed: {e}")
|
|
||||||
return self._orchestrator
|
|
||||||
|
|
||||||
def _ensure_critic(self) -> CriticAdapter | None:
|
|
||||||
if self._critic is not None:
|
|
||||||
return self._critic
|
|
||||||
try:
|
|
||||||
critic_config = self.config.models.critic or {}
|
|
||||||
if critic_config.get("path"):
|
|
||||||
llm, lock = self._get_or_create_llm("critic", critic_config)
|
|
||||||
self._critic = CriticAdapter(llm, lock=lock)
|
|
||||||
except Exception as e:
|
|
||||||
print(f"Critic load failed: {e}")
|
|
||||||
return self._critic
|
|
||||||
|
|
||||||
def handle_task(self, task: UserTask) -> dict[str, object]:
|
|
||||||
return self.runtime_loop.run_task(task)
|
|
||||||
|
|
||||||
def submit_task(self, task: UserTask) -> dict[str, object]:
|
|
||||||
self._background_tasks[task.task_id] = self._background_executor.submit(
|
|
||||||
self.handle_task,
|
|
||||||
task,
|
|
||||||
)
|
|
||||||
return {"task_id": task.task_id, "status": "accepted"}
|
|
||||||
|
|
||||||
def resolve_permission(self, task_id: str, decision: str) -> dict[str, object]:
|
|
||||||
return self.runtime_loop.resolve_permission(
|
|
||||||
task_id=task_id, decision=decision
|
|
||||||
)
|
|
||||||
|
|
||||||
def submit_permission_resolution(self, task_id: str, decision: str) -> dict[str, object]:
|
|
||||||
if not self.task_state_store.get_task(task_id):
|
|
||||||
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
|
|
||||||
self._background_tasks[task_id] = self._background_executor.submit(
|
|
||||||
self.resolve_permission,
|
|
||||||
task_id,
|
|
||||||
decision,
|
|
||||||
)
|
|
||||||
return {"task_id": task_id, "status": "accepted"}
|
|
||||||
|
|
||||||
def resolve_secret(self, task_id: str, secret: str) -> dict[str, object]:
|
|
||||||
return self.runtime_loop.resolve_secret(
|
|
||||||
task_id=task_id, secret=secret
|
|
||||||
)
|
|
||||||
|
|
||||||
def submit_secret_resolution(self, task_id: str, secret: str) -> dict[str, object]:
|
|
||||||
if not self.task_state_store.get_task(task_id):
|
|
||||||
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
|
|
||||||
self._background_tasks[task_id] = self._background_executor.submit(
|
|
||||||
self.resolve_secret,
|
|
||||||
task_id,
|
|
||||||
secret,
|
|
||||||
)
|
|
||||||
return {"task_id": task_id, "status": "accepted"}
|
|
||||||
|
|
||||||
def resolve_password(self, task_id: str, password: str) -> dict[str, object]:
|
|
||||||
return self.runtime_loop.resolve_password(
|
|
||||||
task_id=task_id, password=password
|
|
||||||
)
|
|
||||||
|
|
||||||
def resolve_review(self, task_id: str, decision: str, correction: str | None = None) -> dict[str, object]:
|
|
||||||
return self.runtime_loop.resolve_review(
|
|
||||||
task_id=task_id,
|
|
||||||
decision=decision,
|
|
||||||
correction=correction,
|
|
||||||
)
|
|
||||||
|
|
||||||
def submit_review_resolution(self, task_id: str, decision: str, correction: str | None = None) -> dict[str, object]:
|
|
||||||
if not self.task_state_store.get_task(task_id):
|
|
||||||
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
|
|
||||||
self._background_tasks[task_id] = self._background_executor.submit(
|
|
||||||
self.resolve_review,
|
|
||||||
task_id,
|
|
||||||
decision,
|
|
||||||
correction,
|
|
||||||
)
|
|
||||||
return {"task_id": task_id, "status": "accepted"}
|
|
||||||
|
|
||||||
def submit_password_resolution(self, task_id: str, password: str) -> dict[str, object]:
|
|
||||||
if not self.task_state_store.get_task(task_id):
|
|
||||||
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
|
|
||||||
self._background_tasks[task_id] = self._background_executor.submit(
|
|
||||||
self.resolve_password,
|
|
||||||
task_id,
|
|
||||||
password,
|
|
||||||
)
|
|
||||||
return {"task_id": task_id, "status": "accepted"}
|
|
||||||
|
|
||||||
def handle_critic_feedback(
|
|
||||||
self,
|
|
||||||
feedback: str,
|
|
||||||
task_id: str | None = None,
|
|
||||||
session_id: str | None = None,
|
|
||||||
feedback_type: str | None = None,
|
|
||||||
severity: str | None = None,
|
|
||||||
correction: str | None = None,
|
|
||||||
remember: bool = True,
|
|
||||||
retry: bool = False,
|
|
||||||
assistant_answer: str | None = None,
|
|
||||||
correctness_override: float | None = None,
|
|
||||||
usefulness_override: float | None = None,
|
|
||||||
safety_override: float | None = None,
|
|
||||||
) -> dict[str, object]:
|
|
||||||
target_task_id = task_id
|
|
||||||
target_session_id = session_id
|
|
||||||
|
|
||||||
if not target_session_id and not target_task_id:
|
|
||||||
return {
|
|
||||||
"status": "error",
|
|
||||||
"message": "Either task_id or session_id must be provided",
|
|
||||||
}
|
|
||||||
|
|
||||||
state = self.task_state_store.get_task(target_task_id) if target_task_id else None
|
|
||||||
if not target_session_id and state:
|
|
||||||
target_session_id = state.get("session_id")
|
|
||||||
|
|
||||||
if not target_task_id and target_session_id:
|
|
||||||
recent_tasks = self.task_state_store.get_session_tasks(target_session_id, limit=1)
|
|
||||||
if recent_tasks:
|
|
||||||
target_task_id = recent_tasks[0]["task_id"]
|
|
||||||
|
|
||||||
min_weight = 0.3
|
|
||||||
max_weight = 0.95
|
|
||||||
user_weight = 0.9
|
|
||||||
|
|
||||||
final_weight = max(min_weight, min(max_weight, user_weight))
|
|
||||||
|
|
||||||
task_input = state.get("task_input") if state else None
|
|
||||||
last_directive = state.get("last_directive") if state else None
|
|
||||||
feedback_type = feedback_type or "other"
|
|
||||||
severity = severity or "major"
|
|
||||||
|
|
||||||
lesson = self._build_feedback_lesson(
|
|
||||||
feedback_type=feedback_type,
|
|
||||||
severity=severity,
|
|
||||||
feedback=feedback,
|
|
||||||
correction=correction,
|
|
||||||
task_input=task_input,
|
|
||||||
)
|
|
||||||
|
|
||||||
metadata = {
|
|
||||||
"feedback_text": feedback,
|
|
||||||
"feedback_type": feedback_type,
|
|
||||||
"severity": severity,
|
|
||||||
"correction": correction,
|
|
||||||
"assistant_answer": assistant_answer,
|
|
||||||
"task_input": task_input,
|
|
||||||
"last_directive": last_directive,
|
|
||||||
"overrides": {
|
|
||||||
"correctness": correctness_override,
|
|
||||||
"usefulness": usefulness_override,
|
|
||||||
"safety": safety_override,
|
|
||||||
},
|
|
||||||
"source": "user",
|
|
||||||
}
|
|
||||||
|
|
||||||
feedback_text = lesson
|
|
||||||
if correctness_override is not None:
|
|
||||||
feedback_text += f" | Correctness corrected to: {correctness_override}"
|
|
||||||
if usefulness_override is not None:
|
|
||||||
feedback_text += f" | Usefulness corrected to: {usefulness_override}"
|
|
||||||
if safety_override is not None:
|
|
||||||
feedback_text += f" | Safety corrected to: {safety_override}"
|
|
||||||
|
|
||||||
retry_result = None
|
|
||||||
stored = False
|
|
||||||
store_error = None
|
|
||||||
try:
|
|
||||||
if remember and self._memory_interface:
|
|
||||||
self._memory_interface.insert(
|
|
||||||
text=feedback_text,
|
|
||||||
kind="critique",
|
|
||||||
source="user",
|
|
||||||
task_id=target_task_id,
|
|
||||||
session_id=target_session_id,
|
|
||||||
weight=final_weight,
|
|
||||||
metadata=metadata,
|
|
||||||
)
|
|
||||||
stored = True
|
|
||||||
elif remember and not self._memory_interface:
|
|
||||||
store_error = "Memory not available"
|
|
||||||
except Exception as e:
|
|
||||||
store_error = str(e)
|
|
||||||
|
|
||||||
if retry and task_input:
|
|
||||||
retry_input = self._build_retry_input(
|
|
||||||
task_input=task_input,
|
|
||||||
feedback=feedback,
|
|
||||||
feedback_type=feedback_type,
|
|
||||||
correction=correction,
|
|
||||||
)
|
|
||||||
retry_task = UserTask(
|
|
||||||
session_id=target_session_id or "feedback-retry",
|
|
||||||
input=retry_input,
|
|
||||||
context={
|
|
||||||
"feedback_retry": True,
|
|
||||||
"original_task_id": target_task_id,
|
|
||||||
"feedback_type": feedback_type,
|
|
||||||
"severity": severity,
|
|
||||||
"correction": correction,
|
|
||||||
},
|
|
||||||
)
|
|
||||||
retry_result = self.handle_task(retry_task)
|
|
||||||
|
|
||||||
status = "ok" if stored or not remember else "error"
|
|
||||||
return {
|
|
||||||
"status": status,
|
|
||||||
"message": "Feedback saved" if stored else (store_error or "Feedback accepted"),
|
|
||||||
"stored": stored,
|
|
||||||
"task_id": target_task_id,
|
|
||||||
"session_id": target_session_id,
|
|
||||||
"lesson": lesson,
|
|
||||||
"retry_result": retry_result,
|
|
||||||
}
|
|
||||||
|
|
||||||
def _build_feedback_lesson(
|
|
||||||
self,
|
|
||||||
feedback_type: str,
|
|
||||||
severity: str,
|
|
||||||
feedback: str,
|
|
||||||
correction: str | None,
|
|
||||||
task_input: str | None,
|
|
||||||
) -> str:
|
|
||||||
parts = [
|
|
||||||
"User critique lesson.",
|
|
||||||
f"Error type: {feedback_type}.",
|
|
||||||
f"Severity: {severity}.",
|
|
||||||
]
|
|
||||||
if task_input:
|
|
||||||
parts.append(f"Original task: {task_input}")
|
|
||||||
if feedback:
|
|
||||||
parts.append(f"What was wrong: {feedback}")
|
|
||||||
if correction:
|
|
||||||
parts.append(f"Preferred correction: {correction}")
|
|
||||||
return " | ".join(parts)
|
|
||||||
|
|
||||||
def _build_retry_input(
|
|
||||||
self,
|
|
||||||
task_input: str,
|
|
||||||
feedback: str,
|
|
||||||
feedback_type: str,
|
|
||||||
correction: str | None,
|
|
||||||
) -> str:
|
|
||||||
retry_input = (
|
|
||||||
f"Повтори задачу с учетом обратной связи.\n"
|
|
||||||
f"Исходная задача: {task_input}\n"
|
|
||||||
f"Тип ошибки: {feedback_type}\n"
|
|
||||||
f"Что было неверно: {feedback}\n"
|
|
||||||
)
|
|
||||||
if correction:
|
|
||||||
retry_input += f"Как должно быть: {correction}\n"
|
|
||||||
return retry_input
|
|
||||||
|
|
@ -1,688 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import asyncio
|
|
||||||
|
|
||||||
from app.core.context_builder import ContextBuilder
|
|
||||||
from app.core.contracts import CriticScore, ExecutionDirective, PermissionDecision, PermissionRequest, RuntimeEvent, SecretRequest, TaskCheckpoint, UserTask
|
|
||||||
from app.core.execution_engine import ExecutionEngine
|
|
||||||
from app.core.async_router import AsyncRouter
|
|
||||||
from app.events.event_bus import EventBus
|
|
||||||
from app.events.event_types import CHECKPOINT_SAVED, CONTEXT_BUILT, MEMORY_RECALL_USED, MEMORY_WRITE_DECIDED, REVIEW_RESOLVED, TASK_AWAITING_INPUT, TASK_AWAITING_PERMISSION, TASK_AWAITING_REVIEW, TASK_COMPLETED, TASK_FAILED, TASK_RECEIVED
|
|
||||||
from app.core.permission_service import PermissionService
|
|
||||||
from app.memory.recall import MemoryRecallService
|
|
||||||
from app.memory.write_policy import MemoryWritePolicy
|
|
||||||
from app.state.checkpoint_store import SQLiteCheckpointStore
|
|
||||||
from app.state.task_state_store import SQLiteTaskStateStore
|
|
||||||
|
|
||||||
|
|
||||||
def _build_response_directive(execution_result: dict) -> dict | None:
|
|
||||||
"""Build a response_directive from step_results or direct output for the client."""
|
|
||||||
result = execution_result.get("result", {})
|
|
||||||
|
|
||||||
# Case 1: step_results from plan execution
|
|
||||||
step_results = result.get("step_results")
|
|
||||||
if step_results:
|
|
||||||
response_parts = []
|
|
||||||
for step in step_results:
|
|
||||||
result_data = step.get("result", {})
|
|
||||||
tool_result = result_data.get("result", result_data)
|
|
||||||
if tool_result.get("ok") and tool_result.get("output"):
|
|
||||||
response_parts.append(str(tool_result["output"]))
|
|
||||||
if response_parts:
|
|
||||||
response_text = "\n\n".join(response_parts)
|
|
||||||
return ExecutionDirective(
|
|
||||||
type="respond", payload={"text": response_text}
|
|
||||||
).model_dump(mode="json")
|
|
||||||
|
|
||||||
# Case 2: direct tool output (e.g. from resolve_secret -> execute_tool)
|
|
||||||
if result.get("ok") and result.get("output"):
|
|
||||||
return ExecutionDirective(
|
|
||||||
type="respond", payload={"text": str(result["output"])}
|
|
||||||
).model_dump(mode="json")
|
|
||||||
|
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
class RuntimeLoop:
|
|
||||||
"""Central control loop skeleton coordinating task state and events."""
|
|
||||||
|
|
||||||
def __init__(
|
|
||||||
self,
|
|
||||||
event_bus: EventBus,
|
|
||||||
task_state_store: SQLiteTaskStateStore,
|
|
||||||
checkpoint_store: SQLiteCheckpointStore,
|
|
||||||
context_builder: ContextBuilder,
|
|
||||||
router: AsyncRouter,
|
|
||||||
execution_engine: ExecutionEngine,
|
|
||||||
permission_service: PermissionService,
|
|
||||||
memory_interface=None,
|
|
||||||
recall_service: MemoryRecallService | None = None,
|
|
||||||
memory_policy: MemoryWritePolicy | None = None,
|
|
||||||
) -> None:
|
|
||||||
self._event_bus = event_bus
|
|
||||||
self._task_state_store = task_state_store
|
|
||||||
self._checkpoint_store = checkpoint_store
|
|
||||||
self._context_builder = context_builder
|
|
||||||
self._router = router
|
|
||||||
self._execution_engine = execution_engine
|
|
||||||
self._permission_service = permission_service
|
|
||||||
self._memory_interface = memory_interface
|
|
||||||
self._recall_service = recall_service
|
|
||||||
self._memory_policy = memory_policy
|
|
||||||
|
|
||||||
def set_recall_service(self, recall_service: MemoryRecallService) -> None:
|
|
||||||
self._recall_service = recall_service
|
|
||||||
|
|
||||||
def set_memory_policy(self, policy: MemoryWritePolicy | None) -> None:
|
|
||||||
self._memory_policy = policy
|
|
||||||
|
|
||||||
def run_task(self, task: UserTask) -> dict[str, object]:
|
|
||||||
# Check input for hard-stop commands BEFORE processing
|
|
||||||
hard_stop_check = self._permission_service.check_shell_command(
|
|
||||||
task_id=task.task_id,
|
|
||||||
session_id=task.session_id,
|
|
||||||
command=task.input,
|
|
||||||
)
|
|
||||||
if hard_stop_check.get("decision") == "hard_stop":
|
|
||||||
# Immediately reject hard-stop commands
|
|
||||||
self._publish(task, TASK_RECEIVED, {"status": "received"})
|
|
||||||
checkpoint = TaskCheckpoint(task_id=task.task_id, status="received")
|
|
||||||
self._checkpoint_store.save(checkpoint)
|
|
||||||
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
|
|
||||||
|
|
||||||
error_msg = f"⚠️ BLOCKED: {hard_stop_check.get('reason', 'Hard stop command')}"
|
|
||||||
self._publish(task, TASK_FAILED, {
|
|
||||||
"directive": {},
|
|
||||||
"execution_result": {"error": error_msg},
|
|
||||||
})
|
|
||||||
return {
|
|
||||||
"task_id": task.task_id,
|
|
||||||
"status": "failed",
|
|
||||||
"directive": {},
|
|
||||||
"result": {"error": error_msg},
|
|
||||||
"events": [e.model_dump(mode="json") for e in self._event_bus.list_for_task(task.task_id)],
|
|
||||||
}
|
|
||||||
|
|
||||||
state = self._task_state_store.create_task(
|
|
||||||
task.task_id,
|
|
||||||
{
|
|
||||||
"status": "received",
|
|
||||||
"session_id": task.session_id,
|
|
||||||
"plan": None,
|
|
||||||
"task_input": task.input,
|
|
||||||
"task_context": task.context,
|
|
||||||
},
|
|
||||||
)
|
|
||||||
self._publish(task, TASK_RECEIVED, {"status": "received"})
|
|
||||||
|
|
||||||
checkpoint = TaskCheckpoint(task_id=task.task_id, status="received")
|
|
||||||
self._checkpoint_store.save(checkpoint)
|
|
||||||
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
|
|
||||||
|
|
||||||
context = self._context_builder.build(task=task, checkpoint=checkpoint)
|
|
||||||
self._publish(task, CONTEXT_BUILT, {"keys": sorted(context.keys())})
|
|
||||||
|
|
||||||
# Active memory recall: system decides if it needs to search memory
|
|
||||||
recall_result = asyncio.run(self._run_recall(task))
|
|
||||||
if recall_result["should_recall"]:
|
|
||||||
context["memory_recall"] = {
|
|
||||||
"query": recall_result["query"],
|
|
||||||
"summary": recall_result["summary"],
|
|
||||||
"entries": [
|
|
||||||
{"text": e.text, "kind": e.kind, "weight": e.weight}
|
|
||||||
for e in recall_result["results"]
|
|
||||||
],
|
|
||||||
}
|
|
||||||
self._publish(task, MEMORY_RECALL_USED, {
|
|
||||||
"query": recall_result["query"],
|
|
||||||
"results_count": len(recall_result["results"]),
|
|
||||||
"reason": recall_result["reason"],
|
|
||||||
})
|
|
||||||
|
|
||||||
directive = asyncio.run(
|
|
||||||
self._router.decide(state=state, context=context, task_id=task.task_id, session_id=task.session_id)
|
|
||||||
)
|
|
||||||
execution_result = self._execution_engine.execute(task=task, directive=directive)
|
|
||||||
state_patch = {"status": execution_result["status"], "last_directive": directive.model_dump(mode="json")}
|
|
||||||
if execution_result["status"] == "awaiting_permission":
|
|
||||||
state_patch["pending_permission_request"] = execution_result["result"]["permission_request"]
|
|
||||||
state_patch["pending_secret_request"] = None
|
|
||||||
state_patch["resolved_permission_decision"] = None
|
|
||||||
elif execution_result["status"] == "awaiting_input":
|
|
||||||
state_patch["pending_permission_request"] = None
|
|
||||||
state_patch["pending_secret_request"] = execution_result["result"]["secret_request"]
|
|
||||||
state_patch["resolved_permission_decision"] = None
|
|
||||||
elif execution_result["status"] == "awaiting_password":
|
|
||||||
state_patch["pending_permission_request"] = None
|
|
||||||
state_patch["pending_secret_request"] = None
|
|
||||||
state_patch["resolved_permission_decision"] = None
|
|
||||||
state_patch["pending_password_request"] = {
|
|
||||||
"command": execution_result["result"].get("command", ""),
|
|
||||||
"reason": "Permission denied - требуется sudo пароль",
|
|
||||||
"attempts": 0,
|
|
||||||
}
|
|
||||||
elif execution_result["status"] == "awaiting_review":
|
|
||||||
state_patch["pending_permission_request"] = None
|
|
||||||
state_patch["pending_secret_request"] = None
|
|
||||||
state_patch["resolved_permission_decision"] = None
|
|
||||||
state_patch["pending_review"] = execution_result["result"]["review"]
|
|
||||||
else:
|
|
||||||
state_patch["pending_permission_request"] = None
|
|
||||||
state_patch["pending_secret_request"] = None
|
|
||||||
state_patch["resolved_permission_decision"] = None
|
|
||||||
state_patch["pending_review"] = None
|
|
||||||
self._task_state_store.update_task(task.task_id, state_patch)
|
|
||||||
final_status = str(execution_result["status"])
|
|
||||||
|
|
||||||
# For awaiting states - do NOT mark task as completed, keep it in pending state
|
|
||||||
if final_status in ("awaiting_permission", "awaiting_input", "awaiting_password", "awaiting_review"):
|
|
||||||
# Task stays in pending state, don't update to completed
|
|
||||||
pass
|
|
||||||
else:
|
|
||||||
self._task_state_store.update_task(task.task_id, {"status": final_status})
|
|
||||||
|
|
||||||
final_checkpoint = TaskCheckpoint(
|
|
||||||
task_id=task.task_id,
|
|
||||||
status=final_status,
|
|
||||||
context_snapshot=context,
|
|
||||||
)
|
|
||||||
self._checkpoint_store.save(final_checkpoint)
|
|
||||||
|
|
||||||
# Generate response for user
|
|
||||||
# Case 1: step_results from plan execution
|
|
||||||
if final_status == "completed" and execution_result.get("result", {}).get("step_results"):
|
|
||||||
step_results = execution_result["result"]["step_results"]
|
|
||||||
response_parts = []
|
|
||||||
for step in step_results:
|
|
||||||
result_data = step.get("result", {})
|
|
||||||
tool_result = result_data.get("result", result_data)
|
|
||||||
if tool_result.get("ok") and tool_result.get("output"):
|
|
||||||
response_parts.append(tool_result["output"])
|
|
||||||
if response_parts:
|
|
||||||
response_text = "\n\n".join(response_parts)
|
|
||||||
execution_result["response_directive"] = ExecutionDirective(
|
|
||||||
type="respond", payload={"text": response_text}
|
|
||||||
).model_dump(mode="json")
|
|
||||||
|
|
||||||
# Case 2: respond directive from orchestrator (direct response, no steps)
|
|
||||||
if final_status == "completed" and not execution_result.get("response_directive"):
|
|
||||||
# Use the original directive from router.decide()
|
|
||||||
if hasattr(directive, "type") and directive.type == "respond":
|
|
||||||
if directive.payload.get("text"):
|
|
||||||
execution_result["response_directive"] = directive.model_dump(mode="json")
|
|
||||||
elif isinstance(directive, dict) and directive.get("type") == "respond":
|
|
||||||
if directive.get("payload", {}).get("text"):
|
|
||||||
execution_result["response_directive"] = directive
|
|
||||||
|
|
||||||
# Map status to terminal event type
|
|
||||||
if final_status == "completed":
|
|
||||||
terminal_event_type = TASK_COMPLETED
|
|
||||||
elif final_status == "failed":
|
|
||||||
terminal_event_type = TASK_FAILED
|
|
||||||
elif final_status == "awaiting_permission":
|
|
||||||
terminal_event_type = TASK_AWAITING_PERMISSION
|
|
||||||
elif final_status == "awaiting_input":
|
|
||||||
terminal_event_type = TASK_AWAITING_INPUT
|
|
||||||
elif final_status == "awaiting_review":
|
|
||||||
terminal_event_type = TASK_AWAITING_REVIEW
|
|
||||||
elif final_status == "awaiting_password":
|
|
||||||
terminal_event_type = TASK_AWAITING_PERMISSION
|
|
||||||
else:
|
|
||||||
terminal_event_type = TASK_FAILED
|
|
||||||
self._publish(
|
|
||||||
task,
|
|
||||||
terminal_event_type,
|
|
||||||
{
|
|
||||||
"directive": directive.model_dump(mode="json"),
|
|
||||||
"execution_result": execution_result["result"],
|
|
||||||
},
|
|
||||||
)
|
|
||||||
|
|
||||||
# Save task and result to memory for session context
|
|
||||||
self._save_to_memory(task, execution_result, final_status)
|
|
||||||
|
|
||||||
return {
|
|
||||||
"task_id": task.task_id,
|
|
||||||
"status": final_status,
|
|
||||||
"directive": directive.model_dump(mode="json"),
|
|
||||||
"result": {
|
|
||||||
**execution_result["result"],
|
|
||||||
"response_directive": execution_result.get("response_directive"),
|
|
||||||
},
|
|
||||||
"events": [event.model_dump(mode="json") for event in self._event_bus.list_for_task(task.task_id)],
|
|
||||||
}
|
|
||||||
|
|
||||||
def resolve_permission(self, task_id: str, decision: str) -> dict[str, object]:
|
|
||||||
state = self._task_state_store.get_task(task_id)
|
|
||||||
if not state:
|
|
||||||
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
|
|
||||||
|
|
||||||
pending_request_payload = state.get("pending_permission_request")
|
|
||||||
last_directive_payload = state.get("last_directive")
|
|
||||||
if not pending_request_payload or not last_directive_payload:
|
|
||||||
return {"task_id": task_id, "status": "failed", "result": {"error": "No pending permission request"}}
|
|
||||||
|
|
||||||
task = UserTask(
|
|
||||||
task_id=task_id,
|
|
||||||
session_id=state["session_id"],
|
|
||||||
input=state["task_input"],
|
|
||||||
context=state.get("task_context", {}),
|
|
||||||
)
|
|
||||||
# Get command from pending request
|
|
||||||
command = pending_request_payload.get("command", "")
|
|
||||||
|
|
||||||
# Resolve permission using new service
|
|
||||||
resolved = self._permission_service.resolve_permission(
|
|
||||||
task_id=task_id,
|
|
||||||
session_id=state["session_id"],
|
|
||||||
command=command,
|
|
||||||
decision=decision,
|
|
||||||
)
|
|
||||||
|
|
||||||
if decision == "deny":
|
|
||||||
execution_result = {
|
|
||||||
"status": "failed",
|
|
||||||
"result": {
|
|
||||||
"error": "Permission denied by user.",
|
|
||||||
"permission_decision": resolved,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
elif decision == "allow_with_password":
|
|
||||||
directive = ExecutionDirective.model_validate(last_directive_payload)
|
|
||||||
self._task_state_store.update_task(
|
|
||||||
task.task_id,
|
|
||||||
{
|
|
||||||
"status": "awaiting_password",
|
|
||||||
"pending_password_request": {
|
|
||||||
"command": command,
|
|
||||||
"reason": pending_request_payload.get("reason", "Требуется пароль для выполнения команды"),
|
|
||||||
"attempts": 0,
|
|
||||||
},
|
|
||||||
"pending_permission_request": None,
|
|
||||||
},
|
|
||||||
)
|
|
||||||
self._publish(task, TASK_AWAITING_PERMISSION, {
|
|
||||||
"password_required": True,
|
|
||||||
"command": command,
|
|
||||||
})
|
|
||||||
return {
|
|
||||||
"task_id": task_id,
|
|
||||||
"status": "awaiting_password",
|
|
||||||
"result": {"message": "Требуется ввод пароля"},
|
|
||||||
}
|
|
||||||
else:
|
|
||||||
directive = ExecutionDirective.model_validate(last_directive_payload)
|
|
||||||
execution_result = self._execution_engine.execute(
|
|
||||||
task=task,
|
|
||||||
directive=directive,
|
|
||||||
)
|
|
||||||
|
|
||||||
final_status = str(execution_result["status"])
|
|
||||||
if decision != "allow_with_password":
|
|
||||||
self._task_state_store.update_task(
|
|
||||||
task.task_id,
|
|
||||||
{
|
|
||||||
"status": final_status,
|
|
||||||
"pending_permission_request": None,
|
|
||||||
"pending_secret_request": execution_result["result"].get("secret_request")
|
|
||||||
if final_status == "awaiting_input"
|
|
||||||
else None,
|
|
||||||
"pending_review": execution_result["result"].get("review")
|
|
||||||
if final_status == "awaiting_review"
|
|
||||||
else None,
|
|
||||||
"resolved_permission_decision": resolved,
|
|
||||||
},
|
|
||||||
)
|
|
||||||
checkpoint = TaskCheckpoint(task_id=task.task_id, status=final_status)
|
|
||||||
self._checkpoint_store.save(checkpoint)
|
|
||||||
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
|
|
||||||
if final_status == "completed":
|
|
||||||
terminal_event_type = TASK_COMPLETED
|
|
||||||
elif final_status == "awaiting_input":
|
|
||||||
terminal_event_type = TASK_AWAITING_INPUT
|
|
||||||
elif final_status == "awaiting_permission":
|
|
||||||
terminal_event_type = TASK_AWAITING_PERMISSION
|
|
||||||
elif final_status == "awaiting_review":
|
|
||||||
terminal_event_type = TASK_AWAITING_REVIEW
|
|
||||||
else:
|
|
||||||
terminal_event_type = TASK_FAILED
|
|
||||||
self._publish(
|
|
||||||
task,
|
|
||||||
terminal_event_type,
|
|
||||||
{
|
|
||||||
"permission_resolution": resolved.model_dump(mode="json") if hasattr(resolved, 'model_dump') else resolved,
|
|
||||||
"execution_result": execution_result["result"],
|
|
||||||
},
|
|
||||||
)
|
|
||||||
|
|
||||||
# Save to memory after permission resolution
|
|
||||||
self._save_to_memory(task, execution_result, final_status)
|
|
||||||
|
|
||||||
return {
|
|
||||||
"task_id": task.task_id,
|
|
||||||
"status": final_status,
|
|
||||||
"result": {
|
|
||||||
**execution_result["result"],
|
|
||||||
"response_directive": _build_response_directive(execution_result),
|
|
||||||
},
|
|
||||||
"events": [event.model_dump(mode="json") for event in self._event_bus.list_for_task(task.task_id)],
|
|
||||||
}
|
|
||||||
|
|
||||||
def resolve_secret(self, task_id: str, secret: str) -> dict[str, object]:
|
|
||||||
state = self._task_state_store.get_task(task_id)
|
|
||||||
if not state:
|
|
||||||
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
|
|
||||||
pending_secret_payload = state.get("pending_secret_request")
|
|
||||||
last_directive_payload = state.get("last_directive")
|
|
||||||
resolved_permission_payload = state.get("resolved_permission_decision")
|
|
||||||
if not pending_secret_payload or not last_directive_payload:
|
|
||||||
return {"task_id": task_id, "status": "failed", "result": {"error": "No pending secret request"}}
|
|
||||||
if not resolved_permission_payload:
|
|
||||||
return {"task_id": task_id, "status": "failed", "result": {"error": "No resolved permission available"}}
|
|
||||||
|
|
||||||
task = UserTask(
|
|
||||||
task_id=task_id,
|
|
||||||
session_id=state["session_id"],
|
|
||||||
input=state["task_input"],
|
|
||||||
context=state.get("task_context", {}),
|
|
||||||
)
|
|
||||||
_secret_request = SecretRequest.model_validate(pending_secret_payload)
|
|
||||||
directive = ExecutionDirective.model_validate(last_directive_payload)
|
|
||||||
execution_result = self._execution_engine.execute(
|
|
||||||
task=task,
|
|
||||||
directive=directive,
|
|
||||||
permission_override=None,
|
|
||||||
secret_override=secret,
|
|
||||||
)
|
|
||||||
final_status = str(execution_result["status"])
|
|
||||||
pending_review = execution_result["result"].get("review") if final_status == "awaiting_review" else None
|
|
||||||
pending_secret = execution_result["result"].get("secret_request") if final_status == "awaiting_input" else None
|
|
||||||
self._task_state_store.update_task(
|
|
||||||
task.task_id,
|
|
||||||
{
|
|
||||||
"status": final_status,
|
|
||||||
"pending_secret_request": pending_secret,
|
|
||||||
"resolved_permission_decision": resolved_permission_payload if final_status == "awaiting_input" else None,
|
|
||||||
"pending_review": pending_review,
|
|
||||||
},
|
|
||||||
)
|
|
||||||
checkpoint = TaskCheckpoint(task_id=task.task_id, status=final_status)
|
|
||||||
self._checkpoint_store.save(checkpoint)
|
|
||||||
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
|
|
||||||
if final_status == "completed":
|
|
||||||
terminal_event_type = TASK_COMPLETED
|
|
||||||
elif final_status == "awaiting_input":
|
|
||||||
terminal_event_type = TASK_AWAITING_INPUT
|
|
||||||
elif final_status == "awaiting_permission":
|
|
||||||
terminal_event_type = TASK_AWAITING_PERMISSION
|
|
||||||
elif final_status == "awaiting_review":
|
|
||||||
terminal_event_type = TASK_AWAITING_REVIEW
|
|
||||||
else:
|
|
||||||
terminal_event_type = TASK_FAILED
|
|
||||||
self._publish(
|
|
||||||
task,
|
|
||||||
terminal_event_type,
|
|
||||||
{
|
|
||||||
"secret_resolution": {"task_id": task_id},
|
|
||||||
"execution_result": execution_result["result"],
|
|
||||||
},
|
|
||||||
)
|
|
||||||
return {
|
|
||||||
"task_id": task.task_id,
|
|
||||||
"status": final_status,
|
|
||||||
"result": {
|
|
||||||
**execution_result["result"],
|
|
||||||
"response_directive": _build_response_directive(execution_result),
|
|
||||||
},
|
|
||||||
"events": [event.model_dump(mode="json") for event in self._event_bus.list_for_task(task.task_id)],
|
|
||||||
}
|
|
||||||
|
|
||||||
def resolve_review(self, task_id: str, decision: str, correction: str | None = None) -> dict[str, object]:
|
|
||||||
state = self._task_state_store.get_task(task_id)
|
|
||||||
if not state:
|
|
||||||
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
|
|
||||||
pending_review = state.get("pending_review")
|
|
||||||
if not pending_review:
|
|
||||||
return {"task_id": task_id, "status": "failed", "result": {"error": "No pending review"}}
|
|
||||||
|
|
||||||
task = UserTask(
|
|
||||||
task_id=task_id,
|
|
||||||
session_id=state["session_id"],
|
|
||||||
input=state["task_input"],
|
|
||||||
context={
|
|
||||||
**state.get("task_context", {}),
|
|
||||||
"previous_action_review": {
|
|
||||||
"decision": decision,
|
|
||||||
"correction": correction,
|
|
||||||
"review": pending_review,
|
|
||||||
},
|
|
||||||
},
|
|
||||||
)
|
|
||||||
self._publish(task, REVIEW_RESOLVED, {
|
|
||||||
"decision": decision,
|
|
||||||
"correction": correction,
|
|
||||||
"review": pending_review,
|
|
||||||
})
|
|
||||||
if self._memory_interface:
|
|
||||||
try:
|
|
||||||
self._memory_interface.insert(
|
|
||||||
text=f"User reviewed model action as {decision}. Correction: {correction or ''}. Review: {pending_review}",
|
|
||||||
kind="critique",
|
|
||||||
source="user",
|
|
||||||
task_id=task_id,
|
|
||||||
session_id=state["session_id"],
|
|
||||||
weight=0.9 if decision == "wrong_action" else 0.5,
|
|
||||||
metadata={"decision": decision, "review": pending_review},
|
|
||||||
)
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
self._task_state_store.update_task(task_id, {"pending_review": None, "status": "replanning"})
|
|
||||||
return self.run_task(task)
|
|
||||||
|
|
||||||
def resolve_password(self, task_id: str, password: str) -> dict[str, object]:
|
|
||||||
state = self._task_state_store.get_task(task_id)
|
|
||||||
if not state:
|
|
||||||
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
|
|
||||||
|
|
||||||
pending_password_payload = state.get("pending_password_request")
|
|
||||||
last_directive_payload = state.get("last_directive")
|
|
||||||
if not pending_password_payload or not last_directive_payload:
|
|
||||||
return {"task_id": task_id, "status": "failed", "result": {"error": "No pending password request"}}
|
|
||||||
|
|
||||||
current_attempt = pending_password_payload.get("attempts", 0) + 1
|
|
||||||
|
|
||||||
task = UserTask(
|
|
||||||
task_id=task_id,
|
|
||||||
session_id=state["session_id"],
|
|
||||||
input=state["task_input"],
|
|
||||||
context=state.get("task_context", {}),
|
|
||||||
)
|
|
||||||
directive = ExecutionDirective.model_validate(last_directive_payload)
|
|
||||||
|
|
||||||
execution_result = self._execution_engine.execute(
|
|
||||||
task=task,
|
|
||||||
directive=directive,
|
|
||||||
password_override=password,
|
|
||||||
)
|
|
||||||
|
|
||||||
final_status = str(execution_result["status"])
|
|
||||||
|
|
||||||
if final_status == "failed":
|
|
||||||
error_msg = execution_result.get("result", {}).get("error", "")
|
|
||||||
is_password_error = "permission denied" in error_msg.lower() or "incorrect password" in error_msg.lower()
|
|
||||||
|
|
||||||
if is_password_error and current_attempt < 3:
|
|
||||||
self._task_state_store.update_task(
|
|
||||||
task.task_id,
|
|
||||||
{
|
|
||||||
"status": "awaiting_password",
|
|
||||||
"pending_password_request": {
|
|
||||||
"command": pending_password_payload.get("command"),
|
|
||||||
"reason": pending_password_payload.get("reason"),
|
|
||||||
"attempts": current_attempt,
|
|
||||||
},
|
|
||||||
},
|
|
||||||
)
|
|
||||||
self._publish(task, TASK_AWAITING_PERMISSION, {
|
|
||||||
"password_attempt_failed": True,
|
|
||||||
"attempts": current_attempt,
|
|
||||||
"max_attempts": 3,
|
|
||||||
"message": "Неверный пароль. Попробуйте снова.",
|
|
||||||
})
|
|
||||||
return {
|
|
||||||
"task_id": task_id,
|
|
||||||
"status": "awaiting_password",
|
|
||||||
"result": {"error": "Неверный пароль", "attempts": current_attempt, "max_attempts": 3},
|
|
||||||
}
|
|
||||||
else:
|
|
||||||
self._task_state_store.update_task(
|
|
||||||
task.task_id,
|
|
||||||
{
|
|
||||||
"status": "failed",
|
|
||||||
"pending_password_request": None,
|
|
||||||
"password_attempts": current_attempt,
|
|
||||||
},
|
|
||||||
)
|
|
||||||
self._publish(task, TASK_FAILED, {
|
|
||||||
"password_failed": True,
|
|
||||||
"attempts": current_attempt,
|
|
||||||
"message": "Неверный пароль (3 попытки). Передаю решение модели.",
|
|
||||||
"execution_result": execution_result["result"],
|
|
||||||
})
|
|
||||||
return {
|
|
||||||
"task_id": task_id,
|
|
||||||
"status": "failed",
|
|
||||||
"result": {
|
|
||||||
"error": "Password failed after 3 attempts",
|
|
||||||
"attempts": current_attempt,
|
|
||||||
"message": "Пользователь 3 раза ввёл неверный пароль. Решение за вами.",
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
self._task_state_store.update_task(
|
|
||||||
task.task_id,
|
|
||||||
{
|
|
||||||
"status": final_status,
|
|
||||||
"pending_password_request": None,
|
|
||||||
},
|
|
||||||
)
|
|
||||||
checkpoint = TaskCheckpoint(task_id=task.task_id, status=final_status)
|
|
||||||
self._checkpoint_store.save(checkpoint)
|
|
||||||
self._publish(task, TASK_COMPLETED, {"execution_result": execution_result["result"]})
|
|
||||||
|
|
||||||
# Save to memory after password resolution
|
|
||||||
self._save_to_memory(task, execution_result, final_status)
|
|
||||||
|
|
||||||
return {
|
|
||||||
"task_id": task.task_id,
|
|
||||||
"status": final_status,
|
|
||||||
"result": {
|
|
||||||
**execution_result["result"],
|
|
||||||
"response_directive": _build_response_directive(execution_result),
|
|
||||||
},
|
|
||||||
"events": [event.model_dump(mode="json") for event in self._event_bus.list_for_task(task.task_id)],
|
|
||||||
}
|
|
||||||
|
|
||||||
def _publish(self, task: UserTask, event_type: str, payload: dict[str, object]) -> None:
|
|
||||||
event = RuntimeEvent(
|
|
||||||
task_id=task.task_id,
|
|
||||||
session_id=task.session_id,
|
|
||||||
sequence=self._event_bus.next_sequence(task.task_id),
|
|
||||||
type=event_type,
|
|
||||||
payload=payload,
|
|
||||||
)
|
|
||||||
self._event_bus.publish(event)
|
|
||||||
|
|
||||||
async def _run_recall(self, task: UserTask) -> dict:
|
|
||||||
"""Run active memory recall before orchestration."""
|
|
||||||
if not self._recall_service:
|
|
||||||
return {"should_recall": False, "reason": "no_recall_service", "query": "", "results": [], "summary": ""}
|
|
||||||
try:
|
|
||||||
return await self._recall_service.recall(task_input=task.input)
|
|
||||||
except Exception as e:
|
|
||||||
return {"should_recall": False, "reason": f"recall_error: {e}", "query": "", "results": [], "summary": ""}
|
|
||||||
|
|
||||||
def _save_to_memory(self, task: UserTask, execution_result: dict, status: str) -> None:
|
|
||||||
"""Save task input and result to memory for session context, using MemoryWritePolicy."""
|
|
||||||
if not self._memory_interface:
|
|
||||||
return
|
|
||||||
|
|
||||||
try:
|
|
||||||
# Build a synthetic critic_score for policy based on task status
|
|
||||||
# For summary/tool_result without real critic, we derive from execution outcome
|
|
||||||
if status == "completed":
|
|
||||||
synthetic_score = CriticScore(
|
|
||||||
correctness=0.9, usefulness=0.8, safety=0.95,
|
|
||||||
memory_store=True, weight=0.85, explanation="Task completed successfully"
|
|
||||||
)
|
|
||||||
elif status == "failed":
|
|
||||||
synthetic_score = CriticScore(
|
|
||||||
correctness=0.2, usefulness=0.3, safety=0.7,
|
|
||||||
memory_store=True, weight=0.5, explanation="Task failed — store for learning"
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
synthetic_score = CriticScore(
|
|
||||||
correctness=0.5, usefulness=0.5, safety=0.8,
|
|
||||||
memory_store=False, weight=0.3, explanation=f"Status: {status}"
|
|
||||||
)
|
|
||||||
|
|
||||||
# Save task input as summary
|
|
||||||
decision = "store"
|
|
||||||
if self._memory_policy:
|
|
||||||
decision = self._memory_policy.decide(
|
|
||||||
critic_score=synthetic_score,
|
|
||||||
memory_type="summary",
|
|
||||||
session_id=task.session_id,
|
|
||||||
)
|
|
||||||
if decision in ("store", "store_with_weight"):
|
|
||||||
weight = synthetic_score.weight if decision == "store_with_weight" else 0.8
|
|
||||||
self._memory_interface.insert(
|
|
||||||
text=f"User request: {task.input}",
|
|
||||||
kind="summary",
|
|
||||||
source="user",
|
|
||||||
task_id=task.task_id,
|
|
||||||
session_id=task.session_id,
|
|
||||||
weight=weight,
|
|
||||||
metadata={"status": status, "policy_decision": decision},
|
|
||||||
)
|
|
||||||
self._publish(task, MEMORY_WRITE_DECIDED, {
|
|
||||||
"kind": "summary", "decision": decision, "text_preview": task.input[:80]
|
|
||||||
})
|
|
||||||
|
|
||||||
# Save execution result
|
|
||||||
result_text = ""
|
|
||||||
if status == "completed":
|
|
||||||
step_results = execution_result.get("result", {}).get("step_results", [])
|
|
||||||
if step_results:
|
|
||||||
for step in step_results:
|
|
||||||
tool_result = step.get("result", {}).get("result", {})
|
|
||||||
if tool_result.get("output"):
|
|
||||||
result_text += f" | {step.get('step_id')}: {tool_result.get('output')[:200]}"
|
|
||||||
elif status == "failed":
|
|
||||||
result_text = f" | Error: {execution_result.get('result', {}).get('error', 'Unknown')}"
|
|
||||||
|
|
||||||
if result_text:
|
|
||||||
decision = "store"
|
|
||||||
if self._memory_policy:
|
|
||||||
decision = self._memory_policy.decide(
|
|
||||||
critic_score=synthetic_score,
|
|
||||||
memory_type="tool_result",
|
|
||||||
session_id=task.session_id,
|
|
||||||
)
|
|
||||||
if decision in ("store", "store_with_weight"):
|
|
||||||
weight = synthetic_score.weight if decision == "store_with_weight" else 0.7
|
|
||||||
self._memory_interface.insert(
|
|
||||||
text=f"Result: {status}{result_text}",
|
|
||||||
kind="tool_result",
|
|
||||||
source="system",
|
|
||||||
task_id=task.task_id,
|
|
||||||
session_id=task.session_id,
|
|
||||||
weight=weight,
|
|
||||||
metadata={"status": status, "policy_decision": decision},
|
|
||||||
)
|
|
||||||
self._publish(task, MEMORY_WRITE_DECIDED, {
|
|
||||||
"kind": "tool_result", "decision": decision, "text_preview": result_text[:80]
|
|
||||||
})
|
|
||||||
except Exception as e:
|
|
||||||
import logging
|
|
||||||
logging.getLogger(__name__).warning(f"Failed to save to memory: {e}")
|
|
||||||
|
|
@ -1,2 +0,0 @@
|
||||||
"""Shared services."""
|
|
||||||
|
|
||||||
|
|
@ -1,2 +0,0 @@
|
||||||
"""Task state and checkpoints."""
|
|
||||||
|
|
||||||
|
|
@ -1,75 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import json
|
|
||||||
import sqlite3
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
from app.core.contracts import TaskCheckpoint
|
|
||||||
|
|
||||||
|
|
||||||
class SQLiteCheckpointStore:
|
|
||||||
"""Durable checkpoint store for resumable runtime state."""
|
|
||||||
|
|
||||||
def __init__(self, db_path: str | Path) -> None:
|
|
||||||
self._db_path = Path(db_path)
|
|
||||||
self._db_path.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
self._initialize()
|
|
||||||
|
|
||||||
def save(self, checkpoint: TaskCheckpoint) -> TaskCheckpoint:
|
|
||||||
with sqlite3.connect(self._db_path) as conn:
|
|
||||||
conn.execute(
|
|
||||||
"""
|
|
||||||
INSERT OR REPLACE INTO checkpoints (
|
|
||||||
task_id, status, active_step_id, plan_snapshot_json,
|
|
||||||
context_snapshot_json, updated_at
|
|
||||||
) VALUES (?, ?, ?, ?, ?, ?)
|
|
||||||
""",
|
|
||||||
(
|
|
||||||
checkpoint.task_id,
|
|
||||||
checkpoint.status,
|
|
||||||
checkpoint.active_step_id,
|
|
||||||
json.dumps(checkpoint.plan_snapshot, default=str),
|
|
||||||
json.dumps(checkpoint.context_snapshot, default=str),
|
|
||||||
checkpoint.updated_at.isoformat(),
|
|
||||||
),
|
|
||||||
)
|
|
||||||
conn.commit()
|
|
||||||
return checkpoint
|
|
||||||
|
|
||||||
def load(self, task_id: str) -> TaskCheckpoint | None:
|
|
||||||
with sqlite3.connect(self._db_path) as conn:
|
|
||||||
row = conn.execute(
|
|
||||||
"""
|
|
||||||
SELECT task_id, status, active_step_id, plan_snapshot_json,
|
|
||||||
context_snapshot_json, updated_at
|
|
||||||
FROM checkpoints
|
|
||||||
WHERE task_id = ?
|
|
||||||
""",
|
|
||||||
(task_id,),
|
|
||||||
).fetchone()
|
|
||||||
if not row:
|
|
||||||
return None
|
|
||||||
return TaskCheckpoint(
|
|
||||||
task_id=row[0],
|
|
||||||
status=row[1],
|
|
||||||
active_step_id=row[2],
|
|
||||||
plan_snapshot=json.loads(row[3]),
|
|
||||||
context_snapshot=json.loads(row[4]),
|
|
||||||
updated_at=row[5],
|
|
||||||
)
|
|
||||||
|
|
||||||
def _initialize(self) -> None:
|
|
||||||
with sqlite3.connect(self._db_path) as conn:
|
|
||||||
conn.execute(
|
|
||||||
"""
|
|
||||||
CREATE TABLE IF NOT EXISTS checkpoints (
|
|
||||||
task_id TEXT PRIMARY KEY,
|
|
||||||
status TEXT NOT NULL,
|
|
||||||
active_step_id TEXT,
|
|
||||||
plan_snapshot_json TEXT NOT NULL,
|
|
||||||
context_snapshot_json TEXT NOT NULL,
|
|
||||||
updated_at TEXT NOT NULL
|
|
||||||
)
|
|
||||||
"""
|
|
||||||
)
|
|
||||||
conn.commit()
|
|
||||||
|
|
@ -1,77 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import json
|
|
||||||
import sqlite3
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
|
|
||||||
class SQLiteTaskStateStore:
|
|
||||||
"""Durable task state store for runtime lifecycle state."""
|
|
||||||
|
|
||||||
def __init__(self, db_path: str | Path) -> None:
|
|
||||||
self._db_path = Path(db_path)
|
|
||||||
self._db_path.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
self._initialize()
|
|
||||||
|
|
||||||
def create_task(self, task_id: str, initial_state: dict[str, Any]) -> dict[str, Any]:
|
|
||||||
state = dict(initial_state)
|
|
||||||
session_id = state.get("session_id")
|
|
||||||
with sqlite3.connect(self._db_path) as conn:
|
|
||||||
conn.execute(
|
|
||||||
"""
|
|
||||||
INSERT OR REPLACE INTO task_states (task_id, state_json, session_id)
|
|
||||||
VALUES (?, ?, ?)
|
|
||||||
""",
|
|
||||||
(task_id, json.dumps(state), session_id),
|
|
||||||
)
|
|
||||||
conn.commit()
|
|
||||||
return state
|
|
||||||
|
|
||||||
def get_task(self, task_id: str) -> dict[str, Any] | None:
|
|
||||||
with sqlite3.connect(self._db_path) as conn:
|
|
||||||
row = conn.execute(
|
|
||||||
"SELECT state_json FROM task_states WHERE task_id = ?",
|
|
||||||
(task_id,),
|
|
||||||
).fetchone()
|
|
||||||
return json.loads(row[0]) if row else None
|
|
||||||
|
|
||||||
def update_task(self, task_id: str, patch: dict[str, Any]) -> dict[str, Any]:
|
|
||||||
state = self.get_task(task_id) or {}
|
|
||||||
state.update(patch)
|
|
||||||
with sqlite3.connect(self._db_path) as conn:
|
|
||||||
conn.execute(
|
|
||||||
"""
|
|
||||||
INSERT OR REPLACE INTO task_states (task_id, state_json)
|
|
||||||
VALUES (?, ?)
|
|
||||||
""",
|
|
||||||
(task_id, json.dumps(state)),
|
|
||||||
)
|
|
||||||
conn.commit()
|
|
||||||
return state
|
|
||||||
|
|
||||||
def _initialize(self) -> None:
|
|
||||||
with sqlite3.connect(self._db_path) as conn:
|
|
||||||
conn.execute(
|
|
||||||
"""
|
|
||||||
CREATE TABLE IF NOT EXISTS task_states (
|
|
||||||
task_id TEXT PRIMARY KEY,
|
|
||||||
state_json TEXT NOT NULL
|
|
||||||
)
|
|
||||||
"""
|
|
||||||
)
|
|
||||||
conn.commit()
|
|
||||||
try:
|
|
||||||
conn.execute("ALTER TABLE task_states ADD COLUMN session_id TEXT")
|
|
||||||
conn.commit()
|
|
||||||
except sqlite3.OperationalError:
|
|
||||||
pass
|
|
||||||
|
|
||||||
def get_session_tasks(self, session_id: str, limit: int = 10) -> list[dict[str, Any]]:
|
|
||||||
with sqlite3.connect(self._db_path) as conn:
|
|
||||||
conn.row_factory = sqlite3.Row
|
|
||||||
rows = conn.execute(
|
|
||||||
"SELECT state_json FROM task_states WHERE session_id = ? ORDER BY rowid DESC LIMIT ?",
|
|
||||||
(session_id, limit),
|
|
||||||
).fetchall()
|
|
||||||
return [json.loads(row[0]) for row in rows]
|
|
||||||
|
|
@ -1,2 +0,0 @@
|
||||||
"""Streaming projections."""
|
|
||||||
|
|
||||||
|
|
@ -1,46 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import asyncio
|
|
||||||
from collections import defaultdict
|
|
||||||
from dataclasses import dataclass
|
|
||||||
|
|
||||||
from app.core.contracts import RuntimeEvent
|
|
||||||
from app.events.event_bus import EventBus
|
|
||||||
|
|
||||||
|
|
||||||
class StreamingManager:
|
|
||||||
"""Simple in-process projection from event bus to websocket consumers."""
|
|
||||||
|
|
||||||
def __init__(self, event_bus: EventBus) -> None:
|
|
||||||
self._event_bus = event_bus
|
|
||||||
self._subscribers: dict[str, list[StreamSubscriber]] = defaultdict(list)
|
|
||||||
self._event_bus.subscribe(self._on_event)
|
|
||||||
|
|
||||||
def replay_events(self, task_id: str) -> list[RuntimeEvent]:
|
|
||||||
return self._event_bus.list_for_task(task_id)
|
|
||||||
|
|
||||||
def subscribe(self, task_id: str) -> asyncio.Queue[RuntimeEvent]:
|
|
||||||
queue: asyncio.Queue[RuntimeEvent] = asyncio.Queue()
|
|
||||||
self._subscribers[task_id].append(
|
|
||||||
StreamSubscriber(loop=asyncio.get_running_loop(), queue=queue)
|
|
||||||
)
|
|
||||||
return queue
|
|
||||||
|
|
||||||
def unsubscribe(self, task_id: str, queue: asyncio.Queue[RuntimeEvent]) -> None:
|
|
||||||
listeners = self._subscribers.get(task_id, [])
|
|
||||||
for listener in list(listeners):
|
|
||||||
if listener.queue is queue:
|
|
||||||
listeners.remove(listener)
|
|
||||||
break
|
|
||||||
if not listeners and task_id in self._subscribers:
|
|
||||||
del self._subscribers[task_id]
|
|
||||||
|
|
||||||
def _on_event(self, event: RuntimeEvent) -> None:
|
|
||||||
for listener in list(self._subscribers.get(event.task_id, [])):
|
|
||||||
listener.loop.call_soon_threadsafe(listener.queue.put_nowait, event)
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class StreamSubscriber:
|
|
||||||
loop: asyncio.AbstractEventLoop
|
|
||||||
queue: asyncio.Queue[RuntimeEvent]
|
|
||||||
|
|
@ -1,2 +0,0 @@
|
||||||
"""Tool registry and tool adapters."""
|
|
||||||
|
|
||||||
|
|
@ -1,24 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from abc import ABC, abstractmethod
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
from app.core.contracts import ToolResult, UserTask
|
|
||||||
|
|
||||||
|
|
||||||
class BaseTool(ABC):
|
|
||||||
name: str = ""
|
|
||||||
description: str = ""
|
|
||||||
|
|
||||||
@property
|
|
||||||
def name(self) -> str:
|
|
||||||
return getattr(self, '_name', self.__class__.__name__.replace('Tool', '').lower())
|
|
||||||
|
|
||||||
@property
|
|
||||||
def description(self) -> str:
|
|
||||||
return getattr(self, '_description', "")
|
|
||||||
|
|
||||||
@abstractmethod
|
|
||||||
def execute(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
|
|
||||||
raise NotImplementedError
|
|
||||||
|
|
||||||
|
|
@ -1,83 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import importlib
|
|
||||||
import json
|
|
||||||
import logging
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
PLUGINS_DIR = Path(__file__).parent / "plugins"
|
|
||||||
|
|
||||||
|
|
||||||
class ToolDiscovery:
|
|
||||||
"""Decentralized tool discovery system."""
|
|
||||||
|
|
||||||
def __init__(self, plugins_dir: Path | None = None) -> None:
|
|
||||||
self._plugins_dir = plugins_dir or PLUGINS_DIR
|
|
||||||
|
|
||||||
def discover(self) -> dict[str, Any]:
|
|
||||||
"""Discover all tools from plugins directory."""
|
|
||||||
tools = {}
|
|
||||||
|
|
||||||
if not self._plugins_dir.exists():
|
|
||||||
logger.warning(f"Plugins directory not found: {self._plugins_dir}")
|
|
||||||
return tools
|
|
||||||
|
|
||||||
for folder in self._plugins_dir.iterdir():
|
|
||||||
if not folder.is_dir():
|
|
||||||
continue
|
|
||||||
|
|
||||||
manifest_file = folder / "manifest.json"
|
|
||||||
if not manifest_file.exists():
|
|
||||||
logger.warning(f"Missing manifest.json in {folder.name}")
|
|
||||||
continue
|
|
||||||
|
|
||||||
try:
|
|
||||||
manifest = self._load_manifest(manifest_file)
|
|
||||||
|
|
||||||
tool_name = manifest.get("name", folder.name)
|
|
||||||
tools[tool_name] = {
|
|
||||||
"manifest": manifest,
|
|
||||||
"tool_class": folder.name,
|
|
||||||
}
|
|
||||||
logger.info(f"Discovered tool: {tool_name}")
|
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
logger.error(f"Failed to load tool {folder.name}: {e}")
|
|
||||||
continue
|
|
||||||
|
|
||||||
return tools
|
|
||||||
|
|
||||||
def _load_manifest(self, manifest_file: Path) -> dict[str, Any]:
|
|
||||||
with open(manifest_file) as f:
|
|
||||||
return json.load(f)
|
|
||||||
|
|
||||||
def _load_tool_class(self, tool_name: str, manifest: dict[str, Any]) -> Any:
|
|
||||||
entrypoint = manifest.get("entrypoint", "Tool")
|
|
||||||
module = importlib.import_module(f"app.tools.plugins.{tool_name}")
|
|
||||||
tool_class = getattr(module, entrypoint)
|
|
||||||
return tool_class
|
|
||||||
|
|
||||||
def get_tool_schemas(self) -> list[dict[str, Any]]:
|
|
||||||
"""Get schemas for all discovered tools."""
|
|
||||||
tools = self.discover()
|
|
||||||
schemas = []
|
|
||||||
|
|
||||||
for name, data in tools.items():
|
|
||||||
manifest = data.get("manifest", {})
|
|
||||||
schemas.append({
|
|
||||||
"name": name,
|
|
||||||
"description": manifest.get("description", ""),
|
|
||||||
"args_schema": manifest.get("args_schema", {}),
|
|
||||||
"requires_permission": manifest.get("requires_permission", False),
|
|
||||||
})
|
|
||||||
|
|
||||||
return schemas
|
|
||||||
|
|
||||||
|
|
||||||
def discover_tools() -> dict[str, Any]:
|
|
||||||
"""Convenience function for quick tool discovery."""
|
|
||||||
discovery = ToolDiscovery()
|
|
||||||
return discovery.discover()
|
|
||||||
|
|
@ -1,26 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from app.core.contracts import ToolResult, UserTask
|
|
||||||
from app.tools.base import BaseTool
|
|
||||||
from app.tools.sandbox import ToolSandbox
|
|
||||||
|
|
||||||
|
|
||||||
class FileReadTool(BaseTool):
|
|
||||||
name = "file_read"
|
|
||||||
|
|
||||||
def __init__(self, sandbox: ToolSandbox) -> None:
|
|
||||||
self._sandbox = sandbox
|
|
||||||
|
|
||||||
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
|
|
||||||
path = args.get("path")
|
|
||||||
if not path:
|
|
||||||
return ToolResult(tool=self.name, ok=False, error="Missing path")
|
|
||||||
resolved = self._sandbox.ensure_path_allowed(str(path))
|
|
||||||
content = resolved.read_text(encoding="utf-8")
|
|
||||||
return ToolResult(
|
|
||||||
tool=self.name,
|
|
||||||
ok=True,
|
|
||||||
output=content,
|
|
||||||
metadata={"path": str(resolved), "size": len(content)},
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
@ -1,27 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from app.core.contracts import ToolResult, UserTask
|
|
||||||
from app.tools.base import BaseTool
|
|
||||||
from app.tools.sandbox import ToolSandbox
|
|
||||||
|
|
||||||
|
|
||||||
class FileWriteTool(BaseTool):
|
|
||||||
name = "file_write"
|
|
||||||
|
|
||||||
def __init__(self, sandbox: ToolSandbox) -> None:
|
|
||||||
self._sandbox = sandbox
|
|
||||||
|
|
||||||
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
|
|
||||||
path = args.get("path")
|
|
||||||
content = str(args.get("content", ""))
|
|
||||||
if not path:
|
|
||||||
return ToolResult(tool=self.name, ok=False, error="Missing path")
|
|
||||||
resolved = self._sandbox.ensure_path_allowed(str(path))
|
|
||||||
resolved.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
resolved.write_text(content, encoding="utf-8")
|
|
||||||
return ToolResult(
|
|
||||||
tool=self.name,
|
|
||||||
ok=True,
|
|
||||||
output=f"Wrote {len(content)} bytes",
|
|
||||||
metadata={"path": str(resolved), "size": len(content)},
|
|
||||||
)
|
|
||||||
|
|
@ -1,123 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import logging
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
from app.tools.base import BaseTool
|
|
||||||
from app.core.contracts import ToolResult, UserTask
|
|
||||||
from app.tools.sandbox import ToolSandbox
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
|
|
||||||
class MemoryInsertTool(BaseTool):
|
|
||||||
_name = "memory_insert"
|
|
||||||
_description = "Store information in memory"
|
|
||||||
|
|
||||||
def __init__(self, sandbox: ToolSandbox, memory_interface=None) -> None:
|
|
||||||
super().__init__()
|
|
||||||
self._sandbox = sandbox
|
|
||||||
self._memory = memory_interface
|
|
||||||
|
|
||||||
def execute(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
|
|
||||||
text = args.get("text", "")
|
|
||||||
kind = args.get("kind", "fact")
|
|
||||||
source = args.get("source", "user")
|
|
||||||
weight = args.get("weight", 0.5)
|
|
||||||
|
|
||||||
if not text:
|
|
||||||
return ToolResult(tool="memory_insert", ok=False, output="", error="text is required")
|
|
||||||
if not self._memory:
|
|
||||||
return ToolResult(tool="memory_insert", ok=False, output="", error="Memory not available")
|
|
||||||
|
|
||||||
try:
|
|
||||||
entry = self._memory.insert(
|
|
||||||
text=text,
|
|
||||||
kind=kind,
|
|
||||||
source=source,
|
|
||||||
task_id=task.task_id,
|
|
||||||
session_id=task.session_id,
|
|
||||||
weight=weight,
|
|
||||||
)
|
|
||||||
return ToolResult(
|
|
||||||
tool="memory_insert",
|
|
||||||
ok=True,
|
|
||||||
output=f"Stored: {entry.id}",
|
|
||||||
metadata={"entry_id": entry.id},
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Memory insert failed: {e}")
|
|
||||||
return ToolResult(tool="memory_insert", ok=False, output="", error=str(e))
|
|
||||||
|
|
||||||
|
|
||||||
class MemorySearchTool(BaseTool):
|
|
||||||
_name = "memory_search"
|
|
||||||
_description = "Search memory for information"
|
|
||||||
|
|
||||||
def __init__(self, sandbox: ToolSandbox, memory_interface=None) -> None:
|
|
||||||
super().__init__()
|
|
||||||
self._sandbox = sandbox
|
|
||||||
self._memory = memory_interface
|
|
||||||
|
|
||||||
def execute(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
|
|
||||||
query = args.get("query", "")
|
|
||||||
top_k = args.get("top_k", 5)
|
|
||||||
|
|
||||||
if not query:
|
|
||||||
return ToolResult(tool="memory_search", ok=False, output="", error="query is required")
|
|
||||||
if not self._memory:
|
|
||||||
return ToolResult(tool="memory_search", ok=False, output="", error="Memory not available")
|
|
||||||
|
|
||||||
try:
|
|
||||||
results = self._memory.search(query, top_k=top_k)
|
|
||||||
if not results:
|
|
||||||
return ToolResult(tool="memory_search", ok=True, output="No results found", metadata={"count": 0})
|
|
||||||
|
|
||||||
output_lines = []
|
|
||||||
for entry, score in results:
|
|
||||||
output_lines.append(f"[{score:.2f}] {entry.text[:100]}")
|
|
||||||
|
|
||||||
return ToolResult(
|
|
||||||
tool="memory_search",
|
|
||||||
ok=True,
|
|
||||||
output="\n".join(output_lines),
|
|
||||||
metadata={"count": len(results)},
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Memory search failed: {e}")
|
|
||||||
return ToolResult(tool="memory_search", ok=False, output="", error=str(e))
|
|
||||||
|
|
||||||
|
|
||||||
class MemoryListTool(BaseTool):
|
|
||||||
_name = "memory_list"
|
|
||||||
_description = "List recent memories"
|
|
||||||
|
|
||||||
def __init__(self, sandbox: ToolSandbox, memory_interface=None) -> None:
|
|
||||||
super().__init__()
|
|
||||||
self._sandbox = sandbox
|
|
||||||
self._memory = memory_interface
|
|
||||||
|
|
||||||
def execute(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
|
|
||||||
limit = args.get("limit", 10)
|
|
||||||
|
|
||||||
if not self._memory:
|
|
||||||
return ToolResult(tool="memory_list", ok=False, output="", error="Memory not available")
|
|
||||||
|
|
||||||
try:
|
|
||||||
entries = self._memory.get_recent(limit=limit)
|
|
||||||
if not entries:
|
|
||||||
return ToolResult(tool="memory_list", ok=True, output="No memories", metadata={"count": 0})
|
|
||||||
|
|
||||||
output_lines = []
|
|
||||||
for entry in entries:
|
|
||||||
output_lines.append(f"{entry.kind}: {entry.text[:80]}")
|
|
||||||
|
|
||||||
return ToolResult(
|
|
||||||
tool="memory_list",
|
|
||||||
ok=True,
|
|
||||||
output="\n".join(output_lines),
|
|
||||||
metadata={"count": len(entries)},
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Memory list failed: {e}")
|
|
||||||
return ToolResult(tool="memory_list", ok=False, output="", error=str(e))
|
|
||||||
|
|
@ -1,35 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from app.core.contracts import ToolResult, UserTask
|
|
||||||
from app.tools.base import BaseTool
|
|
||||||
from app.tools.sandbox import ToolSandbox
|
|
||||||
|
|
||||||
|
|
||||||
class Tool(BaseTool):
|
|
||||||
name = "file_read"
|
|
||||||
description = "Read file contents"
|
|
||||||
|
|
||||||
def __init__(self, sandbox: ToolSandbox) -> None:
|
|
||||||
self._sandbox = sandbox
|
|
||||||
|
|
||||||
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
|
|
||||||
path = args.get("path")
|
|
||||||
if not path:
|
|
||||||
return ToolResult(tool=self.name, ok=False, error="Missing path")
|
|
||||||
try:
|
|
||||||
resolved = self._sandbox.ensure_path_allowed(str(path))
|
|
||||||
if not resolved.exists():
|
|
||||||
return ToolResult(tool=self.name, ok=False, error=f"File not found: {path}")
|
|
||||||
content = resolved.read_text(encoding="utf-8")
|
|
||||||
return ToolResult(
|
|
||||||
tool=self.name,
|
|
||||||
ok=True,
|
|
||||||
output=content,
|
|
||||||
metadata={"path": str(resolved), "size": len(content)},
|
|
||||||
)
|
|
||||||
except PermissionError as e:
|
|
||||||
return ToolResult(tool=self.name, ok=False, error=f"Access denied: {e}")
|
|
||||||
except FileNotFoundError as e:
|
|
||||||
return ToolResult(tool=self.name, ok=False, error=f"File not found: {path}")
|
|
||||||
except Exception as e:
|
|
||||||
return ToolResult(tool=self.name, ok=False, error=f"Error: {e}")
|
|
||||||
|
|
@ -1,10 +0,0 @@
|
||||||
{
|
|
||||||
"name": "file_read",
|
|
||||||
"version": "1.0",
|
|
||||||
"entrypoint": "Tool",
|
|
||||||
"description": "Read file contents from allowed paths",
|
|
||||||
"args_schema": {
|
|
||||||
"path": {"type": "string", "required": true, "description": "File path to read"}
|
|
||||||
},
|
|
||||||
"requires_permission": false
|
|
||||||
}
|
|
||||||
|
|
@ -1,33 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from app.core.contracts import ToolResult, UserTask
|
|
||||||
from app.tools.base import BaseTool
|
|
||||||
from app.tools.sandbox import ToolSandbox
|
|
||||||
|
|
||||||
|
|
||||||
class Tool(BaseTool):
|
|
||||||
name = "file_write"
|
|
||||||
description = "Write content to file"
|
|
||||||
|
|
||||||
def __init__(self, sandbox: ToolSandbox) -> None:
|
|
||||||
self._sandbox = sandbox
|
|
||||||
|
|
||||||
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
|
|
||||||
path = args.get("path")
|
|
||||||
content = str(args.get("content", ""))
|
|
||||||
if not path:
|
|
||||||
return ToolResult(tool=self.name, ok=False, error="Missing path")
|
|
||||||
try:
|
|
||||||
resolved = self._sandbox.ensure_path_allowed(str(path))
|
|
||||||
resolved.parent.mkdir(parents=True, exist_ok=True)
|
|
||||||
resolved.write_text(content, encoding="utf-8")
|
|
||||||
return ToolResult(
|
|
||||||
tool=self.name,
|
|
||||||
ok=True,
|
|
||||||
output=f"Wrote {len(content)} bytes",
|
|
||||||
metadata={"path": str(resolved), "size": len(content)},
|
|
||||||
)
|
|
||||||
except PermissionError as e:
|
|
||||||
return ToolResult(tool=self.name, ok=False, error=f"Access denied: {e}")
|
|
||||||
except Exception as e:
|
|
||||||
return ToolResult(tool=self.name, ok=False, error=f"Error: {e}")
|
|
||||||
|
|
@ -1,11 +0,0 @@
|
||||||
{
|
|
||||||
"name": "file_write",
|
|
||||||
"version": "1.0",
|
|
||||||
"entrypoint": "Tool",
|
|
||||||
"description": "Write content to file",
|
|
||||||
"args_schema": {
|
|
||||||
"path": {"type": "string", "required": true, "description": "File path to write"},
|
|
||||||
"content": {"type": "string", "required": true, "description": "Content to write"}
|
|
||||||
},
|
|
||||||
"requires_permission": true
|
|
||||||
}
|
|
||||||
|
|
@ -1,112 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import logging
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
from app.core.contracts import ToolResult, UserTask
|
|
||||||
from app.tools.base import BaseTool
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
|
|
||||||
class Tool(BaseTool):
|
|
||||||
name = "memory"
|
|
||||||
description = "Memory operations: insert, search, list"
|
|
||||||
|
|
||||||
def __init__(self, memory_interface=None) -> None:
|
|
||||||
self._memory = memory_interface
|
|
||||||
|
|
||||||
def execute(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
|
|
||||||
action = args.get("action", "search")
|
|
||||||
|
|
||||||
if action == "insert":
|
|
||||||
return self._insert(task, args)
|
|
||||||
elif action == "search":
|
|
||||||
return self._search(task, args)
|
|
||||||
elif action == "list":
|
|
||||||
return self._list(task, args)
|
|
||||||
else:
|
|
||||||
return ToolResult(tool=self.name, ok=False, error=f"Unknown action: {action}")
|
|
||||||
|
|
||||||
def _insert(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
|
|
||||||
text = args.get("text", "")
|
|
||||||
kind = args.get("kind", "fact")
|
|
||||||
source = args.get("source", "user")
|
|
||||||
weight = args.get("weight", 0.5)
|
|
||||||
|
|
||||||
if not text:
|
|
||||||
return ToolResult(tool=self.name, ok=False, output="", error="text is required")
|
|
||||||
if not self._memory:
|
|
||||||
return ToolResult(tool=self.name, ok=False, output="", error="Memory not available")
|
|
||||||
|
|
||||||
try:
|
|
||||||
entry = self._memory.insert(
|
|
||||||
text=text,
|
|
||||||
kind=kind,
|
|
||||||
source=source,
|
|
||||||
task_id=task.task_id,
|
|
||||||
session_id=task.session_id,
|
|
||||||
weight=weight,
|
|
||||||
)
|
|
||||||
return ToolResult(
|
|
||||||
tool=self.name,
|
|
||||||
ok=True,
|
|
||||||
output=f"Stored: {entry.id}",
|
|
||||||
metadata={"entry_id": entry.id},
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Memory insert failed: {e}")
|
|
||||||
return ToolResult(tool=self.name, ok=False, output="", error=str(e))
|
|
||||||
|
|
||||||
def _search(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
|
|
||||||
query = args.get("query", "")
|
|
||||||
top_k = args.get("top_k", 5)
|
|
||||||
|
|
||||||
if not query:
|
|
||||||
return ToolResult(tool=self.name, ok=False, output="", error="query is required")
|
|
||||||
if not self._memory:
|
|
||||||
return ToolResult(tool=self.name, ok=False, output="", error="Memory not available")
|
|
||||||
|
|
||||||
try:
|
|
||||||
results = self._memory.search(query, top_k=top_k)
|
|
||||||
if not results:
|
|
||||||
return ToolResult(tool=self.name, ok=True, output="No results found", metadata={"count": 0})
|
|
||||||
|
|
||||||
output_lines = []
|
|
||||||
for entry, score in results:
|
|
||||||
output_lines.append(f"[{score:.2f}] {entry.text[:100]}")
|
|
||||||
|
|
||||||
return ToolResult(
|
|
||||||
tool=self.name,
|
|
||||||
ok=True,
|
|
||||||
output="\n".join(output_lines),
|
|
||||||
metadata={"count": len(results)},
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Memory search failed: {e}")
|
|
||||||
return ToolResult(tool=self.name, ok=False, output="", error=str(e))
|
|
||||||
|
|
||||||
def _list(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
|
|
||||||
limit = args.get("limit", 10)
|
|
||||||
|
|
||||||
if not self._memory:
|
|
||||||
return ToolResult(tool=self.name, ok=False, output="", error="Memory not available")
|
|
||||||
|
|
||||||
try:
|
|
||||||
entries = self._memory.get_recent(limit=limit)
|
|
||||||
if not entries:
|
|
||||||
return ToolResult(tool=self.name, ok=True, output="No memories", metadata={"count": 0})
|
|
||||||
|
|
||||||
output_lines = []
|
|
||||||
for entry in entries:
|
|
||||||
output_lines.append(f"{entry.kind}: {entry.text[:80]}")
|
|
||||||
|
|
||||||
return ToolResult(
|
|
||||||
tool=self.name,
|
|
||||||
ok=True,
|
|
||||||
output="\n".join(output_lines),
|
|
||||||
metadata={"count": len(entries)},
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Memory list failed: {e}")
|
|
||||||
return ToolResult(tool=self.name, ok=False, output="", error=str(e))
|
|
||||||
|
|
@ -1,22 +0,0 @@
|
||||||
{
|
|
||||||
"name": "memory",
|
|
||||||
"version": "1.0",
|
|
||||||
"entrypoint": "Tool",
|
|
||||||
"description": "Memory operations: insert, search, list",
|
|
||||||
"args_schema": {
|
|
||||||
"action": {
|
|
||||||
"type": "string",
|
|
||||||
"required": true,
|
|
||||||
"description": "Action: insert, search, or list",
|
|
||||||
"enum": ["insert", "search", "list"]
|
|
||||||
},
|
|
||||||
"text": {"type": "string", "required": false, "description": "Text to store (insert)"},
|
|
||||||
"query": {"type": "string", "required": false, "description": "Query string (search)"},
|
|
||||||
"kind": {"type": "string", "required": false, "description": "Memory kind: fact, command, etc"},
|
|
||||||
"source": {"type": "string", "required": false, "description": "Source: user, system, etc"},
|
|
||||||
"weight": {"type": "number", "required": false, "description": "Memory weight 0-1"},
|
|
||||||
"top_k": {"type": "number", "required": false, "description": "Max results (search)"},
|
|
||||||
"limit": {"type": "number", "required": false, "description": "Max entries (list)"}
|
|
||||||
},
|
|
||||||
"requires_permission": false
|
|
||||||
}
|
|
||||||
|
|
@ -1,55 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from app.core.contracts import ToolResult, UserTask
|
|
||||||
from app.tools.base import BaseTool
|
|
||||||
from app.tools.sandbox import ToolSandbox
|
|
||||||
|
|
||||||
|
|
||||||
def _detect_sudo_auth_failure(output: str) -> bool:
|
|
||||||
normalized = output.lower()
|
|
||||||
return any(
|
|
||||||
marker in normalized
|
|
||||||
for marker in (
|
|
||||||
"incorrect password",
|
|
||||||
"incorrect password attempt",
|
|
||||||
"sudo: no password was provided",
|
|
||||||
"sorry, try again",
|
|
||||||
"authentication failure",
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
class Tool(BaseTool):
|
|
||||||
name = "shell_exec"
|
|
||||||
description = "Execute shell commands"
|
|
||||||
|
|
||||||
def __init__(self, sandbox: ToolSandbox) -> None:
|
|
||||||
self._sandbox = sandbox
|
|
||||||
|
|
||||||
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
|
|
||||||
command = str(args.get("command", "")).strip()
|
|
||||||
if not command:
|
|
||||||
return ToolResult(tool=self.name, ok=False, error="Missing command", metadata={"exit_code": -1})
|
|
||||||
cwd = args.get("cwd")
|
|
||||||
stdin_secret = args.get("stdin_secret")
|
|
||||||
output_callback = args.get("__output_callback")
|
|
||||||
completed = self._sandbox.run_shell(
|
|
||||||
command=command,
|
|
||||||
cwd=str(cwd) if cwd else None,
|
|
||||||
stdin_data=str(stdin_secret) if stdin_secret is not None else None,
|
|
||||||
output_callback=output_callback if callable(output_callback) else None,
|
|
||||||
)
|
|
||||||
output = completed.stdout if completed.returncode == 0 else completed.stderr or completed.stdout
|
|
||||||
sudo_auth_failed = completed.returncode != 0 and _detect_sudo_auth_failure(
|
|
||||||
f"{completed.stdout}\n{completed.stderr}"
|
|
||||||
)
|
|
||||||
return ToolResult(
|
|
||||||
tool=self.name,
|
|
||||||
ok=completed.returncode == 0,
|
|
||||||
output=output,
|
|
||||||
error=None if completed.returncode == 0 else f"Command failed with exit code {completed.returncode}",
|
|
||||||
metadata={
|
|
||||||
"exit_code": completed.returncode,
|
|
||||||
"sudo_auth_failed": sudo_auth_failed,
|
|
||||||
},
|
|
||||||
)
|
|
||||||
|
|
@ -1,12 +0,0 @@
|
||||||
{
|
|
||||||
"name": "shell_exec",
|
|
||||||
"version": "1.0",
|
|
||||||
"entrypoint": "Tool",
|
|
||||||
"description": "Execute shell commands in sandboxed environment",
|
|
||||||
"args_schema": {
|
|
||||||
"command": {"type": "string", "required": true, "description": "Shell command to execute"},
|
|
||||||
"cwd": {"type": "string", "required": false, "description": "Working directory"},
|
|
||||||
"stdin_secret": {"type": "string", "required": false, "description": "Data to pass via stdin"}
|
|
||||||
},
|
|
||||||
"requires_permission": true
|
|
||||||
}
|
|
||||||
|
|
@ -1,61 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import logging
|
|
||||||
from typing import Any, Callable
|
|
||||||
|
|
||||||
from app.tools.base import BaseTool
|
|
||||||
from app.tools.discover import ToolDiscovery
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
|
|
||||||
class ToolRegistry:
|
|
||||||
def __init__(self) -> None:
|
|
||||||
self._tools: dict[str, BaseTool] = {}
|
|
||||||
self._schemas: dict[str, dict[str, Any]] = {}
|
|
||||||
|
|
||||||
def register(self, tool: BaseTool) -> None:
|
|
||||||
self._tools[tool.name] = tool
|
|
||||||
|
|
||||||
def discover_and_init(
|
|
||||||
self,
|
|
||||||
init_factory: Callable[[dict], BaseTool] | None = None,
|
|
||||||
) -> None:
|
|
||||||
"""Discover tools from plugins and initialize them."""
|
|
||||||
discovery = ToolDiscovery()
|
|
||||||
discovered = discovery.discover()
|
|
||||||
|
|
||||||
for name, data in discovered.items():
|
|
||||||
manifest = data.get("manifest", {})
|
|
||||||
|
|
||||||
if init_factory:
|
|
||||||
tool = init_factory({"name": name, "manifest": manifest})
|
|
||||||
else:
|
|
||||||
tool_instance = data.get("instance")
|
|
||||||
if tool_instance:
|
|
||||||
self._tools[name] = tool_instance
|
|
||||||
self._schemas[name] = {
|
|
||||||
"description": manifest.get("description", ""),
|
|
||||||
"args_schema": manifest.get("args_schema", {}),
|
|
||||||
"requires_permission": manifest.get("requires_permission", False),
|
|
||||||
}
|
|
||||||
logger.info(f"Registered tool: {name}")
|
|
||||||
logger.warning(f"No init_factory provided for {name}")
|
|
||||||
|
|
||||||
def get(self, name: str) -> BaseTool:
|
|
||||||
if name not in self._tools:
|
|
||||||
raise KeyError(f"Tool {name} is not registered")
|
|
||||||
return self._tools[name]
|
|
||||||
|
|
||||||
def list_names(self) -> list[str]:
|
|
||||||
return list(self._tools.keys())
|
|
||||||
|
|
||||||
def get_schema(self, name: str) -> dict[str, Any]:
|
|
||||||
return self._schemas.get(name, {})
|
|
||||||
|
|
||||||
def list_schemas(self) -> list[dict[str, Any]]:
|
|
||||||
return [
|
|
||||||
{"name": name, **schema}
|
|
||||||
for name, schema in self._schemas.items()
|
|
||||||
]
|
|
||||||
|
|
||||||
|
|
@ -1,139 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import os
|
|
||||||
import signal
|
|
||||||
import subprocess
|
|
||||||
import threading
|
|
||||||
import time
|
|
||||||
from pathlib import Path
|
|
||||||
from typing import Callable
|
|
||||||
|
|
||||||
|
|
||||||
class ToolSandbox:
|
|
||||||
"""Applies simple working directory and timeout restrictions."""
|
|
||||||
|
|
||||||
def __init__(
|
|
||||||
self,
|
|
||||||
allowed_root: str | Path,
|
|
||||||
timeout_ms: int,
|
|
||||||
command_timeout_ms: int | None = None,
|
|
||||||
idle_timeout_ms: int | None = None,
|
|
||||||
) -> None:
|
|
||||||
self._allowed_root = Path(allowed_root).resolve()
|
|
||||||
self._timeout_seconds = max(timeout_ms / 1000, 0.001)
|
|
||||||
self._command_timeout_seconds = max((command_timeout_ms or timeout_ms) / 1000, 0.001)
|
|
||||||
self._idle_timeout_seconds = max((idle_timeout_ms or timeout_ms) / 1000, 0.001)
|
|
||||||
|
|
||||||
def ensure_path_allowed(self, path: str | Path) -> Path:
|
|
||||||
resolved = Path(path).expanduser().resolve()
|
|
||||||
# Permission-first model: path is allowed if it exists
|
|
||||||
# Permission service will handle write/shell restrictions
|
|
||||||
return resolved
|
|
||||||
|
|
||||||
def run_shell(
|
|
||||||
self,
|
|
||||||
command: str,
|
|
||||||
cwd: str | Path | None = None,
|
|
||||||
stdin_data: str | None = None,
|
|
||||||
output_callback: Callable[[str, str], None] | None = None,
|
|
||||||
) -> subprocess.CompletedProcess[str]:
|
|
||||||
working_directory = self.ensure_path_allowed(cwd or self._allowed_root)
|
|
||||||
env = {"PATH": os.environ.get("PATH", "")}
|
|
||||||
if output_callback is None:
|
|
||||||
return subprocess.run(
|
|
||||||
command,
|
|
||||||
shell=True,
|
|
||||||
cwd=str(working_directory),
|
|
||||||
env=env,
|
|
||||||
text=True,
|
|
||||||
capture_output=True,
|
|
||||||
input=stdin_data,
|
|
||||||
timeout=self._command_timeout_seconds,
|
|
||||||
check=False,
|
|
||||||
)
|
|
||||||
|
|
||||||
process = subprocess.Popen(
|
|
||||||
command,
|
|
||||||
shell=True,
|
|
||||||
cwd=str(working_directory),
|
|
||||||
env=env,
|
|
||||||
text=True,
|
|
||||||
stdin=subprocess.PIPE if stdin_data is not None else None,
|
|
||||||
stdout=subprocess.PIPE,
|
|
||||||
stderr=subprocess.PIPE,
|
|
||||||
start_new_session=True,
|
|
||||||
)
|
|
||||||
|
|
||||||
stdout_chunks: list[str] = []
|
|
||||||
stderr_chunks: list[str] = []
|
|
||||||
output_lock = threading.Lock()
|
|
||||||
last_output_at = time.monotonic()
|
|
||||||
|
|
||||||
if stdin_data is not None and process.stdin is not None:
|
|
||||||
process.stdin.write(stdin_data)
|
|
||||||
process.stdin.close()
|
|
||||||
|
|
||||||
def read_stream(stream_name: str) -> None:
|
|
||||||
stream = process.stdout if stream_name == "stdout" else process.stderr
|
|
||||||
if stream is None:
|
|
||||||
return
|
|
||||||
chunks = stdout_chunks if stream_name == "stdout" else stderr_chunks
|
|
||||||
try:
|
|
||||||
for line in iter(stream.readline, ""):
|
|
||||||
if not line:
|
|
||||||
break
|
|
||||||
chunks.append(line)
|
|
||||||
nonlocal last_output_at
|
|
||||||
with output_lock:
|
|
||||||
last_output_at = time.monotonic()
|
|
||||||
output_callback(stream_name, line)
|
|
||||||
finally:
|
|
||||||
stream.close()
|
|
||||||
|
|
||||||
stdout_thread = threading.Thread(target=read_stream, args=("stdout",), daemon=True)
|
|
||||||
stderr_thread = threading.Thread(target=read_stream, args=("stderr",), daemon=True)
|
|
||||||
stdout_thread.start()
|
|
||||||
stderr_thread.start()
|
|
||||||
|
|
||||||
timed_out = False
|
|
||||||
timeout_reason: str | None = None
|
|
||||||
started_at = time.monotonic()
|
|
||||||
return_code: int | None = None
|
|
||||||
while return_code is None:
|
|
||||||
return_code = process.poll()
|
|
||||||
if return_code is not None:
|
|
||||||
break
|
|
||||||
|
|
||||||
now = time.monotonic()
|
|
||||||
with output_lock:
|
|
||||||
idle_for = now - last_output_at
|
|
||||||
if now - started_at > self._command_timeout_seconds:
|
|
||||||
timed_out = True
|
|
||||||
timeout_reason = f"Command timed out after {self._command_timeout_seconds:.0f}s"
|
|
||||||
break
|
|
||||||
if idle_for > self._idle_timeout_seconds:
|
|
||||||
timed_out = True
|
|
||||||
timeout_reason = f"Command produced no output for {self._idle_timeout_seconds:.0f}s"
|
|
||||||
break
|
|
||||||
time.sleep(0.1)
|
|
||||||
|
|
||||||
if timed_out:
|
|
||||||
try:
|
|
||||||
os.killpg(process.pid, signal.SIGKILL)
|
|
||||||
except ProcessLookupError:
|
|
||||||
pass
|
|
||||||
except PermissionError:
|
|
||||||
process.kill()
|
|
||||||
return_code = process.wait()
|
|
||||||
timeout_message = f"{timeout_reason}\n"
|
|
||||||
stderr_chunks.append(timeout_message)
|
|
||||||
output_callback("stderr", timeout_message)
|
|
||||||
|
|
||||||
stdout_thread.join(timeout=1)
|
|
||||||
stderr_thread.join(timeout=1)
|
|
||||||
return subprocess.CompletedProcess(
|
|
||||||
args=command,
|
|
||||||
returncode=return_code if not timed_out else -9,
|
|
||||||
stdout="".join(stdout_chunks),
|
|
||||||
stderr="".join(stderr_chunks),
|
|
||||||
)
|
|
||||||
|
|
@ -1,65 +0,0 @@
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from app.core.contracts import ToolResult, UserTask
|
|
||||||
from app.tools.base import BaseTool
|
|
||||||
from app.tools.sandbox import ToolSandbox
|
|
||||||
|
|
||||||
|
|
||||||
def _detect_sudo_auth_failure(output: str) -> bool:
|
|
||||||
normalized = output.lower()
|
|
||||||
return any(
|
|
||||||
marker in normalized
|
|
||||||
for marker in (
|
|
||||||
"incorrect password",
|
|
||||||
"incorrect password attempt",
|
|
||||||
"sudo: no password was provided",
|
|
||||||
"sudo: password incorrect",
|
|
||||||
"sorry, try again",
|
|
||||||
"authentication failure",
|
|
||||||
"wrong password",
|
|
||||||
)
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
class ShellExecTool(BaseTool):
|
|
||||||
name = "shell_exec"
|
|
||||||
|
|
||||||
def __init__(self, sandbox: ToolSandbox) -> None:
|
|
||||||
self._sandbox = sandbox
|
|
||||||
|
|
||||||
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
|
|
||||||
command = str(args.get("command", "")).strip()
|
|
||||||
if not command:
|
|
||||||
return ToolResult(tool=self.name, ok=False, error="Missing command", metadata={"exit_code": -1})
|
|
||||||
cwd = args.get("cwd")
|
|
||||||
stdin_secret = args.get("stdin_secret")
|
|
||||||
password = args.get("password")
|
|
||||||
output_callback = args.get("__output_callback")
|
|
||||||
|
|
||||||
if password:
|
|
||||||
command = f'echo "{password}" | sudo -S {command}'
|
|
||||||
|
|
||||||
completed = self._sandbox.run_shell(
|
|
||||||
command=command,
|
|
||||||
cwd=str(cwd) if cwd else None,
|
|
||||||
stdin_data=str(stdin_secret) if stdin_secret is not None else None,
|
|
||||||
output_callback=output_callback if callable(output_callback) else None,
|
|
||||||
)
|
|
||||||
output = completed.stdout if completed.returncode == 0 else completed.stderr or completed.stdout
|
|
||||||
error_output = completed.stderr or completed.stdout
|
|
||||||
sudo_auth_failed = completed.returncode != 0 and _detect_sudo_auth_failure(
|
|
||||||
f"{completed.stdout}\n{completed.stderr}"
|
|
||||||
)
|
|
||||||
needs_sudo = completed.returncode != 0 and "permission denied" in error_output.lower() and not sudo_auth_failed
|
|
||||||
|
|
||||||
return ToolResult(
|
|
||||||
tool=self.name,
|
|
||||||
ok=completed.returncode == 0,
|
|
||||||
output=output,
|
|
||||||
error=None if completed.returncode == 0 else f"Command failed with exit code {completed.returncode}",
|
|
||||||
metadata={
|
|
||||||
"exit_code": completed.returncode,
|
|
||||||
"needs_sudo": needs_sudo,
|
|
||||||
"sudo_auth_failed": sudo_auth_failed,
|
|
||||||
},
|
|
||||||
)
|
|
||||||
|
|
@ -1,42 +0,0 @@
|
||||||
{
|
|
||||||
"thinker": {
|
|
||||||
"path": "Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf",
|
|
||||||
"backend": "vulkan",
|
|
||||||
"n_gpu_layers": -1,
|
|
||||||
"max_tokens": 2048,
|
|
||||||
"temperature": 0.3
|
|
||||||
},
|
|
||||||
"json_compiler": {
|
|
||||||
"path": "gemma-4-E4B-it-Q4_K_M.gguf",
|
|
||||||
"backend": "cpu",
|
|
||||||
"n_gpu_layers": 0,
|
|
||||||
"max_tokens": 1024,
|
|
||||||
"temperature": 0.1
|
|
||||||
},
|
|
||||||
"coder": {
|
|
||||||
"path": "X-Coder-SFT-Qwen3-8B.Q6_K.gguf",
|
|
||||||
"backend": "cpu",
|
|
||||||
"n_gpu_layers": 0,
|
|
||||||
"max_tokens": 2048,
|
|
||||||
"temperature": 0.2
|
|
||||||
},
|
|
||||||
"critic": {
|
|
||||||
"path": "gemma-4-E4B-it-Q4_K_M.gguf",
|
|
||||||
"backend": "cpu",
|
|
||||||
"n_gpu_layers": 0,
|
|
||||||
"max_tokens": 1024,
|
|
||||||
"temperature": 0.1
|
|
||||||
},
|
|
||||||
"sys_util": {
|
|
||||||
"path": "Menlo_Lucy-Q4_K_M.gguf",
|
|
||||||
"backend": "cpu",
|
|
||||||
"n_gpu_layers": 0,
|
|
||||||
"max_tokens": 1024,
|
|
||||||
"temperature": 0.1
|
|
||||||
},
|
|
||||||
"embeddings": {
|
|
||||||
"path": "all-MiniLM-L6-v2",
|
|
||||||
"model_name": "sentence-transformers/all-MiniLM-L6-v2",
|
|
||||||
"embedding_dim": 384
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
@ -1,42 +0,0 @@
|
||||||
{
|
|
||||||
"thinker": {
|
|
||||||
"path": "Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf",
|
|
||||||
"backend": "vulkan",
|
|
||||||
"n_gpu_layers": -1,
|
|
||||||
"max_tokens": 2048,
|
|
||||||
"temperature": 0.3
|
|
||||||
},
|
|
||||||
"json_compiler": {
|
|
||||||
"path": "gemma-4-E4B-it-Q4_K_M.gguf",
|
|
||||||
"backend": "cpu",
|
|
||||||
"n_gpu_layers": 0,
|
|
||||||
"max_tokens": 1024,
|
|
||||||
"temperature": 0.1
|
|
||||||
},
|
|
||||||
"coder": {
|
|
||||||
"path": "X-Coder-SFT-Qwen3-8B.Q6_K.gguf",
|
|
||||||
"backend": "cpu",
|
|
||||||
"n_gpu_layers": 0,
|
|
||||||
"max_tokens": 2048,
|
|
||||||
"temperature": 0.2
|
|
||||||
},
|
|
||||||
"critic": {
|
|
||||||
"path": "gemma-4-E4B-it-Q4_K_M.gguf",
|
|
||||||
"backend": "cpu",
|
|
||||||
"n_gpu_layers": 0,
|
|
||||||
"max_tokens": 1024,
|
|
||||||
"temperature": 0.1
|
|
||||||
},
|
|
||||||
"sys_util": {
|
|
||||||
"path": "Menlo_Lucy-Q4_K_M.gguf",
|
|
||||||
"backend": "cpu",
|
|
||||||
"n_gpu_layers": 0,
|
|
||||||
"max_tokens": 1024,
|
|
||||||
"temperature": 0.1
|
|
||||||
},
|
|
||||||
"embeddings": {
|
|
||||||
"path": "all-MiniLM-L6-v2",
|
|
||||||
"model_name": "sentence-transformers/all-MiniLM-L6-v2",
|
|
||||||
"embedding_dim": 384
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
@ -1,42 +0,0 @@
|
||||||
{
|
|
||||||
"thinker": {
|
|
||||||
"path": "Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf",
|
|
||||||
"backend": "vulkan",
|
|
||||||
"n_gpu_layers": -1,
|
|
||||||
"max_tokens": 2048,
|
|
||||||
"temperature": 0.3
|
|
||||||
},
|
|
||||||
"json_compiler": {
|
|
||||||
"path": "gemma-4-E4B-it-Q4_K_M.gguf",
|
|
||||||
"backend": "cpu",
|
|
||||||
"n_gpu_layers": 0,
|
|
||||||
"max_tokens": 1024,
|
|
||||||
"temperature": 0.1
|
|
||||||
},
|
|
||||||
"coder": {
|
|
||||||
"path": "X-Coder-SFT-Qwen3-8B.Q6_K.gguf",
|
|
||||||
"backend": "cpu",
|
|
||||||
"n_gpu_layers": 0,
|
|
||||||
"max_tokens": 2048,
|
|
||||||
"temperature": 0.2
|
|
||||||
},
|
|
||||||
"critic": {
|
|
||||||
"path": "gemma-4-E4B-it-Q4_K_M.gguf",
|
|
||||||
"backend": "cpu",
|
|
||||||
"n_gpu_layers": 0,
|
|
||||||
"max_tokens": 1024,
|
|
||||||
"temperature": 0.1
|
|
||||||
},
|
|
||||||
"sys_util": {
|
|
||||||
"path": "Menlo_Lucy-Q4_K_M.gguf",
|
|
||||||
"backend": "cpu",
|
|
||||||
"n_gpu_layers": 0,
|
|
||||||
"max_tokens": 1024,
|
|
||||||
"temperature": 0.1
|
|
||||||
},
|
|
||||||
"embeddings": {
|
|
||||||
"path": "all-MiniLM-L6-v2",
|
|
||||||
"model_name": "sentence-transformers/all-MiniLM-L6-v2",
|
|
||||||
"embedding_dim": 384
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
@ -0,0 +1,53 @@
|
||||||
|
default_provider: llama_server
|
||||||
|
|
||||||
|
models:
|
||||||
|
thinker:
|
||||||
|
provider: llama_server
|
||||||
|
base_url: http://127.0.0.1:8081/v1
|
||||||
|
model: local-main
|
||||||
|
purpose: free_cognition
|
||||||
|
structured_output: false
|
||||||
|
temperature: 0.4
|
||||||
|
max_output_tokens: 8192
|
||||||
|
system_prompt: prompts/roles/thinker.md
|
||||||
|
|
||||||
|
critic:
|
||||||
|
provider: llama_server
|
||||||
|
base_url: http://127.0.0.1:8081/v1
|
||||||
|
model: local-main
|
||||||
|
purpose: reflection
|
||||||
|
structured_output: false
|
||||||
|
temperature: 0.1
|
||||||
|
max_output_tokens: 4096
|
||||||
|
system_prompt: prompts/roles/critic.md
|
||||||
|
|
||||||
|
coder:
|
||||||
|
provider: llama_server
|
||||||
|
base_url: http://127.0.0.1:8081/v1
|
||||||
|
model: local-main
|
||||||
|
purpose: code_generation
|
||||||
|
structured_output: false
|
||||||
|
temperature: 0.2
|
||||||
|
max_output_tokens: 16384
|
||||||
|
system_prompt: prompts/roles/coder.md
|
||||||
|
|
||||||
|
action:
|
||||||
|
provider: llama_server
|
||||||
|
base_url: http://127.0.0.1:8081/v1
|
||||||
|
model: local-main
|
||||||
|
purpose: action_directive
|
||||||
|
structured_output: true
|
||||||
|
temperature: 0.0
|
||||||
|
max_output_tokens: 2048
|
||||||
|
system_prompt: prompts/roles/action.md
|
||||||
|
response_schema: duck_core/schemas/action_directive.schema.json
|
||||||
|
|
||||||
|
summary:
|
||||||
|
provider: llama_server
|
||||||
|
base_url: http://127.0.0.1:8081/v1
|
||||||
|
model: local-main
|
||||||
|
purpose: context_summary
|
||||||
|
structured_output: false
|
||||||
|
temperature: 0.1
|
||||||
|
max_output_tokens: 4096
|
||||||
|
system_prompt: prompts/roles/summary.md
|
||||||
|
|
@ -1,94 +0,0 @@
|
||||||
{
|
|
||||||
"description": "Permission-first model configuration",
|
|
||||||
"settings": {
|
|
||||||
"allow_caching": true,
|
|
||||||
"cache_file": "data/runtime/allowed_commands.json",
|
|
||||||
"normalize_commands": true,
|
|
||||||
"split_chained": true
|
|
||||||
},
|
|
||||||
"command_categories": {
|
|
||||||
"hard_stop": {
|
|
||||||
"description": "Commands that are never executed - hard stop",
|
|
||||||
"allow_once": false,
|
|
||||||
"allow_always": false,
|
|
||||||
"commands": [
|
|
||||||
"rm -rf /",
|
|
||||||
"rm -rf /*",
|
|
||||||
"dd if=/dev/zero of=/dev/sd*",
|
|
||||||
"dd if=/dev/zero of=/dev/hd*",
|
|
||||||
"mkfs",
|
|
||||||
"> /dev/sd*",
|
|
||||||
"> /dev/hd*"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"no_always": {
|
|
||||||
"description": "Dangerous commands - allow once only",
|
|
||||||
"allow_once": true,
|
|
||||||
"allow_always": false,
|
|
||||||
"commands": [
|
|
||||||
"rm -rf *",
|
|
||||||
"rm -rf .*",
|
|
||||||
"curl |",
|
|
||||||
"wget -O- |",
|
|
||||||
":(){:|:&};:",
|
|
||||||
"fork",
|
|
||||||
"chmod -R 000",
|
|
||||||
"chmod -R 777",
|
|
||||||
"chown -R",
|
|
||||||
"apt",
|
|
||||||
"apt-get",
|
|
||||||
"dpkg",
|
|
||||||
"yum",
|
|
||||||
"dnf",
|
|
||||||
"pacman",
|
|
||||||
"shutdown",
|
|
||||||
"reboot",
|
|
||||||
"halt",
|
|
||||||
"init 0",
|
|
||||||
"init 6",
|
|
||||||
"telinit",
|
|
||||||
"systemctl stop",
|
|
||||||
"systemctl start",
|
|
||||||
"systemctl restart",
|
|
||||||
"service stop",
|
|
||||||
"service start",
|
|
||||||
"kill -9 -1",
|
|
||||||
"killall",
|
|
||||||
"pkill -9",
|
|
||||||
"reboot -f",
|
|
||||||
"shutdown -h now",
|
|
||||||
"poweroff",
|
|
||||||
"echo .* > /proc/",
|
|
||||||
"echo .* > /sys/"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"normal": {
|
|
||||||
"description": "Normal commands - allow once or always",
|
|
||||||
"allow_once": true,
|
|
||||||
"allow_always": true,
|
|
||||||
"commands": [
|
|
||||||
"shell_exec",
|
|
||||||
"file_write"
|
|
||||||
],
|
|
||||||
"file_extensions": [
|
|
||||||
".py",
|
|
||||||
".txt",
|
|
||||||
".json",
|
|
||||||
".md",
|
|
||||||
".yaml",
|
|
||||||
".yml",
|
|
||||||
".sh",
|
|
||||||
".bash"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"path_settings": {
|
|
||||||
"allow_read_outside": true,
|
|
||||||
"allow_write_paths": [
|
|
||||||
"/home/mirivlad/git/ducklm",
|
|
||||||
"/tmp"
|
|
||||||
],
|
|
||||||
"require_confirmation_for_write": true,
|
|
||||||
"require_confirmation_for_shell": true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
@ -1,15 +0,0 @@
|
||||||
{
|
|
||||||
"thinker": "You are the orchestrator of a local AI agent runtime. Your job is to analyze the user's task and decide how to execute it.\n\n## Decision Types\n\n1. **Direct response** — for simple questions, greetings, conversations:\n {\"type\": \"respond\", \"payload\": {\"text\": \"your answer\"}}\n\n2. **Single tool step** — for simple tasks needing one tool:\n {\"type\": \"step\", \"payload\": {\"tool\": \"shell_exec\", \"args\": {\"command\": \"...\"}}}\n\n3. **Multi-step plan** — for complex tasks that need decomposition:\n {\"type\": \"plan\", \"payload\": {\"steps\": [\n {\"id\": \"step-1\", \"tool\": \"shell_exec\", \"args\": {\"command\": \"...\"}, \"description\": \"...\", \"depends_on\": []},\n {\"id\": \"step-2\", \"tool\": \"file_read\", \"args\": {\"path\": \"...\"}, \"description\": \"...\", \"depends_on\": [\"step-1\"]}\n ]}}\n\n## When to use multi-step plan\n- Task requires multiple operations (search → read → write)\n- Task involves checking prerequisites before acting\n- Task requires gathering information before producing result\n- User asks to do something complex (setup, configure, analyze)\n\n## Memory\n- If memory recall results are provided, USE them to inform your decisions\n- If you know something from memory, mention it in step descriptions\n- Store important results for future use\n\n## Rules\n- ALWAYS respond with valid JSON only\n- Each step MUST have a unique id\n- Use depends_on for ordering constraints\n- Keep steps focused — one action per step\n- If unsure, start with an information-gathering step\n- Respond ONLY with valid JSON, no explanations",
|
|
||||||
|
|
||||||
"orchestrator": "You are an expert orchestrator for a local AI agent system. Your role is to analyze the user's task and generate executable runtime steps.\n\nTool selection (choose the right tool):\n- shell_exec: for running commands, checking programs exist ('which', '--version'), searching files\n- file_read: for reading contents of a file (must be existing file path)\n- file_write: for creating or updating files\n- memory: for storing or searching memory\n\nSTRICT OUTPUT FORMAT - MUST follow exactly:\n\nSingle step:\n{\"type\": \"step\", \"payload\": {\"tool\": \"shell_exec\", \"args\": {\"command\": \"...\"}}}\n{\"type\": \"step\", \"payload\": {\"tool\": \"file_read\", \"args\": {\"path\": \"...\"}}}\n{\"type\": \"step\", \"payload\": {\"tool\": \"file_write\", \"args\": {\"path\": \"...\", \"content\": \"...\"}}}\n\nMulti-step plan:\n{\"type\": \"plan\", \"payload\": {\"steps\": [{\"tool\": \"file_read\", \"args\": {\"path\": \"...\"}, \"description\": \"...\", \"depends_on\": []}]}}\n\nDirect response:\n{\"type\": \"respond\", \"payload\": {\"text\": \"...\"}}\n\nIMPORTANT:\n- Use exactly {\"type\": \"step|plan|respond\", \"payload\": {...}} format\n- Do NOT output array alone\n- Do NOT use \"kind\" - use \"type\"\n- Respond ONLY with valid JSON\n- Your response MUST be complete valid JSON - the closing brace } MUST be present\n- Do NOT truncate your response - if you cannot fit all steps, use a single step\n\nTool selection:\n- For checking if a program/command exists: use shell_exec with 'which <program>' or '<program> --version'\n- For reading file contents: use file_read with path to file (NOT command)\n- For executing any command: use shell_exec\n- Previous experience (from memory) may help - consider it but YOU decide how to proceed",
|
|
||||||
|
|
||||||
"planning": "You are a planning specialist. Generate execution plans.\n\nOutput MUST be:\n{\"type\": \"plan\", \"version\": \"1.0\", \"payload\": {\"steps\": [{\"tool\": \"\", \"args\": {}, \"description\": \"...\", \"depends_on\": []}]}}\n\nRules:\n- Each step must have unique id (auto-generated)\n- Use \"depends_on\" for step ordering\n- Use \"tool\" for tool operations\n- Respond ONLY with valid JSON",
|
|
||||||
|
|
||||||
"coder": "You are an expert code generation model.\n\nOutput format:\n{\"type\": \"code\", \"payload\": {\"language\": \"python\", \"content\": \"...\"}}\n\nOR for completion:\n{\"type\": \"respond\", \"payload\": {\"text\": \"...\"}}\n\nGenerate clean, working code. Respond ONLY with valid JSON.",
|
|
||||||
|
|
||||||
"critic": "You are a critic model. Evaluate tool execution results.\n\nScoring criteria:\n- correctness: 0-1 (does result accomplish task?)\n- usefulness: 0-1 (is result useful?)\n- safety: 0-1 (is result safe?)\n- suggest_memory: boolean (should this be stored in memory?)\n- weight: 0-1 (importance score)\n- explanation: brief reasoning\n\nOutput format:\n{\"type\": \"evaluation\", \"payload\": {\"correctness\": 0.0-1.0, \"usefulness\": 0.0-1.0, \"safety\": 0.0-1.0, \"suggest_memory\": true|false, \"weight\": 0.0-1.0, \"explanation\": \"...\"}}\n\nRespond ONLY with valid JSON.",
|
|
||||||
|
|
||||||
"system": "You are ducklm, a local AI agent runtime.\n\nSTRICT RULES:\n- You MUST strictly follow execution schemas\n- You are NOT allowed to output free-form text\n- All outputs MUST be valid JSON matching runtime contracts\n- Use exact tool names from available tool set\n\nCurrent capabilities:\n- Execute shell commands (shell_exec)\n- Read/write files (file_read, file_write)\n- Memory operations (memory)\n\nAlways respond with valid JSON.",
|
|
||||||
|
|
||||||
"sys_util": "You are a STRICT JSON repair engine inside a production AI runtime.\nYour job is ONLY to fix invalid JSON syntax.\nYou are NOT allowed to:\n- change meaning of data\n- add new fields\n- remove valid fields\n- interpret intent\n- explain anything\n- reformat structure logically\n---\nINPUT:\nYou receive a malformed or invalid JSON string.\n---\nOUTPUT RULES:\n- Output ONLY valid JSON\n- No markdown\n- No comments\n- No explanations\n- No extra text\n---\nREPAIR RULES (STRICT):\nFix ONLY syntax issues:\n- missing or extra commas\n- missing quotes\n- incorrect brackets\n- trailing commas\n- invalid escaping\n- broken strings\n- unbalanced braces\nDO NOT:\n- rename keys\n- reorder fields intentionally\n- guess missing semantic data\n- \"improve\" structure\n---\nIMPORTANT:\nIf multiple valid repairs exist:\n→ choose the minimal change that makes JSON valid\n---\nOUTPUT MUST BE VALID JSON OR NOTHING ELSE\nInvalid JSON:"
|
|
||||||
}
|
|
||||||
|
|
@ -1,9 +0,0 @@
|
||||||
You are an expert code generation model.
|
|
||||||
|
|
||||||
Output format:
|
|
||||||
{"type": "code", "payload": {"language": "python", "content": "..."}}
|
|
||||||
|
|
||||||
OR for completion:
|
|
||||||
{"type": "respond", "payload": {"text": "..."}}
|
|
||||||
|
|
||||||
Generate clean, working code. Respond ONLY with valid JSON.
|
|
||||||
|
|
@ -1,14 +0,0 @@
|
||||||
You are a critic model. Evaluate tool execution results.
|
|
||||||
|
|
||||||
Scoring criteria:
|
|
||||||
- correctness: 0-1 (does result accomplish task?)
|
|
||||||
- usefulness: 0-1 (is result useful?)
|
|
||||||
- safety: 0-1 (is result safe?)
|
|
||||||
- suggest_memory: boolean (should this be stored in memory?)
|
|
||||||
- weight: 0-1 (importance score)
|
|
||||||
- explanation: brief reasoning
|
|
||||||
|
|
||||||
Output format:
|
|
||||||
{"type": "evaluation", "payload": {"correctness": 0.0-1.0, "usefulness": 0.0-1.0, "safety": 0.0-1.0, "suggest_memory": true|false, "weight": 0.0-1.0, "explanation": "..."}}
|
|
||||||
|
|
||||||
Respond ONLY with valid JSON.
|
|
||||||
|
|
@ -1,25 +0,0 @@
|
||||||
You are a JSON Compiler. Convert semantic plan to strict JSON.
|
|
||||||
|
|
||||||
INPUT: Semantic plan from Thinker
|
|
||||||
OUTPUT: Valid JSON only
|
|
||||||
|
|
||||||
RULES:
|
|
||||||
- Convert ONLY, do not make decisions
|
|
||||||
- Do not invent tools
|
|
||||||
- Do not modify plan logic
|
|
||||||
- Do not skip steps
|
|
||||||
- Output ONLY valid JSON
|
|
||||||
|
|
||||||
AVAILABLE TOOLS:
|
|
||||||
- file_write (requires permission)
|
|
||||||
- shell_exec (execute shell commands, requires permission)
|
|
||||||
- memory (no permission needed)
|
|
||||||
- file_read (no permission needed)
|
|
||||||
- respond (just return text to user, no execution)
|
|
||||||
|
|
||||||
IMPORTANT: Use exactly "shell_exec" (not "shell") for shell commands!
|
|
||||||
|
|
||||||
OUTPUT FORMAT:
|
|
||||||
{"type": "plan", "payload": {"steps": [{"id": "1", "tool": "shell_exec", "args": {"command": "..."}, "depends_on": []}]}}
|
|
||||||
OR
|
|
||||||
{"type": "respond", "payload": {"text": "..."}}
|
|
||||||
|
|
@ -1,34 +0,0 @@
|
||||||
You are an expert orchestrator for a local AI agent system. Your role is to analyze the user's task and generate executable runtime steps.
|
|
||||||
|
|
||||||
Tool selection (choose the right tool):
|
|
||||||
- shell_exec: for running commands, checking programs exist ('which', '--version'), searching files
|
|
||||||
- file_read: for reading contents of a file (must be existing file path)
|
|
||||||
- file_write: for creating or updating files
|
|
||||||
- memory: for storing or searching memory
|
|
||||||
|
|
||||||
STRICT OUTPUT FORMAT - MUST follow exactly:
|
|
||||||
|
|
||||||
Single step:
|
|
||||||
{"type": "step", "payload": {"tool": "shell_exec", "args": {"command": "..."}}}
|
|
||||||
{"type": "step", "payload": {"tool": "file_read", "args": {"path": "..."}}}
|
|
||||||
{"type": "step", "payload": {"tool": "file_write", "args": {"path": "...", "content": "..."}}}
|
|
||||||
|
|
||||||
Multi-step plan:
|
|
||||||
{"type": "plan", "payload": {"steps": [{"tool": "file_read", "args": {"path": "..."}, "description": "...", "depends_on": []}]}}
|
|
||||||
|
|
||||||
Direct response:
|
|
||||||
{"type": "respond", "payload": {"text": "..."}}
|
|
||||||
|
|
||||||
IMPORTANT:
|
|
||||||
- Use exactly {"type": "step|plan|respond", "payload": {...}} format
|
|
||||||
- Do NOT output array alone
|
|
||||||
- Do NOT use "kind" - use "type"
|
|
||||||
- Respond ONLY with valid JSON
|
|
||||||
- Your response MUST be complete valid JSON - the closing brace } MUST be present
|
|
||||||
- Do NOT truncate your response - if you cannot fit all steps, use a single step
|
|
||||||
|
|
||||||
Tool selection:
|
|
||||||
- For checking if a program/command exists: use shell_exec with 'which <program>' or '<program> --version'
|
|
||||||
- For reading file contents: use file_read with path to file (NOT command)
|
|
||||||
- For executing any command: use shell_exec
|
|
||||||
- Previous experience (from memory) may help - consider it but YOU decide how to proceed
|
|
||||||
|
|
@ -1,10 +0,0 @@
|
||||||
You are a planning specialist. Generate execution plans.
|
|
||||||
|
|
||||||
Output MUST be:
|
|
||||||
{"type": "plan", "version": "1.0", "payload": {"steps": [{"tool": "", "args": {}, "description": "...", "depends_on": []}]}}
|
|
||||||
|
|
||||||
Rules:
|
|
||||||
- Each step must have unique id (auto-generated)
|
|
||||||
- Use "depends_on" for step ordering
|
|
||||||
- Use "tool" for tool operations
|
|
||||||
- Respond ONLY with valid JSON
|
|
||||||
|
|
@ -1,41 +0,0 @@
|
||||||
You are a STRICT JSON repair engine inside a production AI runtime.
|
|
||||||
Your job is ONLY to fix invalid JSON syntax.
|
|
||||||
You are NOT allowed to:
|
|
||||||
- change meaning of data
|
|
||||||
- add new fields
|
|
||||||
- remove valid fields
|
|
||||||
- interpret intent
|
|
||||||
- explain anything
|
|
||||||
- reformat structure logically
|
|
||||||
---
|
|
||||||
INPUT:
|
|
||||||
You receive a malformed or invalid JSON string.
|
|
||||||
---
|
|
||||||
OUTPUT RULES:
|
|
||||||
- Output ONLY valid JSON
|
|
||||||
- No markdown
|
|
||||||
- No comments
|
|
||||||
- No explanations
|
|
||||||
- No extra text
|
|
||||||
---
|
|
||||||
REPAIR RULES (STRICT):
|
|
||||||
Fix ONLY syntax issues:
|
|
||||||
- missing or extra commas
|
|
||||||
- missing quotes
|
|
||||||
- incorrect brackets
|
|
||||||
- trailing commas
|
|
||||||
- invalid escaping
|
|
||||||
- broken strings
|
|
||||||
- unbalanced braces
|
|
||||||
DO NOT:
|
|
||||||
- rename keys
|
|
||||||
- reorder fields intentionally
|
|
||||||
- guess missing semantic data
|
|
||||||
- "improve" structure
|
|
||||||
---
|
|
||||||
IMPORTANT:
|
|
||||||
If multiple valid repairs exist:
|
|
||||||
→ choose the minimal change that makes JSON valid
|
|
||||||
---
|
|
||||||
OUTPUT MUST BE VALID JSON OR NOTHING ELSE
|
|
||||||
Invalid JSON:
|
|
||||||
|
|
@ -1,14 +0,0 @@
|
||||||
You are ducklm, a local AI agent runtime.
|
|
||||||
|
|
||||||
STRICT RULES:
|
|
||||||
- You MUST strictly follow execution schemas
|
|
||||||
- You are NOT allowed to output free-form text
|
|
||||||
- All outputs MUST be valid JSON matching runtime contracts
|
|
||||||
- Use exact tool names from available tool set
|
|
||||||
|
|
||||||
Current capabilities:
|
|
||||||
- Execute shell commands (shell_exec)
|
|
||||||
- Read/write files (file_read, file_write)
|
|
||||||
- Memory operations (memory)
|
|
||||||
|
|
||||||
Always respond with valid JSON.
|
|
||||||
|
|
@ -1,36 +0,0 @@
|
||||||
You are a Thinker. Analyze user task and create execution plan.
|
|
||||||
|
|
||||||
CONTEXT:
|
|
||||||
{task_summary}
|
|
||||||
{memory_context}
|
|
||||||
|
|
||||||
AVAILABLE TOOLS (injected at runtime):
|
|
||||||
{tools_json}
|
|
||||||
|
|
||||||
INSTRUCTIONS:
|
|
||||||
1. Understand what user wants
|
|
||||||
2. Create step-by-step plan in natural language
|
|
||||||
3. Choose appropriate tools from available
|
|
||||||
4. If the user asks about the current local machine, filesystem, processes,
|
|
||||||
packages, logs, runtime state, or anything that must be observed rather than
|
|
||||||
answered from general knowledge, use an appropriate tool.
|
|
||||||
5. For exploratory tasks, prefer one robust inspection command over many brittle
|
|
||||||
dependent checks. Missing optional files should be treated as information, not
|
|
||||||
as a fatal failure.
|
|
||||||
|
|
||||||
MODE: {mode_hint}
|
|
||||||
- If mode is "execution": create a plan with TOOL STEPS (shell_exec, file_write, etc)
|
|
||||||
- If mode is "conversation": just respond with text, NO tool execution
|
|
||||||
- If mode is "clarification_needed": ask user for clarification
|
|
||||||
|
|
||||||
OUTPUT FORMAT (SEMANTIC PLAN - NOT JSON):
|
|
||||||
For execution mode:
|
|
||||||
ПЛАН:
|
|
||||||
Шаг 1: [use tool - e.g., shell_exec]
|
|
||||||
Шаг 2: [use tool]
|
|
||||||
|
|
||||||
For conversation mode:
|
|
||||||
ОТВЕТ: [just text, no tools needed]
|
|
||||||
|
|
||||||
For clarification:
|
|
||||||
ОТВЕТ: [вопрос пользователю для уточнения]
|
|
||||||
|
|
@ -1,42 +0,0 @@
|
||||||
{
|
|
||||||
"step_timeout_ms": 30000,
|
|
||||||
"task_timeout_ms": 300000,
|
|
||||||
"shell_command_timeout_ms": 3600000,
|
|
||||||
"shell_idle_timeout_ms": 600000,
|
|
||||||
"planner_retry_limit": 2,
|
|
||||||
"tool_retry_limit": 1,
|
|
||||||
"replan_limit": 1,
|
|
||||||
"max_execution_steps": 20,
|
|
||||||
"retrieval_top_k": 5,
|
|
||||||
"max_context_tokens": 8192,
|
|
||||||
"context_budgets": {
|
|
||||||
"system": 512,
|
|
||||||
"task": 512,
|
|
||||||
"memory": 2048,
|
|
||||||
"execution": 2048,
|
|
||||||
"tools": 1024,
|
|
||||||
"safety": 512
|
|
||||||
},
|
|
||||||
"reserve_for_generation_pct": 25,
|
|
||||||
"orchestrator_retry_limit": 2,
|
|
||||||
"memory_thresholds": {
|
|
||||||
"default_store_weight": 0.8
|
|
||||||
},
|
|
||||||
"critic_fallback_policy": "continue_without_critic",
|
|
||||||
"checkpoint_policy": {
|
|
||||||
"save_on_transition": true
|
|
||||||
},
|
|
||||||
"event_retention_policy": {
|
|
||||||
"keep_all": true
|
|
||||||
},
|
|
||||||
"streaming_settings": {
|
|
||||||
"enabled": true
|
|
||||||
},
|
|
||||||
"debug": true,
|
|
||||||
"debug_orchestrator_log_length": 500,
|
|
||||||
"json_fix_retry_limit": 2,
|
|
||||||
"json_fix_use_sys_util": true,
|
|
||||||
"intent_classifier": "thinker",
|
|
||||||
"recall_model": "json_compiler",
|
|
||||||
"critic_retry_limit": 2
|
|
||||||
}
|
|
||||||
|
|
@ -0,0 +1,11 @@
|
||||||
|
services:
|
||||||
|
qdrant:
|
||||||
|
image: qdrant/qdrant:latest
|
||||||
|
ports:
|
||||||
|
- "6333:6333"
|
||||||
|
- "6334:6334"
|
||||||
|
volumes:
|
||||||
|
- qdrant_storage:/qdrant/storage
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
qdrant_storage:
|
||||||
|
|
@ -0,0 +1,5 @@
|
||||||
|
# Architecture
|
||||||
|
|
||||||
|
DuckLM is organized as WebChat and FastAPI over Duck Core. Duck Core owns task state, context building, model calls, events, tools, approvals, skills, experience, and memory adapters.
|
||||||
|
|
||||||
|
The first vertical slice is WebChat -> FastAPI -> RuntimeLoop -> ModelClient -> llama-server -> SQLite event timeline.
|
||||||
|
|
@ -0,0 +1,9 @@
|
||||||
|
# Experience Learning
|
||||||
|
|
||||||
|
Experience records are stored in SQLite. Suggested skill updates are written to `skills/_proposals/` and are not applied automatically.
|
||||||
|
|
||||||
|
Use:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl http://127.0.0.1:8000/v1/experience
|
||||||
|
```
|
||||||
|
|
@ -0,0 +1,71 @@
|
||||||
|
# How To Run
|
||||||
|
|
||||||
|
1. Install dependencies:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 -m venv .venv
|
||||||
|
. .venv/bin/activate
|
||||||
|
python -m pip install -e ".[dev]"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Configure:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp .env.example .env
|
||||||
|
```
|
||||||
|
|
||||||
|
The default `DUCK_MAIN_MODEL_PATH` points to `./models/Qwen3.6/nonMTP/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf`.
|
||||||
|
|
||||||
|
3. Start `llama-server`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash scripts/llama/start_main.sh start
|
||||||
|
```
|
||||||
|
|
||||||
|
Useful process commands:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash scripts/llama/start_main.sh status
|
||||||
|
bash scripts/llama/start_main.sh logs --follow
|
||||||
|
bash scripts/llama/start_main.sh restart
|
||||||
|
bash scripts/llama/start_main.sh stop
|
||||||
|
```
|
||||||
|
|
||||||
|
4. Start DuckLM API:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m duck_core.api
|
||||||
|
```
|
||||||
|
|
||||||
|
5. Open WebChat:
|
||||||
|
|
||||||
|
```text
|
||||||
|
http://127.0.0.1:8000/
|
||||||
|
```
|
||||||
|
|
||||||
|
6. Send a task:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST http://127.0.0.1:8000/v1/chat \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"message":"Скажи коротко, что ты DuckLM","workspace":"./workspace","debug":true}'
|
||||||
|
```
|
||||||
|
|
||||||
|
7. Inspect events:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl http://127.0.0.1:8000/v1/tasks/<task_id>/events
|
||||||
|
```
|
||||||
|
|
||||||
|
8. Approvals:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl http://127.0.0.1:8000/v1/approvals/pending
|
||||||
|
```
|
||||||
|
|
||||||
|
9. Stop services:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash scripts/llama/start_main.sh stop
|
||||||
|
docker compose -f docker-compose.memory.yml down
|
||||||
|
```
|
||||||
|
|
@ -0,0 +1,15 @@
|
||||||
|
# How To Test
|
||||||
|
|
||||||
|
Run smoke tests:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m pytest tests/smoke -v
|
||||||
|
```
|
||||||
|
|
||||||
|
Run verification scripts against a running API:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash scripts/verify/verify_basic_chat.sh
|
||||||
|
bash scripts/verify/verify_models_roles.sh
|
||||||
|
bash scripts/verify/verify_tool_blocking.sh
|
||||||
|
```
|
||||||
|
|
@ -0,0 +1,44 @@
|
||||||
|
# Local Llama Server
|
||||||
|
|
||||||
|
DuckLM expects an OpenAI-compatible `llama-server` at `http://127.0.0.1:8081/v1` by default.
|
||||||
|
|
||||||
|
On the current Radeon RX580 system, `llama.cpp` is built locally with Vulkan:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash scripts/llama/build_vulkan.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
The main model is Qwen3.6 35B A3B nonMTP:
|
||||||
|
|
||||||
|
```text
|
||||||
|
models/Qwen3.6/nonMTP/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf
|
||||||
|
```
|
||||||
|
|
||||||
|
Start it in the background with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash scripts/llama/start_main.sh start
|
||||||
|
```
|
||||||
|
|
||||||
|
Manage the process:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash scripts/llama/start_main.sh status
|
||||||
|
bash scripts/llama/start_main.sh logs
|
||||||
|
bash scripts/llama/start_main.sh logs --follow
|
||||||
|
bash scripts/llama/start_main.sh restart
|
||||||
|
bash scripts/llama/start_main.sh stop
|
||||||
|
```
|
||||||
|
|
||||||
|
The local `.env` uses:
|
||||||
|
|
||||||
|
```env
|
||||||
|
DUCK_LLAMA_SERVER_BIN=./vendor/llama.cpp/build/bin/llama-server
|
||||||
|
DUCK_CTX_SIZE=4096
|
||||||
|
DUCK_N_GPU_LAYERS=20
|
||||||
|
DUCK_PARALLEL=1
|
||||||
|
DUCK_LLAMA_DEVICE=Vulkan0
|
||||||
|
DUCK_LLAMA_EXTRA_ARGS="--reasoning off --cache-ram 0"
|
||||||
|
```
|
||||||
|
|
||||||
|
MTP is available only through `scripts/llama/start_thinker_mtp_experimental.sh` and is not used by the action JSON endpoint by default.
|
||||||
|
|
@ -0,0 +1,5 @@
|
||||||
|
# Memory Architecture
|
||||||
|
|
||||||
|
Semantic memory uses Qdrant as the vector store. Embeddings come from `/v1/embeddings` when the model backend supports it.
|
||||||
|
|
||||||
|
If embeddings are unavailable, `VectorMemory` fails explicitly with `EmbeddingsUnavailableError`; it does not invent a local embedding algorithm.
|
||||||
|
|
@ -0,0 +1,7 @@
|
||||||
|
# Model Roles
|
||||||
|
|
||||||
|
Roles are logical, not physical. `thinker`, `critic`, `coder`, `action`, and `summary` may all point to the same model.
|
||||||
|
|
||||||
|
Each role can differ by prompt, temperature, output limit, response format, schema, memory scope, and endpoint. Request-level parameters can change per call. Backend-level parameters such as GGUF path, context size, GPU offload, MTP, and server port require the backend to be started with the desired settings.
|
||||||
|
|
||||||
|
See `config/models.yaml` for one model mapped to all roles.
|
||||||
|
|
@ -0,0 +1,5 @@
|
||||||
|
# Performance And MTP
|
||||||
|
|
||||||
|
MTP/speculative decoding is an inference backend concern. DuckLM keeps action JSON on the normal endpoint by default.
|
||||||
|
|
||||||
|
Use `scripts/llama/start_thinker_mtp_experimental.sh` only for experiments. Benchmark scaffolding is in `scripts/bench/bench_runtime.py`.
|
||||||
|
|
@ -1,24 +0,0 @@
|
||||||
# UI Bootstrap And Review Flow Plan
|
|
||||||
|
|
||||||
## Goal
|
|
||||||
|
|
||||||
Move the web chat UI to Bootstrap 5.3 with Bootswatch themes and improve review/password/terminal-output ergonomics.
|
|
||||||
|
|
||||||
## Required Changes
|
|
||||||
|
|
||||||
- Replace the current hand-written visual system in `app/api/static/index.html` with Bootstrap 5.3 layout/components.
|
|
||||||
- Add Bootswatch theme support with a visible theme selector and persistent localStorage choice.
|
|
||||||
- Password/secret input must submit on Enter as well as the "Отправить" button.
|
|
||||||
- Console/tool output must render inside a collapsed Bootstrap accordion item.
|
|
||||||
- The accordion body must contain terminal-style output inside `<pre></pre>`.
|
|
||||||
- The terminal accordion must expand only when the user clicks it.
|
|
||||||
- Review UI must show critic/system assessment and user voting buttons:
|
|
||||||
- `Ошибочное действие`
|
|
||||||
- `Всё верно`
|
|
||||||
- optional correction/comment text.
|
|
||||||
|
|
||||||
## Notes
|
|
||||||
|
|
||||||
- Keep runtime event handling WebSocket-driven.
|
|
||||||
- Do not mix console output with assistant prose.
|
|
||||||
- Keep raw tool output available for debugging, but collapsed by default.
|
|
||||||
|
|
@ -0,0 +1,9 @@
|
||||||
|
# Skills
|
||||||
|
|
||||||
|
Skills are procedural memory, not hardcoded routing. The first skill is `analyze_project`, loaded from `skills/analyze_project/skill.yaml`.
|
||||||
|
|
||||||
|
Use:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl http://127.0.0.1:8000/v1/skills
|
||||||
|
```
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue