Import ducklm runtime

This commit is contained in:
mirivlad 2026-05-10 23:37:56 +08:00
parent fd1a045488
commit dc8267880a
90 changed files with 9171 additions and 0 deletions

1
.gitignore vendored
View File

@ -29,6 +29,7 @@ config/.env.*
# Runtime state # Runtime state
data/**/*.sqlite3 data/**/*.sqlite3
data/**/*.sqlite3-* data/**/*.sqlite3-*
data/runtime/*.json
data/runtime/*.pid data/runtime/*.pid
data/runtime/*.log data/runtime/*.log
logs/ logs/

347
ARCHITECTURE.md Normal file
View File

@ -0,0 +1,347 @@
# ARCHITECTURE
Этот документ фиксирует целевую архитектуру `ducklm` как локального event-driven multi-model execution runtime.
`TASK_3.md` — это директива для ИИ-кодера.
`ARCHITECTURE.md` — это короткая инженерная карта системы: что является ядром, какие есть слои, как течёт управление, где принимаются решения, а где только исполняются переходы.
## 1. Core Principle
Система строится вокруг `Runtime Loop Controller`.
Центр системы:
- не `router`
- не `orchestrator`
- не `execution engine`
Центр системы:
- `runtime loop`
Именно он замыкает жизненный цикл задачи:
```text
task
-> state load/create
-> context build
-> orchestration decision
-> plan/directive
-> execution
-> critic
-> memory policy
-> checkpoint
-> next step / complete / fail
```
## 2. Layer Model
Целевая форма системы:
```text
Client / CLI / API
|
v
Runtime Loop Controller
|
+--> State Store / Checkpoints
+--> Context Builder
+--> Router
+--> Orchestrator / Planner
+--> Execution Engine / Scheduler
| |
| +--> Tool Layer
| +--> Coder
|
+--> Critic
+--> Memory Write Policy
+--> Memory Store + Vector Index
+--> Event Bus + Event Store
+--> Streaming Projection
```
Принцип:
- `runtime loop` координирует
- `router` рекомендует
- `orchestrator` думает
- `execution engine` исполняет
- `tools/coder` делают работу
- `critic` оценивает
- `memory policy` решает запись
- `event bus` фиксирует историю
- `state store` даёт resume
## 3. Responsibility Boundaries
### Runtime Loop Controller
Отвечает за:
- task lifecycle
- state transitions
- вызов компонентов в правильном порядке
- применение decision objects
- checkpointing
- completion / failure path
Не отвечает за:
- policy reasoning
- raw tool execution
- prompt assembly inline
### Router
Это `policy evaluator + decision suggester`.
Контракт:
```text
(input state + assembled context) -> ExecutionDirective
```
Свойства:
- pure function
- no side effects
- no tool execution
- no state mutation
### Orchestrator / Planner
Отвечает за:
- orchestration reasoning
- deciding whether planning is needed
- generating plan JSON
- returning structured directives
Не отвечает за:
- execution
- direct state mutation
- tool invocation
### Execution Engine / Scheduler
Отвечает за:
- step scheduling
- task graph traversal
- step execution coordination
- calling tool/coder adapters
- reporting structured results
Не отвечает за:
- ownership of global lifecycle
- high-level policy
### Critic
Отвечает за:
- evaluation of tool/coder outputs
- returning structured scores and explanation
Не отвечает за:
- final memory write decision
- execution retry policy
### Memory Write Policy
Отвечает за:
- deterministic decision about storing memory
- dedup / merge / skip behavior
Не отвечает за:
- semantic retrieval
- critic scoring
## 4. Decision Model
Все decision-producing components должны возвращать структурированные объекты.
Базовый контракт:
```json
{
"type": "plan|tool|coder|respond|replan|store_memory|request_permission|complete|fail|noop",
"payload": {},
"requires_permission": false,
"confidence": 0.0,
"reason": "string"
}
```
Это главный антихаосный инвариант системы.
Следствие:
- компоненты не исполняют решения напрямую
- компоненты не мутируют state напрямую
- runtime loop применяет решения и переводит систему дальше
## 5. Execution Flow
Нормальный путь выполнения:
1. Клиент отправляет task.
2. Runtime loop создаёт или загружает task state.
3. Публикуется `task_received`.
4. Context builder собирает execution context.
5. Router возвращает decision object.
6. Orchestrator возвращает direct action или plan.
7. План валидируется и преобразуется в task graph.
8. Execution engine выбирает следующий шаг.
9. Tool или coder исполняет шаг через adapter.
10. Result возвращается в runtime loop.
11. Critic возвращает evaluation suggestion.
12. Memory policy возвращает decision по записи.
13. State checkpoint сохраняется.
14. Event bus фиксирует события.
15. Runtime loop выбирает `continue / replan / complete / fail`.
## 6. Task Graph Model
Внешний planner может вернуть список шагов.
Внутри runtime план должен жить как task graph:
```json
{
"nodes": [
{
"id": "step-1",
"kind": "tool",
"tool": "shell_exec",
"args": {"command": "hostnamectl"},
"depends_on": []
}
]
}
```
Сейчас допускается sequential DAG execution.
В будущем это даёт путь к parallel scheduling без переписывания модели.
## 7. Event Backbone
Система event-driven.
`EventBus` нужен не только для стриминга, а как внутренняя хребтовая шина.
Минимальные свойства:
- ordering per task
- monotonic sequence per task
- durable append to event store
- replay capability
- consumer idempotency
Минимальная модель доставки:
- `at least once`
Правило идемпотентности:
- событие дедуплицируется по `task_id + sequence`
Streaming layer — это projection от event bus, а не источник правды.
## 8. State Persistence
Так как runtime задуман как long-running autonomous system, in-memory lifecycle недостаточен.
Нужны:
- task state store
- checkpoint store
- resume from crash/restart
Минимальная стратегия:
- checkpoint after critical transitions
- latest valid checkpoint is resumable
Primary choice для MVP:
- `SQLite`
## 9. Async and Isolation
LLM loop не должен блокироваться долгими tool operations.
Поэтому нужны:
- async execution adapters
- timeout wrappers
- cancellation handling
- bounded concurrency
Для опасных или тяжёлых операций нужен отдельный sandbox layer.
Особенно для:
- `shell_exec`
- browser/web fallback
- generated helper scripts
## 10. Memory Architecture
Memory — отдельная подсистема хранения, а не JSON dump.
Рекомендуемая форма:
- metadata store: `SQLite`
- vector index: `FAISS` или `hnswlib`
Два разных процесса:
- retrieval
- write decision
Это специально разделено.
`critic` только оценивает.
`memory write policy` принимает финальное решение.
Минимальная логика записи должна быть детерминированной:
```text
(critic_score + memory_type + runtime_weight + dedup_state + safety_state) -> decision
```
## 11. Failure Model
Система должна быть устойчивой к частичным сбоям.
Ожидаемые controlled failure paths:
- invalid planner output -> replan or fail
- tool timeout -> retry or fail
- critic failure -> fallback policy
- memory failure -> skip write and continue where safe
- streaming failure -> sync fallback
Главный принцип:
- subsystem failure не должен автоматически означать runtime collapse
## 12. Why This Shape
Эта архитектура нужна, чтобы система не деградировала в один из плохих вариантов:
- `router-god-object`
- `runtime loop with hidden policy logic`
- `LLM that directly executes tools`
- `streaming instead of event model`
- `critic as memory authority`
- `in-memory only autonomous runtime`
Если держать эти границы жёстко, проект остаётся расширяемым.
Если границы размыть, система быстро превратится в трудноотлаживаемый procedural agent.

249
FOR_AI_REVIEW.md Normal file
View File

@ -0,0 +1,249 @@
# DuckLM Runtime Architecture Review
## 🧠 1. System Overview
**What is runtime?**
Runtime is the execution substrate of the system — a multi-layered cognitive execution environment that orchestrates LLMs, tools, memory, and permissions into a unified agentic workflow. It's the `RuntimeController` that composes `RuntimeLoop`, `ExecutionEngine`, `ContextBuilder`, `AsyncRouter`, `PermissionService`, and `EventBus`.
**What is the core loop?**
The core loop is the `RuntimeLoop.run_task()` method: it receives a `UserTask`, applies permission hard-stop checks, creates task state, builds context via `ContextBuilder`, routes via `AsyncRouter` to get a `directive`, executes via `ExecutionEngine`, applies `Critic` evaluation, saves via `MemoryPolicy`, publishes `RuntimeEvent`s through `EventBus`, and returns streaming output.
**Models (Orchestrator / Coder / Critic / Utility)**
- **Orchestrator** (`OrchestratorAdapter`/`AsyncOrchestratorAdapter`): LLM that decides plan vs direct respond vs tool; generates `ExecutionDirective` of type `plan`, `tool`, `respond`, `fail`, etc.
- **Coder** (`CoderAdapter`/`AsyncCoderAdapter`): LLM specialized for code generation and manipulation.
- **Critic** (`CriticAdapter`/`AsyncCriticAdapter`): Evaluates tool outputs with JSON scoring (correctness, usefulness, safety, memory_store, weight).
- **Utility**: The `sys_util` orchestrator — a fallback/orchestration layer for system-level operations.
**What is "truth"? (Event Store / State Store)**
- **Event Store** (`SQLiteEventStore`): Immutable append-only log of `RuntimeEvent`s per task. Source of truth for "what happened."
- **State Store** (`SQLiteTaskStateStore`): Current mutable task state (status, last_directive, pending requests). "Current truth" of task progress.
- **Checkpoint Store** (`SQLiteCheckpointStore`): Snapshots of task state + context at milestones.
- **Memory Store** (`MemoryStore` + `VectorIndex`): Long-term knowledge base with weighted entries.
---
## 🔁 2. End-to-End Flow
### High-Level Flow (as seen in logs)
```
User Input
→ Router (AsyncRouter.decide)
→ Context Builder (ContextBuilder.build)
→ Orchestrator (decides plan vs direct)
→ Plan / Direct Action
→ Execution Engine
→ Tool Layer (ToolRegistry + ToolSandbox)
→ Critic (AsyncCriticAdapter)
→ Memory Policy (MemoryWritePolicy)
→ Event Bus (SQLiteEventStore)
→ Streaming Output (via WebSocket / SSE)
```
### Conversation Flow
1. **Router** decides `plan` vs `respond` vs `tool` vs `fail` based on orchestrator output or intent parser.
2. **Context Builder** enriches task with memory context, tool context, execution context, and safety constraints.
3. **Orchestrator** (or direct respond) produces the initial `ExecutionDirective`.
4. **Execution Engine** schedules via `ExecutionScheduler`, then executes:
- `plan` → parse into `PlanStep`s, build task graph, execute ready steps
- `tool` → validate tool existence, check permissions, execute via `ToolRegistry`
- `respond` → direct response
- `fail` → immediate failure
5. **Tool Layer** (`ToolRegistry` + `ToolSandbox`):
- Plugin discovery via `ToolDiscovery`
- Manifest-based tool registration
- Sandboxed execution with timeout
6. **Critic** evaluates tool results (if enabled), outputs `CriticScore` JSON.
7. **Memory Policy** decides whether to insert `tool_result`, `critique`, `plan`, `fact`, `summary`, or `user_preference` into memory.
8. **Event Bus** (`SQLiteEventBus`) publishes `RuntimeEvent` with sequence ordering.
9. **Streaming Output** replays events via WebSocket and sends incremental responses.
### Failure Flow
- **Invalid JSON flow**: `ExecutionScheduler.parse_plan_steps` catches `JSONDecodeError` / `ValueError` / `TypeError`, logs warning, returns empty steps → `plan` fails with "Failed to parse plan steps."
- **Tool failure flow**: Tool execution returns `{"status": "failed", "result": {"error": "..."}}` → ExecutionEngine returns failed status → task state updated → event `TASK_FAILED` published → stops further plan steps.
- **Critic failure flow**: `_evaluate_with_critic` catches exception, logs warning, publishes `CRITIC_RESULT` with error → critic_score is `None` → execution continues without critique.
- **Orchestrator fallback flow**: If primary orchestrator fails or missing, `AsyncRouter` has `sys_util` fallback (utility orchestrator) for system-level decisions.
- **Permission denial flow**: `PermissionService.check_shell_command` / `check_write_path` returns `decision: "hard_stop"` or `decision: "deny"` → immediate failure with blocked reason; if `decision: "prompt"``TASK_AWAITING_PERMISSION` state.
### Repair Flow (JSON / Tool-call)
- Repair is triggered via `resolve_permission` or `resolve_secret` endpoints.
- Permission repair: user provides `decision` ("allow_once"/"allow_always"/"deny"/"ask_always") → `PermissionService.resolve_permission` → updates state → retries original directive.
- Secret repair: user provides secret string → `ExecutionEngine.execute` with `secret_override` → continues execution.
---
## ⚙️ 3. Component Breakdown
### `runtime_loop` (`RuntimeLoop`)
- **Responsibility**: Central task coordination; state management; event publishing.
- **Input**: `UserTask`
- **Output**: `{"task_id", "status", "directive", "result", "events"}`
- **Must NOT do**: Direct LLM calls (delegates to router/execution_engine); bypass state store.
### `execution_engine` (`ExecutionEngine`)
- **Responsibility**: Execute directives (plan/tool/respond/fail); integrate critic; interface with tool registry.
- **Input**: `UserTask`, `ExecutionDirective`, optional `permission_override`, `secret_override`
- **Output**: `{"status", "result", "step_results"}`
- **Must NOT do**: Bypass permission checks; skip critic evaluation when enabled; leak secrets in logs.
### `scheduler` (`ExecutionScheduler`)
- **Responsibility**: Parse plan JSON, build task dependency graph, yield ready steps, detect cycles.
- **Input**: JSON plan string, `task_id`
- **Output**: `list[PlanStep]`
- **Must NOT do**: Execute anything; modify task state directly.
### `tool_registry` (`ToolRegistry`)
- **Responsibility**: Register/manifest tools; execute via `ToolSandbox`; provide schema metadata.
- **Input**: tool name, args dict
- **Output**: `ToolResult`
- **Must NOT do**: Bypass sandbox; execute privileged host commands without sandbox.
### `event_bus` (`EventBus` → `SQLiteEventStore`)
- **Responsibility**: Append-only event persistence; sequence numbering; per-task query.
- **Input**: `RuntimeEvent`
- **Output**: event stream
- **Must NOT do**: Modify state store directly (state is separate); delete or mutate events.
### `memory` (`MemoryInterface` → `MemoryStore` + `VectorIndex`)
- **Responsibility**: Store/retrieve weighted memory entries; vector similarity search; integrate with context builder.
- **Input**: text, kind, source, weight, metadata
- **Output**: search results or insertion confirmation
- **Must NOT do**: Expose raw embeddings without access control; store secrets.
---
## 🧩 4. Data Contracts
### `PlanStep`
```python
id: str
kind: Literal["tool", "coder", "memory", "respond"]
tool: str | None
args: dict[str, Any]
description: str
requires_confirmation: bool
depends_on: list[str]
```
**Real example** (from `router` prompt engineering):
`{"id":"step-0","kind":"tool","tool":"shell_exec","args":{"command":"ls -la"},"description":"List directory","requires_confirmation":false,"depends_on":[]}`
### `ToolCall`
```python
tool: str
args: dict[str, Any]
task_id: str
step_id: str
```
**Real log**: `TOOL_CALLED` event with `{"tool":"shell_exec","args":{"command":"pwd"},"task_id":"xyz","step_id":"step-0"}`
### `ToolResult`
```python
tool: str
ok: bool
output: Any
error: str | None
metadata: dict[str, Any]
```
**Real output**: `{"tool":"shell_exec","ok":true,"output":"/app","error":null,"metadata":{}}`
### `RuntimeEvent`
```python
event_id: str
task_id: str
session_id: str
sequence: int
type: str # e.g. TASK_RECEIVED, TOOL_CALLED, TASK_COMPLETED
payload: dict[str, Any]
causation_id: str | None
correlation_id: str
```
**Real event stream**: `TASK_RECEIVED → CONTEXT_BUILT → PLAN_STARTED → TOOL_CALLED → TOOL_COMPLETED → TASK_COMPLETED`
### `MemoryEntry`
```python
id: str
text: str
kind: Literal["tool_result","plan","critique","fact","summary","user_preference"]
source: Literal["tool","critic","user","system"]
weight: float
task_id: str | None
session_id: str | None
metadata: dict[str, Any]
embedding_model: str
embedding_dim: int
```
**Real insertion**: After critic evaluation, `kind="critique"`, `source="critic"`, `weight=0.85`, metadata includes scores.
---
## 🔥 5. Failure Modes
### Invalid JSON Flow
- **Trigger**: Malformed plan JSON (e.g., missing braces, non-JSON string).
- **Detection**: `parse_plan_steps` catches `JSONDecodeError` / `ValueError` / `TypeError`.
- **Result**: Warning logged, empty steps returned → `PLAN_FAILED` with `"Failed to parse plan steps from directive"`.
### Tool Failure Flow
- **Trigger**: Tool returns `ok=False` or raises exception in sandbox.
- **Detection**: `_execute_tool` checks `tool_result.ok`.
- **Result**: Status `"failed"`, result contains `{"error": "...", "failed_step": step.id, "step_results": [...]}``TASK_FAILED` event; further plan steps skipped.
### Critic Failure Flow
- **Trigger**: Critic adapter raises exception or returns non-JSON output.
- **Detection**: `_evaluate_with_critic` catches exception, logs warning.
- **Result**: Event `CRITIC_RESULT` with error payload → `critic_score = None` → execution continues without critique; memory write skipped.
### Orchestrator Fallback Flow
- **Trigger**: Primary orchestrator model unavailable or returns invalid directive.
- **Detection**: `_ensure_orchestrator` returns `None`; router falls back to `sys_util` orchestrator.
- **Result**: Utility orchestrator handles system-level decisions (e.g., file operations, environment queries).
### Permission Denial Flow
- **Trigger**: `PermissionService` returns `decision: "hard_stop"` or `"deny"`.
- **Detection**: `_execute_tool` checks `permission_result`.
- **Result**: Immediate failure with `"Command blocked: ..."``TASK_FAILED`; no tool execution.
---
## 🧠 6. "Decision Logic Map"
### Orchestrator vs Direct Respond
- **Use orchestrator** when: task requires planning, multi-step tool usage, or unknown intent. Orchestrator decides to emit `plan` or `tool` directive.
- **Direct respond** when: intent parser classifies as simple query (`TASK_RECEIVED` → `router.intent_parser``respond` directive) or `respond` directive explicitly set.
### Utility Model Call
- Invoked when `sys_util` orchestrator is loaded (configurable). Used for system-level operations: environment inspection, file system queries, or when primary orchestrator fails and fallback is needed.
### Retry Logic
- **Planner retry**: `ExecutionScheduler` has `retry_limit=2`; on parse/validation failure, retries up to limit before failing plan.
- **Tool retry**: Not implemented natively; retry must be encoded in plan steps (`depends_on`, manual replan).
### Plan Creation
- **Trigger**: Orchestrator output contains `{type: "plan", ...}` or explicit `plan` directive.
- **Process**: `parse_plan_steps``validate_no_cycles``build_task_graph` → ready steps execution.
- **No plan**: Orchestrator outputs `respond` or `tool` → direct execution.
---
## 🧰 7. Tool System Architecture
### Plugin Discovery
- `ToolDiscovery` scans `app/tools/plugins/` for modules exporting `Tool` classes.
- Discovers: `shell_exec`, `file_read`, `file_write`, `memory` (search/insert/list).
### Manifest-Based Tools
- Each plugin has a `manifest.json` with:
- `description`: human-readable docstring.
- `args_schema`: JSON schema for validation.
- `requires_permission`: boolean for privileged tools (`shell_exec`, `file_write`).
- On discovery, registry registers tool and stores schema for permission/routing.
### Registry Bootstrap
- `RuntimeController._create_tool_registry()` initializes discovery, loads plugins, registers with init mapping (sandbox, permissions).
- Tools are initialized once at startup; `tool_registry` is shared across executions.
### Execution Isolation
- **ToolSandbox** (`ToolSandbox`):
- Restricts filesystem to `allowed_root` (project base dir).
- Timeout per execution (`step_timeout_ms`).
- Blocks `sudo` without secret override; requires secret injection for sudo commands.
- **Permission gating**: `shell_exec` and `file_write` require explicit permission decision before execution.

534
IMPLEMENTATION_PLAN.md Normal file
View File

@ -0,0 +1,534 @@
# IMPLEMENTATION PLAN
Этот документ описывает рекомендуемый порядок реализации `ducklm` от пустого репозитория до рабочего локального runtime с тестовым веб-чатом.
План опирается на [`TASK_3.md`](/home/mirivlad/git/ducklm/TASK_3.md) и [`ARCHITECTURE.md`](/home/mirivlad/git/ducklm/ARCHITECTURE.md).
## 1. Goal
Собрать систему по этапам так, чтобы после каждого этапа оставался рабочий, проверяемый инкремент, а не набор недоделанных слоёв.
Главный принцип:
- сначала каркас и контракты
- потом runtime core
- потом execution path
- потом memory / critic / recovery
- потом удобные интерфейсы проверки
## 2. Milestones Overview
1. Project skeleton and typed contracts
2. Config system and dependency wiring
3. Runtime loop skeleton
4. Event bus and event store
5. State persistence and checkpointing
6. Context builder and orchestrator adapter
7. Router and directive flow
8. Execution engine and task graph
9. Permission system and tool sandbox
10. MVP tools
11. FastAPI API and health surface
12. Web chat test client
13. Coder integration
14. Critic integration
15. Memory system
16. Memory write policy
17. Retry, recovery, replay
18. CLI and operator utilities
19. Hardening and tests
## 3. Detailed Stages
### Stage 1. Project Skeleton and Typed Contracts
Цель:
- создать структуру директорий
- завести базовые модели данных
- убрать двусмысленность интерфейсов между слоями
Сделать:
- создать `app/`, `config/`, `data/`, `tests/`
- добавить core contracts:
- `UserTask`
- `PlanStep`
- `ToolCall`
- `ToolResult`
- `CriticScore`
- `RuntimeEvent`
- `TaskCheckpoint`
- `ExecutionDirective`
Результат этапа:
- проект компилируется
- типы и схемы являются source of truth для остальных модулей
Проверка:
- unit tests на валидацию схем
### Stage 2. Config System and Dependency Wiring
Цель:
- вынести runtime behavior в конфиги
- зафиксировать единый способ загрузки настроек
Сделать:
- `config/models.json`
- `config/prompts.json`
- `config/permissions.json`
- `config/runtime.json`
- loader и typed config models
Результат этапа:
- runtime можно запускать с консистентной конфигурацией
Проверка:
- config load smoke test
### Stage 3. Runtime Loop Skeleton
Цель:
- создать heart of system без полной бизнес-логики
Сделать:
- `runtime_loop.py`
- `runtime_controller.py`
- минимальный lifecycle:
- receive task
- create state
- build empty context
- emit initial event
- return placeholder directive/result
Результат этапа:
- есть центральный control loop
- остальные слои начинают подстраиваться под него, а не наоборот
Проверка:
- smoke test на прохождение задачи через loop skeleton
### Stage 4. Event Bus and Event Store
Цель:
- создать внутреннюю event backbone
Сделать:
- `event_bus.py`
- `event_types.py`
- `event_store.py`
- monotonic sequence per task
- append-only storage
- базовый replay reader
Результат этапа:
- у каждой задачи есть воспроизводимая хронология
Проверка:
- event ordering tests
- dedup/idempotency tests
### Stage 5. State Persistence and Checkpointing
Цель:
- убрать зависимость task lifecycle от памяти процесса
Сделать:
- `task_state_store.py`
- `checkpoint_store.py`
- SQLite backend
- checkpoint after critical transitions
- resume loading primitives
Результат этапа:
- runtime готов к recovery после падения
Проверка:
- save/load checkpoint tests
### Stage 6. Context Builder and Orchestrator Adapter
Цель:
- зафиксировать правильный вход в reasoning path
Сделать:
- `context_builder.py`
- token-budget-aware assembly
- orchestrator adapter abstraction
- planning mode / orchestration mode interfaces
Результат этапа:
- все будущие вызовы reasoning model идут через один нормализованный путь
Проверка:
- tests на context assembly priorities
### Stage 7. Router and Directive Flow
Цель:
- зафиксировать router как pure decision layer
Сделать:
- `router.py`
- `state + context -> ExecutionDirective`
- no side effects
- routing rules for:
- retrieval needed
- planning needed
- permission needed
- critic needed
Результат этапа:
- runtime loop применяет решения, а не изобретает их сам
Проверка:
- unit tests на routing decisions
### Stage 8. Execution Engine and Task Graph
Цель:
- получить управляемое исполнение шагов, а не “вызовы по месту”
Сделать:
- `execution_engine.py`
- `execution_scheduler.py`
- task graph validation
- sequential DAG scheduler
- adapters for tool/coder execution
Результат этапа:
- runtime может исполнять direct action и multi-step plans
Проверка:
- task graph validation tests
- step ordering tests
### Stage 9. Permission System and Tool Sandbox
Цель:
- не дать runtime выполнять опасные действия напрямую
Сделать:
- permission rules
- persistent approval store
- shell safety classifier
- sandbox execution adapter
- timeout/resource/path restrictions
Результат этапа:
- опасные команды требуют policy decision до запуска
Проверка:
- permission flow tests
- sandbox boundary smoke tests
### Stage 10. MVP Tools
Цель:
- сделать минимально полезный execution path
Сделать:
- `shell_exec`
- `file_read`
- `file_write`
- unified tool registry
- unified `ToolResult`
Результат этапа:
- runtime уже может выполнять реальные локальные задачи
Проверка:
- integration tests для трёх базовых tools
### Stage 11. FastAPI API and Health Surface
Цель:
- открыть runtime наружу через стабильный backend interface
Сделать:
- `POST /chat`
- `WS /stream`
- `GET /health`
- базовый request/response models
- error handling
Результат этапа:
- систему уже можно дергать из внешнего клиента
Проверка:
- API smoke tests
### Stage 12. Web Chat Test Client
Цель:
- получить быстрый способ руками проверить поведение всей системы через браузер
Сделать:
- минимальный локальный веб-чат
- простую страницу с:
- вводом задачи
- окном сообщений
- панелью streaming events
- индикацией permission requests
- отображением final result
- подключение к `POST /chat` и `WS /stream`
Требования:
- это не production UI
- это не отдельный продуктовый frontend
- это thin test client для ручной проверки runtime
Лучше всего разместить как:
- `app/api/static/` или отдельный `web/` модуль с минимальным стеком
Результат этапа:
- можно открыть браузер и увидеть, как runtime планирует, исполняет шаги и стримит события
Проверка:
- ручной e2e smoke test через браузер
### Stage 13. Coder Integration
Цель:
- подключить отдельную coding model без смешивания ролей
Сделать:
- `core/coder.py`
- `generate_code`
- `fix_code`
- `refactor_code`
- structured coder result
Результат этапа:
- runtime может делегировать кодогенерацию специализированной модели
Проверка:
- tests на coder request/response flow
### Stage 14. Critic Integration
Цель:
- получить formal evaluation layer после tools/coder
Сделать:
- critic adapter
- `CriticScore`
- fallback policy when critic unavailable
Результат этапа:
- результаты можно оценивать единообразно
Проверка:
- critic scoring contract tests
### Stage 15. Memory System
Цель:
- добавить долговременную retrieval memory
Сделать:
- SQLite metadata store
- FAISS/hnswlib vector index
- insert/search/delete/reindex
- embedding versioning
Результат этапа:
- runtime получает semantic retrieval вместо контекста “только текущая задача”
Проверка:
- memory insert/search tests
### Stage 16. Memory Write Policy
Цель:
- не допустить хаотичной записи всего подряд
Сделать:
- deterministic write policy
- threshold model
- dedup / merge rules
- conflict handling
Результат этапа:
- память пополняется контролируемо, а не по одному score cutoff
Проверка:
- memory policy decision tests
### Stage 17. Retry, Recovery, Replay
Цель:
- довести runtime до устойчивого long-running поведения
Сделать:
- planner retry
- tool retry for allowed cases
- partial failure recovery
- replay path from event store
- resume from checkpoint
Результат этапа:
- система может переживать ошибки без полной потери исполнения
Проверка:
- recovery smoke tests
- replay tests
### Stage 18. CLI and Operator Utilities
Цель:
- дать локальный интерфейс помимо API/веб-чата
Сделать:
- send task
- show result
- follow events
- memory search
- replay task history
Результат этапа:
- разработчик может проверять runtime без браузера
Проверка:
- CLI smoke tests
### Stage 19. Hardening and Tests
Цель:
- довести проект до инженерно приемлемого состояния
Сделать:
- structured logging refinement
- failure-path tests
- concurrency edge cases
- docs refresh
- cleanup of temporary stubs
Результат этапа:
- проект становится пригодным для реальной итеративной разработки
Проверка:
- full critical-path smoke suite
## 4. Recommended First Working Demo
Первый нормальный demo checkpoint должен быть на этапе `Stage 12`.
Что должно работать к этому моменту:
- браузерный веб-чат открывается локально
- пользователь отправляет задачу
- runtime принимает task
- событие начала работы видно в UI
- если нужен plan, это видно в events panel
- tool execution видно в events panel
- final response возвращается в чат
На этом этапе memory, critic и recovery ещё могут быть частично stubbed, но:
- runtime loop
- event bus
- state persistence
- router
- execution engine
- permissions
- базовые tools
- API
- web chat
должны быть уже реальными.
## 5. Order Rationale
Почему веб-чат не в самом конце:
- он нужен как live inspection surface для runtime
- через него проще проверять streaming, permissions и event ordering
- он быстрее выявляет архитектурные проблемы, чем голые unit tests
Но веб-чат ставится только после:
- runtime core
- event bus
- persistence
- basic execution path
- API
Иначе он станет красивой оболочкой над несуществующей системой.

83
MVP_CHECKLIST.md Normal file
View File

@ -0,0 +1,83 @@
# MVP CHECKLIST
Этот чеклист фиксирует минимальный рабочий объём для первого демонстрационного запуска `ducklm`.
## 1. Core Runtime
- [x] Есть модульная структура проекта `app/`, `config/`, `data/`, `tests/`
- [x] Есть typed contracts для core entities
- [x] Есть `Runtime Loop Controller`
- [x] Runtime loop умеет принять задачу и создать task state
- [x] Runtime loop публикует стартовые и финальные события
## 2. Events and State
- [x] Есть `EventBus`
- [x] Есть `EventStore`
- [x] События имеют `task_id + sequence`
- [x] Есть `TaskStateStore`
- [x] Есть `CheckpointStore`
- [x] Есть сохранение checkpoint после critical transitions
- [ ] Есть базовый resume path
## 3. Decision and Execution
- [x] Есть `ContextBuilder`
- [x] Есть `Router` как pure decision layer
- [x] Есть `ExecutionDirective`
- [x] Есть `ExecutionEngine`
- [x] Есть `ExecutionScheduler`
- [ ] План валидируется и преобразуется в task graph
## 4. Tools and Safety
- [x] Есть `PermissionService`
- [x] Есть persistent store для user approvals
- [x] Есть `ToolSandbox`
- [x] Есть `ToolRegistry`
- [x] Работает `shell_exec`
- [x] Работает `file_read`
- [x] Работает `file_write`
## 5. Models and Evaluation
- [ ] Есть orchestrator adapter
- [ ] Есть planning mode interface
- [ ] Есть coder adapter
- [ ] Есть critic adapter
- [ ] Есть fallback policy при critic failure
## 6. Memory
- [ ] Есть SQLite metadata store
- [ ] Есть vector index adapter
- [ ] Работает memory insert/search
- [ ] Есть `MemoryWritePolicy`
- [ ] Запись в память не зависит только от critic score
## 7. Interfaces
- [x] Есть `POST /chat`
- [ ] Есть `WS /stream`
- [x] Есть `GET /health`
- [x] Есть локальный веб-чат для ручной проверки runtime
- [ ] Есть CLI для отправки задач и просмотра событий
## 8. Reliability
- [ ] Есть structured logging
- [ ] Есть retry/recovery policy skeleton
- [ ] Есть replay path from event store
- [ ] Есть critical-path smoke tests
## 9. Demo Definition
MVP считается достигнутым, если:
- [ ] можно открыть локальный веб-чат в браузере
- [ ] можно отправить задачу
- [ ] видно streaming events
- [ ] видно планирование или direct action
- [ ] видно выполнение tool step
- [ ] опасная команда требует подтверждения
- [ ] финальный ответ возвращается пользователю

1255
TASK_3.md Normal file

File diff suppressed because it is too large Load Diff

2
app/__init__.py Normal file
View File

@ -0,0 +1,2 @@
"""ducklm application package."""

2
app/api/__init__.py Normal file
View File

@ -0,0 +1,2 @@
"""API layer."""

118
app/api/server.py Normal file
View File

@ -0,0 +1,118 @@
from __future__ import annotations
import asyncio
from contextlib import asynccontextmanager
from pathlib import Path
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from fastapi.responses import FileResponse
from pydantic import BaseModel
class CriticFeedbackRequest(BaseModel):
feedback: str
task_id: str | None = None
session_id: str | None = None
correctness_override: float | None = None
usefulness_override: float | None = None
safety_override: float | None = None
from app.core.permission_resolution import PermissionResolutionRequest, SecretResolutionRequest, PasswordResolutionRequest
from app.core.contracts import UserTask
from app.runtime.runtime_controller import RuntimeController
from app.streaming.manager import StreamingManager
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Load models on startup."""
print("Lifespan: Starting model loading...")
loop = asyncio.get_event_loop()
def load_models():
try:
print("Lifespan: Loading models...")
runtime.load_models_at_startup()
print("Lifespan: Models loaded")
except Exception as e:
print(f"Lifespan: Failed to load models: {e}")
import traceback
traceback.print_exc()
await loop.run_in_executor(None, load_models)
yield # Server runs here
print("Lifespan: Shutting down...")
app = FastAPI(title="ducklm", lifespan=lifespan)
runtime = RuntimeController(base_dir=Path(__file__).resolve().parents[2])
streaming = StreamingManager(runtime.event_bus)
@app.get("/")
def index() -> FileResponse:
return FileResponse(Path(__file__).resolve().parent / "static" / "index.html")
@app.get("/health")
def health() -> dict[str, str]:
return {"status": "ok"}
@app.post("/chat")
def chat(task: UserTask) -> dict[str, object]:
return runtime.handle_task(task)
@app.post("/permissions/resolve")
def resolve_permission(request: PermissionResolutionRequest) -> dict[str, object]:
return runtime.resolve_permission(task_id=request.task_id, decision=request.decision)
@app.post("/secrets/resolve")
def resolve_secret(request: SecretResolutionRequest) -> dict[str, object]:
return runtime.resolve_secret(task_id=request.task_id, secret=request.secret)
@app.post("/password/resolve")
def resolve_password(request: PasswordResolutionRequest) -> dict[str, object]:
return runtime.resolve_password(task_id=request.task_id, password=request.password)
@app.post("/critic/feedback")
def critic_feedback(request: CriticFeedbackRequest) -> dict[str, object]:
feedback = runtime.handle_critic_feedback(
feedback=request.feedback,
task_id=request.task_id,
session_id=request.session_id,
correctness_override=request.correctness_override,
usefulness_override=request.usefulness_override,
safety_override=request.safety_override,
)
return feedback
@app.websocket("/stream/{task_id}")
async def stream_task(websocket: WebSocket, task_id: str) -> None:
await websocket.accept()
replayed_events = streaming.replay_events(task_id)
for event in replayed_events:
await websocket.send_json(event.model_dump(mode="json"))
if replayed_events and replayed_events[-1].type in {"task_completed", "task_failed"}:
await websocket.close()
return
queue = streaming.subscribe(task_id)
try:
while True:
event = await asyncio.wait_for(queue.get(), timeout=15)
await websocket.send_json(event.model_dump(mode="json"))
if event.type in {"task_completed", "task_failed", "task_awaiting_permission", "task_awaiting_input"}:
break
except (asyncio.TimeoutError, WebSocketDisconnect):
pass
finally:
streaming.unsubscribe(task_id, queue)
await websocket.close()

510
app/api/static/index.html Normal file
View File

@ -0,0 +1,510 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>ducklm runtime test chat</title>
<style>
:root {
--bg: #f5f1e8;
--panel: #fffdf7;
--border: #c9bba4;
--text: #1d1a16;
--accent: #195c4b;
--muted: #6f6659;
}
* { box-sizing: border-box; }
body {
margin: 0;
font-family: Georgia, "Iowan Old Style", serif;
background: radial-gradient(circle at top, #fff7df 0%, var(--bg) 60%);
color: var(--text);
}
main {
max-width: 1100px;
margin: 0 auto;
padding: 32px 20px 48px;
}
h1 {
margin: 0 0 8px;
font-size: 2.2rem;
}
p {
color: var(--muted);
margin: 0 0 24px;
}
.layout {
display: grid;
grid-template-columns: 1fr 350px;
gap: 20px;
}
.panel {
background: var(--panel);
border: 1px solid var(--border);
border-radius: 18px;
padding: 18px;
box-shadow: 0 14px 30px rgba(35, 28, 17, 0.08);
max-width: 100%;
}
textarea, input {
width: 100%;
border: 1px solid var(--border);
background: #fff;
border-radius: 12px;
padding: 12px;
font: inherit;
}
textarea {
min-height: 120px;
resize: vertical;
}
button {
margin-top: 12px;
background: var(--accent);
color: white;
border: 0;
border-radius: 999px;
padding: 10px 18px;
font: inherit;
cursor: pointer;
}
.messages, .events {
display: grid;
gap: 12px;
max-height: 520px;
overflow: auto;
width: 100%;
}
.messages {
width: 100%;
}
.events {
width: 100%;
}
.bubble, .event {
border: 1px solid var(--border);
border-radius: 14px;
padding: 12px;
background: white;
}
.event code {
color: var(--accent);
}
.row {
display: grid;
gap: 10px;
margin-top: 12px;
}
@media (max-width: 860px) {
.layout { grid-template-columns: 1fr; }
}
</style>
</head>
<body>
<main>
<h1>ducklm runtime test chat</h1>
<p>Thin browser client for checking task submission, tool execution and event replay.</p>
<div class="layout">
<section class="panel">
<div class="messages" id="messages"></div>
<div class="row">
<textarea id="prompt" placeholder="Опиши задачу..." style="min-height: 60px;"></textarea>
<button id="sendBtn">Отправить</button>
</div>
</section>
<aside class="panel">
<strong>Events</strong>
<div class="events" id="events"></div>
</aside>
</div>
</main>
<script>
const messages = document.getElementById("messages");
const events = document.getElementById("events");
const promptEl = document.getElementById("prompt");
const sendBtn = document.getElementById("sendBtn");
let lastPermissionRequest = null;
let lastSecretRequest = null;
let lastPasswordRequest = null;
const seenEvents = new Set();
let activePermissionBubble = null;
let activeSecretBubble = null;
let activePasswordBubble = null;
function addBubble(title, body) {
const el = document.createElement("div");
el.className = "bubble";
el.innerHTML = `<strong>${title}</strong><div>${body}</div>`;
messages.appendChild(el);
messages.scrollTop = messages.scrollHeight;
}
function escapeHtml(value) {
return String(value)
.replaceAll("&", "&amp;")
.replaceAll("<", "&lt;")
.replaceAll(">", "&gt;");
}
function addSystemMessage(title, text) {
addBubble(title, `<div>${escapeHtml(text)}</div>`);
}
function addJsonBubble(title, data) {
addBubble(title, `<pre>${escapeHtml(JSON.stringify(data, null, 2))}</pre>`);
}
function renderRuntimeResult(result, status) {
if (!result) {
addSystemMessage("Runtime", "No result returned.");
return;
}
if (status === "awaiting_permission" && result.permission_request) {
lastPermissionRequest = result.permission_request;
renderPermissionControls(result.permission_request);
return;
}
if (status === "awaiting_input" && result.secret_request) {
addSystemMessage("System", result.secret_request.prompt || "Secret input required.");
lastSecretRequest = result.secret_request;
renderSecretControls(result.secret_request);
return;
}
if (status === "awaiting_password" && result.needs_sudo) {
lastPasswordRequest = result;
renderPasswordControls(result);
return;
}
if (status === "awaiting_permission" && result.error) {
addSystemMessage("System", result.error);
return;
}
if (result.message && !result.step_results) {
addSystemMessage("Runtime", result.message);
}
if (result.step_results && Array.isArray(result.step_results)) {
for (const step of result.step_results) {
const toolResult = step.result?.result || step.result;
if (toolResult && toolResult.output) {
addBubble("💻", escapeHtml(toolResult.output));
} else if (toolResult && toolResult.error) {
addSystemMessage("❌", toolResult.error);
}
}
return;
}
if (typeof result.output === "string") {
addBubble("Runtime", `<pre>${escapeHtml(result.output)}</pre>`);
return;
}
addJsonBubble("Runtime", result);
}
function addEvent(event) {
const eventKey = `${event.task_id || "na"}:${event.sequence || JSON.stringify(event)}`;
if (seenEvents.has(eventKey)) {
return;
}
seenEvents.add(eventKey);
const el = document.createElement("div");
el.className = "event";
el.innerHTML = `<div><code>${event.type}</code></div><pre>${JSON.stringify(event.payload ?? event, null, 2)}</pre>`;
events.appendChild(el);
events.scrollTop = events.scrollHeight;
if (event.type === "permission_requested") {
lastPermissionRequest = event.payload;
renderPermissionControls(event.payload);
}
if (event.type === "permission_resolved" || event.type === "task_failed" || event.type === "task_completed") {
clearPermissionControls();
}
if (event.type === "orchestrator_result") {
const p = event.payload || {};
const directive = p.directive || {};
if (directive.type === "respond") {
const text = directive.payload?.text || directive.payload?.message || "";
if (text) {
addBubble("🤖", escapeHtml(text));
}
}
}
if (event.type === "task_completed") {
const p = event.payload || {};
const execResult = p.execution_result || {};
if (execResult.step_results && Array.isArray(execResult.step_results)) {
for (const step of execResult.step_results) {
const toolResult = step.result?.result || step.result;
if (toolResult && toolResult.output) {
addBubble("💻", escapeHtml(toolResult.output));
}
}
}
}
if (event.type === "secret_requested") {
lastSecretRequest = event.payload;
renderSecretControls(event.payload);
}
if (event.type === "task_completed" || event.type === "task_failed") {
clearSecretControls();
clearPasswordControls();
}
}
function renderPermissionControls(request) {
clearPermissionControls();
const command = request.command || JSON.stringify(request);
const el = document.createElement("div");
el.className = "bubble";
el.innerHTML = `
<strong>⚠️ Требуется разрешение</strong>
<div style="margin: 8px 0; padding: 8px; background: #fff3cd; border-radius: 8px; font-family: monospace; font-size: 12px;">${escapeHtml(command)}</div>
<div>
<button data-decision="allow_once">Разрешить</button>
${request.allow_always !== false ? '<button data-decision="allow_always">Разрешить навсегда</button>' : ''}
<button data-decision="deny" style="background: #dc3545;">Запретить</button>
</div>
`;
el.querySelectorAll("button").forEach((button) => {
button.addEventListener("click", async () => {
if (button.disabled) return;
await resolvePermission(button.dataset.decision);
});
});
activePermissionBubble = el;
messages.appendChild(el);
messages.scrollTop = messages.scrollHeight;
}
function clearPermissionControls() {
if (activePermissionBubble) {
activePermissionBubble.remove();
activePermissionBubble = null;
}
}
function renderSecretControls(request) {
clearSecretControls();
const el = document.createElement("div");
el.className = "bubble";
el.innerHTML = `
<strong>Secret required</strong>
<div>${escapeHtml(request.prompt)}</div>
<div style="margin-top:8px;color:var(--muted)">Command: <code>${escapeHtml(request.command || "")}</code></div>
<input type="password" placeholder="Enter secret" />
<button>Submit secret</button>
`;
const input = el.querySelector("input");
const submitBtn = el.querySelector("button");
input.addEventListener("keydown", async (event) => {
if (event.key === "Enter") {
event.preventDefault();
await resolveSecret(input.value);
}
});
submitBtn.addEventListener("click", async () => {
await resolveSecret(input.value);
});
activeSecretBubble = el;
messages.appendChild(el);
messages.scrollTop = messages.scrollHeight;
input.focus();
}
function clearSecretControls() {
if (activeSecretBubble) {
activeSecretBubble.remove();
activeSecretBubble = null;
}
}
function renderPasswordControls(request) {
clearPasswordControls();
const command = request.command || "unknown command";
const el = document.createElement("div");
el.className = "bubble";
el.innerHTML = `
<strong>🔐 Требуется пароль sudo</strong>
<div style="margin: 8px 0; padding: 8px; background: #fff3cd; border-radius: 8px; font-family: monospace; font-size: 12px;">sudo ${escapeHtml(command)}</div>
<div style="margin-top:8px;">
<input type="password" placeholder="Введите пароль sudo" />
<button>Выполнить с sudo</button>
</div>
`;
const input = el.querySelector("input");
const button = el.querySelector("button");
input.addEventListener("keydown", async (event) => {
if (event.key === "Enter") {
event.preventDefault();
await resolvePassword(input.value);
}
});
button.addEventListener("click", async () => {
await resolvePassword(input.value);
});
activePasswordBubble = el;
messages.appendChild(el);
messages.scrollTop = messages.scrollHeight;
input.focus();
}
function clearPasswordControls() {
if (activePasswordBubble) {
activePasswordBubble.remove();
activePasswordBubble = null;
}
}
async function resolvePermission(decision) {
if (!lastPermissionRequest) {
addBubble("Client error", "No pending permission request.");
return;
}
if (activePermissionBubble) {
activePermissionBubble.querySelectorAll("button").forEach((button) => {
button.disabled = true;
});
}
const response = await fetch("/permissions/resolve", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
task_id: lastPermissionRequest.task_id,
decision
})
});
const data = await response.json();
clearPermissionControls();
if (!data || !data.status) {
addSystemMessage("Error", "Failed to resolve permission");
return;
}
if (decision === "deny") {
addSystemMessage("Permission", "Permission denied.");
} else if (decision === "allow_always") {
addSystemMessage("Permission", "Permission granted permanently for this pattern.");
} else {
addSystemMessage("Permission", "Permission granted once.");
}
events.innerHTML = "";
seenEvents.clear();
if (data.events && Array.isArray(data.events)) {
data.events.forEach(addEvent);
}
renderRuntimeResult(data.result, data.status);
}
async function resolveSecret(secret) {
if (!lastSecretRequest) {
addBubble("Client error", "No pending secret request.");
return;
}
if (!secret) {
addSystemMessage("System", "Secret cannot be empty.");
return;
}
if (activeSecretBubble) {
const input = activeSecretBubble.querySelector("input");
const button = activeSecretBubble.querySelector("button");
if (input) input.disabled = true;
if (button) button.disabled = true;
}
clearSecretControls();
addSystemMessage("System", "Secret submitted. Waiting for command result.");
const response = await fetch("/secrets/resolve", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
task_id: lastSecretRequest.task_id,
secret
})
});
const data = await response.json();
events.innerHTML = "";
seenEvents.clear();
data.events.forEach(addEvent);
renderRuntimeResult(data.result, data.status);
}
async function resolvePassword(password) {
if (!lastPasswordRequest) {
addBubble("Client error", "No pending password request.");
return;
}
if (!password) {
addSystemMessage("System", "Пароль не может быть пустым.");
return;
}
if (activePasswordBubble) {
const input = activePasswordBubble.querySelector("input");
const button = activePasswordBubble.querySelector("button");
if (input) input.disabled = true;
if (button) button.disabled = true;
}
clearPasswordControls();
addSystemMessage("System", "Выполняю команду с sudo...");
const response = await fetch("/password/resolve", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
task_id: lastPasswordRequest.task_id,
password
})
});
const data = await response.json();
events.innerHTML = "";
seenEvents.clear();
data.events.forEach(addEvent);
renderRuntimeResult(data.result, data.status);
}
async function sendTask() {
const body = {
input: promptEl.value || "browser task",
task_id: "web-" + Date.now(),
session_id: "web-session",
context: {}
};
if (!promptEl.value.trim()) return;
addBubble("User", promptEl.value);
promptEl.value = "";
const response = await fetch("/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(body)
});
const data = await response.json();
if (!data || !data.events) {
addSystemMessage("Error", "Invalid response from server");
return;
}
lastPermissionRequest = null;
lastSecretRequest = null;
lastPasswordRequest = null;
clearPermissionControls();
clearSecretControls();
clearPasswordControls();
events.innerHTML = "";
seenEvents.clear();
data.events.forEach(addEvent);
renderRuntimeResult(data.result, data.status);
if (data.task_id) {
attachStream(data.task_id);
}
}
function attachStream(taskId) {
const protocol = location.protocol === "https:" ? "wss:" : "ws:";
const ws = new WebSocket(`${protocol}//${location.host}/stream/${taskId}`);
ws.onmessage = (message) => {
const data = JSON.parse(message.data);
addEvent(data);
};
}
sendBtn.addEventListener("click", sendTask);
</script>
</body>
</html>

2
app/cli/__init__.py Normal file
View File

@ -0,0 +1,2 @@
"""CLI layer."""

2
app/core/__init__.py Normal file
View File

@ -0,0 +1,2 @@
"""Core orchestration components."""

509
app/core/async_router.py Normal file
View File

@ -0,0 +1,509 @@
from __future__ import annotations
import asyncio
import json
import logging
from typing import Any
from app.core.contracts import ExecutionDirective
from app.core.intent_parser import IntentParser
from app.events.event_bus import EventBus
from app.events.event_types import (
ORCHESTRATOR_CALLED,
ORCHESTRATOR_FALLBACK_USED,
ORCHESTRATOR_RETRY,
ORCHESTRATOR_RESULT,
ORCHESTRATOR_UNAVAILABLE,
THINKER_CALLED,
THINKER_RESULT,
JSON_COMPILER_CALLED,
JSON_COMPILER_RESULT,
)
from app.models.async_adapters import AsyncOrchestratorAdapter
logger = logging.getLogger(__name__)
class AsyncRouter:
"""Async router using Thinker + JSON Compiler pipeline."""
def __init__(
self,
thinker: AsyncOrchestratorAdapter | None = None,
json_compiler: AsyncOrchestratorAdapter | None = None,
intent_parser: IntentParser | None = None,
prompts: dict[str, str] | None = None,
event_bus: EventBus | None = None,
tool_registry=None,
retry_limit: int = 2,
debug: bool = False,
log_length: int = 500,
json_fix_retry_limit: int = 2,
json_fix_use_sys_util: bool = True,
intent_classifier: str = "thinker",
) -> None:
self._thinker = thinker
self._json_compiler = json_compiler
self._intent_classifier = intent_classifier
self._sys_util = None
self._intent_parser = intent_parser or IntentParser()
self._prompts = prompts or {}
self._event_bus = event_bus
self._tool_registry = tool_registry
self._retry_limit = retry_limit
self._debug = debug
self._log_length = log_length
self._json_fix_retry_limit = json_fix_retry_limit
self._json_fix_use_sys_util = json_fix_use_sys_util
self._orchestrator = None # Set separately if needed for classification
def set_event_bus(self, event_bus: EventBus) -> None:
self._event_bus = event_bus
def set_thinker(self, thinker: AsyncOrchestratorAdapter) -> None:
self._thinker = thinker
def set_json_compiler(self, json_compiler: AsyncOrchestratorAdapter) -> None:
self._json_compiler = json_compiler
def set_sys_util(self, sys_util: AsyncOrchestratorAdapter) -> None:
self._sys_util = sys_util
def set_orchestrator(self, orchestrator: AsyncOrchestratorAdapter) -> None:
self._orchestrator = orchestrator
def set_tool_registry(self, tool_registry) -> None:
self._tool_registry = tool_registry
async def decide(
self,
state: dict[str, Any],
context: dict[str, Any],
task_id: str | None = None,
session_id: str | None = None,
) -> ExecutionDirective:
task_context = context.get("task_context", {})
requested_tool = task_context.get("requested_tool")
task_summary = str(context.get("task_summary", ""))
if requested_tool:
self._emit_event(
ORCHESTRATOR_RESULT,
{"reason": "explicit_tool_request", "tool": requested_tool},
task_id,
session_id,
)
return ExecutionDirective(
type="tool",
payload={
"tool": requested_tool,
"args": task_context.get("tool_args", {}),
},
requires_permission=requested_tool in {"shell_exec", "file_write"},
confidence=0.9,
reason="Task context explicitly requested a tool execution.",
)
if self._thinker is None:
fallback = self._fallback_directive(task_summary)
self._emit_event(
ORCHESTRATOR_FALLBACK_USED,
{"reason": "thinker_unavailable", "directive": fallback.model_dump(mode="json")},
task_id,
session_id,
)
return fallback
if self._json_compiler is None:
fallback = self._fallback_directive(task_summary)
self._emit_event(
ORCHESTRATOR_FALLBACK_USED,
{"reason": "json_compiler_unavailable", "directive": fallback.model_dump(mode="json")},
task_id,
session_id,
)
return fallback
mode_hint = await self._classify_intent(task_summary)
thinker_prompt = self._build_thinker_prompt(task_summary, context, mode_hint)
for thinker_attempt in range(self._retry_limit + 1):
if thinker_attempt > 0:
self._emit_event(
ORCHESTRATOR_RETRY,
{"attempt": thinker_attempt, "prompt": thinker_prompt},
task_id,
session_id,
)
thinker_prompt = self._add_thinker_feedback(thinker_prompt, last_thinker_error, thinker_attempt)
self._emit_event(
THINKER_CALLED,
{"attempt": thinker_attempt, "mode": mode_hint},
task_id,
session_id,
)
try:
thinker_result = await self._thinker.generate(thinker_prompt)
except Exception as e:
logger.warning(f"Thinker generate failed: {e}")
last_thinker_error = str(e)
continue
logger.info(f"Thinker result (attempt {thinker_attempt + 1}): {thinker_result}")
self._emit_event(
THINKER_RESULT,
{"result": thinker_result, "attempt": thinker_attempt},
task_id,
session_id,
)
# If mode_hint is conversation, only allow respond type
if mode_hint == "conversation" and not self._is_simple_response(thinker_result):
# Check if Thinker is trying to create an execution plan instead
if any(word in thinker_result.lower() for word in ["шаг", "step", "выполнить", "execute", "shell", "команда"]):
# Override to conversation-only response
respond_text = self._extract_conversation_response(thinker_result)
self._emit_event(
ORCHESTRATOR_RESULT,
{"directive": {"type": "respond", "payload": {"text": respond_text}}, "mode_violation": True},
task_id,
session_id,
)
return ExecutionDirective(
type="respond",
payload={"text": respond_text},
requires_permission=False,
reason="Mode violation: conversation only",
)
if self._is_simple_response(thinker_result):
json_compiler_prompt = self._build_json_compiler_prompt(thinker_result)
else:
json_compiler_prompt = self._build_json_compiler_prompt(thinker_result)
for compiler_attempt in range(self._json_fix_retry_limit + 1):
self._emit_event(
JSON_COMPILER_CALLED,
{"attempt": compiler_attempt, "plan": thinker_result},
task_id,
session_id,
)
try:
compiler_result = await self._json_compiler.generate(json_compiler_prompt)
except Exception as e:
logger.warning(f"JSON Compiler generate failed: {e}")
compiler_result = None
if compiler_result:
logger.info(f"JSON Compiler result (attempt {compiler_attempt + 1}): {compiler_result}")
self._emit_event(
JSON_COMPILER_RESULT,
{"result": compiler_result, "attempt": compiler_attempt},
task_id,
session_id,
)
directive = self._validate_directive(compiler_result, mode_hint) if compiler_result else None
if directive is not None:
directive = self._guard_rail_check(directive)
self._emit_event(
ORCHESTRATOR_RESULT,
{"directive": directive.model_dump(mode="json"), "thinker_attempt": thinker_attempt, "compiler_attempt": compiler_attempt},
task_id,
session_id,
)
return directive
if compiler_result:
logger.warning(f"JSON Compiler validation failed, attempting fix (attempt {compiler_attempt + 1})")
fix_result = await self._fix_invalid_json(compiler_result, compiler_attempt, task_id, session_id)
if fix_result:
fixed_directive = self._validate_directive(fix_result, mode_hint)
if fixed_directive is not None:
fixed_directive = self._guard_rail_check(fixed_directive)
self._emit_event(
ORCHESTRATOR_RESULT,
{"directive": fixed_directive.model_dump(mode="json"), "fixed": True},
task_id,
session_id,
)
return fixed_directive
last_thinker_error = f"JSON Compiler failed after {self._json_fix_retry_limit + 1} attempts"
self._emit_event(
ORCHESTRATOR_UNAVAILABLE,
{"reason": "retry_exhausted", "last_error": last_thinker_error},
task_id,
session_id,
)
raise RuntimeError(f"Thinker/Compiler pipeline failed after {self._retry_limit + 1} attempts")
def _fallback_directive(self, task_summary: str) -> ExecutionDirective:
parsed = self._intent_parser.parse(task_summary)
if parsed:
return parsed
return ExecutionDirective(
type="respond",
payload={"text": f"Runtime accepted task: {task_summary}"},
requires_permission=False,
confidence=0.4,
reason="Fallback response because local orchestration models are not loaded.",
)
def _is_simple_response(self, thinker_result: str) -> bool:
result_lower = thinker_result.lower().strip()
return result_lower.startswith("ответ:") or result_lower.startswith("response:") or "не нужно" in result_lower
def _extract_conversation_response(self, thinker_result: str) -> str:
"""Extract text response from thinker result for conversation mode."""
result_lower = thinker_result.lower()
# Skip the ПЛАН lines, just get the ОТВЕТ part
lines = thinker_result.split('\n')
response_lines = []
capture = False
for line in lines:
if line.strip().lower().startswith('ответ:') or line.strip().lower().startswith('response:'):
capture = True
response_lines.append(line)
elif capture and line.strip():
# Check if this is a new ПЛАН or step
if line.strip().lower().startswith('план') or line.strip().lower().startswith('step'):
break
response_lines.append(line)
if response_lines:
return '\n'.join(response_lines).replace('ответ:', '').replace('response:', '').strip()
# Fallback: return first few sentences
sentences = thinker_result.split('.')[:3]
return '. '.join(sentences).strip()
def _build_thinker_prompt(
self, task_summary: str, context: dict[str, Any], mode_hint: str
) -> str:
base_prompt = self._prompts.get("thinker", "")
memory_context = context.get("memory_context", [])
tools_json = "[]"
if self._tool_registry:
schemas = self._tool_registry.list_schemas()
tools_json = json.dumps(schemas, ensure_ascii=False, indent=2)
prompt_lines = [
base_prompt,
"",
f"Task: {task_summary}",
f"Mode hint: {mode_hint}",
]
if memory_context:
memory_text = "\n".join([f"- {m.get('text', '')}" for m in memory_context[:5]])
prompt_lines.append(f"\nRelevant memory:\n{memory_text}")
session_history = context.get("session_history", [])
if session_history:
history_text = "\n".join([f"- {h.get('text', '')}" for h in session_history[:3]])
prompt_lines.append(f"\nPrevious requests in this session:\n{history_text}")
prompt_lines.extend([
"",
f"AVAILABLE TOOLS (JSON):",
tools_json,
"",
])
return "\n".join(prompt_lines)
def _build_json_compiler_prompt(self, thinker_result: str) -> str:
base_prompt = self._prompts.get("json_compiler", "")
prompt_lines = [
base_prompt,
"",
"Thinker's plan:",
thinker_result,
"",
]
return "\n".join(prompt_lines)
def _determine_mode_from_context(self, context: dict[str, Any]) -> str:
"""Legacy method - kept for compatibility"""
task_summary = str(context.get("task_summary", "")).lower()
keywords = ["запусти", "выполни", "создай", "напиши", "удали", "run", "execute", "create"]
for kw in keywords:
if kw in task_summary:
return "execution"
return "conversation"
async def _classify_intent(self, task_summary: str) -> str:
"""LLM-based intent classification"""
if self._intent_classifier == "orchestrator" and self._orchestrator:
classifier_model = self._orchestrator
else:
classifier_model = self._thinker
if not classifier_model:
logger.warning("No classifier model available, using default")
return "conversation"
classification_prompt = f"""Классифицируй запрос пользователя: "{task_summary}"
Правила:
- execution: пользователь ХОЧЕТ выполнить действие (проверить, запустить, создать, удалить, найти, прочитать, записать)
- conversation: пользователь просто отвечает, задаёт вопрос или хочет информацию
- clarification_needed: непонятно что делать
Ответь ОДНИМ словом: execution / conversation / clarification_needed"""
try:
result = await classifier_model.generate(classification_prompt)
result = result.strip().lower()
# Extract first word - LLM often adds explanation
first_word = result.split()[0] if result.split() else ""
# Validate result is one of allowed values
allowed = {"execution", "conversation", "clarification_needed"}
if first_word in allowed:
logger.info(f"Intent classified: {first_word} for task: {task_summary}")
return first_word
if result in allowed:
logger.info(f"Intent classified: {result} for task: {task_summary}")
return result
logger.warning(f"Invalid classification result: {result}, defaulting to conversation")
return "conversation"
except Exception as e:
logger.warning(f"Intent classification failed: {e}, defaulting to conversation")
return "conversation"
def _validate_directive(self, output: str, mode_hint: str) -> ExecutionDirective | None:
if not output:
return None
try:
json_start = output.find("{")
json_end = output.rfind("}") + 1
if json_start < 0 or json_end <= 0:
return None
json_str = output[json_start:json_end]
data = json.loads(json_str)
if "type" not in data:
return None
msg_type = data.get("type", "")
payload = data.get("payload", {})
if msg_type == "step" and "tool" in payload:
tool = payload.get("tool", "")
args = payload.get("args", {})
payload = {"tool": tool, "args": args}
if msg_type == "plan":
payload = {"steps": payload.get("steps", [])}
return ExecutionDirective(
type=msg_type,
payload=payload,
confidence=data.get("confidence", 0.9),
reason=data.get("reason", ""),
)
except (json.JSONDecodeError, ValueError, TypeError) as e:
logger.warning(f"Directive JSON validation failed: {e}")
return None
def _guard_rail_check(self, directive: ExecutionDirective) -> ExecutionDirective:
tool_name = directive.payload.get("tool", "")
if tool_name in {"shell_exec", "file_write", "file_delete"}:
return ExecutionDirective(
type=directive.type,
payload=directive.payload,
requires_permission=True,
confidence=directive.confidence,
reason=directive.reason,
)
return directive
def _add_thinker_feedback(self, prompt: str, error: str, attempt: int) -> str:
feedback = f"\n[ATTEMPT {attempt + 1} FAILED: {error}]\n"
feedback += "Provide a valid semantic plan.\n"
return prompt + feedback
def _emit_event(
self,
event_type: str,
payload: dict[str, Any],
task_id: str | None,
session_id: str | None,
) -> None:
if self._event_bus and task_id:
from app.core.contracts import RuntimeEvent
event = RuntimeEvent(
task_id=task_id,
session_id=session_id or "unknown",
sequence=self._event_bus.next_sequence(task_id),
type=event_type,
payload=payload,
)
self._event_bus.publish(event)
SYS_UTIL_PROMPT = None
async def _fix_invalid_json(self, invalid_result: str, attempt: int, task_id: str | None, session_id: str | None) -> str | None:
"""Try to fix invalid JSON using sys_util model."""
if not self._sys_util:
return None
first_brace = invalid_result.find('{')
last_brace = invalid_result.rfind('}')
if first_brace < 0 or last_brace <= first_brace:
return None
truncated_json = invalid_result[first_brace:last_brace + 1]
error_msg = ""
try:
json.loads(truncated_json)
except json.JSONDecodeError as e:
error_msg = str(e)
sys_util_prompt = (
self._prompts.get("sys_util")
if self._prompts
else self.SYS_UTIL_PROMPT or (
"You are a STRICT JSON repair engine. "
"Your job is ONLY to fix invalid JSON syntax. "
"You MUST output valid JSON or nothing else."
)
)
fix_prompt = f"""{sys_util_prompt}
{error_msg}
Fixed JSON:"""
try:
logger.info(f"JSON fix using sys_util model (attempt {attempt + 1})")
fixed_result = await self._sys_util.generate(fix_prompt)
fixed_first = fixed_result.find('{')
fixed_last = fixed_result.rfind('}')
if fixed_first >= 0 and fixed_last > fixed_first:
return fixed_result[fixed_first:fixed_last + 1]
return None
except Exception as e:
logger.warning(f"JSON fix failed: {e}")
return None

89
app/core/config.py Normal file
View File

@ -0,0 +1,89 @@
from __future__ import annotations
import json
from pathlib import Path
from typing import Any
from pydantic import BaseModel, Field
class ModelsConfig(BaseModel):
orchestrator_path: str = "models/llama.gguf"
coder_path: str = "models/xcoder.gguf"
critic_path: str = "models/gemma.gguf"
embeddings_path: str = "models/all-MiniLM-L6-v2"
inference: dict[str, Any] = Field(default_factory=dict)
thinker: dict[str, Any] = Field(default_factory=dict)
json_compiler: dict[str, Any] = Field(default_factory=dict)
orchestrator: dict[str, Any] = Field(default_factory=dict)
coder: dict[str, Any] = Field(default_factory=dict)
critic: dict[str, Any] = Field(default_factory=dict)
sys_util: dict[str, Any] = Field(default_factory=dict)
embeddings: dict[str, Any] = Field(default_factory=dict)
class PromptsConfig(BaseModel):
orchestration_prompt: str = ""
planning_prompt: str = ""
coder_prompt: str = ""
critic_prompt: str = ""
class PermissionsConfig(BaseModel):
dangerous_commands: dict[str, str] = Field(default_factory=dict)
sensitive_paths: list[str] = Field(default_factory=list)
default_approval_behavior: str = "ask_always"
class RuntimeConfig(BaseModel):
step_timeout_ms: int = 30_000
task_timeout_ms: int = 300_000
planner_retry_limit: int = 2
tool_retry_limit: int = 1
replan_limit: int = 1
max_execution_steps: int = 20
retrieval_top_k: int = 5
max_context_tokens: int = 8192
context_budgets: dict[str, int] = Field(default_factory=lambda: {
"system": 512,
"task": 512,
"memory": 2048,
"execution": 2048,
"tools": 1024,
"safety": 512,
})
reserve_for_generation_pct: int = 25
orchestrator_retry_limit: int = 2
intent_classifier: str = "thinker"
memory_thresholds: dict[str, float] = Field(default_factory=dict)
critic_fallback_policy: str = "continue_without_critic"
checkpoint_policy: dict[str, Any] = Field(default_factory=dict)
event_retention_policy: dict[str, Any] = Field(default_factory=dict)
streaming_settings: dict[str, Any] = Field(default_factory=dict)
debug: bool = False
debug_orchestrator_log_length: int = 500
json_fix_retry_limit: int = 2
json_fix_use_sys_util: bool = True
class AppConfig(BaseModel):
models: ModelsConfig
prompts: PromptsConfig
permissions: PermissionsConfig
runtime: RuntimeConfig
def _load_json(path: Path) -> dict[str, Any]:
with path.open("r", encoding="utf-8") as handle:
return json.load(handle)
def load_app_config(config_dir: str | Path) -> AppConfig:
config_path = Path(config_dir)
return AppConfig(
models=ModelsConfig.model_validate(_load_json(config_path / "models.json")),
prompts=PromptsConfig.model_validate(_load_json(config_path / "prompts.json")),
permissions=PermissionsConfig.model_validate(_load_json(config_path / "permissions.json")),
runtime=RuntimeConfig.model_validate(_load_json(config_path / "runtime.json")),
)

172
app/core/context_builder.py Normal file
View File

@ -0,0 +1,172 @@
from __future__ import annotations
import logging
from typing import Any
from app.core.contracts import TaskCheckpoint, UserTask
logger = logging.getLogger(__name__)
DEFAULT_BUDGETS = {
"system": 512,
"task": 512,
"memory": 2048,
"execution": 2048,
"tools": 1024,
"safety": 512,
}
class ContextBuilder:
def __init__(
self,
memory_interface=None,
tool_registry=None,
config: dict[str, Any] | None = None,
) -> None:
self._memory = memory_interface
self._tool_registry = tool_registry
self._config = config or {}
self._max_tokens = self._config.get("max_context_tokens", 8192)
self._budgets = self._config.get("context_budgets", DEFAULT_BUDGETS)
self._reserve_pct = self._config.get("reserve_for_generation_pct", 25)
def build(
self,
task: UserTask,
checkpoint: TaskCheckpoint | None = None,
query: str | None = None,
) -> dict[str, Any]:
task_summary = task.input
search_query = query or task_summary
session_id = task.session_id
memory_context = []
if self._memory:
memory_context = self._retrieve_memory(search_query, session_id=session_id)
budgets = self._calculate_budgets()
reserved = self._reserve_for_generation()
system_budget = budgets.get("system", 512)
task_budget = budgets.get("task", 512)
safety_budget = budgets.get("safety", 512)
memory_budget = budgets.get("memory", 2048)
truncated_memory = self._truncate_memory(
memory_context, memory_budget
)
# Get session history for follow-up context
session_history = self._get_session_history(session_id)
context = {
"system_prompt": "",
"task_summary": task_summary[:task_budget],
"task_context": task.context,
"memory_context": truncated_memory,
"session_history": session_history,
"execution_context": checkpoint.model_dump() if checkpoint else {},
"tool_context": self._get_tool_context(),
"safety_context": {},
"constraints": {
"budgets": budgets,
"reserved_for_generation": reserved,
"original_memory_count": len(memory_context),
"truncated_memory_count": len(truncated_memory),
},
}
return context
def _get_tool_context(self) -> list[dict[str, Any]]:
"""Expose available tools to orchestrator."""
if not self._tool_registry:
return []
tools = []
for name in self._tool_registry.list_names():
tool = self._tool_registry.get(name)
tools.append({
"name": name,
"description": getattr(tool, "description", ""),
})
return tools
def _calculate_budgets(self) -> dict[str, int]:
return dict(self._budgets)
def _reserve_for_generation(self) -> int:
return int(self._max_tokens * self._reserve_pct / 100)
def _retrieve_memory(
self,
query: str,
session_id: str | None = None,
top_k: int = 5,
) -> list[dict[str, Any]]:
if not self._memory:
return []
try:
results = self._memory.search(query, top_k=top_k, session_id=session_id)
return [
{
"id": entry.id,
"text": entry.text,
"kind": entry.kind,
"source": entry.source,
"weight": entry.weight,
"score": score,
}
for entry, score in results
]
except Exception as e:
logger.warning(f"Memory retrieval failed: {e}")
return []
def _get_session_history(self, session_id: str | None = None) -> list[dict[str, Any]]:
"""Get previous task summaries from the same session for context."""
if not self._memory or not session_id:
return []
try:
# Get recent entries from same session
entries = self._memory.get_by_session(session_id, limit=5)
# Filter to only task summaries
summaries = [
{
"id": entry.id,
"text": entry.text,
"kind": entry.kind,
"source": entry.source,
"weight": entry.weight,
}
for entry in entries
if entry.kind in ("summary", "tool_result")
]
return summaries
except Exception as e:
logger.warning(f"Session history retrieval failed: {e}")
return []
def _truncate_memory(
self,
memory_context: list[dict[str, Any]],
budget: int,
) -> list[dict[str, Any]]:
if not memory_context:
return []
estimated_per_entry = 50
max_entries = max(budget // estimated_per_entry, 1)
if len(memory_context) > max_entries:
return memory_context[:max_entries]
return memory_context
def estimate_tokens(self, text: str) -> int:
if not text:
return 0
return len(text.split()) * 4 // 3

148
app/core/contracts.py Normal file
View File

@ -0,0 +1,148 @@
from __future__ import annotations
from datetime import datetime, timezone
from typing import Any, Literal
from uuid import uuid4
from pydantic import BaseModel, Field
def utc_now() -> datetime:
return datetime.now(timezone.utc)
class UserTask(BaseModel):
task_id: str = Field(default_factory=lambda: str(uuid4()))
session_id: str = Field(default_factory=lambda: str(uuid4()))
input: str
context: dict[str, Any] = Field(default_factory=dict)
created_at: datetime = Field(default_factory=utc_now)
class PlanStep(BaseModel):
id: str
kind: Literal["tool", "coder", "memory", "respond"]
tool: str | None = None
args: dict[str, Any] = Field(default_factory=dict)
description: str
requires_confirmation: bool = False
depends_on: list[str] = Field(default_factory=list)
class ToolCall(BaseModel):
tool: str
args: dict[str, Any] = Field(default_factory=dict)
task_id: str
step_id: str
class ToolResult(BaseModel):
tool: str
ok: bool
output: Any = None
error: str | None = None
metadata: dict[str, Any] = Field(default_factory=dict)
class CoderRequest(BaseModel):
mode: Literal["generate", "fix", "refactor"]
instruction: str
context: dict[str, Any] = Field(default_factory=dict)
task_id: str
class CriticScore(BaseModel):
correctness: float = Field(ge=0.0, le=1.0)
usefulness: float = Field(ge=0.0, le=1.0)
safety: float = Field(ge=0.0, le=1.0)
memory_store: bool
weight: float = Field(ge=0.0, le=1.0)
explanation: str
class MemoryEntry(BaseModel):
id: str = Field(default_factory=lambda: str(uuid4()))
text: str
kind: Literal["tool_result", "plan", "critique", "fact", "summary", "user_preference"]
source: Literal["tool", "critic", "user", "system"]
weight: float = Field(ge=0.0, le=1.0)
task_id: str | None = None
session_id: str | None = None
metadata: dict[str, Any] = Field(default_factory=dict)
created_at: datetime = Field(default_factory=utc_now)
embedding_model: str
embedding_dim: int
class PermissionDecision(BaseModel):
action_type: str
pattern: str
decision: Literal["allow_once", "allow_always", "deny", "ask_always"]
created_at: datetime = Field(default_factory=utc_now)
class RuntimeEvent(BaseModel):
event_id: str = Field(default_factory=lambda: str(uuid4()))
task_id: str
session_id: str
sequence: int
type: str
timestamp: datetime = Field(default_factory=utc_now)
payload: dict[str, Any] = Field(default_factory=dict)
causation_id: str | None = None
correlation_id: str = Field(default_factory=lambda: str(uuid4()))
class TaskCheckpoint(BaseModel):
task_id: str
status: str
active_step_id: str | None = None
plan_snapshot: dict[str, Any] = Field(default_factory=dict)
context_snapshot: dict[str, Any] = Field(default_factory=dict)
updated_at: datetime = Field(default_factory=utc_now)
class PermissionRequest(BaseModel):
task_id: str
session_id: str
action_type: str
pattern: str
command: str | None = None
path: str | None = None
requires_password: bool = False
class SecretRequest(BaseModel):
task_id: str
session_id: str
kind: str
prompt: str
command: str | None = None
class PasswordRequest(BaseModel):
task_id: str
session_id: str
command: str
reason: str
attempts: int = 0
max_attempts: int = 3
class ExecutionDirective(BaseModel):
type: Literal[
"plan",
"tool",
"coder",
"respond",
"replan",
"store_memory",
"request_permission",
"complete",
"fail",
"noop",
]
payload: dict[str, Any] = Field(default_factory=dict)
requires_permission: bool = False
confidence: float = Field(ge=0.0, le=1.0, default=0.0)
reason: str = ""

View File

@ -0,0 +1,591 @@
from __future__ import annotations
import asyncio
import json
import logging
from typing import Any
from app.core.contracts import (
CriticScore,
ExecutionDirective,
PermissionDecision,
PermissionRequest,
RuntimeEvent,
SecretRequest,
ToolCall,
UserTask,
)
from app.core.execution_scheduler import ExecutionScheduler
from app.events.event_bus import EventBus
from app.events.event_types import (
CRITIC_CALLED,
CRITIC_RESULT,
PERMISSION_REQUESTED,
PERMISSION_RESOLVED,
PLAN_FAILED,
PLAN_STARTED,
SECRET_REQUESTED,
STEP_STARTED,
STEPPED_COMPLETED,
TOOL_CALLED,
TOOL_COMPLETED,
)
from app.models.async_adapters import AsyncCriticAdapter, AsyncCoderAdapter
from app.memory.write_policy import MemoryWritePolicy
from app.memory.interface import MemoryInterface
logger = logging.getLogger(__name__)
class ExecutionEngine:
def __init__(
self,
event_bus: EventBus,
tool_registry,
permission_service,
scheduler: ExecutionScheduler | None = None,
critic: AsyncCriticAdapter | None = None,
memory_policy: MemoryWritePolicy | None = None,
memory_interface: MemoryInterface | None = None,
prompts: dict[str, str] | None = None,
) -> None:
self._event_bus = event_bus
self._tool_registry = tool_registry
self._permission_service = permission_service
self._scheduler = scheduler or ExecutionScheduler()
self._critic = critic
self._coder: AsyncCoderAdapter | None = None
self._memory_policy = memory_policy
self._memory_interface = memory_interface
self._prompts = prompts or {}
def set_critic(self, critic: AsyncCriticAdapter) -> None:
self._critic = critic
def set_coder(self, coder: AsyncCoderAdapter) -> None:
self._coder = coder
def set_memory_policy(self, policy: MemoryWritePolicy) -> None:
self._memory_policy = policy
def execute(
self,
task: UserTask,
directive: ExecutionDirective,
permission_override: PermissionDecision | None = None,
secret_override: str | None = None,
password_override: str | None = None,
) -> dict[str, Any]:
scheduled = self._scheduler.next_directive(directive)
self._publish(task, STEP_STARTED, {"directive_type": scheduled.type})
if scheduled.type == "plan":
return self._execute_plan(
task=task,
directive=scheduled,
permission_override=permission_override,
secret_override=secret_override,
password_override=password_override,
)
if scheduled.type == "tool":
return self._execute_tool(
task=task,
directive=scheduled,
permission_override=permission_override,
secret_override=secret_override,
password_override=password_override,
)
if scheduled.type == "respond":
return {
"status": "completed",
"result": {
"message": f"Runtime accepted task: {task.input}",
"mode": scheduled.payload.get("mode", "direct_response"),
},
}
if scheduled.type == "coder":
return self._execute_coder(
task=task,
directive=scheduled,
)
if scheduled.type == "fail":
return {
"status": "failed",
"result": {"error": scheduled.reason or "Execution failed."},
}
return {
"status": "completed",
"result": {
"message": "Directive accepted.",
"directive_type": scheduled.type,
},
}
def _execute_plan(
self,
task: UserTask,
directive: ExecutionDirective,
permission_override: PermissionDecision | None = None,
secret_override: str | None = None,
password_override: str | None = None,
) -> dict[str, Any]:
# Unified format: {"type": "plan", "payload": {"steps": [...]}}
# Need to extract steps from nested payload
import json
payload = directive.payload
steps_data = []
# If payload has "steps" directly, use them
if "steps" in payload:
steps_data = payload.get("steps", [])
# If payload is a string (JSON), parse it
elif isinstance(payload, str) and payload.strip().startswith("{"):
try:
parsed = json.loads(payload)
steps_data = parsed.get("payload", {}).get("steps", [])
except:
steps_data = []
if steps_data:
plan_json = json.dumps({"type": "plan", "payload": {"steps": steps_data}})
else:
plan_json = json.dumps(payload)
plan_steps = self._scheduler.parse_plan_steps(plan_json, task.task_id)
if not plan_steps:
return {
"status": "failed",
"result": {"error": "Failed to parse plan steps from directive"},
}
if not self._scheduler.validate_no_cycles(plan_steps):
self._publish(task, PLAN_FAILED, {"error": "Cycle detected in plan"})
return {
"status": "failed",
"result": {"error": "Cycle detected in plan"},
}
graph = self._scheduler.build_task_graph(plan_steps)
self._publish(task, PLAN_STARTED, {"steps": len(plan_steps)})
completed_steps: set[str] = set()
step_results: list[dict[str, Any]] = []
ready_steps = self._get_ready_steps(graph, completed_steps)
while ready_steps:
step = ready_steps.pop(0)
# Handle respond kind directly without tool execution
if step.kind == "respond":
result = {
"status": "completed",
"result": {
"message": step.args.get("text", step.description),
},
}
else:
step_directive = ExecutionDirective(
type=step.kind,
payload={
"tool": step.tool,
"args": step.args,
},
requires_permission=step.requires_confirmation,
reason=step.description,
)
result = self._execute_tool(
task=task,
directive=step_directive,
permission_override=permission_override,
secret_override=secret_override,
password_override=password_override,
)
# If tool needs permission - return immediately, don't continue execution
if result.get("status") == "awaiting_permission":
return {
"status": "awaiting_permission",
"result": result.get("result", {}),
"step_results": step_results,
}
step_results.append({
"step_id": step.id,
"result": result,
})
completed_steps.add(step.id)
self._publish(task, STEPPED_COMPLETED, {
"step_id": step.id,
"status": result.get("status"),
})
# If tool needs permission or failed - return immediately, don't continue execution
if result.get("status") == "failed":
return {
"status": "failed",
"result": {
"error": f"Step {step.id} failed",
"failed_step": step.id,
"step_results": step_results,
},
}
requires_execution = directive.payload.get("requires_execution", True)
if requires_execution and self._critic:
critic_result = self._evaluate_with_critic(
task, step, result
)
if critic_result:
# Convert to dict for JSON serialization
result["critic_score"] = critic_result.model_dump(mode="json") if hasattr(critic_result, 'model_dump') else dict(critic_result)
self._save_critique_to_memory(task, step, critic_result)
ready_steps = self._get_ready_steps(graph, completed_steps)
return {
"status": "completed",
"result": {
"message": f"Plan executed: {len(completed_steps)} steps completed",
"step_results": step_results,
},
}
def _get_ready_steps(
self,
graph: dict[str, Any],
completed: set[str],
) -> list:
if not graph or not graph.get("nodes"):
return []
step_map: dict = graph.get("step_map", {})
ready = []
for node in graph["nodes"]:
node_id = node["id"]
if node_id in completed:
continue
deps = node.get("depends_on", [])
if all(dep in completed for dep in deps):
step = step_map.get(node_id)
if step:
ready.append(step)
return ready
def _evaluate_with_critic(
self,
task: UserTask,
step,
result: dict[str, Any],
) -> CriticScore | None:
if not self._critic:
return None
critic_prompt = self._build_critic_prompt(step, result)
self._publish(task, CRITIC_CALLED, {"step_id": step.id})
try:
critic_output = asyncio.run(self._critic.generate(critic_prompt))
score = self._parse_critic_score(critic_output)
self._publish(task, CRITIC_RESULT, {
"step_id": step.id,
"score": score.model_dump(mode="json") if score else None,
})
if score:
result["critic_score"] = {
"correctness": score.correctness,
"usefulness": score.usefulness,
"safety": score.safety,
"memory_store": score.memory_store,
"weight": score.weight,
"explanation": score.explanation,
}
return score
except Exception as e:
logger.warning(f"Critic evaluation failed: {e}")
self._publish(task, CRITIC_RESULT, {
"step_id": step.id,
"error": str(e),
})
return None
def _save_critique_to_memory(
self,
task: UserTask,
step,
score: CriticScore,
) -> None:
"""Save critic evaluation as critique entry in memory."""
if not self._memory_interface:
return
try:
tool_name = step.tool
tool_args = step.args or {}
args_str = ", ".join([f"{k}={v}" for k, v in tool_args.items()])
critique_text = f"Tool: {tool_name}({args_str}) | Task: {task.input[:100]} | Scores: correctness={score.correctness}, usefulness={score.usefulness}, safety={score.safety} | {score.explanation}"
metadata = {
"task_input": task.input,
"tool": tool_name,
"args": tool_args,
"step_id": step.id,
"scores": {
"correctness": score.correctness,
"usefulness": score.usefulness,
"safety": score.safety,
},
}
self._memory_interface.insert(
text=critique_text,
kind="critique",
source="critic",
task_id=task.task_id,
session_id=task.session_id,
weight=score.weight,
metadata=metadata,
)
logger.info(f"Saved critique to memory: {tool_name} task_id={task.task_id}")
except Exception as e:
logger.warning(f"Failed to save critique to memory: {e}")
def _build_critic_prompt(self, step, result: dict[str, Any]) -> str:
base_prompt = self._prompts.get("critic", "")
tool_result = result.get("result", {})
return f"""{base_prompt}
Step: {step.description}
Tool: {step.tool}
Args: {step.args}
Result:
{json.dumps(tool_result, indent=2)}
Evaluate and respond with JSON:
{{"correctness": 0.0-1.0, "usefulness": 0.0-1.0, "safety": 0.0-1.0, "memory_store": true|false, "weight": 0.0-1.0, "explanation": "..."}}"""
def _parse_critic_score(self, output: str) -> CriticScore | None:
try:
json_start = output.find("{")
json_end = output.rfind("}") + 1
if json_start < 0:
return None
json_str = output[json_start:json_end]
data = json.loads(json_str)
return CriticScore(
correctness=data.get("correctness", 0.5),
usefulness=data.get("usefulness", 0.5),
safety=data.get("safety", 1.0),
memory_store=data.get("memory_store", False),
weight=data.get("weight", 0.5),
explanation=data.get("explanation", ""),
)
except (json.JSONDecodeError, ValueError, TypeError) as e:
logger.warning(f"Critic score parsing failed: {e}")
return None
def _execute_coder(
self,
task: UserTask,
directive: ExecutionDirective,
) -> dict[str, Any]:
if not self._coder:
return {"status": "failed", "result": {"error": "Coder model not available"}}
coder_task = directive.payload.get("task", "")
if not coder_task:
return {"status": "failed", "result": {"error": "Missing task for coder"}}
try:
output = asyncio.run(self._coder.generate(coder_task))
return {
"status": "completed",
"result": {"code": output},
}
except Exception as e:
logger.warning(f"Coder execution failed: {e}")
return {"status": "failed", "result": {"error": str(e)}}
def _execute_tool(
self,
task: UserTask,
directive: ExecutionDirective,
permission_override: PermissionDecision | None = None,
secret_override: str | None = None,
password_override: str | None = None,
) -> dict[str, Any]:
tool_name = str(directive.payload.get("tool", "")).strip()
tool_args = dict(directive.payload.get("args", {}))
if password_override:
tool_args["password"] = password_override
if not tool_name:
return {"status": "failed", "result": {"error": "Missing tool name"}}
# Tool-first: validate tool exists in registry
available_tools = self._tool_registry.list_names()
if tool_name not in available_tools:
return {"status": "failed", "result": {"error": f"Unknown tool: {tool_name}. Available tools: {available_tools}"}}
permission_result = None
# Check permission for shell_exec and file_write
if tool_name == "shell_exec":
permission_result = self._permission_service.check_shell_command(
task_id=task.task_id,
session_id=task.session_id,
command=str(tool_args.get("command", "")),
)
elif tool_name == "file_write":
# Allow writing to runtime data directory without permission check
write_path = str(tool_args.get("path", ""))
if "allowed_commands.json" in write_path or "/data/runtime" in write_path:
# Internal system write - allow without permission
permission_result = {"decision": "allowed", "path": write_path}
else:
permission_result = self._permission_service.check_write_path(
task_id=task.task_id,
session_id=task.session_id,
path=write_path,
)
# Handle permission result
if permission_result:
decision = permission_result.get("decision", "unknown")
# Hard stop - deny execution
if decision == "hard_stop":
self._publish(task, PERMISSION_REQUESTED, permission_result)
return {
"status": "failed",
"result": {
"error": f"Command blocked: {permission_result.get('reason', 'Hard stop command')}",
"command": permission_result.get("command", ""),
},
}
# Cached - already allowed
if decision in ("allowed_always", "allowed") or permission_result.get("cached"):
self._publish(task, PERMISSION_RESOLVED, permission_result)
# Need user confirmation - return immediately, don't continue execution
elif decision == "prompt":
self._publish(task, PERMISSION_REQUESTED, permission_result)
return {
"status": "awaiting_permission",
"result": {
"error": "Permission required before execution.",
"permission_request": permission_result,
},
}
# Hard stop - return immediately
elif decision == "deny":
self._publish(task, PERMISSION_RESOLVED, permission_result)
return {
"status": "failed",
"result": {
"error": "Permission denied",
"command": permission_result.get("command", ""),
},
}
# Deny
elif decision == "deny":
self._publish(task, PERMISSION_RESOLVED, permission_result)
return {
"status": "failed",
"result": {
"error": "Permission denied",
"command": permission_result.get("command", ""),
},
}
if tool_name == "shell_exec":
command = str(tool_args.get("command", ""))
if command.startswith("sudo ") and secret_override is None:
secret_request = SecretRequest(
task_id=task.task_id,
session_id=task.session_id,
kind="sudo_password",
prompt="Sudo password required",
command=command,
)
self._publish(task, SECRET_REQUESTED, secret_request.model_dump(mode="json"))
return {
"status": "awaiting_input",
"result": {
"error": "Secret required",
"secret_request": secret_request.model_dump(mode="json"),
},
}
if command.startswith("sudo ") and secret_override is not None:
tool_args["command"] = f"sudo -S -p '' {command[len('sudo '):]}"
tool_args["stdin_secret"] = f"{secret_override}\n"
tool_call = ToolCall(
tool=tool_name,
args=tool_args,
task_id=task.task_id,
step_id="step-1",
)
self._publish(task, TOOL_CALLED, tool_call.model_dump(mode="json"))
tool_result = self._tool_registry.get(tool_name).execute(task=task, args=tool_args)
self._publish(task, TOOL_COMPLETED, tool_result.model_dump(mode="json"))
needs_sudo = tool_result.metadata.get("needs_sudo", False) if tool_result.metadata else False
if not tool_result.ok and needs_sudo:
return {
"status": "awaiting_password",
"result": {
"task_id": task.task_id,
"needs_sudo": True,
"command": tool_args.get("command", ""),
"error": tool_result.error or "Permission denied",
"tool_result": tool_result.model_dump(mode="json"),
},
}
return {
"status": "completed" if tool_result.ok else "failed",
"result": tool_result.model_dump(mode="json"),
}
def _publish(self, task: UserTask, event_type: str, payload: dict[str, Any]) -> None:
if not self._event_bus:
return
event = RuntimeEvent(
task_id=task.task_id,
session_id=task.session_id,
sequence=self._event_bus.next_sequence(task.task_id),
type=event_type,
payload=payload,
)
self._event_bus.publish(event)

View File

@ -0,0 +1,212 @@
from __future__ import annotations
import json
import logging
from collections import deque
from typing import Any
from app.core.contracts import ExecutionDirective, PlanStep
logger = logging.getLogger(__name__)
class ExecutionScheduler:
def __init__(self, retry_limit: int = 2) -> None:
self._retry_limit = retry_limit
def parse_plan_steps(
self,
json_str: str,
task_id: str | None = None,
) -> list[PlanStep]:
try:
json_start = json_str.find("{")
json_end = json_str.rfind("}") + 1
if json_start < 0:
return []
json_str = json_str[json_start:json_end]
data = json.loads(json_str)
# Unified format: {"type": "plan", "payload": {"steps": [...]}}
# or direct: {"type": "step", "payload": {"tool": "...", "args": {...}}}
if isinstance(data, dict):
msg_type = data.get("type", "")
# Single step format: {"type": "step", "payload": {"tool": ..., "args": ...}}
if msg_type == "step":
payload = data.get("payload", {})
step = {
"id": "step-0",
"kind": "tool",
"tool": payload.get("tool"),
"args": payload.get("args", {}),
"description": payload.get("description", ""),
"depends_on": payload.get("depends_on", []),
}
data = [step]
# Plan format: {"type": "plan", "payload": {"steps": [...]}}
elif msg_type == "plan":
payload = data.get("payload", {})
steps_data = payload.get("steps", [])
# Normalize steps: handle {"type": "step", "payload": {"tool": ...}}
normalized = []
for step in steps_data:
if isinstance(step, dict) and step.get("type") == "step":
inner = step.get("payload", {})
normalized.append({
"tool": inner.get("tool"),
"args": inner.get("args", {}),
"description": inner.get("description", ""),
"depends_on": inner.get("depends_on", []),
})
else:
normalized.append(step)
steps_data = normalized
data = steps_data if steps_data else []
# Old format compatibility
elif "steps" in data:
data = data["steps"]
elif "plan" in data:
data = data["plan"]
else:
data = [data]
elif isinstance(data, str):
data = json.loads(data)
if isinstance(data, dict):
data = [data]
steps = []
for i, step_data in enumerate(data):
if isinstance(step_data, str):
step_data = {"id": f"step-{i}", "kind": "respond", "text": step_data}
if not isinstance(step_data, dict):
continue
step_data.setdefault("id", f"step-{i}")
# Tool-first: scheduler получает tool напрямую, без трансформаций
# kind определяется по наличию tool name
# args передаются напрямую
if step_data.get("tool"):
step_data["kind"] = "tool"
step_data.setdefault("kind", step_data.get("kind", "respond"))
step_data.setdefault("tool", step_data.get("tool"))
step_data.setdefault("args", step_data.get("args", {}))
step_data.setdefault("description", step_data.get("description", ""))
step_data.setdefault("requires_confirmation", False)
step_data.setdefault("depends_on", [])
if "description" not in step_data:
step_data["description"] = f"Step {i}"
steps.append(PlanStep(**step_data))
return steps
except (json.JSONDecodeError, ValueError, TypeError) as e:
logger.warning(f"Plan parsing failed: {e}")
return []
def validate_no_cycles(self, steps: list[PlanStep]) -> bool:
if not steps:
return True
graph: dict[str, set[str]] = {}
for step in steps:
graph[step.id] = set(step.depends_on)
visited: set[str] = set()
rec_stack: set[str] = set()
def has_cycle(node: str) -> bool:
if node in rec_stack:
return True
if node in visited:
return False
visited.add(node)
rec_stack.add(node)
for dep in graph.get(node, []):
if has_cycle(dep):
return True
rec_stack.remove(node)
return False
for step in steps:
if step.id not in visited:
if has_cycle(step.id):
logger.warning(f"Cycle detected in plan: {step.id}")
return False
return True
def build_task_graph(
self,
steps: list[PlanStep],
) -> dict[str, Any]:
if not steps:
return {"nodes": [], "edges": []}
if not self.validate_no_cycles(steps):
return {"nodes": [], "edges": [], "error": "Cycle detected in plan"}
nodes = []
edges = []
step_map = {s.id: s for s in steps}
for step in steps:
nodes.append({
"id": step.id,
"kind": step.kind,
"tool": step.tool,
"args": step.args,
"ready": len(step.depends_on) == 0,
})
for dep_id in step.depends_on:
edges.append({
"from": dep_id,
"to": step.id,
})
return {"nodes": nodes, "edges": edges, "step_map": step_map}
def get_ready_steps(
self,
graph: dict[str, Any],
completed: set[str],
) -> list[PlanStep]:
if not graph or not graph.get("nodes"):
return []
step_map: dict[str, PlanStep] = graph.get("step_map", {})
ready = []
for node in graph["nodes"]:
node_id = node["id"]
if node_id in completed:
continue
deps = node.get("depends_on", [])
if all(dep in completed for dep in deps):
step = step_map.get(node_id)
if step:
ready.append(step)
return ready
def next_directive(
self,
directive: ExecutionDirective,
) -> ExecutionDirective:
return directive

106
app/core/intent_parser.py Normal file
View File

@ -0,0 +1,106 @@
from __future__ import annotations
import re
from typing import Any
from app.core.contracts import ExecutionDirective
SHELL_PREFIXES = (
"run ",
"execute ",
"launch ",
"запусти ",
"выполни ",
"выполнить ",
)
MEMORY_STORE_PATTERNS = (
r"запомни\s+(.+)",
r"сохрани\s+(.+)",
r"запиши\s+(.+)",
r"remember\s+(.+)",
r"save\s+(.+)",
)
MEMORY_SEARCH_PATTERNS = (
r"найди\s+(.+)",
r"вспомни\s+(.+)",
r"search\s+(.+)",
r"find\s+(.+)",
)
class IntentParser:
"""Extracts explicit tool intents from natural-language task text."""
def __init__(self) -> None:
self._store_patterns = [re.compile(p, re.IGNORECASE) for p in MEMORY_STORE_PATTERNS]
self._search_patterns = [re.compile(p, re.IGNORECASE) for p in MEMORY_SEARCH_PATTERNS]
def parse(self, task_input: str) -> ExecutionDirective | None:
normalized = task_input.strip()
lowered = normalized.lower()
if matched := self._match_patterns(self._store_patterns, normalized):
return ExecutionDirective(
type="tool",
payload={
"tool": "memory_insert",
"args": {
"text": matched.group(1).strip(),
"kind": "fact",
"source": "user",
},
},
requires_permission=False,
confidence=0.85,
reason="User explicitly requested to store in memory.",
)
if matched := self._match_patterns(self._search_patterns, normalized):
return ExecutionDirective(
type="tool",
payload={
"tool": "memory_search",
"args": {"query": matched.group(1).strip()},
},
requires_permission=False,
confidence=0.85,
reason="User explicitly requested to search memory.",
)
for prefix in SHELL_PREFIXES:
if lowered.startswith(prefix):
command = normalized[len(prefix) :].strip()
if command:
return ExecutionDirective(
type="tool",
payload={
"tool": "shell_exec",
"args": {"command": command},
},
requires_permission=True,
confidence=0.92,
reason="Natural-language task explicitly requested shell execution.",
)
quoted = re.match(r"^`(.+)`$", normalized)
if quoted:
return ExecutionDirective(
type="tool",
payload={
"tool": "shell_exec",
"args": {"command": quoted.group(1)},
},
requires_permission=True,
confidence=0.75,
reason="Backticked input treated as direct shell command.",
)
return None
def _match_patterns(self, patterns: list[re.Pattern], text: str):
for pattern in patterns:
if match := pattern.match(text):
return match
return None

View File

@ -0,0 +1,18 @@
from __future__ import annotations
from pydantic import BaseModel
class PermissionResolutionRequest(BaseModel):
task_id: str
decision: str
class SecretResolutionRequest(BaseModel):
task_id: str
secret: str
class PasswordResolutionRequest(BaseModel):
task_id: str
password: str

View File

@ -0,0 +1,341 @@
from __future__ import annotations
import hashlib
import json
import logging
import os
import re
import shlex
from pathlib import Path
from typing import Any
logger = logging.getLogger(__name__)
class PermissionService:
"""Permission-first model - user is the authority."""
def __init__(self, config: dict[str, Any] | None = None, cache_file: Path | None = None):
self._config = config or self._load_config()
self._settings = self._config.get("settings", {})
self._cache_file = cache_file
self._categories = self._config.get("command_categories", {})
self._path_settings = self._config.get("path_settings", {})
self._legacy_dangerous_commands = self._config.get("dangerous_commands", {})
self._legacy_sensitive_paths = self._config.get("sensitive_paths", [])
def _load_config(self) -> dict[str, Any]:
try:
config_path = Path(__file__).parents[2] / "config" / "permissions.json"
with open(config_path) as f:
return json.load(f)
except Exception as e:
logger.warning(f"Failed to load permissions config: {e}")
return {"settings": {}, "command_categories": {}}
def _get_cache_file(self) -> Path:
if self._cache_file:
return self._cache_file
base_dir = Path(__file__).parents[2]
cache_relative = self._settings.get("cache_file", "data/runtime/allowed_commands.json")
return base_dir / cache_relative
def _load_cache(self) -> dict[str, Any]:
cache_file = self._get_cache_file()
try:
if cache_file.exists():
with open(cache_file) as f:
return json.load(f)
except Exception as e:
logger.warning(f"Failed to load cache: {e}")
return {"allowed_once": {}, "allowed_always": {}}
def _save_cache(self, cache: dict[str, Any]) -> None:
cache_file = self._get_cache_file()
cache_file.parent.mkdir(parents=True, exist_ok=True)
with open(cache_file, "w") as f:
json.dump(cache, f, indent=2)
def check_shell_command(
self,
task_id: str,
session_id: str,
command: str,
) -> dict[str, Any]:
"""Check if shell command requires permission."""
normalized = self._normalize_command(command)
command_hash = self._hash_command(normalized)
cache = self._load_cache()
# Check cache first
if command_hash in cache.get("allowed_always", {}):
return {
"decision": "allowed_always",
"command": normalized,
"cached": True,
}
if command_hash in cache.get("allowed_once", {}):
cached = cache["allowed_once"][command_hash]
if cached.get("task_id") == task_id:
return {
"decision": "allowed_once",
"command": normalized,
"cached": True,
}
# Check hard stop
if self._is_hard_stop(normalized):
return {
"decision": "hard_stop",
"command": normalized,
"reason": "Hard stop command - execution denied",
}
if not self._categories and self._legacy_dangerous_commands:
if self._matches_legacy_dangerous(normalized):
return {
"decision": "prompt",
"command": normalized,
"category": "legacy_dangerous",
"allow_always": False,
"task_id": task_id,
"session_id": session_id,
}
return {
"decision": "allowed",
"command": normalized,
"category": "legacy_safe",
"task_id": task_id,
"session_id": session_id,
}
# Check no_always category
category = self._get_category(normalized)
can_always = self._categories.get(category, {}).get("allow_always", True)
# Need user confirmation
return {
"decision": "prompt",
"command": normalized,
"category": category,
"allow_always": can_always,
"task_id": task_id,
"session_id": session_id,
}
def check_write_path(
self,
task_id: str,
session_id: str,
path: str,
) -> dict[str, Any]:
"""Check if write path requires permission."""
if not self._path_settings and self._legacy_sensitive_paths:
if any(path.startswith(sensitive) for sensitive in self._legacy_sensitive_paths):
return {
"decision": "prompt",
"path": path,
"task_id": task_id,
"session_id": session_id,
}
return {"decision": "allowed", "path": path}
allow_write_paths = self._path_settings.get("allow_write_paths", [])
# Check if path is in allowed list
for allowed in allow_write_paths:
if path.startswith(allowed):
return {"decision": "allowed", "path": path}
# Otherwise require permission
return {
"decision": "prompt",
"path": path,
"task_id": task_id,
"session_id": session_id,
}
def resolve_permission(
self,
task_id: str,
session_id: str,
command: str,
decision: str,
) -> dict[str, Any]:
"""Resolve permission decision from user."""
normalized = self._normalize_command(command)
command_hash = self._hash_command(normalized)
cache = self._load_cache()
if decision == "allow_once":
cache.setdefault("allowed_once", {})[command_hash] = {
"command": normalized,
"task_id": task_id,
"session_id": session_id,
}
self._save_cache(cache)
return {"status": "allowed_once", "command": normalized}
elif decision == "allow_always":
cache.setdefault("allowed_always", {})[command_hash] = {
"command": normalized,
"task_id": task_id,
"session_id": session_id,
}
self._save_cache(cache)
return {"status": "allowed_always", "command": normalized}
elif decision == "deny":
return {"status": "denied", "command": normalized}
return {"status": "unknown", "decision": decision}
def clear_cache(self) -> dict[str, Any]:
"""Clear permission cache."""
cache = {"allowed_once": {}, "allowed_always": {}}
self._save_cache(cache)
return {"status": "cache_cleared"}
def _normalize_command(self, command: str) -> str:
"""Normalize command for consistent hashing."""
if not self._settings.get("normalize_commands", True):
return command.strip()
normalized = command.strip()
# Split chained commands if enabled
if self._settings.get("split_chained", True):
# Replace ; and || with && for splitting
normalized = normalized.replace(";", " && ")
normalized = normalized.replace("||", " && ")
# Resolve environment variables
try:
normalized = os.path.expandvars(normalized)
except:
pass
# Resolve home directory
normalized = normalized.replace("~", os.path.expanduser("~"))
# Remove extra whitespace
normalized = " ".join(normalized.split())
return normalized
def _hash_command(self, command: str) -> str:
"""Generate hash for command."""
return hashlib.sha256(command.encode()).hexdigest()[:16]
def _matches_legacy_dangerous(self, command: str) -> bool:
cmd_lower = command.lower()
for pattern in self._legacy_dangerous_commands:
if pattern.lower() in cmd_lower:
return True
return False
def _is_hard_stop(self, command: str) -> bool:
"""Check if command is hard stop."""
hard_stop_commands = self._categories.get("hard_stop", {}).get("commands", [])
cmd_lower = command.lower()
for hs in hard_stop_commands:
if hs.lower() in cmd_lower:
return True
return False
def _get_category(self, command: str) -> str:
"""Get command category."""
cmd_lower = command.lower()
# Check no_always category
no_always = self._categories.get("no_always", {}).get("commands", [])
for cmd in no_always:
if cmd in cmd_lower:
return "no_always"
# Default to normal
return "normal"
SUDO_COMMANDS = {
"apt", "apt-get", "dpkg", "yum", "dnf", "pacman", "zypper",
"systemctl", "service", "mount", "umount",
"shutdown", "reboot", "halt", "poweroff",
"useradd", "usermod", "userdel", "groupadd", "groupmod",
"chmod", "chown", "chgrp",
"iptables", "ufw",
"kill", "killall", "pkill",
}
def _requires_sudo(command: str) -> bool:
"""Check if command requires sudo."""
if not command:
return False
cmd_lower = command.lower().strip()
first_word = cmd_lower.split()[0] if cmd_lower.split() else ""
return first_word in SUDO_COMMANDS
class PermissionRequest:
"""Permission request to user."""
def __init__(
self,
task_id: str,
session_id: str,
command: str,
category: str = "normal",
allow_always: bool = True,
) -> None:
self.task_id = task_id
self.session_id = session_id
self.command = command
self.category = category
self.allow_always = allow_always
self.requires_password = _requires_sudo(command)
def to_dict(self) -> dict[str, Any]:
return {
"task_id": self.task_id,
"session_id": self.session_id,
"command": self.command,
"category": self.category,
"allow_always": self.allow_always,
"requires_password": self.requires_password,
"buttons": self._get_buttons(),
}
def _get_buttons(self) -> list[dict[str, str]]:
buttons = [{"action": "deny", "label": "Запретить"}]
if self.allow_always:
buttons.insert(0, {"action": "allow_always", "label": "Разрешить навсегда"})
if self.requires_password:
buttons.insert(0, {"action": "allow_with_password", "label": "Разрешить с паролем"})
else:
buttons.insert(0, {"action": "allow_once", "label": "Разрешить"})
return buttons
class PermissionDecision:
"""Permission decision."""
def __init__(
self,
decision: str,
command: str | None = None,
cached: bool = False,
) -> None:
self.decision = decision
self.command = command
self.cached = cached

2
app/events/__init__.py Normal file
View File

@ -0,0 +1,2 @@
"""Event bus and event store."""

32
app/events/event_bus.py Normal file
View File

@ -0,0 +1,32 @@
from __future__ import annotations
from typing import Callable
from app.core.contracts import RuntimeEvent
from app.events.event_store import SQLiteEventStore
Subscriber = Callable[[RuntimeEvent], None]
class EventBus:
"""Per-task ordered event publishing with durable storage."""
def __init__(self, event_store: SQLiteEventStore) -> None:
self._store = event_store
self._subscribers: list[Subscriber] = []
def next_sequence(self, task_id: str) -> int:
return self._store.get_latest_sequence(task_id) + 1
def publish(self, event: RuntimeEvent) -> RuntimeEvent:
self._store.append(event)
for subscriber in self._subscribers:
subscriber(event)
return event
def subscribe(self, subscriber: Subscriber) -> None:
self._subscribers.append(subscriber)
def list_for_task(self, task_id: str) -> list[RuntimeEvent]:
return self._store.list_for_task(task_id)

94
app/events/event_store.py Normal file
View File

@ -0,0 +1,94 @@
from __future__ import annotations
import json
import sqlite3
from pathlib import Path
from app.core.contracts import RuntimeEvent
class SQLiteEventStore:
"""Append-only event store with per-task ordered history."""
def __init__(self, db_path: str | Path) -> None:
self._db_path = Path(db_path)
self._db_path.parent.mkdir(parents=True, exist_ok=True)
self._initialize()
def append(self, event: RuntimeEvent) -> None:
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
INSERT INTO events (
event_id, task_id, session_id, sequence, type, timestamp,
payload_json, causation_id, correlation_id
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
event.event_id,
event.task_id,
event.session_id,
event.sequence,
event.type,
event.timestamp.isoformat(),
json.dumps(event.payload),
event.causation_id,
event.correlation_id,
),
)
conn.commit()
def list_for_task(self, task_id: str) -> list[RuntimeEvent]:
with sqlite3.connect(self._db_path) as conn:
rows = conn.execute(
"""
SELECT event_id, task_id, session_id, sequence, type, timestamp,
payload_json, causation_id, correlation_id
FROM events
WHERE task_id = ?
ORDER BY sequence ASC
""",
(task_id,),
).fetchall()
return [
RuntimeEvent(
event_id=row[0],
task_id=row[1],
session_id=row[2],
sequence=row[3],
type=row[4],
timestamp=row[5],
payload=json.loads(row[6]),
causation_id=row[7],
correlation_id=row[8],
)
for row in rows
]
def get_latest_sequence(self, task_id: str) -> int:
with sqlite3.connect(self._db_path) as conn:
row = conn.execute(
"SELECT COALESCE(MAX(sequence), 0) FROM events WHERE task_id = ?",
(task_id,),
).fetchone()
return int(row[0]) if row else 0
def _initialize(self) -> None:
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
CREATE TABLE IF NOT EXISTS events (
event_id TEXT PRIMARY KEY,
task_id TEXT NOT NULL,
session_id TEXT NOT NULL,
sequence INTEGER NOT NULL,
type TEXT NOT NULL,
timestamp TEXT NOT NULL,
payload_json TEXT NOT NULL,
causation_id TEXT,
correlation_id TEXT NOT NULL,
UNIQUE(task_id, sequence)
)
"""
)
conn.commit()

31
app/events/event_types.py Normal file
View File

@ -0,0 +1,31 @@
TASK_RECEIVED = "task_received"
CONTEXT_BUILT = "context_built"
STEP_STARTED = "step_started"
TOOL_CALLED = "tool_called"
TOOL_COMPLETED = "tool_completed"
PERMISSION_REQUESTED = "permission_requested"
PERMISSION_RESOLVED = "permission_resolved"
TASK_AWAITING_PERMISSION = "task_awaiting_permission"
SECRET_REQUESTED = "secret_requested"
TASK_AWAITING_INPUT = "task_awaiting_input"
CHECKPOINT_SAVED = "checkpoint_saved"
TASK_COMPLETED = "task_completed"
TASK_FAILED = "task_failed"
ORCHESTRATOR_CALLED = "orchestrator_called"
ORCHESTRATOR_RESULT = "orchestrator_result"
ORCHESTRATOR_UNAVAILABLE = "orchestrator_unavailable"
ORCHESTRATOR_FALLBACK_USED = "orchestrator_fallback_used"
ORCHESTRATOR_RETRY = "orchestrator_retry"
PLANNER_CALLED = "planner_called"
PLANNER_RETRY = "planner_retry"
CRITIC_CALLED = "critic_called"
CRITIC_RESULT = "critic_result"
MEMORY_WRITE_DECIDED = "memory_write_decided"
PLAN_STARTED = "plan_started"
PLAN_FAILED = "plan_failed"
PLAN_COMPLETED = "plan_completed"
STEPPED_COMPLETED = "step_completed"
THINKER_CALLED = "thinker_called"
THINKER_RESULT = "thinker_result"
JSON_COMPILER_CALLED = "json_compiler_called"
JSON_COMPILER_RESULT = "json_compiler_result"

24
app/memory/__init__.py Normal file
View File

@ -0,0 +1,24 @@
MEMORY_AVAILABLE = False
VECTOR_AVAILABLE = False
try:
from app.memory.store import MemoryStore
from app.memory.vector_index import VectorIndex
from app.memory.interface import MemoryInterface
from app.memory.write_policy import MemoryWritePolicy
MEMORY_AVAILABLE = True
VECTOR_AVAILABLE = True
except ImportError:
MemoryStore = None
VectorIndex = None
MemoryInterface = None
MemoryWritePolicy = None
__all__ = [
"MemoryStore",
"VectorIndex",
"MemoryInterface",
"MemoryWritePolicy",
"MEMORY_AVAILABLE",
"VECTOR_AVAILABLE",
]

145
app/memory/interface.py Normal file
View File

@ -0,0 +1,145 @@
from __future__ import annotations
import json
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Literal
import numpy as np
from app.core.contracts import MemoryEntry
from app.memory.store import MemoryStore
from app.memory.vector_index import VectorIndex
from app.models.embeddings import EmbeddingsAdapter
class MemoryInterface:
def __init__(
self,
store: MemoryStore,
vector_index: VectorIndex,
embeddings: EmbeddingsAdapter,
) -> None:
self._store = store
self._vector_index = vector_index
self._embeddings = embeddings
def insert(
self,
text: str,
kind: Literal["tool_result", "plan", "critique", "fact", "summary", "user_preference"],
source: Literal["tool", "critic", "user", "system"],
task_id: str | None = None,
session_id: str | None = None,
weight: float = 0.5,
metadata: dict[str, Any] | None = None,
) -> MemoryEntry:
entry = MemoryEntry(
text=text,
kind=kind,
source=source,
weight=weight,
task_id=task_id,
session_id=session_id,
metadata=metadata or {},
embedding_model=self._embeddings.__class__.__name__,
embedding_dim=self._embeddings.embedding_dim,
)
embedding = self._embeddings.encode(text)
embedding_bytes = embedding.astype("float32").tobytes()
self._store.insert(entry, embedding_bytes)
self._vector_index.insert(entry.id, embedding)
self._vector_index.save()
self.cleanup()
return entry
def search(
self,
query: str,
top_k: int = 5,
kind: str | None = None,
session_id: str | None = None,
) -> list[tuple[MemoryEntry, float]]:
query_embedding = self._embeddings.encode(query)
memory_ids, scores = self._vector_index.search(query_embedding, k=top_k)
results: list[tuple[MemoryEntry, float]] = []
for memory_id, score in zip(memory_ids, scores):
entry = self._store.get(memory_id)
if entry:
if kind and entry.kind != kind:
continue
if session_id and entry.session_id != session_id:
continue
results.append((entry, score))
return results[:top_k]
def get(self, memory_id: str) -> MemoryEntry | None:
return self._store.get(memory_id)
def delete(self, memory_id: str) -> bool:
entry = self._store.get(memory_id)
if entry:
self._vector_index.delete(memory_id)
return self._store.delete(memory_id)
return False
def get_by_task(self, task_id: str) -> list[MemoryEntry]:
return self._store.get_by_task(task_id)
def get_by_session(self, session_id: str, limit: int = 100) -> list[MemoryEntry]:
return self._store.get_by_session(session_id, limit)
def get_recent(self, limit: int = 10) -> list[MemoryEntry]:
return self._store.get_all(limit)
def count(self) -> int:
return self._store.count()
def reindex(self) -> None:
entries = self._store.get_all(limit=10000)
self._vector_index.save()
for entry in entries:
text = entry.text
embedding = self._embeddings.encode(text)
self._vector_index.insert(entry.id, embedding)
self._vector_index.save()
def close(self) -> None:
self._store.close()
def cleanup(self, max_items: int = 750, decay_factor: float = 0.95) -> int:
"""Remove low-weight entries when exceeding max_items limit.
Applies weight decay based on freshness before cleanup.
Returns number of removed entries.
"""
current_count = self._store.count()
if current_count <= max_items:
return 0
removed = 0
entries_to_remove = current_count - max_items
all_entries = self._store.get_all(limit=current_count)
def effective_weight(entry: MemoryEntry) -> float:
entry_weight = entry.weight
if entry.created_at:
age_days = (datetime.now(timezone.utc) - entry.created_at).total_seconds() / 86400
freshness_factor = max(0.1, decay_factor ** age_days)
return entry_weight * freshness_factor
return entry_weight
sorted_entries = sorted(all_entries, key=effective_weight)
for entry in sorted_entries[:entries_to_remove]:
self._store.delete(entry.id)
removed += 1
return removed

185
app/memory/store.py Normal file
View File

@ -0,0 +1,185 @@
from __future__ import annotations
import json
import sqlite3
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Sequence
from uuid import uuid4
from app.core.contracts import MemoryEntry
def utc_now() -> datetime:
return datetime.now(timezone.utc)
class MemoryStore:
def __init__(self, db_path: str | Path) -> None:
self._db_path = Path(db_path)
self._db_path.parent.mkdir(parents=True, exist_ok=True)
self._conn = sqlite3.connect(str(self._db_path), check_same_thread=False)
self._conn.row_factory = sqlite3.Row
self._init_tables()
def _init_tables(self) -> None:
self._conn.executescript("""
CREATE TABLE IF NOT EXISTS memory_items (
id TEXT PRIMARY KEY,
text TEXT NOT NULL,
kind TEXT NOT NULL,
source TEXT NOT NULL,
weight REAL NOT NULL DEFAULT 0.5,
task_id TEXT,
session_id TEXT,
metadata_json TEXT,
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS memory_embeddings (
memory_id TEXT PRIMARY KEY,
embedding BLOB NOT NULL,
embedding_model TEXT NOT NULL,
embedding_dim INTEGER NOT NULL,
created_at TEXT NOT NULL,
FOREIGN KEY (memory_id) REFERENCES memory_items(id) ON DELETE CASCADE
);
CREATE INDEX IF NOT EXISTS idx_memory_items_task ON memory_items(task_id);
CREATE INDEX IF NOT EXISTS idx_memory_items_session ON memory_items(session_id);
CREATE INDEX IF NOT EXISTS idx_memory_items_kind ON memory_items(kind);
CREATE INDEX IF NOT EXISTS idx_memory_embeddings_model ON memory_embeddings(embedding_model);
""")
self._conn.commit()
def insert(self, entry: MemoryEntry, embedding: bytes) -> None:
cursor = self._conn.cursor()
cursor.execute(
"""
INSERT INTO memory_items (id, text, kind, source, weight, task_id, session_id, metadata_json, created_at, updated_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
entry.id,
entry.text,
entry.kind,
entry.source,
entry.weight,
entry.task_id,
entry.session_id,
json.dumps(entry.metadata) if entry.metadata else None,
entry.created_at.isoformat(),
utc_now().isoformat(),
),
)
cursor.execute(
"""
INSERT INTO memory_embeddings (memory_id, embedding, embedding_model, embedding_dim, created_at)
VALUES (?, ?, ?, ?, ?)
""",
(
entry.id,
embedding,
entry.embedding_model,
entry.embedding_dim,
utc_now().isoformat(),
),
)
self._conn.commit()
def get(self, memory_id: str) -> MemoryEntry | None:
cursor = self._conn.cursor()
row = cursor.execute(
"SELECT * FROM memory_items WHERE id = ?", (memory_id,)
).fetchone()
if not row:
return None
return self._row_to_entry(row)
def get_embedding(self, memory_id: str) -> bytes | None:
cursor = self._conn.cursor()
row = cursor.execute(
"SELECT embedding FROM memory_embeddings WHERE memory_id = ?", (memory_id,)
).fetchone()
return bytes(row["embedding"]) if row else None
def get_all(self, limit: int = 1000) -> list[MemoryEntry]:
cursor = self._conn.cursor()
rows = cursor.execute(
"SELECT * FROM memory_items ORDER BY created_at DESC LIMIT ?", (limit,)
).fetchall()
return [self._row_to_entry(row) for row in rows]
def get_by_task(self, task_id: str) -> list[MemoryEntry]:
cursor = self._conn.cursor()
rows = cursor.execute(
"SELECT * FROM memory_items WHERE task_id = ? ORDER BY created_at DESC", (task_id,)
).fetchall()
return [self._row_to_entry(row) for row in rows]
def get_by_session(self, session_id: str, limit: int = 100) -> list[MemoryEntry]:
cursor = self._conn.cursor()
rows = cursor.execute(
"SELECT * FROM memory_items WHERE session_id = ? ORDER BY created_at DESC LIMIT ?",
(session_id, limit),
).fetchall()
return [self._row_to_entry(row) for row in rows]
def get_by_kind(self, kind: str, limit: int = 100) -> list[MemoryEntry]:
cursor = self._conn.cursor()
rows = cursor.execute(
"SELECT * FROM memory_items WHERE kind = ? ORDER BY created_at DESC LIMIT ?", (kind, limit)
).fetchall()
return [self._row_to_entry(row) for row in rows]
def delete(self, memory_id: str) -> bool:
cursor = self._conn.cursor()
cursor.execute("DELETE FROM memory_embeddings WHERE memory_id = ?", (memory_id,))
cursor.execute("DELETE FROM memory_items WHERE id = ?", (memory_id,))
self._conn.commit()
return cursor.rowcount > 0
def update_weight(self, memory_id: str, weight: float) -> bool:
cursor = self._conn.cursor()
cursor.execute(
"UPDATE memory_items SET weight = ?, updated_at = ? WHERE id = ?",
(weight, utc_now().isoformat(), memory_id),
)
self._conn.commit()
return cursor.rowcount > 0
def search_text(self, query: str, limit: int = 10) -> list[MemoryEntry]:
cursor = self._conn.cursor()
rows = cursor.execute(
"SELECT * FROM memory_items WHERE text LIKE ? ORDER BY created_at DESC LIMIT ?",
(f"%{query}%", limit),
).fetchall()
return [self._row_to_entry(row) for row in rows]
def count(self) -> int:
cursor = self._conn.cursor()
row = cursor.execute("SELECT COUNT(*) FROM memory_items").fetchone()
return row[0] if row else 0
def close(self) -> None:
self._conn.close()
def _row_to_entry(self, row: sqlite3.Row) -> MemoryEntry:
metadata = {}
if row["metadata_json"]:
import json
metadata = json.loads(row["metadata_json"])
return MemoryEntry(
id=row["id"],
text=row["text"],
kind=row["kind"],
source=row["source"],
weight=row["weight"],
task_id=row["task_id"],
session_id=row["session_id"],
metadata=metadata,
created_at=datetime.fromisoformat(row["created_at"]),
embedding_model="",
embedding_dim=0,
)

149
app/memory/vector_index.py Normal file
View File

@ -0,0 +1,149 @@
from __future__ import annotations
import logging
import numpy as np
import hnswlib
from pathlib import Path
from typing import Any
logger = logging.getLogger(__name__)
class VectorIndex:
def __init__(
self,
index_path: str | Path | None = None,
embedding_dim: int = 384,
max_elements: int = 10000,
) -> None:
self._embedding_dim = embedding_dim
self._index_path = Path(index_path) if index_path else None
self._index: hnswlib.Index | None = None
self._max_elements = max_elements
self._loading = False # Prevent recursion
self._init_index()
def _init_index(self) -> None:
if self._loading:
return
self._loading = True
try:
if self._index_path and self._index_path.exists():
self._load()
else:
self._index = hnswlib.Index(
space="l2",
dim=self._embedding_dim,
)
self._index.init_index(
max_elements=self._max_elements,
ef_construction=200,
M=16,
)
except Exception as e:
logger.warning(f"VectorIndex init failed: {e}")
self._index = hnswlib.Index(
space="l2",
dim=self._embedding_dim,
)
self._index.init_index(
max_elements=self._max_elements,
ef_construction=100,
M=16,
)
finally:
self._loading = False
def insert(self, memory_id: str, embedding: np.ndarray) -> None:
if self._index is None:
self._init_index()
if self._index is None:
return
try:
vector = self._normalize(embedding)
internal_id = self._get_internal_id(memory_id)
self._index.add_items(vector, ids=np.array([internal_id]))
except Exception as e:
logger.warning(f"VectorIndex insert failed: {e}")
def search(
self,
query_embedding: np.ndarray,
k: int = 5,
) -> tuple[list[str], list[float]]:
if self._index is None:
return [], []
try:
if self._index.get_current_count() == 0:
return [], []
# Set ef to at least k for proper search
self._index.set_ef(max(k * 2, 50))
vector = self._normalize(query_embedding)
labels, distances = self._index.knn_query(vector, k=k)
memory_ids = [self._get_memory_id(int(label)) for label in labels[0]]
scores = [1.0 - dist for dist in distances[0]]
return memory_ids, scores
except Exception as e:
logger.warning(f"VectorIndex search failed: {e}")
return [], []
def delete(self, memory_id: str) -> bool:
return False
def get_items(self, memory_ids: list[str]) -> np.ndarray:
if self._index is None:
raise RuntimeError("Index not initialized")
internal_ids = [self._get_internal_id(mid) for mid in memory_ids]
return self._index.get_items(np.array(internal_ids))
def save(self) -> None:
if self._index and self._index_path:
try:
self._index_path.parent.mkdir(parents=True, exist_ok=True)
self._index.save_index(str(self._index_path))
except Exception as e:
logger.warning(f"VectorIndex save failed: {e}")
def _load(self) -> None:
if self._loading:
return
self._loading = True
try:
if self._index_path and self._index_path.exists():
self._index = hnswlib.Index(space="l2", dim=self._embedding_dim)
self._index.load_index(
str(self._index_path),
max_elements=self._max_elements
)
except Exception as e:
logger.warning(f"VectorIndex load failed: {e}")
self._init_index()
finally:
self._loading = False
def _normalize(self, vector: np.ndarray) -> np.ndarray:
vec = vector.flatten()
norm = np.linalg.norm(vec)
if norm > 0:
vec = vec / norm
return vec.reshape(1, -1)
def _get_internal_id(self, memory_id: str) -> int:
return hash(memory_id) % (2**31)
def _get_memory_id(self, internal_id: int) -> str:
return str(internal_id)
@property
def embedding_dim(self) -> int:
return self._embedding_dim
@property
def element_count(self) -> int:
return self._index.get_current_count() if self._index else 0

View File

@ -0,0 +1,98 @@
from __future__ import annotations
from typing import Any, Literal
from app.core.contracts import CriticScore, MemoryEntry
class MemoryWritePolicy:
def __init__(
self,
store_threshold: float = 0.7,
min_usefulness: float = 0.3,
max_entries_per_session: int = 50,
) -> None:
self._store_threshold = store_threshold
self._min_usefulness = min_usefulness
self._max_entries_per_session = max_entries_per_session
def decide(
self,
critic_score: CriticScore,
memory_type: MemoryEntry.Kind,
session_id: str | None = None,
has_duplicate: bool = False,
current_session_count: int = 0,
) -> Literal["store", "store_with_weight", "skip", "merge"]:
if critic_score.safety < 0.5:
return "skip"
if has_duplicate:
return "merge"
if not critic_score.memory_store:
return "skip"
if critic_score.usefulness < self._min_usefulness:
return "skip"
if session_id and current_session_count >= self._max_entries_per_session:
return "skip"
base_decision = self._evaluate_scores(critic_score, memory_type)
if base_decision == "store" and critic_score.weight < self._store_threshold:
adjusted_weight = self._adjust_weight(critic_score, memory_type)
if adjusted_weight >= self._store_threshold:
return "store_with_weight"
return base_decision
return base_decision
def _evaluate_scores(
self,
critic_score: CriticScore,
memory_type: MemoryEntry.Kind,
) -> Literal["store", "store_with_weight", "skip", "merge"]:
avg_score = (critic_score.correctness + critic_score.usefulness + critic_score.safety) / 3.0
if memory_type in ("fact", "plan", "summary"):
if avg_score >= 0.8:
return "store"
elif avg_score >= 0.6:
return "store_with_weight"
if memory_type in ("tool_result", "critique"):
if avg_score >= self._store_threshold:
return "store"
elif avg_score >= 0.5:
return "store_with_weight"
if memory_type == "user_preference":
if avg_score >= 0.5:
return "store"
return "skip"
def _adjust_weight(
self,
critic_score: CriticScore,
memory_type: MemoryEntry.Kind,
) -> float:
base_weight = critic_score.weight
type_boost = {
"fact": 0.15,
"plan": 0.1,
"summary": 0.1,
"user_preference": 0.2,
"tool_result": 0.05,
"critique": 0.05,
}.get(memory_type, 0.0)
safety_boost = 0.0
if critic_score.safety >= 0.9:
safety_boost = 0.1
adjusted = base_weight + type_boost + safety_boost
return min(adjusted, 1.0)

32
app/models/__init__.py Normal file
View File

@ -0,0 +1,32 @@
LLM_AVAILABLE = False
EMBEDDINGS_AVAILABLE = False
try:
from app.models.adapters import create_adapter, create_llama_adapter
from app.models.orchestrator import OrchestratorAdapter
from app.models.coder import CoderAdapter
from app.models.critic import CriticAdapter
LLM_AVAILABLE = True
except ImportError:
create_adapter = None
create_llama_adapter = None
OrchestratorAdapter = None
CoderAdapter = None
CriticAdapter = None
try:
from app.models.embeddings import EmbeddingsAdapter
EMBEDDINGS_AVAILABLE = True
except ImportError:
EmbeddingsAdapter = None
__all__ = [
"create_adapter",
"create_llama_adapter",
"OrchestratorAdapter",
"CoderAdapter",
"CriticAdapter",
"EmbeddingsAdapter",
"LLM_AVAILABLE",
"EMBEDDINGS_AVAILABLE",
]

72
app/models/adapters.py Normal file
View File

@ -0,0 +1,72 @@
from __future__ import annotations
from pathlib import Path
from typing import Any, Protocol, Iterator
import os
try:
from llama_cpp import Llama
LLAMA_AVAILABLE = True
except ImportError:
Llama = None
LLAMA_AVAILABLE = False
class BaseModelAdapter(Protocol):
async def generate(self, prompt: str, **kwargs: Any) -> str: ...
def stream(self, prompt: str, **kwargs: Any) -> Iterator[str]: ...
def create_llama_adapter(
model_path: str,
backend: str = "cpu",
n_gpu_layers: int = 0,
max_tokens: int = 2048,
temperature: float = 0.2,
base_dir: Path | None = None,
) -> "Llama":
if not LLAMA_AVAILABLE:
raise RuntimeError("llama-cpp-python not installed")
if base_dir:
model_path = str(base_dir / model_path)
else:
model_path = str(Path.cwd() / model_path)
return Llama(
model_path=model_path,
n_gpu_layers=n_gpu_layers,
n_ctx=4096,
n_threads=int(os.environ.get("DUCKLM_N_THREADS", max(4, min((os.cpu_count() or 4) // 2, 20)))),
n_threads_batch=-1,
max_tokens=max_tokens,
temperature=temperature,
verbose=False,
)
def create_adapter(
model_type: str,
config: dict[str, Any],
base_dir: Path | None = None,
) -> "Llama":
if not LLAMA_AVAILABLE:
raise RuntimeError("llama-cpp-python not installed")
model_path = config.get("path", "")
backend = config.get("backend", "cpu")
n_gpu_layers = config.get("n_gpu_layers", 0)
max_tokens = config.get("max_tokens", 2048)
temperature = config.get("temperature", 0.2)
if backend == "vulkan" and n_gpu_layers != 0:
n_gpu_layers = -1
return create_llama_adapter(
model_path=model_path,
backend=backend,
n_gpu_layers=n_gpu_layers,
max_tokens=max_tokens,
temperature=temperature,
base_dir=base_dir,
)

View File

@ -0,0 +1,58 @@
from __future__ import annotations
import asyncio
from typing import Any, AsyncIterator
from app.models.orchestrator import OrchestratorAdapter as SyncOrchestrator
class AsyncOrchestratorAdapter:
"""Async wrapper for orchestrator - runs in executor to avoid blocking event loop."""
def __init__(self, sync_adapter: SyncOrchestrator) -> None:
self._sync = sync_adapter
async def generate(self, prompt: str, max_tokens: int | None = None) -> str:
loop = asyncio.get_event_loop()
return await loop.run_in_executor(
None,
lambda: self._sync.generate(prompt, max_tokens)
)
async def stream(self, prompt: str, max_tokens: int | None = None) -> AsyncIterator[str]:
loop = asyncio.get_event_loop()
async def gen():
return list(self._sync.stream(prompt, max_tokens))
result = await loop.run_in_executor(None, gen)
for chunk in result:
yield chunk
class AsyncCoderAdapter:
"""Async wrapper for coder."""
def __init__(self, sync_adapter) -> None:
self._sync = sync_adapter
async def generate(self, prompt: str, max_tokens: int | None = None) -> str:
loop = asyncio.get_event_loop()
return await loop.run_in_executor(
None,
lambda: self._sync.generate(prompt, max_tokens)
)
class AsyncCriticAdapter:
"""Async wrapper for critic."""
def __init__(self, sync_adapter) -> None:
self._sync = sync_adapter
async def generate(self, prompt: str, max_tokens: int | None = None) -> str:
loop = asyncio.get_event_loop()
return await loop.run_in_executor(
None,
lambda: self._sync.generate(prompt, max_tokens)
)

44
app/models/coder.py Normal file
View File

@ -0,0 +1,44 @@
from __future__ import annotations
from threading import RLock
from typing import Any, Iterator
from llama_cpp import Llama
class CoderAdapter:
def __init__(self, llm: Llama, system_prompt: str | None = None, lock: RLock | None = None) -> None:
self._llm = llm
self._lock = lock or RLock()
self._system_prompt = system_prompt or (
"You are an expert code generation model."
)
self._temperature = 0.2
def generate(self, prompt: str, max_tokens: int | None = None) -> str:
messages = [
{"role": "system", "content": self._system_prompt},
{"role": "user", "content": prompt},
]
with self._lock:
output = self._llm.create_chat_completion(
messages=messages,
max_tokens=max_tokens or 1024,
temperature=self._temperature,
)
return output["choices"][0]["message"]["content"]
def stream(self, prompt: str, max_tokens: int | None = None) -> Iterator[str]:
messages = [
{"role": "system", "content": self._system_prompt},
{"role": "user", "content": prompt},
]
with self._lock:
for chunk in self._llm.create_chat_completion(
messages=messages,
max_tokens=max_tokens or 1024,
temperature=self._temperature,
stream=True,
):
content = chunk["choices"][0].get("delta", {}).get("content")
if content:
yield content

44
app/models/critic.py Normal file
View File

@ -0,0 +1,44 @@
from __future__ import annotations
from threading import RLock
from typing import Any, Iterator
from llama_cpp import Llama
class CriticAdapter:
def __init__(self, llm: Llama, system_prompt: str | None = None, lock: RLock | None = None) -> None:
self._llm = llm
self._lock = lock or RLock()
self._system_prompt = system_prompt or (
"You are a critic model. Evaluate tool results and respond with JSON."
)
self._temperature = 0.1
def generate(self, prompt: str, max_tokens: int | None = None) -> str:
messages = [
{"role": "system", "content": self._system_prompt},
{"role": "user", "content": prompt},
]
with self._lock:
output = self._llm.create_chat_completion(
messages=messages,
max_tokens=max_tokens or 512,
temperature=self._temperature,
)
return output["choices"][0]["message"]["content"]
def stream(self, prompt: str, max_tokens: int | None = None) -> Iterator[str]:
messages = [
{"role": "system", "content": self._system_prompt},
{"role": "user", "content": prompt},
]
with self._lock:
for chunk in self._llm.create_chat_completion(
messages=messages,
max_tokens=max_tokens or 512,
temperature=self._temperature,
stream=True,
):
content = chunk["choices"][0].get("delta", {}).get("content")
if content:
yield content

37
app/models/embeddings.py Normal file
View File

@ -0,0 +1,37 @@
from __future__ import annotations
from pathlib import Path
from typing import Any
import numpy as np
from sentence_transformers import SentenceTransformer
class EmbeddingsAdapter:
def __init__(
self,
model_path: str | Path | None = None,
model_name: str = "sentence-transformers/all-MiniLM-L6-v2",
embedding_dim: int = 384,
) -> None:
self._embedding_dim = embedding_dim
if model_path and Path(model_path).exists():
self._model = SentenceTransformer(str(model_path))
else:
self._model = SentenceTransformer(model_name)
def encode(self, texts: str | list[str]) -> np.ndarray:
is_single = isinstance(texts, str)
if is_single:
texts = [texts]
embeddings = self._model.encode(texts, convert_to_numpy=True)
if is_single:
return embeddings[0]
return embeddings
def encode_batch(self, texts: list[str], batch_size: int = 32) -> np.ndarray:
return self._model.encode(texts, batch_size=batch_size, convert_to_numpy=True)
@property
def embedding_dim(self) -> int:
return self._embedding_dim

View File

@ -0,0 +1,45 @@
from __future__ import annotations
from threading import RLock
from typing import Any, Iterator
from llama_cpp import Llama
class OrchestratorAdapter:
def __init__(self, llm: Llama, system_prompt: str | None = None, lock: RLock | None = None) -> None:
self._llm = llm
self._lock = lock or RLock()
self._system_prompt = system_prompt or (
"You are an expert orchestrator for a local AI agent system. "
"Your role is to analyze the user's task, decide whether planning is needed."
)
self._temperature = 0.2
def generate(self, prompt: str, max_tokens: int | None = None) -> str:
messages = [
{"role": "system", "content": self._system_prompt},
{"role": "user", "content": prompt},
]
with self._lock:
output = self._llm.create_chat_completion(
messages=messages,
max_tokens=max_tokens or 512,
temperature=self._temperature,
)
return output["choices"][0]["message"]["content"]
def stream(self, prompt: str, max_tokens: int | None = None) -> Iterator[str]:
messages = [
{"role": "system", "content": self._system_prompt},
{"role": "user", "content": prompt},
]
with self._lock:
for chunk in self._llm.create_chat_completion(
messages=messages,
max_tokens=max_tokens or 512,
temperature=self._temperature,
stream=True,
):
content = chunk["choices"][0].get("delta", {}).get("content")
if content:
yield content

View File

@ -0,0 +1,2 @@
"""Permission and approval handling."""

View File

@ -0,0 +1,67 @@
from __future__ import annotations
import sqlite3
from pathlib import Path
from app.core.contracts import PermissionDecision
class SQLiteApprovalStore:
"""Stores persistent user approval decisions."""
def __init__(self, db_path: str | Path) -> None:
self._db_path = Path(db_path)
self._db_path.parent.mkdir(parents=True, exist_ok=True)
self._initialize()
def save(self, decision: PermissionDecision) -> PermissionDecision:
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
INSERT OR REPLACE INTO approvals (action_type, pattern, decision, created_at)
VALUES (?, ?, ?, ?)
""",
(
decision.action_type,
decision.pattern,
decision.decision,
decision.created_at.isoformat(),
),
)
conn.commit()
return decision
def load(self, action_type: str, pattern: str) -> PermissionDecision | None:
with sqlite3.connect(self._db_path) as conn:
row = conn.execute(
"""
SELECT action_type, pattern, decision, created_at
FROM approvals
WHERE action_type = ? AND pattern = ?
""",
(action_type, pattern),
).fetchone()
if not row:
return None
return PermissionDecision(
action_type=row[0],
pattern=row[1],
decision=row[2],
created_at=row[3],
)
def _initialize(self) -> None:
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
CREATE TABLE IF NOT EXISTS approvals (
action_type TEXT NOT NULL,
pattern TEXT NOT NULL,
decision TEXT NOT NULL,
created_at TEXT NOT NULL,
PRIMARY KEY (action_type, pattern)
)
"""
)
conn.commit()

2
app/runtime/__init__.py Normal file
View File

@ -0,0 +1,2 @@
"""Runtime loop and execution coordination."""

View File

@ -0,0 +1,148 @@
from __future__ import annotations
import asyncio
from app.core.context_builder import ContextBuilder
from app.core.contracts import ExecutionDirective, PermissionDecision, PermissionRequest, RuntimeEvent, TaskCheckpoint, UserTask
from app.core.execution_engine import ExecutionEngine
from app.core.async_router import AsyncRouter
from app.events.event_bus import EventBus
from app.events.event_types import CHECKPOINT_SAVED, CONTEXT_BUILT, TASK_AWAITING_PERMISSION, TASK_COMPLETED, TASK_FAILED, TASK_RECEIVED
from app.core.permission_service import PermissionService
from app.state.checkpoint_store import SQLiteCheckpointStore
from app.state.task_state_store import SQLiteTaskStateStore
class AsyncRuntimeLoop:
"""Async runtime loop using LLM orchestrator."""
def __init__(
self,
event_bus: EventBus,
task_state_store: SQLiteTaskStateStore,
checkpoint_store: SQLiteCheckpointStore,
context_builder: ContextBuilder,
router: AsyncRouter,
execution_engine: ExecutionEngine,
permission_service: PermissionService,
memory_interface=None,
) -> None:
self._event_bus = event_bus
self._task_state_store = task_state_store
self._checkpoint_store = checkpoint_store
self._context_builder = context_builder
self._router = router
self._execution_engine = execution_engine
self._permission_service = permission_service
self._memory_interface = memory_interface
async def run_task(self, task: UserTask) -> dict[str, object]:
state = self._task_state_store.create_task(
task.task_id,
{
"status": "received",
"session_id": task.session_id,
"plan": None,
"task_input": task.input,
"task_context": task.context,
},
)
self._publish(task, TASK_RECEIVED, {"status": "received"})
checkpoint = TaskCheckpoint(task_id=task.task_id, status="received")
self._checkpoint_store.save(checkpoint)
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
context = self._context_builder.build(task=task, checkpoint=checkpoint)
self._publish(task, CONTEXT_BUILT, {"keys": sorted(context.keys())})
directive = await self._router.decide(state=state, context=context, task_id=task.task_id, session_id=task.session_id)
execution_result = await asyncio.to_thread(
self._execution_engine.execute,
task=task,
directive=directive,
)
state_patch = {"status": execution_result["status"], "last_directive": directive.model_dump(mode="json")}
if execution_result["status"] == "awaiting_permission":
state_patch["pending_permission_request"] = execution_result["result"].get("permission_request")
self._task_state_store.update_task(task.task_id, state_patch)
status = execution_result["status"]
if status == "completed":
self._publish(task, TASK_COMPLETED, {"directive": directive.model_dump(mode="json"), "execution_result": execution_result["result"]})
elif status == "failed":
self._publish(task, TASK_FAILED, {"error": execution_result.get("result", {}).get("error")})
checkpoint.status = status
self._checkpoint_store.save(checkpoint)
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
# Save task and result to memory for session context
self._save_to_memory(task, execution_result, status)
return {
"task_id": task.task_id,
"status": status,
"directive": directive.model_dump(mode="json"),
"result": execution_result.get("result"),
"events": list(self._event_bus.get_task_events(task.task_id)),
}
def _publish(self, task: UserTask, event_type: str, payload: dict) -> None:
if not self._event_bus:
return
event = RuntimeEvent(
task_id=task.task_id,
session_id=task.session_id,
sequence=self._event_bus.next_sequence(task.task_id),
type=event_type,
payload=payload,
)
self._event_bus.publish(event)
def _save_to_memory(self, task: UserTask, execution_result: dict, status: str) -> None:
"""Save task input and result to memory for session context."""
if not self._memory_interface:
return
try:
# Save task input as summary
self._memory_interface.insert(
text=f"User request: {task.input}",
kind="summary",
source="user",
task_id=task.task_id,
session_id=task.session_id,
weight=0.8,
metadata={"status": status},
)
# Save execution result
result_text = ""
if status == "completed":
step_results = execution_result.get("result", {}).get("step_results", [])
if step_results:
for step in step_results:
tool_result = step.get("result", {}).get("result", {})
if tool_result.get("output"):
result_text += f" | {step.get('step_id')}: {tool_result.get('output')[:200]}"
elif status == "failed":
result_text = f" | Error: {execution_result.get('result', {}).get('error', 'Unknown')}"
if result_text:
self._memory_interface.insert(
text=f"Result: {status}{result_text}",
kind="tool_result",
source="system",
task_id=task.task_id,
session_id=task.session_id,
weight=0.7,
metadata={"status": status},
)
except Exception as e:
import logging
logging.getLogger(__name__).warning(f"Failed to save to memory: {e}")

View File

@ -0,0 +1,462 @@
from __future__ import annotations
import json
from threading import RLock
from pathlib import Path
from app.core.config import AppConfig, load_app_config
from app.core.context_builder import ContextBuilder
from app.core.contracts import UserTask
from app.core.execution_engine import ExecutionEngine
from app.core.execution_scheduler import ExecutionScheduler
from app.core.async_router import AsyncRouter
from app.events.event_bus import EventBus
from app.events.event_store import SQLiteEventStore
from app.memory import MemoryInterface, MemoryStore, VectorIndex
from app.memory.write_policy import MemoryWritePolicy
from app.models import (
CoderAdapter,
CriticAdapter,
EmbeddingsAdapter,
OrchestratorAdapter,
create_adapter,
)
from app.models.async_adapters import AsyncOrchestratorAdapter, AsyncCriticAdapter, AsyncCoderAdapter
from app.permissions.approval_store import SQLiteApprovalStore
from app.core.permission_service import PermissionService
from app.runtime.runtime_loop import RuntimeLoop
from app.state.checkpoint_store import SQLiteCheckpointStore
from app.state.task_state_store import SQLiteTaskStateStore
from app.tools.file_read import FileReadTool
from app.tools.file_write import FileWriteTool
from app.tools.registry import ToolRegistry
from app.tools.sandbox import ToolSandbox
from app.tools.shell_exec import ShellExecTool
from app.tools.memory_tools import MemoryInsertTool, MemorySearchTool, MemoryListTool
class RuntimeController:
"""Composition root for the ducklm runtime."""
def __init__(self, base_dir: str | Path | None = None) -> None:
self.base_dir = Path(base_dir or Path(__file__).resolve().parents[2])
self.config: AppConfig = load_app_config(self.base_dir / "config")
self.event_bus = EventBus(
SQLiteEventStore(self.base_dir / "data" / "events" / "events.sqlite3")
)
self.task_state_store = SQLiteTaskStateStore(
self.base_dir / "data" / "state" / "task_state.sqlite3"
)
self.checkpoint_store = SQLiteCheckpointStore(
self.base_dir / "data" / "state" / "checkpoints.sqlite3"
)
self.approval_store = SQLiteApprovalStore(
self.base_dir / "data" / "permissions" / "approvals.sqlite3"
)
self._thinker: OrchestratorAdapter | None = None
self._json_compiler: OrchestratorAdapter | None = None
self._orchestrator: OrchestratorAdapter | None = None
self._coder: CoderAdapter | None = None
self._critic: CriticAdapter | None = None
self._sys_util: OrchestratorAdapter | None = None
self._model_cache: dict[tuple[object, ...], tuple[object, RLock]] = {}
self._memory_interface: MemoryInterface | None = None
self._memory_policy: MemoryWritePolicy | None = None
self.tool_registry = None
self.tool_sandbox = None
self._init_models()
self._init_memory()
runtime_config = self.config.runtime
self.tool_sandbox = ToolSandbox(
allowed_root=self.base_dir,
timeout_ms=runtime_config.step_timeout_ms,
)
self.tool_registry = self._create_tool_registry()
context_config = {
"max_context_tokens": runtime_config.max_context_tokens,
"context_budgets": runtime_config.context_budgets,
"reserve_for_generation_pct": runtime_config.reserve_for_generation_pct,
}
self.context_builder = ContextBuilder(
memory_interface=self._memory_interface,
tool_registry=self.tool_registry,
config=context_config,
)
self._prompts = self._load_prompts()
# ensure sys_util prompt is present in prompts dict for router
# ensure sys_util prompt is available to router (prompts.json may have "sys_util" key)
if "sys_util" not in self._prompts and "prompts" in self.config:
self._prompts["sys_util"] = self.config.get("sys_util")
self.context_builder = ContextBuilder(
memory_interface=self._memory_interface,
tool_registry=self.tool_registry,
config=context_config,
)
self.router = AsyncRouter(
thinker=None,
json_compiler=None,
intent_parser=None,
prompts=self._prompts,
event_bus=self.event_bus,
tool_registry=self.tool_registry,
retry_limit=runtime_config.orchestrator_retry_limit,
debug=runtime_config.debug if hasattr(runtime_config, 'debug') else False,
log_length=runtime_config.debug_orchestrator_log_length if hasattr(runtime_config, 'debug_orchestrator_log_length') else 500,
json_fix_retry_limit=runtime_config.json_fix_retry_limit if hasattr(runtime_config, 'json_fix_retry_limit') else 2,
json_fix_use_sys_util=runtime_config.json_fix_use_sys_util if hasattr(runtime_config, "json_fix_use_sys_util") else True,
intent_classifier=runtime_config.intent_classifier if hasattr(runtime_config, "intent_classifier") else "thinker",
)
self.permission_service = PermissionService(
config=self._load_permissions_config(),
)
self.execution_engine = ExecutionEngine(
event_bus=self.event_bus,
tool_registry=self.tool_registry,
permission_service=self.permission_service,
scheduler=ExecutionScheduler(
retry_limit=runtime_config.planner_retry_limit
),
critic=self._critic,
memory_policy=self._memory_policy,
memory_interface=self._memory_interface,
prompts=self._prompts,
)
self.runtime_loop = RuntimeLoop(
event_bus=self.event_bus,
task_state_store=self.task_state_store,
checkpoint_store=self.checkpoint_store,
context_builder=self.context_builder,
router=self.router,
execution_engine=self.execution_engine,
permission_service=self.permission_service,
memory_interface=self._memory_interface,
)
def _load_prompts(self) -> dict[str, str]:
prompts_dir = self.base_dir / "config" / "prompts"
prompts = {}
if prompts_dir.is_dir():
for md_file in prompts_dir.glob("*.md"):
role = md_file.stem
prompts[role] = md_file.read_text(encoding="utf-8")
if prompts:
return prompts
prompts_file = self.base_dir / "config" / "prompts.json"
if prompts_file.exists():
with open(prompts_file) as f:
return json.load(f)
return {}
def _load_permissions_config(self) -> dict:
permissions_file = self.base_dir / "config" / "permissions.json"
if not permissions_file.exists():
return {}
with permissions_file.open("r", encoding="utf-8") as handle:
return json.load(handle)
def _init_models(self) -> None:
try:
memory_config = self.config.runtime.memory_thresholds or {}
if memory_config:
self._memory_policy = MemoryWritePolicy(
store_threshold=memory_config.get("default_store_weight", 0.8),
)
print("Models policy ready")
except Exception as e:
print(f"Models init failed: {e}")
def load_models_at_startup(self) -> None:
"""Load all LLM models synchronously. Called from startup hook in executor."""
import os
os.chdir(str(self.base_dir / "models"))
try:
print("Loading thinker model...")
thinker_config = self.config.models.thinker or {}
if thinker_config.get("path"):
llm, lock = self._get_or_create_llm("thinker", thinker_config)
self._thinker = OrchestratorAdapter(llm, system_prompt=self._prompts.get("thinker"), lock=lock)
print(f"Thinker loaded: {self._thinker} (model: {thinker_config.get("path")})")
print("Loading json_compiler model...")
compiler_config = self.config.models.json_compiler or {}
if compiler_config.get("path"):
llm, lock = self._get_or_create_llm("json_compiler", compiler_config)
self._json_compiler = OrchestratorAdapter(llm, system_prompt=self._prompts.get("json_compiler"), lock=lock)
print(f"JSON Compiler loaded: {self._json_compiler} (model: {compiler_config.get("path")})")
print("Loading coder model...")
coder_config = self.config.models.coder or {}
if coder_config.get("path"):
llm, lock = self._get_or_create_llm("coder", coder_config)
self._coder = CoderAdapter(llm, system_prompt=self._prompts.get("coder"), lock=lock)
print(f"Coder loaded: {self._coder} (model: {coder_config.get("path")})")
print("Loading critic model...")
critic_config = self.config.models.critic or {}
if critic_config.get("path"):
llm, lock = self._get_or_create_llm("critic", critic_config)
self._critic = CriticAdapter(llm, system_prompt=self._prompts.get("critic"), lock=lock)
print(f"Critic loaded: {self._critic} (model: {critic_config.get("path")})")
print("Loading sys_util model...")
sys_util_config = self.config.models.sys_util or {}
if sys_util_config.get("path"):
llm, lock = self._get_or_create_llm("sys_util", sys_util_config)
self._sys_util = OrchestratorAdapter(llm, system_prompt=self._prompts.get("sys_util"), lock=lock)
print(f"Sys_util loaded: {self._sys_util} (model: {sys_util_config.get("path")})")
print("All models loaded successfully")
async_thinker = AsyncOrchestratorAdapter(self._thinker) if self._thinker else None
async_compiler = AsyncOrchestratorAdapter(self._json_compiler) if self._json_compiler else None
async_coder = AsyncCoderAdapter(self._coder) if self._coder else None
async_critic = AsyncCriticAdapter(self._critic) if self._critic else None
async_sys_util = AsyncOrchestratorAdapter(self._sys_util) if self._sys_util else None
self.router.set_thinker(async_thinker)
self.router.set_json_compiler(async_compiler)
self.router.set_sys_util(async_sys_util)
self.router.set_tool_registry(self.tool_registry)
if async_critic:
self.execution_engine.set_critic(async_critic)
if async_coder:
self.execution_engine.set_coder(async_coder)
except Exception as e:
print(f"Failed to load models at startup: {e}")
raise RuntimeError(f"Model loading failed: {e}") from e
def _model_cache_key(self, model_config: dict) -> tuple[object, ...]:
path = str((self.base_dir / "models" / model_config.get("path", "")).resolve())
return (
path,
model_config.get("backend", "cpu"),
model_config.get("n_gpu_layers", 0),
model_config.get("n_ctx", 4096),
)
def _get_or_create_llm(self, model_type: str, model_config: dict):
key = self._model_cache_key(model_config)
cached = self._model_cache.get(key)
if cached:
print(f"Reusing model instance: {model_config.get('path')} for {model_type}")
return cached
llm = create_adapter(model_type, model_config, self.base_dir / "models")
lock = RLock()
cached = (llm, lock)
self._model_cache[key] = cached
return cached
def _init_memory(self) -> None:
try:
emb_config = self.config.models.embeddings or {}
model_path = self.base_dir / emb_config.get("path", "models/all-MiniLM-L6-v2")
if not model_path.exists():
print(f"Memory init skipped: embeddings model not found at {model_path}")
self._memory_interface = None
return
embeddings = EmbeddingsAdapter(
model_path=model_path,
embedding_dim=emb_config.get("embedding_dim", 384),
)
store = MemoryStore(
self.base_dir / "data" / "memory" / "memory.sqlite3"
)
vector_index = VectorIndex(
index_path=self.base_dir / "data" / "memory" / "index.bin",
embedding_dim=embeddings.embedding_dim,
)
self._memory_interface = MemoryInterface(store, vector_index, embeddings)
except Exception as e:
print(f"Memory init failed: {e}")
self._memory_interface = None
def _create_tool_registry(self) -> ToolRegistry:
from app.tools.registry import ToolRegistry
from app.tools.plugins.shell_exec import Tool as ShellExecTool
from app.tools.plugins.file_read import Tool as FileReadTool
from app.tools.plugins.file_write import Tool as FileWriteTool
from app.tools.plugins.memory_tools import Tool as MemoryTool
from app.tools.discover import ToolDiscovery
registry = ToolRegistry()
tool_init_map = {
"shell_exec": lambda m: ShellExecTool(self.tool_sandbox),
"file_read": lambda m: FileReadTool(self.tool_sandbox),
"file_write": lambda m: FileWriteTool(self.tool_sandbox),
"memory": lambda m: MemoryTool(self._memory_interface),
}
discovery = ToolDiscovery()
discovered = discovery.discover()
for name, data in discovered.items():
init_fn = tool_init_map.get(name)
if init_fn:
tool = init_fn(data.get("manifest", {}))
registry.register(tool)
registry._schemas[name] = {
"description": data.get("manifest", {}).get("description", ""),
"args_schema": data.get("manifest", {}).get("args_schema", {}),
"requires_permission": data.get("manifest", {}).get("requires_permission", False),
}
print(f"Registered tool: {name}")
else:
print(f"No init mapping for tool: {name} - skipping")
return registry
@property
def orchestrator(self) -> OrchestratorAdapter | None:
return self._orchestrator
@property
def coder(self) -> CoderAdapter | None:
return self._coder
@property
def critic(self) -> CriticAdapter | None:
return self._critic
@property
def memory_interface(self) -> MemoryInterface | None:
return self._memory_interface
def _ensure_orchestrator(self) -> OrchestratorAdapter | None:
if self._orchestrator is not None:
return self._orchestrator
try:
orch_config = self.config.models.orchestrator or {}
if orch_config.get("path"):
llm, lock = self._get_or_create_llm("orchestrator", orch_config)
self._orchestrator = OrchestratorAdapter(llm, lock=lock)
except Exception as e:
print(f"Orchestrator load failed: {e}")
return self._orchestrator
def _ensure_critic(self) -> CriticAdapter | None:
if self._critic is not None:
return self._critic
try:
critic_config = self.config.models.critic or {}
if critic_config.get("path"):
llm, lock = self._get_or_create_llm("critic", critic_config)
self._critic = CriticAdapter(llm, lock=lock)
except Exception as e:
print(f"Critic load failed: {e}")
return self._critic
def handle_task(self, task: UserTask) -> dict[str, object]:
return self.runtime_loop.run_task(task)
def resolve_permission(self, task_id: str, decision: str) -> dict[str, object]:
return self.runtime_loop.resolve_permission(
task_id=task_id, decision=decision
)
def resolve_secret(self, task_id: str, secret: str) -> dict[str, object]:
return self.runtime_loop.resolve_secret(
task_id=task_id, secret=secret
)
def resolve_password(self, task_id: str, password: str) -> dict[str, object]:
return self.runtime_loop.resolve_password(
task_id=task_id, password=password
)
def handle_critic_feedback(
self,
feedback: str,
task_id: str | None = None,
session_id: str | None = None,
correctness_override: float | None = None,
usefulness_override: float | None = None,
safety_override: float | None = None,
) -> dict[str, object]:
if not self._memory_interface:
return {"status": "error", "message": "Memory not available"}
target_task_id = task_id
target_session_id = session_id
if not target_session_id and not target_task_id:
return {
"status": "error",
"message": "Either task_id or session_id must be provided",
}
if not target_session_id and target_task_id:
state = self.task_state_store.get_task(target_task_id)
if state:
target_session_id = state.get("session_id")
if not target_task_id and target_session_id:
recent_tasks = self.task_state_store.get_session_tasks(target_session_id, limit=1)
if recent_tasks:
target_task_id = recent_tasks[0]["task_id"]
min_weight = 0.3
max_weight = 0.95
user_weight = 0.9
final_weight = max(min_weight, min(max_weight, user_weight))
metadata = {
"feedback_text": feedback,
"overrides": {
"correctness": correctness_override,
"usefulness": usefulness_override,
"safety": safety_override,
},
"source": "user",
}
feedback_text = f"User feedback: {feedback}"
if correctness_override is not None:
feedback_text += f" | Correctness corrected to: {correctness_override}"
if usefulness_override is not None:
feedback_text += f" | Usefulness corrected to: {usefulness_override}"
if safety_override is not None:
feedback_text += f" | Safety corrected to: {safety_override}"
try:
self._memory_interface.insert(
text=feedback_text,
kind="critique",
source="user",
task_id=target_task_id,
session_id=target_session_id,
weight=final_weight,
metadata=metadata,
)
return {
"status": "ok",
"message": "Feedback saved",
"task_id": target_task_id,
"session_id": target_session_id,
}
except Exception as e:
return {"status": "error", "message": str(e)}

504
app/runtime/runtime_loop.py Normal file
View File

@ -0,0 +1,504 @@
from __future__ import annotations
import asyncio
from app.core.context_builder import ContextBuilder
from app.core.contracts import ExecutionDirective, PermissionDecision, PermissionRequest, RuntimeEvent, SecretRequest, TaskCheckpoint, UserTask
from app.core.execution_engine import ExecutionEngine
from app.core.async_router import AsyncRouter
from app.events.event_bus import EventBus
from app.events.event_types import CHECKPOINT_SAVED, CONTEXT_BUILT, TASK_AWAITING_INPUT, TASK_AWAITING_PERMISSION, TASK_COMPLETED, TASK_FAILED, TASK_RECEIVED
from app.core.permission_service import PermissionService
from app.state.checkpoint_store import SQLiteCheckpointStore
from app.state.task_state_store import SQLiteTaskStateStore
class RuntimeLoop:
"""Central control loop skeleton coordinating task state and events."""
def __init__(
self,
event_bus: EventBus,
task_state_store: SQLiteTaskStateStore,
checkpoint_store: SQLiteCheckpointStore,
context_builder: ContextBuilder,
router: AsyncRouter,
execution_engine: ExecutionEngine,
permission_service: PermissionService,
memory_interface=None,
) -> None:
self._event_bus = event_bus
self._task_state_store = task_state_store
self._checkpoint_store = checkpoint_store
self._context_builder = context_builder
self._router = router
self._execution_engine = execution_engine
self._permission_service = permission_service
self._memory_interface = memory_interface
def run_task(self, task: UserTask) -> dict[str, object]:
# Check input for hard-stop commands BEFORE processing
hard_stop_check = self._permission_service.check_shell_command(
task_id=task.task_id,
session_id=task.session_id,
command=task.input,
)
if hard_stop_check.get("decision") == "hard_stop":
# Immediately reject hard-stop commands
self._publish(task, TASK_RECEIVED, {"status": "received"})
checkpoint = TaskCheckpoint(task_id=task.task_id, status="received")
self._checkpoint_store.save(checkpoint)
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
error_msg = f"⚠️ BLOCKED: {hard_stop_check.get('reason', 'Hard stop command')}"
self._publish(task, TASK_FAILED, {
"directive": {},
"execution_result": {"error": error_msg},
})
return {
"task_id": task.task_id,
"status": "failed",
"directive": {},
"result": {"error": error_msg},
"events": [e.model_dump(mode="json") for e in self._event_bus.list_for_task(task.task_id)],
}
state = self._task_state_store.create_task(
task.task_id,
{
"status": "received",
"session_id": task.session_id,
"plan": None,
"task_input": task.input,
"task_context": task.context,
},
)
self._publish(task, TASK_RECEIVED, {"status": "received"})
checkpoint = TaskCheckpoint(task_id=task.task_id, status="received")
self._checkpoint_store.save(checkpoint)
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
context = self._context_builder.build(task=task, checkpoint=checkpoint)
self._publish(task, CONTEXT_BUILT, {"keys": sorted(context.keys())})
directive = asyncio.run(
self._router.decide(state=state, context=context, task_id=task.task_id, session_id=task.session_id)
)
execution_result = self._execution_engine.execute(task=task, directive=directive)
state_patch = {"status": execution_result["status"], "last_directive": directive.model_dump(mode="json")}
if execution_result["status"] == "awaiting_permission":
state_patch["pending_permission_request"] = execution_result["result"]["permission_request"]
state_patch["pending_secret_request"] = None
state_patch["resolved_permission_decision"] = None
elif execution_result["status"] == "awaiting_input":
state_patch["pending_permission_request"] = None
state_patch["pending_secret_request"] = execution_result["result"]["secret_request"]
state_patch["resolved_permission_decision"] = None
elif execution_result["status"] == "awaiting_password":
state_patch["pending_permission_request"] = None
state_patch["pending_secret_request"] = None
state_patch["resolved_permission_decision"] = None
state_patch["pending_password_request"] = {
"command": execution_result["result"].get("command", ""),
"reason": "Permission denied - требуется sudo пароль",
"attempts": 0,
}
else:
state_patch["pending_permission_request"] = None
state_patch["pending_secret_request"] = None
state_patch["resolved_permission_decision"] = None
self._task_state_store.update_task(task.task_id, state_patch)
final_status = str(execution_result["status"])
# For awaiting states - do NOT mark task as completed, keep it in pending state
if final_status in ("awaiting_permission", "awaiting_input", "awaiting_password"):
# Task stays in pending state, don't update to completed
pass
else:
self._task_state_store.update_task(task.task_id, {"status": final_status})
final_checkpoint = TaskCheckpoint(
task_id=task.task_id,
status=final_status,
context_snapshot=context,
)
self._checkpoint_store.save(final_checkpoint)
# Generate response after plan execution
if final_status == "completed" and execution_result.get("result", {}).get("step_results"):
# Format tool results into response
step_results = execution_result["result"]["step_results"]
response_parts = []
for step in step_results:
result_data = step.get("result", {})
tool_result = result_data.get("result", result_data)
if tool_result.get("ok") and tool_result.get("output"):
response_parts.append(tool_result["output"])
if response_parts:
# Create respond directive
response_text = "\n\n".join(response_parts)
respond_directive = ExecutionDirective(
type="respond",
payload={"text": response_text},
)
# Add to execution result
execution_result["response_directive"] = respond_directive.model_dump(mode="json")
# Map status to terminal event type
if final_status == "completed":
terminal_event_type = TASK_COMPLETED
elif final_status == "failed":
terminal_event_type = TASK_FAILED
elif final_status == "awaiting_permission":
terminal_event_type = TASK_AWAITING_PERMISSION
elif final_status == "awaiting_input":
terminal_event_type = TASK_AWAITING_INPUT
elif final_status == "awaiting_password":
terminal_event_type = TASK_AWAITING_PERMISSION
else:
terminal_event_type = TASK_FAILED
self._publish(
task,
terminal_event_type,
{
"directive": directive.model_dump(mode="json"),
"execution_result": execution_result["result"],
},
)
# Save task and result to memory for session context
self._save_to_memory(task, execution_result, final_status)
return {
"task_id": task.task_id,
"status": final_status,
"directive": directive.model_dump(mode="json"),
"result": execution_result["result"],
"events": [event.model_dump(mode="json") for event in self._event_bus.list_for_task(task.task_id)],
}
def resolve_permission(self, task_id: str, decision: str) -> dict[str, object]:
state = self._task_state_store.get_task(task_id)
if not state:
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
pending_request_payload = state.get("pending_permission_request")
last_directive_payload = state.get("last_directive")
if not pending_request_payload or not last_directive_payload:
return {"task_id": task_id, "status": "failed", "result": {"error": "No pending permission request"}}
task = UserTask(
task_id=task_id,
session_id=state["session_id"],
input=state["task_input"],
context=state.get("task_context", {}),
)
# Get command from pending request
command = pending_request_payload.get("command", "")
# Resolve permission using new service
resolved = self._permission_service.resolve_permission(
task_id=task_id,
session_id=state["session_id"],
command=command,
decision=decision,
)
if decision == "deny":
execution_result = {
"status": "failed",
"result": {
"error": "Permission denied by user.",
"permission_decision": resolved,
},
}
elif decision == "allow_with_password":
directive = ExecutionDirective.model_validate(last_directive_payload)
self._task_state_store.update_task(
task.task_id,
{
"status": "awaiting_password",
"pending_password_request": {
"command": command,
"reason": pending_request_payload.get("reason", "Требуется пароль для выполнения команды"),
"attempts": 0,
},
"pending_permission_request": None,
},
)
self._publish(task, TASK_AWAITING_PERMISSION, {
"password_required": True,
"command": command,
})
return {
"task_id": task_id,
"status": "awaiting_password",
"result": {"message": "Требуется ввод пароля"},
}
else:
directive = ExecutionDirective.model_validate(last_directive_payload)
execution_result = self._execution_engine.execute(
task=task,
directive=directive,
)
final_status = str(execution_result["status"])
if decision != "allow_with_password":
self._task_state_store.update_task(
task.task_id,
{
"status": final_status,
"pending_permission_request": None,
"pending_secret_request": execution_result["result"].get("secret_request")
if final_status == "awaiting_input"
else None,
"resolved_permission_decision": resolved,
},
)
checkpoint = TaskCheckpoint(task_id=task.task_id, status=final_status)
self._checkpoint_store.save(checkpoint)
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
if final_status == "completed":
terminal_event_type = TASK_COMPLETED
elif final_status == "awaiting_input":
terminal_event_type = TASK_AWAITING_INPUT
elif final_status == "awaiting_permission":
terminal_event_type = TASK_AWAITING_PERMISSION
else:
terminal_event_type = TASK_FAILED
self._publish(
task,
terminal_event_type,
{
"permission_resolution": resolved.model_dump(mode="json") if hasattr(resolved, 'model_dump') else resolved,
"execution_result": execution_result["result"],
},
)
# Save to memory after permission resolution
self._save_to_memory(task, execution_result, final_status)
return {
"task_id": task.task_id,
"status": final_status,
"result": execution_result["result"],
"events": [event.model_dump(mode="json") for event in self._event_bus.list_for_task(task.task_id)],
}
def resolve_secret(self, task_id: str, secret: str) -> dict[str, object]:
state = self._task_state_store.get_task(task_id)
if not state:
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
pending_secret_payload = state.get("pending_secret_request")
last_directive_payload = state.get("last_directive")
resolved_permission_payload = state.get("resolved_permission_decision")
if not pending_secret_payload or not last_directive_payload:
return {"task_id": task_id, "status": "failed", "result": {"error": "No pending secret request"}}
if not resolved_permission_payload:
return {"task_id": task_id, "status": "failed", "result": {"error": "No resolved permission available"}}
task = UserTask(
task_id=task_id,
session_id=state["session_id"],
input=state["task_input"],
context=state.get("task_context", {}),
)
_secret_request = SecretRequest.model_validate(pending_secret_payload)
directive = ExecutionDirective.model_validate(last_directive_payload)
execution_result = self._execution_engine.execute(
task=task,
directive=directive,
permission_override=None,
secret_override=secret,
)
final_status = str(execution_result["status"])
self._task_state_store.update_task(
task.task_id,
{
"status": final_status,
"pending_secret_request": None,
"resolved_permission_decision": None,
},
)
checkpoint = TaskCheckpoint(task_id=task.task_id, status=final_status)
self._checkpoint_store.save(checkpoint)
self._publish(task, CHECKPOINT_SAVED, checkpoint.model_dump(mode="json"))
if final_status == "completed":
terminal_event_type = TASK_COMPLETED
elif final_status == "awaiting_input":
terminal_event_type = TASK_AWAITING_INPUT
elif final_status == "awaiting_permission":
terminal_event_type = TASK_AWAITING_PERMISSION
else:
terminal_event_type = TASK_FAILED
self._publish(
task,
terminal_event_type,
{
"secret_resolution": {"task_id": task_id},
"execution_result": execution_result["result"],
},
)
return {
"task_id": task.task_id,
"status": final_status,
"result": execution_result["result"],
"events": [event.model_dump(mode="json") for event in self._event_bus.list_for_task(task.task_id)],
}
def resolve_password(self, task_id: str, password: str) -> dict[str, object]:
state = self._task_state_store.get_task(task_id)
if not state:
return {"task_id": task_id, "status": "failed", "result": {"error": "Unknown task_id"}}
pending_password_payload = state.get("pending_password_request")
last_directive_payload = state.get("last_directive")
if not pending_password_payload or not last_directive_payload:
return {"task_id": task_id, "status": "failed", "result": {"error": "No pending password request"}}
current_attempt = pending_password_payload.get("attempts", 0) + 1
task = UserTask(
task_id=task_id,
session_id=state["session_id"],
input=state["task_input"],
context=state.get("task_context", {}),
)
directive = ExecutionDirective.model_validate(last_directive_payload)
execution_result = self._execution_engine.execute(
task=task,
directive=directive,
password_override=password,
)
final_status = str(execution_result["status"])
if final_status == "failed":
error_msg = execution_result.get("result", {}).get("error", "")
is_password_error = "permission denied" in error_msg.lower() or "incorrect password" in error_msg.lower()
if is_password_error and current_attempt < 3:
self._task_state_store.update_task(
task.task_id,
{
"status": "awaiting_password",
"pending_password_request": {
"command": pending_password_payload.get("command"),
"reason": pending_password_payload.get("reason"),
"attempts": current_attempt,
},
},
)
self._publish(task, TASK_AWAITING_PERMISSION, {
"password_attempt_failed": True,
"attempts": current_attempt,
"max_attempts": 3,
"message": "Неверный пароль. Попробуйте снова.",
})
return {
"task_id": task_id,
"status": "awaiting_password",
"result": {"error": "Неверный пароль", "attempts": current_attempt, "max_attempts": 3},
}
else:
self._task_state_store.update_task(
task.task_id,
{
"status": "failed",
"pending_password_request": None,
"password_attempts": current_attempt,
},
)
self._publish(task, TASK_FAILED, {
"password_failed": True,
"attempts": current_attempt,
"message": "Неверный пароль (3 попытки). Передаю решение модели.",
"execution_result": execution_result["result"],
})
return {
"task_id": task_id,
"status": "failed",
"result": {
"error": "Password failed after 3 attempts",
"attempts": current_attempt,
"message": "Пользователь 3 раза ввёл неверный пароль. Решение за вами.",
},
}
self._task_state_store.update_task(
task.task_id,
{
"status": final_status,
"pending_password_request": None,
},
)
checkpoint = TaskCheckpoint(task_id=task.task_id, status=final_status)
self._checkpoint_store.save(checkpoint)
self._publish(task, TASK_COMPLETED, {"execution_result": execution_result["result"]})
# Save to memory after password resolution
self._save_to_memory(task, execution_result, final_status)
return {
"task_id": task.task_id,
"status": final_status,
"result": execution_result["result"],
"events": [event.model_dump(mode="json") for event in self._event_bus.list_for_task(task.task_id)],
}
def _publish(self, task: UserTask, event_type: str, payload: dict[str, object]) -> None:
event = RuntimeEvent(
task_id=task.task_id,
session_id=task.session_id,
sequence=self._event_bus.next_sequence(task.task_id),
type=event_type,
payload=payload,
)
self._event_bus.publish(event)
def _save_to_memory(self, task: UserTask, execution_result: dict, status: str) -> None:
"""Save task input and result to memory for session context."""
if not self._memory_interface:
return
try:
# Save task input as summary
self._memory_interface.insert(
text=f"User request: {task.input}",
kind="summary",
source="user",
task_id=task.task_id,
session_id=task.session_id,
weight=0.8,
metadata={"status": status},
)
# Save execution result
result_text = ""
if status == "completed":
step_results = execution_result.get("result", {}).get("step_results", [])
if step_results:
for step in step_results:
tool_result = step.get("result", {}).get("result", {})
if tool_result.get("output"):
result_text += f" | {step.get('step_id')}: {tool_result.get('output')[:200]}"
elif status == "failed":
result_text = f" | Error: {execution_result.get('result', {}).get('error', 'Unknown')}"
if result_text:
self._memory_interface.insert(
text=f"Result: {status}{result_text}",
kind="tool_result",
source="system",
task_id=task.task_id,
session_id=task.session_id,
weight=0.7,
metadata={"status": status},
)
except Exception as e:
# Log but don't fail the task
import logging
logging.getLogger(__name__).warning(f"Failed to save to memory: {e}")

2
app/services/__init__.py Normal file
View File

@ -0,0 +1,2 @@
"""Shared services."""

2
app/state/__init__.py Normal file
View File

@ -0,0 +1,2 @@
"""Task state and checkpoints."""

View File

@ -0,0 +1,75 @@
from __future__ import annotations
import json
import sqlite3
from pathlib import Path
from app.core.contracts import TaskCheckpoint
class SQLiteCheckpointStore:
"""Durable checkpoint store for resumable runtime state."""
def __init__(self, db_path: str | Path) -> None:
self._db_path = Path(db_path)
self._db_path.parent.mkdir(parents=True, exist_ok=True)
self._initialize()
def save(self, checkpoint: TaskCheckpoint) -> TaskCheckpoint:
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
INSERT OR REPLACE INTO checkpoints (
task_id, status, active_step_id, plan_snapshot_json,
context_snapshot_json, updated_at
) VALUES (?, ?, ?, ?, ?, ?)
""",
(
checkpoint.task_id,
checkpoint.status,
checkpoint.active_step_id,
json.dumps(checkpoint.plan_snapshot, default=str),
json.dumps(checkpoint.context_snapshot, default=str),
checkpoint.updated_at.isoformat(),
),
)
conn.commit()
return checkpoint
def load(self, task_id: str) -> TaskCheckpoint | None:
with sqlite3.connect(self._db_path) as conn:
row = conn.execute(
"""
SELECT task_id, status, active_step_id, plan_snapshot_json,
context_snapshot_json, updated_at
FROM checkpoints
WHERE task_id = ?
""",
(task_id,),
).fetchone()
if not row:
return None
return TaskCheckpoint(
task_id=row[0],
status=row[1],
active_step_id=row[2],
plan_snapshot=json.loads(row[3]),
context_snapshot=json.loads(row[4]),
updated_at=row[5],
)
def _initialize(self) -> None:
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
CREATE TABLE IF NOT EXISTS checkpoints (
task_id TEXT PRIMARY KEY,
status TEXT NOT NULL,
active_step_id TEXT,
plan_snapshot_json TEXT NOT NULL,
context_snapshot_json TEXT NOT NULL,
updated_at TEXT NOT NULL
)
"""
)
conn.commit()

View File

@ -0,0 +1,77 @@
from __future__ import annotations
import json
import sqlite3
from pathlib import Path
from typing import Any
class SQLiteTaskStateStore:
"""Durable task state store for runtime lifecycle state."""
def __init__(self, db_path: str | Path) -> None:
self._db_path = Path(db_path)
self._db_path.parent.mkdir(parents=True, exist_ok=True)
self._initialize()
def create_task(self, task_id: str, initial_state: dict[str, Any]) -> dict[str, Any]:
state = dict(initial_state)
session_id = state.get("session_id")
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
INSERT OR REPLACE INTO task_states (task_id, state_json, session_id)
VALUES (?, ?, ?)
""",
(task_id, json.dumps(state), session_id),
)
conn.commit()
return state
def get_task(self, task_id: str) -> dict[str, Any] | None:
with sqlite3.connect(self._db_path) as conn:
row = conn.execute(
"SELECT state_json FROM task_states WHERE task_id = ?",
(task_id,),
).fetchone()
return json.loads(row[0]) if row else None
def update_task(self, task_id: str, patch: dict[str, Any]) -> dict[str, Any]:
state = self.get_task(task_id) or {}
state.update(patch)
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
INSERT OR REPLACE INTO task_states (task_id, state_json)
VALUES (?, ?)
""",
(task_id, json.dumps(state)),
)
conn.commit()
return state
def _initialize(self) -> None:
with sqlite3.connect(self._db_path) as conn:
conn.execute(
"""
CREATE TABLE IF NOT EXISTS task_states (
task_id TEXT PRIMARY KEY,
state_json TEXT NOT NULL
)
"""
)
conn.commit()
try:
conn.execute("ALTER TABLE task_states ADD COLUMN session_id TEXT")
conn.commit()
except sqlite3.OperationalError:
pass
def get_session_tasks(self, session_id: str, limit: int = 10) -> list[dict[str, Any]]:
with sqlite3.connect(self._db_path) as conn:
conn.row_factory = sqlite3.Row
rows = conn.execute(
"SELECT state_json FROM task_states WHERE session_id = ? ORDER BY rowid DESC LIMIT ?",
(session_id, limit),
).fetchall()
return [json.loads(row[0]) for row in rows]

View File

@ -0,0 +1,2 @@
"""Streaming projections."""

36
app/streaming/manager.py Normal file
View File

@ -0,0 +1,36 @@
from __future__ import annotations
import asyncio
from collections import defaultdict
from app.core.contracts import RuntimeEvent
from app.events.event_bus import EventBus
class StreamingManager:
"""Simple in-process projection from event bus to websocket consumers."""
def __init__(self, event_bus: EventBus) -> None:
self._event_bus = event_bus
self._subscribers: dict[str, list[asyncio.Queue[RuntimeEvent]]] = defaultdict(list)
self._event_bus.subscribe(self._on_event)
def replay_events(self, task_id: str) -> list[RuntimeEvent]:
return self._event_bus.list_for_task(task_id)
def subscribe(self, task_id: str) -> asyncio.Queue[RuntimeEvent]:
queue: asyncio.Queue[RuntimeEvent] = asyncio.Queue()
self._subscribers[task_id].append(queue)
return queue
def unsubscribe(self, task_id: str, queue: asyncio.Queue[RuntimeEvent]) -> None:
listeners = self._subscribers.get(task_id, [])
if queue in listeners:
listeners.remove(queue)
if not listeners and task_id in self._subscribers:
del self._subscribers[task_id]
def _on_event(self, event: RuntimeEvent) -> None:
for queue in self._subscribers.get(event.task_id, []):
queue.put_nowait(event)

2
app/tools/__init__.py Normal file
View File

@ -0,0 +1,2 @@
"""Tool registry and tool adapters."""

24
app/tools/base.py Normal file
View File

@ -0,0 +1,24 @@
from __future__ import annotations
from abc import ABC, abstractmethod
from typing import Any
from app.core.contracts import ToolResult, UserTask
class BaseTool(ABC):
name: str = ""
description: str = ""
@property
def name(self) -> str:
return getattr(self, '_name', self.__class__.__name__.replace('Tool', '').lower())
@property
def description(self) -> str:
return getattr(self, '_description', "")
@abstractmethod
def execute(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
raise NotImplementedError

83
app/tools/discover.py Normal file
View File

@ -0,0 +1,83 @@
from __future__ import annotations
import importlib
import json
import logging
from pathlib import Path
from typing import Any
logger = logging.getLogger(__name__)
PLUGINS_DIR = Path(__file__).parent / "plugins"
class ToolDiscovery:
"""Decentralized tool discovery system."""
def __init__(self, plugins_dir: Path | None = None) -> None:
self._plugins_dir = plugins_dir or PLUGINS_DIR
def discover(self) -> dict[str, Any]:
"""Discover all tools from plugins directory."""
tools = {}
if not self._plugins_dir.exists():
logger.warning(f"Plugins directory not found: {self._plugins_dir}")
return tools
for folder in self._plugins_dir.iterdir():
if not folder.is_dir():
continue
manifest_file = folder / "manifest.json"
if not manifest_file.exists():
logger.warning(f"Missing manifest.json in {folder.name}")
continue
try:
manifest = self._load_manifest(manifest_file)
tool_name = manifest.get("name", folder.name)
tools[tool_name] = {
"manifest": manifest,
"tool_class": folder.name,
}
logger.info(f"Discovered tool: {tool_name}")
except Exception as e:
logger.error(f"Failed to load tool {folder.name}: {e}")
continue
return tools
def _load_manifest(self, manifest_file: Path) -> dict[str, Any]:
with open(manifest_file) as f:
return json.load(f)
def _load_tool_class(self, tool_name: str, manifest: dict[str, Any]) -> Any:
entrypoint = manifest.get("entrypoint", "Tool")
module = importlib.import_module(f"app.tools.plugins.{tool_name}")
tool_class = getattr(module, entrypoint)
return tool_class
def get_tool_schemas(self) -> list[dict[str, Any]]:
"""Get schemas for all discovered tools."""
tools = self.discover()
schemas = []
for name, data in tools.items():
manifest = data.get("manifest", {})
schemas.append({
"name": name,
"description": manifest.get("description", ""),
"args_schema": manifest.get("args_schema", {}),
"requires_permission": manifest.get("requires_permission", False),
})
return schemas
def discover_tools() -> dict[str, Any]:
"""Convenience function for quick tool discovery."""
discovery = ToolDiscovery()
return discovery.discover()

26
app/tools/file_read.py Normal file
View File

@ -0,0 +1,26 @@
from __future__ import annotations
from app.core.contracts import ToolResult, UserTask
from app.tools.base import BaseTool
from app.tools.sandbox import ToolSandbox
class FileReadTool(BaseTool):
name = "file_read"
def __init__(self, sandbox: ToolSandbox) -> None:
self._sandbox = sandbox
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
path = args.get("path")
if not path:
return ToolResult(tool=self.name, ok=False, error="Missing path")
resolved = self._sandbox.ensure_path_allowed(str(path))
content = resolved.read_text(encoding="utf-8")
return ToolResult(
tool=self.name,
ok=True,
output=content,
metadata={"path": str(resolved), "size": len(content)},
)

27
app/tools/file_write.py Normal file
View File

@ -0,0 +1,27 @@
from __future__ import annotations
from app.core.contracts import ToolResult, UserTask
from app.tools.base import BaseTool
from app.tools.sandbox import ToolSandbox
class FileWriteTool(BaseTool):
name = "file_write"
def __init__(self, sandbox: ToolSandbox) -> None:
self._sandbox = sandbox
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
path = args.get("path")
content = str(args.get("content", ""))
if not path:
return ToolResult(tool=self.name, ok=False, error="Missing path")
resolved = self._sandbox.ensure_path_allowed(str(path))
resolved.parent.mkdir(parents=True, exist_ok=True)
resolved.write_text(content, encoding="utf-8")
return ToolResult(
tool=self.name,
ok=True,
output=f"Wrote {len(content)} bytes",
metadata={"path": str(resolved), "size": len(content)},
)

123
app/tools/memory_tools.py Normal file
View File

@ -0,0 +1,123 @@
from __future__ import annotations
import logging
from typing import Any
from app.tools.base import BaseTool
from app.core.contracts import ToolResult, UserTask
from app.tools.sandbox import ToolSandbox
logger = logging.getLogger(__name__)
class MemoryInsertTool(BaseTool):
_name = "memory_insert"
_description = "Store information in memory"
def __init__(self, sandbox: ToolSandbox, memory_interface=None) -> None:
super().__init__()
self._sandbox = sandbox
self._memory = memory_interface
def execute(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
text = args.get("text", "")
kind = args.get("kind", "fact")
source = args.get("source", "user")
weight = args.get("weight", 0.5)
if not text:
return ToolResult(tool="memory_insert", ok=False, output="", error="text is required")
if not self._memory:
return ToolResult(tool="memory_insert", ok=False, output="", error="Memory not available")
try:
entry = self._memory.insert(
text=text,
kind=kind,
source=source,
task_id=task.task_id,
session_id=task.session_id,
weight=weight,
)
return ToolResult(
tool="memory_insert",
ok=True,
output=f"Stored: {entry.id}",
metadata={"entry_id": entry.id},
)
except Exception as e:
logger.warning(f"Memory insert failed: {e}")
return ToolResult(tool="memory_insert", ok=False, output="", error=str(e))
class MemorySearchTool(BaseTool):
_name = "memory_search"
_description = "Search memory for information"
def __init__(self, sandbox: ToolSandbox, memory_interface=None) -> None:
super().__init__()
self._sandbox = sandbox
self._memory = memory_interface
def execute(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
query = args.get("query", "")
top_k = args.get("top_k", 5)
if not query:
return ToolResult(tool="memory_search", ok=False, output="", error="query is required")
if not self._memory:
return ToolResult(tool="memory_search", ok=False, output="", error="Memory not available")
try:
results = self._memory.search(query, top_k=top_k)
if not results:
return ToolResult(tool="memory_search", ok=True, output="No results found", metadata={"count": 0})
output_lines = []
for entry, score in results:
output_lines.append(f"[{score:.2f}] {entry.text[:100]}")
return ToolResult(
tool="memory_search",
ok=True,
output="\n".join(output_lines),
metadata={"count": len(results)},
)
except Exception as e:
logger.warning(f"Memory search failed: {e}")
return ToolResult(tool="memory_search", ok=False, output="", error=str(e))
class MemoryListTool(BaseTool):
_name = "memory_list"
_description = "List recent memories"
def __init__(self, sandbox: ToolSandbox, memory_interface=None) -> None:
super().__init__()
self._sandbox = sandbox
self._memory = memory_interface
def execute(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
limit = args.get("limit", 10)
if not self._memory:
return ToolResult(tool="memory_list", ok=False, output="", error="Memory not available")
try:
entries = self._memory.get_recent(limit=limit)
if not entries:
return ToolResult(tool="memory_list", ok=True, output="No memories", metadata={"count": 0})
output_lines = []
for entry in entries:
output_lines.append(f"{entry.kind}: {entry.text[:80]}")
return ToolResult(
tool="memory_list",
ok=True,
output="\n".join(output_lines),
metadata={"count": len(entries)},
)
except Exception as e:
logger.warning(f"Memory list failed: {e}")
return ToolResult(tool="memory_list", ok=False, output="", error=str(e))

View File

@ -0,0 +1,35 @@
from __future__ import annotations
from app.core.contracts import ToolResult, UserTask
from app.tools.base import BaseTool
from app.tools.sandbox import ToolSandbox
class Tool(BaseTool):
name = "file_read"
description = "Read file contents"
def __init__(self, sandbox: ToolSandbox) -> None:
self._sandbox = sandbox
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
path = args.get("path")
if not path:
return ToolResult(tool=self.name, ok=False, error="Missing path")
try:
resolved = self._sandbox.ensure_path_allowed(str(path))
if not resolved.exists():
return ToolResult(tool=self.name, ok=False, error=f"File not found: {path}")
content = resolved.read_text(encoding="utf-8")
return ToolResult(
tool=self.name,
ok=True,
output=content,
metadata={"path": str(resolved), "size": len(content)},
)
except PermissionError as e:
return ToolResult(tool=self.name, ok=False, error=f"Access denied: {e}")
except FileNotFoundError as e:
return ToolResult(tool=self.name, ok=False, error=f"File not found: {path}")
except Exception as e:
return ToolResult(tool=self.name, ok=False, error=f"Error: {e}")

View File

@ -0,0 +1,10 @@
{
"name": "file_read",
"version": "1.0",
"entrypoint": "Tool",
"description": "Read file contents from allowed paths",
"args_schema": {
"path": {"type": "string", "required": true, "description": "File path to read"}
},
"requires_permission": false
}

View File

@ -0,0 +1,33 @@
from __future__ import annotations
from app.core.contracts import ToolResult, UserTask
from app.tools.base import BaseTool
from app.tools.sandbox import ToolSandbox
class Tool(BaseTool):
name = "file_write"
description = "Write content to file"
def __init__(self, sandbox: ToolSandbox) -> None:
self._sandbox = sandbox
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
path = args.get("path")
content = str(args.get("content", ""))
if not path:
return ToolResult(tool=self.name, ok=False, error="Missing path")
try:
resolved = self._sandbox.ensure_path_allowed(str(path))
resolved.parent.mkdir(parents=True, exist_ok=True)
resolved.write_text(content, encoding="utf-8")
return ToolResult(
tool=self.name,
ok=True,
output=f"Wrote {len(content)} bytes",
metadata={"path": str(resolved), "size": len(content)},
)
except PermissionError as e:
return ToolResult(tool=self.name, ok=False, error=f"Access denied: {e}")
except Exception as e:
return ToolResult(tool=self.name, ok=False, error=f"Error: {e}")

View File

@ -0,0 +1,11 @@
{
"name": "file_write",
"version": "1.0",
"entrypoint": "Tool",
"description": "Write content to file",
"args_schema": {
"path": {"type": "string", "required": true, "description": "File path to write"},
"content": {"type": "string", "required": true, "description": "Content to write"}
},
"requires_permission": true
}

View File

@ -0,0 +1,112 @@
from __future__ import annotations
import logging
from typing import Any
from app.core.contracts import ToolResult, UserTask
from app.tools.base import BaseTool
logger = logging.getLogger(__name__)
class Tool(BaseTool):
name = "memory"
description = "Memory operations: insert, search, list"
def __init__(self, memory_interface=None) -> None:
self._memory = memory_interface
def execute(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
action = args.get("action", "search")
if action == "insert":
return self._insert(task, args)
elif action == "search":
return self._search(task, args)
elif action == "list":
return self._list(task, args)
else:
return ToolResult(tool=self.name, ok=False, error=f"Unknown action: {action}")
def _insert(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
text = args.get("text", "")
kind = args.get("kind", "fact")
source = args.get("source", "user")
weight = args.get("weight", 0.5)
if not text:
return ToolResult(tool=self.name, ok=False, output="", error="text is required")
if not self._memory:
return ToolResult(tool=self.name, ok=False, output="", error="Memory not available")
try:
entry = self._memory.insert(
text=text,
kind=kind,
source=source,
task_id=task.task_id,
session_id=task.session_id,
weight=weight,
)
return ToolResult(
tool=self.name,
ok=True,
output=f"Stored: {entry.id}",
metadata={"entry_id": entry.id},
)
except Exception as e:
logger.warning(f"Memory insert failed: {e}")
return ToolResult(tool=self.name, ok=False, output="", error=str(e))
def _search(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
query = args.get("query", "")
top_k = args.get("top_k", 5)
if not query:
return ToolResult(tool=self.name, ok=False, output="", error="query is required")
if not self._memory:
return ToolResult(tool=self.name, ok=False, output="", error="Memory not available")
try:
results = self._memory.search(query, top_k=top_k)
if not results:
return ToolResult(tool=self.name, ok=True, output="No results found", metadata={"count": 0})
output_lines = []
for entry, score in results:
output_lines.append(f"[{score:.2f}] {entry.text[:100]}")
return ToolResult(
tool=self.name,
ok=True,
output="\n".join(output_lines),
metadata={"count": len(results)},
)
except Exception as e:
logger.warning(f"Memory search failed: {e}")
return ToolResult(tool=self.name, ok=False, output="", error=str(e))
def _list(self, task: UserTask, args: dict[str, Any]) -> ToolResult:
limit = args.get("limit", 10)
if not self._memory:
return ToolResult(tool=self.name, ok=False, output="", error="Memory not available")
try:
entries = self._memory.get_recent(limit=limit)
if not entries:
return ToolResult(tool=self.name, ok=True, output="No memories", metadata={"count": 0})
output_lines = []
for entry in entries:
output_lines.append(f"{entry.kind}: {entry.text[:80]}")
return ToolResult(
tool=self.name,
ok=True,
output="\n".join(output_lines),
metadata={"count": len(entries)},
)
except Exception as e:
logger.warning(f"Memory list failed: {e}")
return ToolResult(tool=self.name, ok=False, output="", error=str(e))

View File

@ -0,0 +1,22 @@
{
"name": "memory",
"version": "1.0",
"entrypoint": "Tool",
"description": "Memory operations: insert, search, list",
"args_schema": {
"action": {
"type": "string",
"required": true,
"description": "Action: insert, search, or list",
"enum": ["insert", "search", "list"]
},
"text": {"type": "string", "required": false, "description": "Text to store (insert)"},
"query": {"type": "string", "required": false, "description": "Query string (search)"},
"kind": {"type": "string", "required": false, "description": "Memory kind: fact, command, etc"},
"source": {"type": "string", "required": false, "description": "Source: user, system, etc"},
"weight": {"type": "number", "required": false, "description": "Memory weight 0-1"},
"top_k": {"type": "number", "required": false, "description": "Max results (search)"},
"limit": {"type": "number", "required": false, "description": "Max entries (list)"}
},
"requires_permission": false
}

View File

@ -0,0 +1,33 @@
from __future__ import annotations
from app.core.contracts import ToolResult, UserTask
from app.tools.base import BaseTool
from app.tools.sandbox import ToolSandbox
class Tool(BaseTool):
name = "shell_exec"
description = "Execute shell commands"
def __init__(self, sandbox: ToolSandbox) -> None:
self._sandbox = sandbox
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
command = str(args.get("command", "")).strip()
if not command:
return ToolResult(tool=self.name, ok=False, error="Missing command", metadata={"exit_code": -1})
cwd = args.get("cwd")
stdin_secret = args.get("stdin_secret")
completed = self._sandbox.run_shell(
command=command,
cwd=str(cwd) if cwd else None,
stdin_data=str(stdin_secret) if stdin_secret is not None else None,
)
output = completed.stdout if completed.returncode == 0 else completed.stderr or completed.stdout
return ToolResult(
tool=self.name,
ok=completed.returncode == 0,
output=output,
error=None if completed.returncode == 0 else f"Command failed with exit code {completed.returncode}",
metadata={"exit_code": completed.returncode},
)

View File

@ -0,0 +1,12 @@
{
"name": "shell_exec",
"version": "1.0",
"entrypoint": "Tool",
"description": "Execute shell commands in sandboxed environment",
"args_schema": {
"command": {"type": "string", "required": true, "description": "Shell command to execute"},
"cwd": {"type": "string", "required": false, "description": "Working directory"},
"stdin_secret": {"type": "string", "required": false, "description": "Data to pass via stdin"}
},
"requires_permission": true
}

61
app/tools/registry.py Normal file
View File

@ -0,0 +1,61 @@
from __future__ import annotations
import logging
from typing import Any, Callable
from app.tools.base import BaseTool
from app.tools.discover import ToolDiscovery
logger = logging.getLogger(__name__)
class ToolRegistry:
def __init__(self) -> None:
self._tools: dict[str, BaseTool] = {}
self._schemas: dict[str, dict[str, Any]] = {}
def register(self, tool: BaseTool) -> None:
self._tools[tool.name] = tool
def discover_and_init(
self,
init_factory: Callable[[dict], BaseTool] | None = None,
) -> None:
"""Discover tools from plugins and initialize them."""
discovery = ToolDiscovery()
discovered = discovery.discover()
for name, data in discovered.items():
manifest = data.get("manifest", {})
if init_factory:
tool = init_factory({"name": name, "manifest": manifest})
else:
tool_instance = data.get("instance")
if tool_instance:
self._tools[name] = tool_instance
self._schemas[name] = {
"description": manifest.get("description", ""),
"args_schema": manifest.get("args_schema", {}),
"requires_permission": manifest.get("requires_permission", False),
}
logger.info(f"Registered tool: {name}")
logger.warning(f"No init_factory provided for {name}")
def get(self, name: str) -> BaseTool:
if name not in self._tools:
raise KeyError(f"Tool {name} is not registered")
return self._tools[name]
def list_names(self) -> list[str]:
return list(self._tools.keys())
def get_schema(self, name: str) -> dict[str, Any]:
return self._schemas.get(name, {})
def list_schemas(self) -> list[dict[str, Any]]:
return [
{"name": name, **schema}
for name, schema in self._schemas.items()
]

39
app/tools/sandbox.py Normal file
View File

@ -0,0 +1,39 @@
from __future__ import annotations
import os
import subprocess
from pathlib import Path
class ToolSandbox:
"""Applies simple working directory and timeout restrictions."""
def __init__(self, allowed_root: str | Path, timeout_ms: int) -> None:
self._allowed_root = Path(allowed_root).resolve()
self._timeout_seconds = max(timeout_ms / 1000, 1)
def ensure_path_allowed(self, path: str | Path) -> Path:
resolved = Path(path).expanduser().resolve()
# Permission-first model: path is allowed if it exists
# Permission service will handle write/shell restrictions
return resolved
def run_shell(
self,
command: str,
cwd: str | Path | None = None,
stdin_data: str | None = None,
) -> subprocess.CompletedProcess[str]:
working_directory = self.ensure_path_allowed(cwd or self._allowed_root)
env = {"PATH": os.environ.get("PATH", "")}
return subprocess.run(
command,
shell=True,
cwd=str(working_directory),
env=env,
text=True,
capture_output=True,
input=stdin_data,
timeout=self._timeout_seconds,
check=False,
)

47
app/tools/shell_exec.py Normal file
View File

@ -0,0 +1,47 @@
from __future__ import annotations
from app.core.contracts import ToolResult, UserTask
from app.tools.base import BaseTool
from app.tools.sandbox import ToolSandbox
class ShellExecTool(BaseTool):
name = "shell_exec"
def __init__(self, sandbox: ToolSandbox) -> None:
self._sandbox = sandbox
def execute(self, task: UserTask, args: dict[str, object]) -> ToolResult:
command = str(args.get("command", "")).strip()
if not command:
return ToolResult(tool=self.name, ok=False, error="Missing command", metadata={"exit_code": -1})
cwd = args.get("cwd")
stdin_secret = args.get("stdin_secret")
password = args.get("password")
if password:
command = f'echo "{password}" | sudo -S {command}'
completed = self._sandbox.run_shell(
command=command,
cwd=str(cwd) if cwd else None,
stdin_data=str(stdin_secret) if stdin_secret is not None else None,
)
output = completed.stdout if completed.returncode == 0 else completed.stderr or completed.stdout
error_output = completed.stderr or completed.stdout
is_sudo_error = (
completed.returncode != 0 and
("permission denied" in error_output.lower() or
"incorrect password" in error_output.lower() or
"sudo: password incorrect" in error_output.lower() or
"wrong password" in error_output.lower())
)
return ToolResult(
tool=self.name,
ok=completed.returncode == 0,
output=output,
error=None if completed.returncode == 0 else f"Command failed with exit code {completed.returncode}",
metadata={"exit_code": completed.returncode, "needs_sudo": is_sudo_error},
)

42
config/models.json Normal file
View File

@ -0,0 +1,42 @@
{
"thinker": {
"path": "Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M.gguf",
"backend": "vulkan",
"n_gpu_layers": -1,
"max_tokens": 2048,
"temperature": 0.3
},
"json_compiler": {
"path": "gemma-4-E4B-it-Q4_K_M.gguf",
"backend": "cpu",
"n_gpu_layers": 0,
"max_tokens": 1024,
"temperature": 0.1
},
"coder": {
"path": "X-Coder-SFT-Qwen3-8B.Q6_K.gguf",
"backend": "cpu",
"n_gpu_layers": 0,
"max_tokens": 2048,
"temperature": 0.2
},
"critic": {
"path": "gemma-4-E4B-it-Q4_K_M.gguf",
"backend": "cpu",
"n_gpu_layers": 0,
"max_tokens": 1024,
"temperature": 0.1
},
"sys_util": {
"path": "Menlo_Lucy-Q4_K_M.gguf",
"backend": "cpu",
"n_gpu_layers": 0,
"max_tokens": 1024,
"temperature": 0.1
},
"embeddings": {
"path": "all-MiniLM-L6-v2",
"model_name": "sentence-transformers/all-MiniLM-L6-v2",
"embedding_dim": 384
}
}

88
config/permissions.json Normal file
View File

@ -0,0 +1,88 @@
{
"description": "Permission-first model configuration",
"settings": {
"allow_caching": true,
"cache_file": "data/runtime/allowed_commands.json",
"normalize_commands": true,
"split_chained": true
},
"command_categories": {
"hard_stop": {
"description": "Commands that are never executed - hard stop",
"allow_once": false,
"allow_always": false,
"commands": [
"rm -rf /",
"rm -rf /*",
"dd if=/dev/zero of=/dev/sd*",
"dd if=/dev/zero of=/dev/hd*",
"mkfs",
"> /dev/sd*",
"> /dev/hd*"
]
},
"no_always": {
"description": "Dangerous commands - allow once only",
"allow_once": true,
"allow_always": false,
"commands": [
"rm -rf *",
"rm -rf .*",
"curl |",
"wget -O- |",
":(){:|:&};:",
"fork",
"chmod -R 000",
"chmod -R 777",
"chown -R",
"shutdown",
"reboot",
"halt",
"init 0",
"init 6",
"telinit",
"systemctl stop",
"systemctl start",
"systemctl restart",
"service stop",
"service start",
"kill -9 -1",
"killall",
"pkill -9",
"reboot -f",
"shutdown -h now",
"poweroff",
"echo .* > /proc/",
"echo .* > /sys/"
]
},
"normal": {
"description": "Normal commands - allow once or always",
"allow_once": true,
"allow_always": true,
"commands": [
"shell_exec",
"file_write"
],
"file_extensions": [
".py",
".txt",
".json",
".md",
".yaml",
".yml",
".sh",
".bash"
]
}
},
"path_settings": {
"allow_read_outside": true,
"allow_write_paths": [
"/home/mirivlad/git/ducklm",
"/tmp"
],
"require_confirmation_for_write": true,
"require_confirmation_for_shell": true
}
}

8
config/prompts.json Normal file
View File

@ -0,0 +1,8 @@
{
"orchestrator": "You are an expert orchestrator for a local AI agent system. Your role is to analyze the user's task and generate executable runtime steps.\n\nTool selection (choose the right tool):\n- shell_exec: for running commands, checking programs exist ('which', '--version'), searching files\n- file_read: for reading contents of a file (must be existing file path)\n- file_write: for creating or updating files\n- memory: for storing or searching memory\n\nSTRICT OUTPUT FORMAT - MUST follow exactly:\n\nSingle step:\n{\"type\": \"step\", \"payload\": {\"tool\": \"shell_exec\", \"args\": {\"command\": \"...\"}}}\n{\"type\": \"step\", \"payload\": {\"tool\": \"file_read\", \"args\": {\"path\": \"...\"}}}\n{\"type\": \"step\", \"payload\": {\"tool\": \"file_write\", \"args\": {\"path\": \"...\", \"content\": \"...\"}}}\n\nMulti-step plan:\n{\"type\": \"plan\", \"payload\": {\"steps\": [{\"tool\": \"file_read\", \"args\": {\"path\": \"...\"}, \"description\": \"...\", \"depends_on\": []}]}}\n\nDirect response:\n{\"type\": \"respond\", \"payload\": {\"text\": \"...\"}}\n\nIMPORTANT:\n- Use exactly {\"type\": \"step|plan|respond\", \"payload\": {...}} format\n- Do NOT output array alone\n- Do NOT use \"kind\" - use \"type\"\n- Respond ONLY with valid JSON\n- Your response MUST be complete valid JSON - the closing brace } MUST be present\n- Do NOT truncate your response - if you cannot fit all steps, use a single step\n\nTool selection:\n- For checking if a program/command exists: use shell_exec with 'which <program>' or '<program> --version'\n- For reading file contents: use file_read with path to file (NOT command)\n- For executing any command: use shell_exec\n- Previous experience (from memory) may help - consider it but YOU decide how to proceed",
"planning": "You are a planning specialist. Generate execution plans.\n\nOutput MUST be:\n{\"type\": \"plan\", \"version\": \"1.0\", \"payload\": {\"steps\": [{\"tool\": \"\", \"args\": {}, \"description\": \"...\", \"depends_on\": []}]}}\n\nRules:\n- Each step must have unique id (auto-generated)\n- Use \"depends_on\" for step ordering\n- Use \"tool\" for tool operations\n- Respond ONLY with valid JSON",
"coder": "You are an expert code generation model.\n\nOutput format:\n{\"type\": \"code\", \"payload\": {\"language\": \"python\", \"content\": \"...\"}}\n\nOR for completion:\n{\"type\": \"respond\", \"payload\": {\"text\": \"...\"}}\n\nGenerate clean, working code. Respond ONLY with valid JSON.",
"critic": "You are a critic model. Evaluate tool execution results.\n\nScoring criteria:\n- correctness: 0-1 (does result accomplish task?)\n- usefulness: 0-1 (is result useful?)\n- safety: 0-1 (is result safe?)\n- suggest_memory: boolean (should this be stored in memory?)\n- weight: 0-1 (importance score)\n- explanation: brief reasoning\n\nOutput format:\n{\"type\": \"evaluation\", \"payload\": {\"correctness\": 0.0-1.0, \"usefulness\": 0.0-1.0, \"safety\": 0.0-1.0, \"suggest_memory\": true|false, \"weight\": 0.0-1.0, \"explanation\": \"...\"}}\n\nRespond ONLY with valid JSON.",
"system": "You are ducklm, a local AI agent runtime.\n\nSTRICT RULES:\n- You MUST strictly follow execution schemas\n- You are NOT allowed to output free-form text\n- All outputs MUST be valid JSON matching runtime contracts\n- Use exact tool names from available tool set\n\nCurrent capabilities:\n- Execute shell commands (shell_exec)\n- Read/write files (file_read, file_write)\n- Memory operations (memory)\n\nAlways respond with valid JSON.",
"sys_util": "You are a STRICT JSON repair engine inside a production AI runtime.\nYour job is ONLY to fix invalid JSON syntax.\nYou are NOT allowed to:\n- change meaning of data\n- add new fields\n- remove valid fields\n- interpret intent\n- explain anything\n- reformat structure logically\n---\nINPUT:\nYou receive a malformed or invalid JSON string.\n---\nOUTPUT RULES:\n- Output ONLY valid JSON\n- No markdown\n- No comments\n- No explanations\n- No extra text\n---\nREPAIR RULES (STRICT):\nFix ONLY syntax issues:\n- missing or extra commas\n- missing quotes\n- incorrect brackets\n- trailing commas\n- invalid escaping\n- broken strings\n- unbalanced braces\nDO NOT:\n- rename keys\n- reorder fields intentionally\n- guess missing semantic data\n- \"improve\" structure\n---\nIMPORTANT:\nIf multiple valid repairs exist:\n\u2192 choose the minimal change that makes JSON valid\n---\nOUTPUT MUST BE VALID JSON OR NOTHING ELSE\nInvalid JSON:"
}

9
config/prompts/coder.md Normal file
View File

@ -0,0 +1,9 @@
You are an expert code generation model.
Output format:
{"type": "code", "payload": {"language": "python", "content": "..."}}
OR for completion:
{"type": "respond", "payload": {"text": "..."}}
Generate clean, working code. Respond ONLY with valid JSON.

14
config/prompts/critic.md Normal file
View File

@ -0,0 +1,14 @@
You are a critic model. Evaluate tool execution results.
Scoring criteria:
- correctness: 0-1 (does result accomplish task?)
- usefulness: 0-1 (is result useful?)
- safety: 0-1 (is result safe?)
- suggest_memory: boolean (should this be stored in memory?)
- weight: 0-1 (importance score)
- explanation: brief reasoning
Output format:
{"type": "evaluation", "payload": {"correctness": 0.0-1.0, "usefulness": 0.0-1.0, "safety": 0.0-1.0, "suggest_memory": true|false, "weight": 0.0-1.0, "explanation": "..."}}
Respond ONLY with valid JSON.

View File

@ -0,0 +1,25 @@
You are a JSON Compiler. Convert semantic plan to strict JSON.
INPUT: Semantic plan from Thinker
OUTPUT: Valid JSON only
RULES:
- Convert ONLY, do not make decisions
- Do not invent tools
- Do not modify plan logic
- Do not skip steps
- Output ONLY valid JSON
AVAILABLE TOOLS:
- file_write (requires permission)
- shell_exec (execute shell commands, requires permission)
- memory (no permission needed)
- file_read (no permission needed)
- respond (just return text to user, no execution)
IMPORTANT: Use exactly "shell_exec" (not "shell") for shell commands!
OUTPUT FORMAT:
{"type": "plan", "payload": {"steps": [{"id": "1", "tool": "shell_exec", "args": {"command": "..."}, "depends_on": []}]}}
OR
{"type": "respond", "payload": {"text": "..."}}

View File

@ -0,0 +1,34 @@
You are an expert orchestrator for a local AI agent system. Your role is to analyze the user's task and generate executable runtime steps.
Tool selection (choose the right tool):
- shell_exec: for running commands, checking programs exist ('which', '--version'), searching files
- file_read: for reading contents of a file (must be existing file path)
- file_write: for creating or updating files
- memory: for storing or searching memory
STRICT OUTPUT FORMAT - MUST follow exactly:
Single step:
{"type": "step", "payload": {"tool": "shell_exec", "args": {"command": "..."}}}
{"type": "step", "payload": {"tool": "file_read", "args": {"path": "..."}}}
{"type": "step", "payload": {"tool": "file_write", "args": {"path": "...", "content": "..."}}}
Multi-step plan:
{"type": "plan", "payload": {"steps": [{"tool": "file_read", "args": {"path": "..."}, "description": "...", "depends_on": []}]}}
Direct response:
{"type": "respond", "payload": {"text": "..."}}
IMPORTANT:
- Use exactly {"type": "step|plan|respond", "payload": {...}} format
- Do NOT output array alone
- Do NOT use "kind" - use "type"
- Respond ONLY with valid JSON
- Your response MUST be complete valid JSON - the closing brace } MUST be present
- Do NOT truncate your response - if you cannot fit all steps, use a single step
Tool selection:
- For checking if a program/command exists: use shell_exec with 'which <program>' or '<program> --version'
- For reading file contents: use file_read with path to file (NOT command)
- For executing any command: use shell_exec
- Previous experience (from memory) may help - consider it but YOU decide how to proceed

View File

@ -0,0 +1,10 @@
You are a planning specialist. Generate execution plans.
Output MUST be:
{"type": "plan", "version": "1.0", "payload": {"steps": [{"tool": "", "args": {}, "description": "...", "depends_on": []}]}}
Rules:
- Each step must have unique id (auto-generated)
- Use "depends_on" for step ordering
- Use "tool" for tool operations
- Respond ONLY with valid JSON

View File

@ -0,0 +1,41 @@
You are a STRICT JSON repair engine inside a production AI runtime.
Your job is ONLY to fix invalid JSON syntax.
You are NOT allowed to:
- change meaning of data
- add new fields
- remove valid fields
- interpret intent
- explain anything
- reformat structure logically
---
INPUT:
You receive a malformed or invalid JSON string.
---
OUTPUT RULES:
- Output ONLY valid JSON
- No markdown
- No comments
- No explanations
- No extra text
---
REPAIR RULES (STRICT):
Fix ONLY syntax issues:
- missing or extra commas
- missing quotes
- incorrect brackets
- trailing commas
- invalid escaping
- broken strings
- unbalanced braces
DO NOT:
- rename keys
- reorder fields intentionally
- guess missing semantic data
- "improve" structure
---
IMPORTANT:
If multiple valid repairs exist:
→ choose the minimal change that makes JSON valid
---
OUTPUT MUST BE VALID JSON OR NOTHING ELSE
Invalid JSON:

14
config/prompts/system.md Normal file
View File

@ -0,0 +1,14 @@
You are ducklm, a local AI agent runtime.
STRICT RULES:
- You MUST strictly follow execution schemas
- You are NOT allowed to output free-form text
- All outputs MUST be valid JSON matching runtime contracts
- Use exact tool names from available tool set
Current capabilities:
- Execute shell commands (shell_exec)
- Read/write files (file_read, file_write)
- Memory operations (memory)
Always respond with valid JSON.

30
config/prompts/thinker.md Normal file
View File

@ -0,0 +1,30 @@
You are a Thinker. Analyze user task and create execution plan.
CONTEXT:
{task_summary}
{memory_context}
AVAILABLE TOOLS (injected at runtime):
{tools_json}
INSTRUCTIONS:
1. Understand what user wants
2. Create step-by-step plan in natural language
3. Choose appropriate tools from available
MODE: {mode_hint}
- If mode is "execution": create a plan with TOOL STEPS (shell_exec, file_write, etc)
- If mode is "conversation": just respond with text, NO tool execution
- If mode is "clarification_needed": ask user for clarification
OUTPUT FORMAT (SEMANTIC PLAN - NOT JSON):
For execution mode:
ПЛАН:
Шаг 1: [use tool - e.g., shell_exec]
Шаг 2: [use tool]
For conversation mode:
ОТВЕТ: [just text, no tools needed]
For clarification:
ОТВЕТ: [вопрос пользователю для уточнения]

38
config/runtime.json Normal file
View File

@ -0,0 +1,38 @@
{
"step_timeout_ms": 30000,
"task_timeout_ms": 300000,
"planner_retry_limit": 2,
"tool_retry_limit": 1,
"replan_limit": 1,
"max_execution_steps": 20,
"retrieval_top_k": 5,
"max_context_tokens": 8192,
"context_budgets": {
"system": 512,
"task": 512,
"memory": 2048,
"execution": 2048,
"tools": 1024,
"safety": 512
},
"reserve_for_generation_pct": 25,
"orchestrator_retry_limit": 2,
"memory_thresholds": {
"default_store_weight": 0.8
},
"critic_fallback_policy": "continue_without_critic",
"checkpoint_policy": {
"save_on_transition": true
},
"event_retention_policy": {
"keep_all": true
},
"streaming_settings": {
"enabled": true
},
"debug": true,
"debug_orchestrator_log_length": 500,
"json_fix_retry_limit": 2,
"json_fix_use_sys_util": true,
"intent_classifier": "thinker"
}

1
data/.gitkeep Normal file
View File

@ -0,0 +1 @@

1
data/events/.gitkeep Normal file
View File

@ -0,0 +1 @@

1
data/memory/.gitkeep Normal file
View File

@ -0,0 +1 @@

View File

@ -0,0 +1 @@

1
data/state/.gitkeep Normal file
View File

@ -0,0 +1 @@

5
main.py Normal file
View File

@ -0,0 +1,5 @@
from app.api.server import app
__all__ = ["app"]

29
pyproject.toml Normal file
View File

@ -0,0 +1,29 @@
[project]
name = "ducklm"
version = "0.1.0"
description = "Local event-driven multi-model execution runtime"
requires-python = ">=3.11"
dependencies = [
"fastapi>=0.115",
"httpx>=0.28",
"pydantic>=2.7",
"uvicorn>=0.30",
"websockets>=15.0",
"llama-cpp-python>=0.2.0",
"hnswlib>=0.8.0",
"sentence-transformers>=3.0",
"numpy>=1.26",
]
[build-system]
requires = ["setuptools>=68"]
build-backend = "setuptools.build_meta"
[tool.setuptools]
packages = ["app"]
[tool.setuptools.package-dir]
"" = "."
[tool.pytest.ini_options]
pythonpath = ["."]

81
scripts/server.sh Executable file
View File

@ -0,0 +1,81 @@
#!/usr/bin/env bash
set -euo pipefail
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
VENV_PYTHON="$ROOT_DIR/.venv/bin/python"
PID_FILE="$ROOT_DIR/data/runtime/server.pid"
LOG_FILE="$ROOT_DIR/data/runtime/server.log"
HOST="${HOST:-127.0.0.1}"
PORT="${PORT:-8000}"
mkdir -p "$ROOT_DIR/data/runtime"
export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}$ROOT_DIR/.venv/lib/python3.13/site-packages/llama_cpp/lib"
export GGML_VULKAN=1
is_running() {
if [[ -f "$PID_FILE" ]]; then
local pid
pid="$(cat "$PID_FILE")"
if kill -0 "$pid" >/dev/null 2>&1; then
return 0
fi
fi
return 1
}
start_server() {
if is_running; then
echo "Server already running with PID $(cat "$PID_FILE")"
exit 0
fi
nohup "$VENV_PYTHON" -m uvicorn main:app --host "$HOST" --port "$PORT" >"$LOG_FILE" 2>&1 &
echo $! >"$PID_FILE"
echo "Started server on http://$HOST:$PORT with PID $(cat "$PID_FILE")"
echo "Log: $LOG_FILE"
}
stop_server() {
if ! is_running; then
echo "Server is not running"
rm -f "$PID_FILE"
exit 0
fi
local pid
pid="$(cat "$PID_FILE")"
kill "$pid"
rm -f "$PID_FILE"
echo "Stopped server PID $pid"
}
status_server() {
if is_running; then
echo "Server is running with PID $(cat "$PID_FILE") on http://$HOST:$PORT"
else
echo "Server is not running"
fi
}
case "${1:-}" in
start)
start_server
;;
stop)
stop_server
;;
restart)
stop_server || true
start_server
;;
status)
status_server
;;
logs)
touch "$LOG_FILE"
tail -n 50 -f "$LOG_FILE"
;;
*)
echo "Usage: $0 {start|stop|restart|status|logs}"
exit 1
;;
esac

View File

@ -0,0 +1,27 @@
from app.api.server import chat, health, resolve_permission, resolve_secret
from app.core.permission_resolution import PermissionResolutionRequest, SecretResolutionRequest
from app.core.contracts import UserTask
def test_health_handler() -> None:
assert health() == {"status": "ok"}
def test_chat_handler_returns_runtime_events() -> None:
body = chat(UserTask(input="hello from handler test"))
assert body["status"] == "completed"
assert body["events"][0]["type"] == "task_received"
def test_resolve_permission_handler_allows_completion() -> None:
initial = chat(UserTask(input="запусти pwd"))
if initial["status"] == "awaiting_permission":
body = resolve_permission(
PermissionResolutionRequest(task_id=initial["task_id"], decision="allow_once")
)
assert body["status"] in {"completed", "failed"}
def test_resolve_secret_handler_requires_pending_request() -> None:
body = resolve_secret(SecretResolutionRequest(task_id="missing", secret="x"))
assert body["status"] == "failed"

37
tests/test_contracts.py Normal file
View File

@ -0,0 +1,37 @@
from app.core.contracts import CriticScore, ExecutionDirective, PlanStep, UserTask
def test_user_task_defaults() -> None:
task = UserTask(input="hello")
assert task.task_id
assert task.session_id
def test_plan_step_supports_dependencies() -> None:
step = PlanStep(
id="step-1",
kind="tool",
tool="shell_exec",
description="run command",
depends_on=[],
)
assert step.tool == "shell_exec"
def test_critic_score_bounds() -> None:
score = CriticScore(
correctness=1.0,
usefulness=0.5,
safety=0.0,
memory_store=False,
weight=0.2,
explanation="ok",
)
assert score.weight == 0.2
def test_execution_directive_defaults() -> None:
directive = ExecutionDirective(type="noop")
assert directive.payload == {}
assert directive.confidence == 0.0

View File

@ -0,0 +1,25 @@
from app.core.contracts import UserTask
from app.runtime.runtime_controller import RuntimeController
def test_runtime_loop_emits_basic_events() -> None:
controller = RuntimeController()
result = controller.handle_task(UserTask(input="hello runtime"))
event_types = [event["type"] for event in result["events"]]
assert result["status"] == "completed"
assert "message" in result["result"]
assert "task_received" in event_types
assert "context_built" in event_types
assert "task_completed" in event_types
def test_runtime_loop_routes_natural_language_shell_request_to_permission_flow() -> None:
controller = RuntimeController()
result = controller.handle_task(UserTask(input="запусти sudo apt update"))
event_types = [event["type"] for event in result["events"]]
assert result["status"] == "awaiting_permission"
assert result["directive"]["type"] == "tool"
assert result["directive"]["payload"]["tool"] == "shell_exec"
assert "permission_requested" in event_types
assert "task_awaiting_permission" in event_types
assert result["result"]["error"] == "Permission required before execution."

147
tests/test_tools_flow.py Normal file
View File

@ -0,0 +1,147 @@
import json
from pathlib import Path
from app.core.contracts import UserTask
from app.runtime.runtime_controller import RuntimeController
def _write_config_tree(base_dir: Path) -> None:
(base_dir / "config").mkdir()
(base_dir / "data" / "events").mkdir(parents=True, exist_ok=True)
(base_dir / "data" / "state").mkdir(parents=True, exist_ok=True)
(base_dir / "data" / "permissions").mkdir(parents=True, exist_ok=True)
(base_dir / "models").mkdir(exist_ok=True)
configs = {
"models.json": {
"orchestrator_path": "models/llama.gguf",
"coder_path": "models/xcoder.gguf",
"critic_path": "models/gemma.gguf",
"embeddings_path": "models/all-MiniLM-L6-v2",
"inference": {},
},
"prompts.json": {
"orchestration_prompt": "",
"planning_prompt": "",
"coder_prompt": "",
"critic_prompt": "",
},
"permissions.json": {
"dangerous_commands": {"rm": "ask_always", "sudo": "ask_always"},
"sensitive_paths": ["/etc", "/usr", "/var"],
"default_approval_behavior": "ask_always",
},
"runtime.json": {
"step_timeout_ms": 5000,
"task_timeout_ms": 30000,
"planner_retry_limit": 1,
"tool_retry_limit": 0,
"replan_limit": 0,
"max_execution_steps": 5,
"retrieval_top_k": 3,
"memory_thresholds": {},
"critic_fallback_policy": "continue_without_critic",
"checkpoint_policy": {"save_on_transition": True},
"event_retention_policy": {"keep_all": True},
"streaming_settings": {"enabled": True},
},
}
for name, payload in configs.items():
(base_dir / "config" / name).write_text(json.dumps(payload), encoding="utf-8")
def test_file_write_and_read_tool_flow(tmp_path: Path) -> None:
_write_config_tree(tmp_path)
controller = RuntimeController(base_dir=tmp_path)
target = tmp_path / "notes" / "test.txt"
write_result = controller.handle_task(
UserTask(
input="write a file",
context={
"requested_tool": "file_write",
"tool_args": {"path": str(target), "content": "hello from ducklm"},
},
)
)
assert write_result["status"] == "completed"
assert target.read_text(encoding="utf-8") == "hello from ducklm"
read_result = controller.handle_task(
UserTask(
input="read the file",
context={
"requested_tool": "file_read",
"tool_args": {"path": str(target)},
},
)
)
assert read_result["status"] == "completed"
assert read_result["result"]["output"] == "hello from ducklm"
def test_shell_exec_requires_permission_for_dangerous_command(tmp_path: Path) -> None:
_write_config_tree(tmp_path)
controller = RuntimeController(base_dir=tmp_path)
result = controller.handle_task(
UserTask(
input="run dangerous shell command",
context={
"requested_tool": "shell_exec",
"tool_args": {"command": "rm -rf /tmp/nonexistent"},
},
)
)
assert result["status"] == "awaiting_permission"
assert "permission_request" in result["result"]
def test_shell_exec_allows_safe_command(tmp_path: Path) -> None:
_write_config_tree(tmp_path)
controller = RuntimeController(base_dir=tmp_path)
result = controller.handle_task(
UserTask(
input="run safe shell command",
context={
"requested_tool": "shell_exec",
"tool_args": {"command": "pwd"},
},
)
)
assert result["status"] == "completed"
assert str(tmp_path) in result["result"]["output"]
def test_permission_resolution_can_resume_task(tmp_path: Path) -> None:
_write_config_tree(tmp_path)
controller = RuntimeController(base_dir=tmp_path)
initial = controller.handle_task(
UserTask(
input="запусти sudo apt update",
)
)
assert initial["status"] == "awaiting_permission"
resumed = controller.resolve_permission(task_id=initial["task_id"], decision="deny")
assert resumed["status"] == "failed"
assert resumed["result"]["error"] == "Permission denied by user."
def test_sudo_permission_resolution_requests_secret_input(tmp_path: Path) -> None:
_write_config_tree(tmp_path)
controller = RuntimeController(base_dir=tmp_path)
initial = controller.handle_task(UserTask(input="запусти sudo apt update"))
assert initial["status"] == "awaiting_permission"
resumed = controller.resolve_permission(task_id=initial["task_id"], decision="allow_once")
assert resumed["status"] == "awaiting_input"
assert resumed["result"]["secret_request"]["kind"] == "sudo_password"
def test_secret_resolution_continues_after_pending_secret_saved(tmp_path: Path) -> None:
_write_config_tree(tmp_path)
controller = RuntimeController(base_dir=tmp_path)
initial = controller.handle_task(UserTask(input="запусти sudo apt update"))
resumed = controller.resolve_permission(task_id=initial["task_id"], decision="allow_once")
assert resumed["status"] == "awaiting_input"
final = controller.resolve_secret(task_id=initial["task_id"], secret="wrongpass")
assert final["status"] in {"completed", "failed"}
assert "error" in final["result"] or "output" in final["result"]