249 lines
12 KiB
Markdown
249 lines
12 KiB
Markdown
# DuckLM Runtime Architecture Review
|
|
|
|
## 🧠 1. System Overview
|
|
|
|
**What is runtime?**
|
|
Runtime is the execution substrate of the system — a multi-layered cognitive execution environment that orchestrates LLMs, tools, memory, and permissions into a unified agentic workflow. It's the `RuntimeController` that composes `RuntimeLoop`, `ExecutionEngine`, `ContextBuilder`, `AsyncRouter`, `PermissionService`, and `EventBus`.
|
|
|
|
**What is the core loop?**
|
|
The core loop is the `RuntimeLoop.run_task()` method: it receives a `UserTask`, applies permission hard-stop checks, creates task state, builds context via `ContextBuilder`, routes via `AsyncRouter` to get a `directive`, executes via `ExecutionEngine`, applies `Critic` evaluation, saves via `MemoryPolicy`, publishes `RuntimeEvent`s through `EventBus`, and returns streaming output.
|
|
|
|
**Models (Orchestrator / Coder / Critic / Utility)**
|
|
- **Orchestrator** (`OrchestratorAdapter`/`AsyncOrchestratorAdapter`): LLM that decides plan vs direct respond vs tool; generates `ExecutionDirective` of type `plan`, `tool`, `respond`, `fail`, etc.
|
|
- **Coder** (`CoderAdapter`/`AsyncCoderAdapter`): LLM specialized for code generation and manipulation.
|
|
- **Critic** (`CriticAdapter`/`AsyncCriticAdapter`): Evaluates tool outputs with JSON scoring (correctness, usefulness, safety, memory_store, weight).
|
|
- **Utility**: The `sys_util` orchestrator — a fallback/orchestration layer for system-level operations.
|
|
|
|
**What is "truth"? (Event Store / State Store)**
|
|
- **Event Store** (`SQLiteEventStore`): Immutable append-only log of `RuntimeEvent`s per task. Source of truth for "what happened."
|
|
- **State Store** (`SQLiteTaskStateStore`): Current mutable task state (status, last_directive, pending requests). "Current truth" of task progress.
|
|
- **Checkpoint Store** (`SQLiteCheckpointStore`): Snapshots of task state + context at milestones.
|
|
- **Memory Store** (`MemoryStore` + `VectorIndex`): Long-term knowledge base with weighted entries.
|
|
|
|
---
|
|
|
|
## 🔁 2. End-to-End Flow
|
|
|
|
### High-Level Flow (as seen in logs)
|
|
```
|
|
User Input
|
|
→ Router (AsyncRouter.decide)
|
|
→ Context Builder (ContextBuilder.build)
|
|
→ Orchestrator (decides plan vs direct)
|
|
→ Plan / Direct Action
|
|
→ Execution Engine
|
|
→ Tool Layer (ToolRegistry + ToolSandbox)
|
|
→ Critic (AsyncCriticAdapter)
|
|
→ Memory Policy (MemoryWritePolicy)
|
|
→ Event Bus (SQLiteEventStore)
|
|
→ Streaming Output (via WebSocket / SSE)
|
|
```
|
|
|
|
### Conversation Flow
|
|
1. **Router** decides `plan` vs `respond` vs `tool` vs `fail` based on orchestrator output or intent parser.
|
|
2. **Context Builder** enriches task with memory context, tool context, execution context, and safety constraints.
|
|
3. **Orchestrator** (or direct respond) produces the initial `ExecutionDirective`.
|
|
4. **Execution Engine** schedules via `ExecutionScheduler`, then executes:
|
|
- `plan` → parse into `PlanStep`s, build task graph, execute ready steps
|
|
- `tool` → validate tool existence, check permissions, execute via `ToolRegistry`
|
|
- `respond` → direct response
|
|
- `fail` → immediate failure
|
|
5. **Tool Layer** (`ToolRegistry` + `ToolSandbox`):
|
|
- Plugin discovery via `ToolDiscovery`
|
|
- Manifest-based tool registration
|
|
- Sandboxed execution with timeout
|
|
6. **Critic** evaluates tool results (if enabled), outputs `CriticScore` JSON.
|
|
7. **Memory Policy** decides whether to insert `tool_result`, `critique`, `plan`, `fact`, `summary`, or `user_preference` into memory.
|
|
8. **Event Bus** (`SQLiteEventBus`) publishes `RuntimeEvent` with sequence ordering.
|
|
9. **Streaming Output** replays events via WebSocket and sends incremental responses.
|
|
|
|
### Failure Flow
|
|
- **Invalid JSON flow**: `ExecutionScheduler.parse_plan_steps` catches `JSONDecodeError` / `ValueError` / `TypeError`, logs warning, returns empty steps → `plan` fails with "Failed to parse plan steps."
|
|
- **Tool failure flow**: Tool execution returns `{"status": "failed", "result": {"error": "..."}}` → ExecutionEngine returns failed status → task state updated → event `TASK_FAILED` published → stops further plan steps.
|
|
- **Critic failure flow**: `_evaluate_with_critic` catches exception, logs warning, publishes `CRITIC_RESULT` with error → critic_score is `None` → execution continues without critique.
|
|
- **Orchestrator fallback flow**: If primary orchestrator fails or missing, `AsyncRouter` has `sys_util` fallback (utility orchestrator) for system-level decisions.
|
|
- **Permission denial flow**: `PermissionService.check_shell_command` / `check_write_path` returns `decision: "hard_stop"` or `decision: "deny"` → immediate failure with blocked reason; if `decision: "prompt"` → `TASK_AWAITING_PERMISSION` state.
|
|
|
|
### Repair Flow (JSON / Tool-call)
|
|
- Repair is triggered via `resolve_permission` or `resolve_secret` endpoints.
|
|
- Permission repair: user provides `decision` ("allow_once"/"allow_always"/"deny"/"ask_always") → `PermissionService.resolve_permission` → updates state → retries original directive.
|
|
- Secret repair: user provides secret string → `ExecutionEngine.execute` with `secret_override` → continues execution.
|
|
|
|
---
|
|
|
|
## ⚙️ 3. Component Breakdown
|
|
|
|
### `runtime_loop` (`RuntimeLoop`)
|
|
- **Responsibility**: Central task coordination; state management; event publishing.
|
|
- **Input**: `UserTask`
|
|
- **Output**: `{"task_id", "status", "directive", "result", "events"}`
|
|
- **Must NOT do**: Direct LLM calls (delegates to router/execution_engine); bypass state store.
|
|
|
|
### `execution_engine` (`ExecutionEngine`)
|
|
- **Responsibility**: Execute directives (plan/tool/respond/fail); integrate critic; interface with tool registry.
|
|
- **Input**: `UserTask`, `ExecutionDirective`, optional `permission_override`, `secret_override`
|
|
- **Output**: `{"status", "result", "step_results"}`
|
|
- **Must NOT do**: Bypass permission checks; skip critic evaluation when enabled; leak secrets in logs.
|
|
|
|
### `scheduler` (`ExecutionScheduler`)
|
|
- **Responsibility**: Parse plan JSON, build task dependency graph, yield ready steps, detect cycles.
|
|
- **Input**: JSON plan string, `task_id`
|
|
- **Output**: `list[PlanStep]`
|
|
- **Must NOT do**: Execute anything; modify task state directly.
|
|
|
|
### `tool_registry` (`ToolRegistry`)
|
|
- **Responsibility**: Register/manifest tools; execute via `ToolSandbox`; provide schema metadata.
|
|
- **Input**: tool name, args dict
|
|
- **Output**: `ToolResult`
|
|
- **Must NOT do**: Bypass sandbox; execute privileged host commands without sandbox.
|
|
|
|
### `event_bus` (`EventBus` → `SQLiteEventStore`)
|
|
- **Responsibility**: Append-only event persistence; sequence numbering; per-task query.
|
|
- **Input**: `RuntimeEvent`
|
|
- **Output**: event stream
|
|
- **Must NOT do**: Modify state store directly (state is separate); delete or mutate events.
|
|
|
|
### `memory` (`MemoryInterface` → `MemoryStore` + `VectorIndex`)
|
|
- **Responsibility**: Store/retrieve weighted memory entries; vector similarity search; integrate with context builder.
|
|
- **Input**: text, kind, source, weight, metadata
|
|
- **Output**: search results or insertion confirmation
|
|
- **Must NOT do**: Expose raw embeddings without access control; store secrets.
|
|
|
|
---
|
|
|
|
## 🧩 4. Data Contracts
|
|
|
|
### `PlanStep`
|
|
```python
|
|
id: str
|
|
kind: Literal["tool", "coder", "memory", "respond"]
|
|
tool: str | None
|
|
args: dict[str, Any]
|
|
description: str
|
|
requires_confirmation: bool
|
|
depends_on: list[str]
|
|
```
|
|
**Real example** (from `router` prompt engineering):
|
|
`{"id":"step-0","kind":"tool","tool":"shell_exec","args":{"command":"ls -la"},"description":"List directory","requires_confirmation":false,"depends_on":[]}`
|
|
|
|
### `ToolCall`
|
|
```python
|
|
tool: str
|
|
args: dict[str, Any]
|
|
task_id: str
|
|
step_id: str
|
|
```
|
|
**Real log**: `TOOL_CALLED` event with `{"tool":"shell_exec","args":{"command":"pwd"},"task_id":"xyz","step_id":"step-0"}`
|
|
|
|
### `ToolResult`
|
|
```python
|
|
tool: str
|
|
ok: bool
|
|
output: Any
|
|
error: str | None
|
|
metadata: dict[str, Any]
|
|
```
|
|
**Real output**: `{"tool":"shell_exec","ok":true,"output":"/app","error":null,"metadata":{}}`
|
|
|
|
### `RuntimeEvent`
|
|
```python
|
|
event_id: str
|
|
task_id: str
|
|
session_id: str
|
|
sequence: int
|
|
type: str # e.g. TASK_RECEIVED, TOOL_CALLED, TASK_COMPLETED
|
|
payload: dict[str, Any]
|
|
causation_id: str | None
|
|
correlation_id: str
|
|
```
|
|
**Real event stream**: `TASK_RECEIVED → CONTEXT_BUILT → PLAN_STARTED → TOOL_CALLED → TOOL_COMPLETED → TASK_COMPLETED`
|
|
|
|
### `MemoryEntry`
|
|
```python
|
|
id: str
|
|
text: str
|
|
kind: Literal["tool_result","plan","critique","fact","summary","user_preference"]
|
|
source: Literal["tool","critic","user","system"]
|
|
weight: float
|
|
task_id: str | None
|
|
session_id: str | None
|
|
metadata: dict[str, Any]
|
|
embedding_model: str
|
|
embedding_dim: int
|
|
```
|
|
**Real insertion**: After critic evaluation, `kind="critique"`, `source="critic"`, `weight=0.85`, metadata includes scores.
|
|
|
|
---
|
|
|
|
## 🔥 5. Failure Modes
|
|
|
|
### Invalid JSON Flow
|
|
- **Trigger**: Malformed plan JSON (e.g., missing braces, non-JSON string).
|
|
- **Detection**: `parse_plan_steps` catches `JSONDecodeError` / `ValueError` / `TypeError`.
|
|
- **Result**: Warning logged, empty steps returned → `PLAN_FAILED` with `"Failed to parse plan steps from directive"`.
|
|
|
|
### Tool Failure Flow
|
|
- **Trigger**: Tool returns `ok=False` or raises exception in sandbox.
|
|
- **Detection**: `_execute_tool` checks `tool_result.ok`.
|
|
- **Result**: Status `"failed"`, result contains `{"error": "...", "failed_step": step.id, "step_results": [...]}` → `TASK_FAILED` event; further plan steps skipped.
|
|
|
|
### Critic Failure Flow
|
|
- **Trigger**: Critic adapter raises exception or returns non-JSON output.
|
|
- **Detection**: `_evaluate_with_critic` catches exception, logs warning.
|
|
- **Result**: Event `CRITIC_RESULT` with error payload → `critic_score = None` → execution continues without critique; memory write skipped.
|
|
|
|
### Orchestrator Fallback Flow
|
|
- **Trigger**: Primary orchestrator model unavailable or returns invalid directive.
|
|
- **Detection**: `_ensure_orchestrator` returns `None`; router falls back to `sys_util` orchestrator.
|
|
- **Result**: Utility orchestrator handles system-level decisions (e.g., file operations, environment queries).
|
|
|
|
### Permission Denial Flow
|
|
- **Trigger**: `PermissionService` returns `decision: "hard_stop"` or `"deny"`.
|
|
- **Detection**: `_execute_tool` checks `permission_result`.
|
|
- **Result**: Immediate failure with `"Command blocked: ..."` → `TASK_FAILED`; no tool execution.
|
|
|
|
---
|
|
|
|
## 🧠 6. "Decision Logic Map"
|
|
|
|
### Orchestrator vs Direct Respond
|
|
- **Use orchestrator** when: task requires planning, multi-step tool usage, or unknown intent. Orchestrator decides to emit `plan` or `tool` directive.
|
|
- **Direct respond** when: intent parser classifies as simple query (`TASK_RECEIVED` → `router.intent_parser` → `respond` directive) or `respond` directive explicitly set.
|
|
|
|
### Utility Model Call
|
|
- Invoked when `sys_util` orchestrator is loaded (configurable). Used for system-level operations: environment inspection, file system queries, or when primary orchestrator fails and fallback is needed.
|
|
|
|
### Retry Logic
|
|
- **Planner retry**: `ExecutionScheduler` has `retry_limit=2`; on parse/validation failure, retries up to limit before failing plan.
|
|
- **Tool retry**: Not implemented natively; retry must be encoded in plan steps (`depends_on`, manual replan).
|
|
|
|
### Plan Creation
|
|
- **Trigger**: Orchestrator output contains `{type: "plan", ...}` or explicit `plan` directive.
|
|
- **Process**: `parse_plan_steps` → `validate_no_cycles` → `build_task_graph` → ready steps execution.
|
|
- **No plan**: Orchestrator outputs `respond` or `tool` → direct execution.
|
|
|
|
---
|
|
|
|
## 🧰 7. Tool System Architecture
|
|
|
|
### Plugin Discovery
|
|
- `ToolDiscovery` scans `app/tools/plugins/` for modules exporting `Tool` classes.
|
|
- Discovers: `shell_exec`, `file_read`, `file_write`, `memory` (search/insert/list).
|
|
|
|
### Manifest-Based Tools
|
|
- Each plugin has a `manifest.json` with:
|
|
- `description`: human-readable docstring.
|
|
- `args_schema`: JSON schema for validation.
|
|
- `requires_permission`: boolean for privileged tools (`shell_exec`, `file_write`).
|
|
- On discovery, registry registers tool and stores schema for permission/routing.
|
|
|
|
### Registry Bootstrap
|
|
- `RuntimeController._create_tool_registry()` initializes discovery, loads plugins, registers with init mapping (sandbox, permissions).
|
|
- Tools are initialized once at startup; `tool_registry` is shared across executions.
|
|
|
|
### Execution Isolation
|
|
- **ToolSandbox** (`ToolSandbox`):
|
|
- Restricts filesystem to `allowed_root` (project base dir).
|
|
- Timeout per execution (`step_timeout_ms`).
|
|
- Blocks `sudo` without secret override; requires secret injection for sudo commands.
|
|
- **Permission gating**: `shell_exec` and `file_write` require explicit permission decision before execution. |