# DuckLM Runtime Architecture Review ## 🧠 1. System Overview **What is runtime?** Runtime is the execution substrate of the system — a multi-layered cognitive execution environment that orchestrates LLMs, tools, memory, and permissions into a unified agentic workflow. It's the `RuntimeController` that composes `RuntimeLoop`, `ExecutionEngine`, `ContextBuilder`, `AsyncRouter`, `PermissionService`, and `EventBus`. **What is the core loop?** The core loop is the `RuntimeLoop.run_task()` method: it receives a `UserTask`, applies permission hard-stop checks, creates task state, builds context via `ContextBuilder`, routes via `AsyncRouter` to get a `directive`, executes via `ExecutionEngine`, applies `Critic` evaluation, saves via `MemoryPolicy`, publishes `RuntimeEvent`s through `EventBus`, and returns streaming output. **Models (Orchestrator / Coder / Critic / Utility)** - **Orchestrator** (`OrchestratorAdapter`/`AsyncOrchestratorAdapter`): LLM that decides plan vs direct respond vs tool; generates `ExecutionDirective` of type `plan`, `tool`, `respond`, `fail`, etc. - **Coder** (`CoderAdapter`/`AsyncCoderAdapter`): LLM specialized for code generation and manipulation. - **Critic** (`CriticAdapter`/`AsyncCriticAdapter`): Evaluates tool outputs with JSON scoring (correctness, usefulness, safety, memory_store, weight). - **Utility**: The `sys_util` orchestrator — a fallback/orchestration layer for system-level operations. **What is "truth"? (Event Store / State Store)** - **Event Store** (`SQLiteEventStore`): Immutable append-only log of `RuntimeEvent`s per task. Source of truth for "what happened." - **State Store** (`SQLiteTaskStateStore`): Current mutable task state (status, last_directive, pending requests). "Current truth" of task progress. - **Checkpoint Store** (`SQLiteCheckpointStore`): Snapshots of task state + context at milestones. - **Memory Store** (`MemoryStore` + `VectorIndex`): Long-term knowledge base with weighted entries. --- ## 🔁 2. End-to-End Flow ### High-Level Flow (as seen in logs) ``` User Input → Router (AsyncRouter.decide) → Context Builder (ContextBuilder.build) → Orchestrator (decides plan vs direct) → Plan / Direct Action → Execution Engine → Tool Layer (ToolRegistry + ToolSandbox) → Critic (AsyncCriticAdapter) → Memory Policy (MemoryWritePolicy) → Event Bus (SQLiteEventStore) → Streaming Output (via WebSocket / SSE) ``` ### Conversation Flow 1. **Router** decides `plan` vs `respond` vs `tool` vs `fail` based on orchestrator output or intent parser. 2. **Context Builder** enriches task with memory context, tool context, execution context, and safety constraints. 3. **Orchestrator** (or direct respond) produces the initial `ExecutionDirective`. 4. **Execution Engine** schedules via `ExecutionScheduler`, then executes: - `plan` → parse into `PlanStep`s, build task graph, execute ready steps - `tool` → validate tool existence, check permissions, execute via `ToolRegistry` - `respond` → direct response - `fail` → immediate failure 5. **Tool Layer** (`ToolRegistry` + `ToolSandbox`): - Plugin discovery via `ToolDiscovery` - Manifest-based tool registration - Sandboxed execution with timeout 6. **Critic** evaluates tool results (if enabled), outputs `CriticScore` JSON. 7. **Memory Policy** decides whether to insert `tool_result`, `critique`, `plan`, `fact`, `summary`, or `user_preference` into memory. 8. **Event Bus** (`SQLiteEventBus`) publishes `RuntimeEvent` with sequence ordering. 9. **Streaming Output** replays events via WebSocket and sends incremental responses. ### Failure Flow - **Invalid JSON flow**: `ExecutionScheduler.parse_plan_steps` catches `JSONDecodeError` / `ValueError` / `TypeError`, logs warning, returns empty steps → `plan` fails with "Failed to parse plan steps." - **Tool failure flow**: Tool execution returns `{"status": "failed", "result": {"error": "..."}}` → ExecutionEngine returns failed status → task state updated → event `TASK_FAILED` published → stops further plan steps. - **Critic failure flow**: `_evaluate_with_critic` catches exception, logs warning, publishes `CRITIC_RESULT` with error → critic_score is `None` → execution continues without critique. - **Orchestrator fallback flow**: If primary orchestrator fails or missing, `AsyncRouter` has `sys_util` fallback (utility orchestrator) for system-level decisions. - **Permission denial flow**: `PermissionService.check_shell_command` / `check_write_path` returns `decision: "hard_stop"` or `decision: "deny"` → immediate failure with blocked reason; if `decision: "prompt"` → `TASK_AWAITING_PERMISSION` state. ### Repair Flow (JSON / Tool-call) - Repair is triggered via `resolve_permission` or `resolve_secret` endpoints. - Permission repair: user provides `decision` ("allow_once"/"allow_always"/"deny"/"ask_always") → `PermissionService.resolve_permission` → updates state → retries original directive. - Secret repair: user provides secret string → `ExecutionEngine.execute` with `secret_override` → continues execution. --- ## ⚙️ 3. Component Breakdown ### `runtime_loop` (`RuntimeLoop`) - **Responsibility**: Central task coordination; state management; event publishing. - **Input**: `UserTask` - **Output**: `{"task_id", "status", "directive", "result", "events"}` - **Must NOT do**: Direct LLM calls (delegates to router/execution_engine); bypass state store. ### `execution_engine` (`ExecutionEngine`) - **Responsibility**: Execute directives (plan/tool/respond/fail); integrate critic; interface with tool registry. - **Input**: `UserTask`, `ExecutionDirective`, optional `permission_override`, `secret_override` - **Output**: `{"status", "result", "step_results"}` - **Must NOT do**: Bypass permission checks; skip critic evaluation when enabled; leak secrets in logs. ### `scheduler` (`ExecutionScheduler`) - **Responsibility**: Parse plan JSON, build task dependency graph, yield ready steps, detect cycles. - **Input**: JSON plan string, `task_id` - **Output**: `list[PlanStep]` - **Must NOT do**: Execute anything; modify task state directly. ### `tool_registry` (`ToolRegistry`) - **Responsibility**: Register/manifest tools; execute via `ToolSandbox`; provide schema metadata. - **Input**: tool name, args dict - **Output**: `ToolResult` - **Must NOT do**: Bypass sandbox; execute privileged host commands without sandbox. ### `event_bus` (`EventBus` → `SQLiteEventStore`) - **Responsibility**: Append-only event persistence; sequence numbering; per-task query. - **Input**: `RuntimeEvent` - **Output**: event stream - **Must NOT do**: Modify state store directly (state is separate); delete or mutate events. ### `memory` (`MemoryInterface` → `MemoryStore` + `VectorIndex`) - **Responsibility**: Store/retrieve weighted memory entries; vector similarity search; integrate with context builder. - **Input**: text, kind, source, weight, metadata - **Output**: search results or insertion confirmation - **Must NOT do**: Expose raw embeddings without access control; store secrets. --- ## 🧩 4. Data Contracts ### `PlanStep` ```python id: str kind: Literal["tool", "coder", "memory", "respond"] tool: str | None args: dict[str, Any] description: str requires_confirmation: bool depends_on: list[str] ``` **Real example** (from `router` prompt engineering): `{"id":"step-0","kind":"tool","tool":"shell_exec","args":{"command":"ls -la"},"description":"List directory","requires_confirmation":false,"depends_on":[]}` ### `ToolCall` ```python tool: str args: dict[str, Any] task_id: str step_id: str ``` **Real log**: `TOOL_CALLED` event with `{"tool":"shell_exec","args":{"command":"pwd"},"task_id":"xyz","step_id":"step-0"}` ### `ToolResult` ```python tool: str ok: bool output: Any error: str | None metadata: dict[str, Any] ``` **Real output**: `{"tool":"shell_exec","ok":true,"output":"/app","error":null,"metadata":{}}` ### `RuntimeEvent` ```python event_id: str task_id: str session_id: str sequence: int type: str # e.g. TASK_RECEIVED, TOOL_CALLED, TASK_COMPLETED payload: dict[str, Any] causation_id: str | None correlation_id: str ``` **Real event stream**: `TASK_RECEIVED → CONTEXT_BUILT → PLAN_STARTED → TOOL_CALLED → TOOL_COMPLETED → TASK_COMPLETED` ### `MemoryEntry` ```python id: str text: str kind: Literal["tool_result","plan","critique","fact","summary","user_preference"] source: Literal["tool","critic","user","system"] weight: float task_id: str | None session_id: str | None metadata: dict[str, Any] embedding_model: str embedding_dim: int ``` **Real insertion**: After critic evaluation, `kind="critique"`, `source="critic"`, `weight=0.85`, metadata includes scores. --- ## 🔥 5. Failure Modes ### Invalid JSON Flow - **Trigger**: Malformed plan JSON (e.g., missing braces, non-JSON string). - **Detection**: `parse_plan_steps` catches `JSONDecodeError` / `ValueError` / `TypeError`. - **Result**: Warning logged, empty steps returned → `PLAN_FAILED` with `"Failed to parse plan steps from directive"`. ### Tool Failure Flow - **Trigger**: Tool returns `ok=False` or raises exception in sandbox. - **Detection**: `_execute_tool` checks `tool_result.ok`. - **Result**: Status `"failed"`, result contains `{"error": "...", "failed_step": step.id, "step_results": [...]}` → `TASK_FAILED` event; further plan steps skipped. ### Critic Failure Flow - **Trigger**: Critic adapter raises exception or returns non-JSON output. - **Detection**: `_evaluate_with_critic` catches exception, logs warning. - **Result**: Event `CRITIC_RESULT` with error payload → `critic_score = None` → execution continues without critique; memory write skipped. ### Orchestrator Fallback Flow - **Trigger**: Primary orchestrator model unavailable or returns invalid directive. - **Detection**: `_ensure_orchestrator` returns `None`; router falls back to `sys_util` orchestrator. - **Result**: Utility orchestrator handles system-level decisions (e.g., file operations, environment queries). ### Permission Denial Flow - **Trigger**: `PermissionService` returns `decision: "hard_stop"` or `"deny"`. - **Detection**: `_execute_tool` checks `permission_result`. - **Result**: Immediate failure with `"Command blocked: ..."` → `TASK_FAILED`; no tool execution. --- ## 🧠 6. "Decision Logic Map" ### Orchestrator vs Direct Respond - **Use orchestrator** when: task requires planning, multi-step tool usage, or unknown intent. Orchestrator decides to emit `plan` or `tool` directive. - **Direct respond** when: intent parser classifies as simple query (`TASK_RECEIVED` → `router.intent_parser` → `respond` directive) or `respond` directive explicitly set. ### Utility Model Call - Invoked when `sys_util` orchestrator is loaded (configurable). Used for system-level operations: environment inspection, file system queries, or when primary orchestrator fails and fallback is needed. ### Retry Logic - **Planner retry**: `ExecutionScheduler` has `retry_limit=2`; on parse/validation failure, retries up to limit before failing plan. - **Tool retry**: Not implemented natively; retry must be encoded in plan steps (`depends_on`, manual replan). ### Plan Creation - **Trigger**: Orchestrator output contains `{type: "plan", ...}` or explicit `plan` directive. - **Process**: `parse_plan_steps` → `validate_no_cycles` → `build_task_graph` → ready steps execution. - **No plan**: Orchestrator outputs `respond` or `tool` → direct execution. --- ## 🧰 7. Tool System Architecture ### Plugin Discovery - `ToolDiscovery` scans `app/tools/plugins/` for modules exporting `Tool` classes. - Discovers: `shell_exec`, `file_read`, `file_write`, `memory` (search/insert/list). ### Manifest-Based Tools - Each plugin has a `manifest.json` with: - `description`: human-readable docstring. - `args_schema`: JSON schema for validation. - `requires_permission`: boolean for privileged tools (`shell_exec`, `file_write`). - On discovery, registry registers tool and stores schema for permission/routing. ### Registry Bootstrap - `RuntimeController._create_tool_registry()` initializes discovery, loads plugins, registers with init mapping (sandbox, permissions). - Tools are initialized once at startup; `tool_registry` is shared across executions. ### Execution Isolation - **ToolSandbox** (`ToolSandbox`): - Restricts filesystem to `allowed_root` (project base dir). - Timeout per execution (`step_timeout_ms`). - Blocks `sudo` without secret override; requires secret injection for sudo commands. - **Permission gating**: `shell_exec` and `file_write` require explicit permission decision before execution.