ducklm/FOR_AI_REVIEW.md

# DuckLM Runtime Architecture Review

## 🧠 1. System Overview

**What is runtime?**
Runtime is the execution substrate of the system — a multi-layered cognitive execution environment that orchestrates LLMs, tools, memory, and permissions into a unified agentic workflow. It's the `RuntimeController` that composes `RuntimeLoop`, `ExecutionEngine`, `ContextBuilder`, `AsyncRouter`, `PermissionService`, and `EventBus`.

**What is the core loop?**
The core loop is the `RuntimeLoop.run_task()` method: it receives a `UserTask`, applies permission hard-stop checks, creates task state, builds context via `ContextBuilder`, routes via `AsyncRouter` to get a `directive`, executes via `ExecutionEngine`, applies `Critic` evaluation, saves via `MemoryPolicy`, publishes `RuntimeEvent`s through `EventBus`, and returns streaming output.

**Models (Orchestrator / Coder / Critic / Utility)**
- **Orchestrator** (`OrchestratorAdapter`/`AsyncOrchestratorAdapter`): LLM that decides plan vs direct respond vs tool; generates `ExecutionDirective` of type `plan`, `tool`, `respond`, `fail`, etc.
- **Coder** (`CoderAdapter`/`AsyncCoderAdapter`): LLM specialized for code generation and manipulation.
- **Critic** (`CriticAdapter`/`AsyncCriticAdapter`): Evaluates tool outputs with JSON scoring (correctness, usefulness, safety, memory_store, weight).
- **Utility**: The `sys_util` orchestrator — a fallback/orchestration layer for system-level operations.

**What is "truth"? (Event Store / State Store)**
- **Event Store** (`SQLiteEventStore`): Immutable append-only log of `RuntimeEvent`s per task. Source of truth for "what happened."
- **State Store** (`SQLiteTaskStateStore`): Current mutable task state (status, last_directive, pending requests). "Current truth" of task progress.
- **Checkpoint Store** (`SQLiteCheckpointStore`): Snapshots of task state + context at milestones.
- **Memory Store** (`MemoryStore` + `VectorIndex`): Long-term knowledge base with weighted entries.

---

## 🔁 2. End-to-End Flow

### High-Level Flow (as seen in logs)
```
User Input
→ Router (AsyncRouter.decide)
→ Context Builder (ContextBuilder.build)
→ Orchestrator (decides plan vs direct)
→ Plan / Direct Action
→ Execution Engine
→ Tool Layer (ToolRegistry + ToolSandbox)
→ Critic (AsyncCriticAdapter)
→ Memory Policy (MemoryWritePolicy)
→ Event Bus (SQLiteEventStore)
→ Streaming Output (via WebSocket / SSE)
```

### Conversation Flow
1. **Router** decides `plan` vs `respond` vs `tool` vs `fail` based on orchestrator output or intent parser.
2. **Context Builder** enriches task with memory context, tool context, execution context, and safety constraints.
3. **Orchestrator** (or direct respond) produces the initial `ExecutionDirective`.
4. **Execution Engine** schedules via `ExecutionScheduler`, then executes:
   - `plan` → parse into `PlanStep`s, build task graph, execute ready steps
   - `tool` → validate tool existence, check permissions, execute via `ToolRegistry`
   - `respond` → direct response
   - `fail` → immediate failure
5. **Tool Layer** (`ToolRegistry` + `ToolSandbox`):
   - Plugin discovery via `ToolDiscovery`
   - Manifest-based tool registration
   - Sandboxed execution with timeout
6. **Critic** evaluates tool results (if enabled), outputs `CriticScore` JSON.
7. **Memory Policy** decides whether to insert `tool_result`, `critique`, `plan`, `fact`, `summary`, or `user_preference` into memory.
8. **Event Bus** (`SQLiteEventBus`) publishes `RuntimeEvent` with sequence ordering.
9. **Streaming Output** replays events via WebSocket and sends incremental responses.

### Failure Flow
- **Invalid JSON flow**: `ExecutionScheduler.parse_plan_steps` catches `JSONDecodeError` / `ValueError` / `TypeError`, logs warning, returns empty steps → `plan` fails with "Failed to parse plan steps."
- **Tool failure flow**: Tool execution returns `{"status": "failed", "result": {"error": "..."}}` → ExecutionEngine returns failed status → task state updated → event `TASK_FAILED` published → stops further plan steps.
- **Critic failure flow**: `_evaluate_with_critic` catches exception, logs warning, publishes `CRITIC_RESULT` with error → critic_score is `None` → execution continues without critique.
- **Orchestrator fallback flow**: If primary orchestrator fails or missing, `AsyncRouter` has `sys_util` fallback (utility orchestrator) for system-level decisions.
- **Permission denial flow**: `PermissionService.check_shell_command` / `check_write_path` returns `decision: "hard_stop"` or `decision: "deny"` → immediate failure with blocked reason; if `decision: "prompt"` → `TASK_AWAITING_PERMISSION` state.

### Repair Flow (JSON / Tool-call)
- Repair is triggered via `resolve_permission` or `resolve_secret` endpoints.
- Permission repair: user provides `decision` ("allow_once"/"allow_always"/"deny"/"ask_always") → `PermissionService.resolve_permission` → updates state → retries original directive.
- Secret repair: user provides secret string → `ExecutionEngine.execute` with `secret_override` → continues execution.

---

## ⚙️ 3. Component Breakdown

### `runtime_loop` (`RuntimeLoop`)
- **Responsibility**: Central task coordination; state management; event publishing.
- **Input**: `UserTask`
- **Output**: `{"task_id", "status", "directive", "result", "events"}`
- **Must NOT do**: Direct LLM calls (delegates to router/execution_engine); bypass state store.

### `execution_engine` (`ExecutionEngine`)
- **Responsibility**: Execute directives (plan/tool/respond/fail); integrate critic; interface with tool registry.
- **Input**: `UserTask`, `ExecutionDirective`, optional `permission_override`, `secret_override`
- **Output**: `{"status", "result", "step_results"}`
- **Must NOT do**: Bypass permission checks; skip critic evaluation when enabled; leak secrets in logs.

### `scheduler` (`ExecutionScheduler`)
- **Responsibility**: Parse plan JSON, build task dependency graph, yield ready steps, detect cycles.
- **Input**: JSON plan string, `task_id`
- **Output**: `list[PlanStep]`
- **Must NOT do**: Execute anything; modify task state directly.

### `tool_registry` (`ToolRegistry`)
- **Responsibility**: Register/manifest tools; execute via `ToolSandbox`; provide schema metadata.
- **Input**: tool name, args dict
- **Output**: `ToolResult`
- **Must NOT do**: Bypass sandbox; execute privileged host commands without sandbox.

### `event_bus` (`EventBus` → `SQLiteEventStore`)
- **Responsibility**: Append-only event persistence; sequence numbering; per-task query.
- **Input**: `RuntimeEvent`
- **Output**: event stream
- **Must NOT do**: Modify state store directly (state is separate); delete or mutate events.

### `memory` (`MemoryInterface` → `MemoryStore` + `VectorIndex`)
- **Responsibility**: Store/retrieve weighted memory entries; vector similarity search; integrate with context builder.
- **Input**: text, kind, source, weight, metadata
- **Output**: search results or insertion confirmation
- **Must NOT do**: Expose raw embeddings without access control; store secrets.

---

## 🧩 4. Data Contracts

### `PlanStep`
```python
id: str
kind: Literal["tool", "coder", "memory", "respond"]
tool: str | None
args: dict[str, Any]
description: str
requires_confirmation: bool
depends_on: list[str]
```
**Real example** (from `router` prompt engineering):
`{"id":"step-0","kind":"tool","tool":"shell_exec","args":{"command":"ls -la"},"description":"List directory","requires_confirmation":false,"depends_on":[]}`

### `ToolCall`
```python
tool: str
args: dict[str, Any]
task_id: str
step_id: str
```
**Real log**: `TOOL_CALLED` event with `{"tool":"shell_exec","args":{"command":"pwd"},"task_id":"xyz","step_id":"step-0"}`

### `ToolResult`
```python
tool: str
ok: bool
output: Any
error: str | None
metadata: dict[str, Any]
```
**Real output**: `{"tool":"shell_exec","ok":true,"output":"/app","error":null,"metadata":{}}`

### `RuntimeEvent`
```python
event_id: str
task_id: str
session_id: str
sequence: int
type: str  # e.g. TASK_RECEIVED, TOOL_CALLED, TASK_COMPLETED
payload: dict[str, Any]
causation_id: str | None
correlation_id: str
```
**Real event stream**: `TASK_RECEIVED → CONTEXT_BUILT → PLAN_STARTED → TOOL_CALLED → TOOL_COMPLETED → TASK_COMPLETED`

### `MemoryEntry`
```python
id: str
text: str
kind: Literal["tool_result","plan","critique","fact","summary","user_preference"]
source: Literal["tool","critic","user","system"]
weight: float
task_id: str | None
session_id: str | None
metadata: dict[str, Any]
embedding_model: str
embedding_dim: int
```
**Real insertion**: After critic evaluation, `kind="critique"`, `source="critic"`, `weight=0.85`, metadata includes scores.

---

## 🔥 5. Failure Modes

### Invalid JSON Flow
- **Trigger**: Malformed plan JSON (e.g., missing braces, non-JSON string).
- **Detection**: `parse_plan_steps` catches `JSONDecodeError` / `ValueError` / `TypeError`.
- **Result**: Warning logged, empty steps returned → `PLAN_FAILED` with `"Failed to parse plan steps from directive"`.

### Tool Failure Flow
- **Trigger**: Tool returns `ok=False` or raises exception in sandbox.
- **Detection**: `_execute_tool` checks `tool_result.ok`.
- **Result**: Status `"failed"`, result contains `{"error": "...", "failed_step": step.id, "step_results": [...]}` → `TASK_FAILED` event; further plan steps skipped.

### Critic Failure Flow
- **Trigger**: Critic adapter raises exception or returns non-JSON output.
- **Detection**: `_evaluate_with_critic` catches exception, logs warning.
- **Result**: Event `CRITIC_RESULT` with error payload → `critic_score = None` → execution continues without critique; memory write skipped.

### Orchestrator Fallback Flow
- **Trigger**: Primary orchestrator model unavailable or returns invalid directive.
- **Detection**: `_ensure_orchestrator` returns `None`; router falls back to `sys_util` orchestrator.
- **Result**: Utility orchestrator handles system-level decisions (e.g., file operations, environment queries).

### Permission Denial Flow
- **Trigger**: `PermissionService` returns `decision: "hard_stop"` or `"deny"`.
- **Detection**: `_execute_tool` checks `permission_result`.
- **Result**: Immediate failure with `"Command blocked: ..."` → `TASK_FAILED`; no tool execution.

---

## 🧠 6. "Decision Logic Map"

### Orchestrator vs Direct Respond
- **Use orchestrator** when: task requires planning, multi-step tool usage, or unknown intent. Orchestrator decides to emit `plan` or `tool` directive.
- **Direct respond** when: intent parser classifies as simple query (`TASK_RECEIVED` → `router.intent_parser` → `respond` directive) or `respond` directive explicitly set.

### Utility Model Call
- Invoked when `sys_util` orchestrator is loaded (configurable). Used for system-level operations: environment inspection, file system queries, or when primary orchestrator fails and fallback is needed.

### Retry Logic
- **Planner retry**: `ExecutionScheduler` has `retry_limit=2`; on parse/validation failure, retries up to limit before failing plan.
- **Tool retry**: Not implemented natively; retry must be encoded in plan steps (`depends_on`, manual replan).

### Plan Creation
- **Trigger**: Orchestrator output contains `{type: "plan", ...}` or explicit `plan` directive.
- **Process**: `parse_plan_steps` → `validate_no_cycles` → `build_task_graph` → ready steps execution.
- **No plan**: Orchestrator outputs `respond` or `tool` → direct execution.

---

## 🧰 7. Tool System Architecture

### Plugin Discovery
- `ToolDiscovery` scans `app/tools/plugins/` for modules exporting `Tool` classes.
- Discovers: `shell_exec`, `file_read`, `file_write`, `memory` (search/insert/list).

### Manifest-Based Tools
- Each plugin has a `manifest.json` with:
  - `description`: human-readable docstring.
  - `args_schema`: JSON schema for validation.
  - `requires_permission`: boolean for privileged tools (`shell_exec`, `file_write`).
- On discovery, registry registers tool and stores schema for permission/routing.

### Registry Bootstrap
- `RuntimeController._create_tool_registry()` initializes discovery, loads plugins, registers with init mapping (sandbox, permissions).
- Tools are initialized once at startup; `tool_registry` is shared across executions.

### Execution Isolation
- **ToolSandbox** (`ToolSandbox`):
  - Restricts filesystem to `allowed_root` (project base dir).
  - Timeout per execution (`step_timeout_ms`).
  - Blocks `sudo` without secret override; requires secret injection for sudo commands.
- **Permission gating**: `shell_exec` and `file_write` require explicit permission decision before execution.