12 KiB
DuckLM Runtime Architecture Review
🧠 1. System Overview
What is runtime?
Runtime is the execution substrate of the system — a multi-layered cognitive execution environment that orchestrates LLMs, tools, memory, and permissions into a unified agentic workflow. It's the RuntimeController that composes RuntimeLoop, ExecutionEngine, ContextBuilder, AsyncRouter, PermissionService, and EventBus.
What is the core loop?
The core loop is the RuntimeLoop.run_task() method: it receives a UserTask, applies permission hard-stop checks, creates task state, builds context via ContextBuilder, routes via AsyncRouter to get a directive, executes via ExecutionEngine, applies Critic evaluation, saves via MemoryPolicy, publishes RuntimeEvents through EventBus, and returns streaming output.
Models (Orchestrator / Coder / Critic / Utility)
- Orchestrator (
OrchestratorAdapter/AsyncOrchestratorAdapter): LLM that decides plan vs direct respond vs tool; generatesExecutionDirectiveof typeplan,tool,respond,fail, etc. - Coder (
CoderAdapter/AsyncCoderAdapter): LLM specialized for code generation and manipulation. - Critic (
CriticAdapter/AsyncCriticAdapter): Evaluates tool outputs with JSON scoring (correctness, usefulness, safety, memory_store, weight). - Utility: The
sys_utilorchestrator — a fallback/orchestration layer for system-level operations.
What is "truth"? (Event Store / State Store)
- Event Store (
SQLiteEventStore): Immutable append-only log ofRuntimeEvents per task. Source of truth for "what happened." - State Store (
SQLiteTaskStateStore): Current mutable task state (status, last_directive, pending requests). "Current truth" of task progress. - Checkpoint Store (
SQLiteCheckpointStore): Snapshots of task state + context at milestones. - Memory Store (
MemoryStore+VectorIndex): Long-term knowledge base with weighted entries.
🔁 2. End-to-End Flow
High-Level Flow (as seen in logs)
User Input
→ Router (AsyncRouter.decide)
→ Context Builder (ContextBuilder.build)
→ Orchestrator (decides plan vs direct)
→ Plan / Direct Action
→ Execution Engine
→ Tool Layer (ToolRegistry + ToolSandbox)
→ Critic (AsyncCriticAdapter)
→ Memory Policy (MemoryWritePolicy)
→ Event Bus (SQLiteEventStore)
→ Streaming Output (via WebSocket / SSE)
Conversation Flow
- Router decides
planvsrespondvstoolvsfailbased on orchestrator output or intent parser. - Context Builder enriches task with memory context, tool context, execution context, and safety constraints.
- Orchestrator (or direct respond) produces the initial
ExecutionDirective. - Execution Engine schedules via
ExecutionScheduler, then executes:plan→ parse intoPlanSteps, build task graph, execute ready stepstool→ validate tool existence, check permissions, execute viaToolRegistryrespond→ direct responsefail→ immediate failure
- Tool Layer (
ToolRegistry+ToolSandbox):- Plugin discovery via
ToolDiscovery - Manifest-based tool registration
- Sandboxed execution with timeout
- Plugin discovery via
- Critic evaluates tool results (if enabled), outputs
CriticScoreJSON. - Memory Policy decides whether to insert
tool_result,critique,plan,fact,summary, oruser_preferenceinto memory. - Event Bus (
SQLiteEventBus) publishesRuntimeEventwith sequence ordering. - Streaming Output replays events via WebSocket and sends incremental responses.
Failure Flow
- Invalid JSON flow:
ExecutionScheduler.parse_plan_stepscatchesJSONDecodeError/ValueError/TypeError, logs warning, returns empty steps →planfails with "Failed to parse plan steps." - Tool failure flow: Tool execution returns
{"status": "failed", "result": {"error": "..."}}→ ExecutionEngine returns failed status → task state updated → eventTASK_FAILEDpublished → stops further plan steps. - Critic failure flow:
_evaluate_with_criticcatches exception, logs warning, publishesCRITIC_RESULTwith error → critic_score isNone→ execution continues without critique. - Orchestrator fallback flow: If primary orchestrator fails or missing,
AsyncRouterhassys_utilfallback (utility orchestrator) for system-level decisions. - Permission denial flow:
PermissionService.check_shell_command/check_write_pathreturnsdecision: "hard_stop"ordecision: "deny"→ immediate failure with blocked reason; ifdecision: "prompt"→TASK_AWAITING_PERMISSIONstate.
Repair Flow (JSON / Tool-call)
- Repair is triggered via
resolve_permissionorresolve_secretendpoints. - Permission repair: user provides
decision("allow_once"/"allow_always"/"deny"/"ask_always") →PermissionService.resolve_permission→ updates state → retries original directive. - Secret repair: user provides secret string →
ExecutionEngine.executewithsecret_override→ continues execution.
⚙️ 3. Component Breakdown
runtime_loop (RuntimeLoop)
- Responsibility: Central task coordination; state management; event publishing.
- Input:
UserTask - Output:
{"task_id", "status", "directive", "result", "events"} - Must NOT do: Direct LLM calls (delegates to router/execution_engine); bypass state store.
execution_engine (ExecutionEngine)
- Responsibility: Execute directives (plan/tool/respond/fail); integrate critic; interface with tool registry.
- Input:
UserTask,ExecutionDirective, optionalpermission_override,secret_override - Output:
{"status", "result", "step_results"} - Must NOT do: Bypass permission checks; skip critic evaluation when enabled; leak secrets in logs.
scheduler (ExecutionScheduler)
- Responsibility: Parse plan JSON, build task dependency graph, yield ready steps, detect cycles.
- Input: JSON plan string,
task_id - Output:
list[PlanStep] - Must NOT do: Execute anything; modify task state directly.
tool_registry (ToolRegistry)
- Responsibility: Register/manifest tools; execute via
ToolSandbox; provide schema metadata. - Input: tool name, args dict
- Output:
ToolResult - Must NOT do: Bypass sandbox; execute privileged host commands without sandbox.
event_bus (EventBus → SQLiteEventStore)
- Responsibility: Append-only event persistence; sequence numbering; per-task query.
- Input:
RuntimeEvent - Output: event stream
- Must NOT do: Modify state store directly (state is separate); delete or mutate events.
memory (MemoryInterface → MemoryStore + VectorIndex)
- Responsibility: Store/retrieve weighted memory entries; vector similarity search; integrate with context builder.
- Input: text, kind, source, weight, metadata
- Output: search results or insertion confirmation
- Must NOT do: Expose raw embeddings without access control; store secrets.
🧩 4. Data Contracts
PlanStep
id: str
kind: Literal["tool", "coder", "memory", "respond"]
tool: str | None
args: dict[str, Any]
description: str
requires_confirmation: bool
depends_on: list[str]
Real example (from router prompt engineering):
{"id":"step-0","kind":"tool","tool":"shell_exec","args":{"command":"ls -la"},"description":"List directory","requires_confirmation":false,"depends_on":[]}
ToolCall
tool: str
args: dict[str, Any]
task_id: str
step_id: str
Real log: TOOL_CALLED event with {"tool":"shell_exec","args":{"command":"pwd"},"task_id":"xyz","step_id":"step-0"}
ToolResult
tool: str
ok: bool
output: Any
error: str | None
metadata: dict[str, Any]
Real output: {"tool":"shell_exec","ok":true,"output":"/app","error":null,"metadata":{}}
RuntimeEvent
event_id: str
task_id: str
session_id: str
sequence: int
type: str # e.g. TASK_RECEIVED, TOOL_CALLED, TASK_COMPLETED
payload: dict[str, Any]
causation_id: str | None
correlation_id: str
Real event stream: TASK_RECEIVED → CONTEXT_BUILT → PLAN_STARTED → TOOL_CALLED → TOOL_COMPLETED → TASK_COMPLETED
MemoryEntry
id: str
text: str
kind: Literal["tool_result","plan","critique","fact","summary","user_preference"]
source: Literal["tool","critic","user","system"]
weight: float
task_id: str | None
session_id: str | None
metadata: dict[str, Any]
embedding_model: str
embedding_dim: int
Real insertion: After critic evaluation, kind="critique", source="critic", weight=0.85, metadata includes scores.
🔥 5. Failure Modes
Invalid JSON Flow
- Trigger: Malformed plan JSON (e.g., missing braces, non-JSON string).
- Detection:
parse_plan_stepscatchesJSONDecodeError/ValueError/TypeError. - Result: Warning logged, empty steps returned →
PLAN_FAILEDwith"Failed to parse plan steps from directive".
Tool Failure Flow
- Trigger: Tool returns
ok=Falseor raises exception in sandbox. - Detection:
_execute_toolcheckstool_result.ok. - Result: Status
"failed", result contains{"error": "...", "failed_step": step.id, "step_results": [...]}→TASK_FAILEDevent; further plan steps skipped.
Critic Failure Flow
- Trigger: Critic adapter raises exception or returns non-JSON output.
- Detection:
_evaluate_with_criticcatches exception, logs warning. - Result: Event
CRITIC_RESULTwith error payload →critic_score = None→ execution continues without critique; memory write skipped.
Orchestrator Fallback Flow
- Trigger: Primary orchestrator model unavailable or returns invalid directive.
- Detection:
_ensure_orchestratorreturnsNone; router falls back tosys_utilorchestrator. - Result: Utility orchestrator handles system-level decisions (e.g., file operations, environment queries).
Permission Denial Flow
- Trigger:
PermissionServicereturnsdecision: "hard_stop"or"deny". - Detection:
_execute_toolcheckspermission_result. - Result: Immediate failure with
"Command blocked: ..."→TASK_FAILED; no tool execution.
🧠 6. "Decision Logic Map"
Orchestrator vs Direct Respond
- Use orchestrator when: task requires planning, multi-step tool usage, or unknown intent. Orchestrator decides to emit
planortooldirective. - Direct respond when: intent parser classifies as simple query (
TASK_RECEIVED→router.intent_parser→responddirective) orresponddirective explicitly set.
Utility Model Call
- Invoked when
sys_utilorchestrator is loaded (configurable). Used for system-level operations: environment inspection, file system queries, or when primary orchestrator fails and fallback is needed.
Retry Logic
- Planner retry:
ExecutionSchedulerhasretry_limit=2; on parse/validation failure, retries up to limit before failing plan. - Tool retry: Not implemented natively; retry must be encoded in plan steps (
depends_on, manual replan).
Plan Creation
- Trigger: Orchestrator output contains
{type: "plan", ...}or explicitplandirective. - Process:
parse_plan_steps→validate_no_cycles→build_task_graph→ ready steps execution. - No plan: Orchestrator outputs
respondortool→ direct execution.
🧰 7. Tool System Architecture
Plugin Discovery
ToolDiscoveryscansapp/tools/plugins/for modules exportingToolclasses.- Discovers:
shell_exec,file_read,file_write,memory(search/insert/list).
Manifest-Based Tools
- Each plugin has a
manifest.jsonwith:description: human-readable docstring.args_schema: JSON schema for validation.requires_permission: boolean for privileged tools (shell_exec,file_write).
- On discovery, registry registers tool and stores schema for permission/routing.
Registry Bootstrap
RuntimeController._create_tool_registry()initializes discovery, loads plugins, registers with init mapping (sandbox, permissions).- Tools are initialized once at startup;
tool_registryis shared across executions.
Execution Isolation
- ToolSandbox (
ToolSandbox):- Restricts filesystem to
allowed_root(project base dir). - Timeout per execution (
step_timeout_ms). - Blocks
sudowithout secret override; requires secret injection for sudo commands.
- Restricts filesystem to
- Permission gating:
shell_execandfile_writerequire explicit permission decision before execution.