# Web API Endpoints: ```text GET /health GET /v1/status GET /v1/models/roles GET /v1/models/ping POST /v1/chat POST /v1/tasks GET /v1/tasks GET /v1/tasks/{task_id} GET /v1/tasks/{task_id}/events GET /v1/tasks/{task_id}/stream GET /v1/approvals/pending POST /v1/approvals/{approval_id}/allow_once POST /v1/approvals/{approval_id}/allow_forever POST /v1/approvals/{approval_id}/deny GET /v1/skills GET /v1/skills/{skill_id} GET /v1/experience GET /v1/experience/{id} GET /v1/memory/search?q=... ``` `GET /v1/status` returns a fast runtime snapshot without live backend checks: ```json { "name": "DuckLM", "api": {"host": "127.0.0.1", "port": 8000}, "paths": {"workspace": "./workspace", "db_path": "./data/duck.sqlite3"}, "token_budget": {"ctx_size": 65536}, "models": { "default_provider": "llama_server", "endpoints": ["llama_server:http://127.0.0.1:8081/v1:local-main"], "roles": {"thinker": {"model": "local-main"}} }, "services": { "duck_api": {"ok": true, "probed": true}, "llama": {"ok": null, "probed": false, "roles": {}}, "vector_memory": {"ok": null, "probed": false} } } ``` Use `GET /v1/status?probe=true` to also call the model backend and Qdrant. Structured utility roles are validated locally before side effects: - `action` output must match `duck_core/schemas/action_directive.schema.json`; invalid directives are logged as `action_directive_failed` and no tool runs. - `memory_policy` output must match its JSON schema; invalid decisions fall back to `should_store=false`. - Large tool outputs are compacted before being sent back to the model; full outputs remain in task events and command audit. - Duplicate identical tool actions in one task are skipped and logged as `tool_call_skipped`. - Critic reflection is controlled by `DUCK_ENABLE_REFLECTION`; the default is off for responsive single-slot local chat. Chat requests accept optional `reasoning`: ```json { "message": "hello", "reasoning": "auto" } ``` Allowed values: - `auto`: use the `llama-server` default. - `on`: pass `enable_thinking=true` for the thinker response. - `off`: pass `enable_thinking=false` and `thinking_budget_tokens=0`.