1.5 KiB

Raw Blame History

Web API

Endpoints:

GET  /health
GET  /v1/status
GET  /v1/models/roles
GET  /v1/models/ping
POST /v1/chat
POST /v1/tasks
GET  /v1/tasks
GET  /v1/tasks/{task_id}
GET  /v1/tasks/{task_id}/events
GET  /v1/tasks/{task_id}/stream
GET  /v1/approvals/pending
POST /v1/approvals/{approval_id}/allow_once
POST /v1/approvals/{approval_id}/allow_forever
POST /v1/approvals/{approval_id}/deny
GET  /v1/skills
GET  /v1/skills/{skill_id}
GET  /v1/experience
GET  /v1/experience/{id}
GET  /v1/memory/search?q=...

GET /v1/status returns a fast runtime snapshot without live backend checks:

{
  "name": "DuckLM",
  "api": {"host": "127.0.0.1", "port": 8000},
  "paths": {"workspace": "./workspace", "db_path": "./data/duck.sqlite3"},
  "token_budget": {"ctx_size": 65536},
  "models": {
    "default_provider": "llama_server",
    "endpoints": ["llama_server:http://127.0.0.1:8081/v1:local-main"],
    "roles": {"thinker": {"model": "local-main"}}
  },
  "services": {
    "duck_api": {"ok": true, "probed": true},
    "llama": {"ok": null, "probed": false, "roles": {}},
    "vector_memory": {"ok": null, "probed": false}
  }
}

Use GET /v1/status?probe=true to also call the model backend and Qdrant.

Chat requests accept optional reasoning:

{
  "message": "hello",
  "reasoning": "auto"
}

Allowed values:

auto: use the llama-server default.
on: pass enable_thinking=true for the thinker response.
off: pass enable_thinking=false and thinking_budget_tokens=0.

1.5 KiB Raw Blame History

Web API

1.5 KiB

Raw Blame History