1.5 KiB
1.5 KiB
Web API
Endpoints:
GET /health
GET /v1/status
GET /v1/models/roles
GET /v1/models/ping
POST /v1/chat
POST /v1/tasks
GET /v1/tasks
GET /v1/tasks/{task_id}
GET /v1/tasks/{task_id}/events
GET /v1/tasks/{task_id}/stream
GET /v1/approvals/pending
POST /v1/approvals/{approval_id}/allow_once
POST /v1/approvals/{approval_id}/allow_forever
POST /v1/approvals/{approval_id}/deny
GET /v1/skills
GET /v1/skills/{skill_id}
GET /v1/experience
GET /v1/experience/{id}
GET /v1/memory/search?q=...
GET /v1/status returns a fast runtime snapshot without live backend checks:
{
"name": "DuckLM",
"api": {"host": "127.0.0.1", "port": 8000},
"paths": {"workspace": "./workspace", "db_path": "./data/duck.sqlite3"},
"token_budget": {"ctx_size": 65536},
"models": {
"default_provider": "llama_server",
"endpoints": ["llama_server:http://127.0.0.1:8081/v1:local-main"],
"roles": {"thinker": {"model": "local-main"}}
},
"services": {
"duck_api": {"ok": true, "probed": true},
"llama": {"ok": null, "probed": false, "roles": {}},
"vector_memory": {"ok": null, "probed": false}
}
}
Use GET /v1/status?probe=true to also call the model backend and Qdrant.
Chat requests accept optional reasoning:
{
"message": "hello",
"reasoning": "auto"
}
Allowed values:
auto: use thellama-serverdefault.on: passenable_thinking=truefor the thinker response.off: passenable_thinking=falseandthinking_budget_tokens=0.