64 lines
1.5 KiB
Markdown
64 lines
1.5 KiB
Markdown
# Web API
|
|
|
|
Endpoints:
|
|
|
|
```text
|
|
GET /health
|
|
GET /v1/status
|
|
GET /v1/models/roles
|
|
GET /v1/models/ping
|
|
POST /v1/chat
|
|
POST /v1/tasks
|
|
GET /v1/tasks
|
|
GET /v1/tasks/{task_id}
|
|
GET /v1/tasks/{task_id}/events
|
|
GET /v1/tasks/{task_id}/stream
|
|
GET /v1/approvals/pending
|
|
POST /v1/approvals/{approval_id}/allow_once
|
|
POST /v1/approvals/{approval_id}/allow_forever
|
|
POST /v1/approvals/{approval_id}/deny
|
|
GET /v1/skills
|
|
GET /v1/skills/{skill_id}
|
|
GET /v1/experience
|
|
GET /v1/experience/{id}
|
|
GET /v1/memory/search?q=...
|
|
```
|
|
|
|
`GET /v1/status` returns a fast runtime snapshot without live backend checks:
|
|
|
|
```json
|
|
{
|
|
"name": "DuckLM",
|
|
"api": {"host": "127.0.0.1", "port": 8000},
|
|
"paths": {"workspace": "./workspace", "db_path": "./data/duck.sqlite3"},
|
|
"token_budget": {"ctx_size": 65536},
|
|
"models": {
|
|
"default_provider": "llama_server",
|
|
"endpoints": ["llama_server:http://127.0.0.1:8081/v1:local-main"],
|
|
"roles": {"thinker": {"model": "local-main"}}
|
|
},
|
|
"services": {
|
|
"duck_api": {"ok": true, "probed": true},
|
|
"llama": {"ok": null, "probed": false, "roles": {}},
|
|
"vector_memory": {"ok": null, "probed": false}
|
|
}
|
|
}
|
|
```
|
|
|
|
Use `GET /v1/status?probe=true` to also call the model backend and Qdrant.
|
|
|
|
Chat requests accept optional `reasoning`:
|
|
|
|
```json
|
|
{
|
|
"message": "hello",
|
|
"reasoning": "auto"
|
|
}
|
|
```
|
|
|
|
Allowed values:
|
|
|
|
- `auto`: use the `llama-server` default.
|
|
- `on`: pass `enable_thinking=true` for the thinker response.
|
|
- `off`: pass `enable_thinking=false` and `thinking_budget_tokens=0`.
|