2.1 KiB

Raw Blame History

How To Run

Install dependencies:

python3 -m venv .venv
. .venv/bin/activate
python -m pip install -e ".[dev]"

Configure:

cp .env.example .env

The default DUCK_MAIN_MODEL_PATH points to ./models/Qwen3.6/nonMTP/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf.

DUCK_ENABLE_REFLECTION=0 is the recommended default for the local single-slot stack. Set it to 1 only when you explicitly want critic reflection after each chat and accept that it can slow down the next request.

Start DuckLM:

bash scripts/duck.sh start

This starts the local stack:

Qdrant vector memory on http://127.0.0.1:6333/
llama-server on http://127.0.0.1:8081/v1
DuckLM API/WebChat on http://127.0.0.1:8000/

Useful process commands:

bash scripts/duck.sh status
bash scripts/duck.sh logs --follow
bash scripts/duck.sh restart
bash scripts/duck.sh stop

Use live probes when you need backend diagnostics, not just process status:

bash scripts/duck.sh status --probe
curl --noproxy '*' 'http://127.0.0.1:8000/v1/status?probe=true'

Open WebChat:

http://127.0.0.1:8000/

Low-level llama-only commands are still available when needed:

bash scripts/llama/start_main.sh status
bash scripts/llama/start_main.sh logs --follow

MTP/speculative variant:

bash scripts/duck.sh stop
bash scripts/duck-mtp.sh start
bash scripts/duck-mtp.sh status
bash scripts/duck-mtp.sh logs --follow

duck-mtp.sh keeps DuckLM on http://127.0.0.1:8000/ and starts the MTP-backed llama-server on the normal role endpoint http://127.0.0.1:8081/v1, so config/models.yaml does not need to change.

Send a task:

curl -X POST http://127.0.0.1:8000/v1/chat \
  -H "Content-Type: application/json" \
  -d '{"message":"Скажи коротко, что ты DuckLM","workspace":"./workspace","debug":true}'

Inspect events:

curl http://127.0.0.1:8000/v1/tasks/<task_id>/events

Approvals:

curl http://127.0.0.1:8000/v1/approvals/pending

Stop services:

bash scripts/duck.sh stop

2.1 KiB Raw Blame History

How To Run

2.1 KiB

Raw Blame History