# Local Llama Server DuckLM expects an OpenAI-compatible `llama-server` at `http://127.0.0.1:8081/v1` by default. On the current Radeon RX580 system, `llama.cpp` is built locally with Vulkan: ```bash bash scripts/llama/build_vulkan.sh ``` The main model is Qwen3.6 35B A3B nonMTP: ```text models/Qwen3.6/nonMTP/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf ``` Start it in the background with: ```bash bash scripts/llama/start_main.sh start ``` Manage the process: ```bash bash scripts/llama/start_main.sh status bash scripts/llama/start_main.sh logs bash scripts/llama/start_main.sh logs --follow bash scripts/llama/start_main.sh restart bash scripts/llama/start_main.sh stop ``` The local `.env` uses: ```env DUCK_LLAMA_SERVER_BIN=./vendor/llama.cpp/build/bin/llama-server DUCK_CTX_SIZE=4096 DUCK_N_GPU_LAYERS=20 DUCK_PARALLEL=1 DUCK_LLAMA_DEVICE=Vulkan0 DUCK_LLAMA_EXTRA_ARGS="--reasoning off --cache-ram 0" ``` MTP is available only through `scripts/llama/start_thinker_mtp_experimental.sh` and is not used by the action JSON endpoint by default.