ducklm/docs/performance_mtp.md

280 B

Performance And MTP

MTP/speculative decoding is an inference backend concern. DuckLM keeps action JSON on the normal endpoint by default.

Use scripts/llama/start_thinker_mtp_experimental.sh only for experiments. Benchmark scaffolding is in scripts/bench/bench_runtime.py.