6 lines
280 B
Markdown
6 lines
280 B
Markdown
# Performance And MTP
|
|
|
|
MTP/speculative decoding is an inference backend concern. DuckLM keeps action JSON on the normal endpoint by default.
|
|
|
|
Use `scripts/llama/start_thinker_mtp_experimental.sh` only for experiments. Benchmark scaffolding is in `scripts/bench/bench_runtime.py`.
|