ducklm/docs/bench/utility_model_bench_2026052...

992 B

Utility Role Model Benchmark

Scope: service roles only (action, memory_policy, recall, summary, critic). The main user-facing thinker is not evaluated for replacement here.

Model Quality Avg latency, s Avg tok/s Notes
Qwen3.6-35B nonMTP GPU baseline 0.97 17.93 4.51 critic/reflection_quality: missing=['lesson']

Case Details

Qwen3.6-35B nonMTP GPU baseline

Role Case Score Latency, s tok/s Note
action direct_answer_no_tools 1.00 15.32 2.94 ok
action read_specific_file 1.00 19.64 4.12 ok
memory_policy store_user_preference 1.00 18.42 4.78 ok
memory_policy ignore_trivial_tool_call 1.00 14.98 4.07 ok
recall select_relevant_memory 1.00 15.04 4.39 ok
summary preserve_decisions 1.00 9.99 4.40 ok
critic reflection_quality 0.80 32.16 6.84 missing=['lesson']