# Utility Role Model Benchmark Scope: service roles only (`action`, `memory_policy`, `recall`, `summary`, `critic`). The main user-facing thinker is not evaluated for replacement here. | Model | Quality | Avg latency, s | Avg tok/s | Notes | | --- | ---: | ---: | ---: | --- | | Qwen3.6-35B nonMTP GPU baseline | 0.97 | 17.93 | 4.51 | critic/reflection_quality: missing=['lesson'] | ## Case Details ### Qwen3.6-35B nonMTP GPU baseline | Role | Case | Score | Latency, s | tok/s | Note | | --- | --- | ---: | ---: | ---: | --- | | action | direct_answer_no_tools | 1.00 | 15.32 | 2.94 | ok | | action | read_specific_file | 1.00 | 19.64 | 4.12 | ok | | memory_policy | store_user_preference | 1.00 | 18.42 | 4.78 | ok | | memory_policy | ignore_trivial_tool_call | 1.00 | 14.98 | 4.07 | ok | | recall | select_relevant_memory | 1.00 | 15.04 | 4.39 | ok | | summary | preserve_decisions | 1.00 | 9.99 | 4.40 | ok | | critic | reflection_quality | 0.80 | 32.16 | 6.84 | missing=['lesson'] |