992 B
992 B
Utility Role Model Benchmark
Scope: service roles only (action, memory_policy, recall, summary, critic).
The main user-facing thinker is not evaluated for replacement here.
| Model | Quality | Avg latency, s | Avg tok/s | Notes |
|---|---|---|---|---|
| Qwen3.6-35B nonMTP GPU baseline | 0.97 | 17.93 | 4.51 | critic/reflection_quality: missing=['lesson'] |
Case Details
Qwen3.6-35B nonMTP GPU baseline
| Role | Case | Score | Latency, s | tok/s | Note |
|---|---|---|---|---|---|
| action | direct_answer_no_tools | 1.00 | 15.32 | 2.94 | ok |
| action | read_specific_file | 1.00 | 19.64 | 4.12 | ok |
| memory_policy | store_user_preference | 1.00 | 18.42 | 4.78 | ok |
| memory_policy | ignore_trivial_tool_call | 1.00 | 14.98 | 4.07 | ok |
| recall | select_relevant_memory | 1.00 | 15.04 | 4.39 | ok |
| summary | preserve_decisions | 1.00 | 9.99 | 4.40 | ok |
| critic | reflection_quality | 0.80 | 32.16 | 6.84 | missing=['lesson'] |