# Utility Role Model Benchmark Scope: service roles only (`action`, `memory_policy`, `recall`, `summary`, `critic`). The main user-facing thinker is not evaluated for replacement here. | Model | Quality | Avg latency, s | Avg tok/s | Notes | | --- | ---: | ---: | ---: | --- | | Qwen3.6-35B nonMTP GPU baseline | 0.97 | 17.94 | 4.51 | critic/reflection_quality: missing=['lesson'] | | Menlo_Lucy-Q4_K_M CPU | 0.77 | 4.41 | 16.21 | memory_policy/ignore_trivial_tool_call: stored_trivial={'should_store': True, 'memory_type': 'fact', 'summary': 'Password was successfully launched and user was informed.', 'importance': 0.7, 'scope': 'global', 'metadata': {}}; recall/select_relevant_memory: wrong_ids=[]; summary/preserve_decisions: missing=['approval'] | | Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M CPU | 0.40 | 61.94 | 2.56 | memory_policy/store_user_preference: invalid_json: Expecting value: line 1 column 1 (char 0); memory_policy/ignore_trivial_tool_call: invalid_json: Expecting value: line 1 column 1 (char 0); recall/select_relevant_memory: invalid_json: Expecting property name enclosed in double quotes: line 1 column 2 (char 1) | | X-Coder-SFT-Qwen3-8B.Q6_K CPU | 0.76 | 60.12 | 2.51 | action/direct_answer_no_tools: invalid_json: Expecting ',' delimiter: line 13 column 6 (char 632); memory_policy/ignore_trivial_tool_call: stored_trivial={'should_store': True, 'memory_type': 'event', 'summary': 'User executed pwd command and received /tmp/project as output.', 'importance': 0.8, 'scope': 'conversation', 'metadata': {}} | | gemma-4-E4B-it-Q4_K_M CPU | 0.97 | 21.23 | 5.36 | critic/reflection_quality: missing=['lesson'] | ## Case Details ### Qwen3.6-35B nonMTP GPU baseline | Role | Case | Score | Latency, s | tok/s | Note | | --- | --- | ---: | ---: | ---: | --- | | action | direct_answer_no_tools | 1.00 | 15.31 | 2.94 | ok | | action | read_specific_file | 1.00 | 19.61 | 4.13 | ok | | memory_policy | store_user_preference | 1.00 | 18.53 | 4.75 | ok | | memory_policy | ignore_trivial_tool_call | 1.00 | 15.00 | 4.07 | ok | | recall | select_relevant_memory | 1.00 | 15.09 | 4.38 | ok | | summary | preserve_decisions | 1.00 | 9.95 | 4.42 | ok | | critic | reflection_quality | 0.80 | 32.09 | 6.86 | missing=['lesson'] | ### Menlo_Lucy-Q4_K_M CPU | Role | Case | Score | Latency, s | tok/s | Note | | --- | --- | ---: | ---: | ---: | --- | | action | direct_answer_no_tools | 1.00 | 3.23 | 9.60 | ok | | action | read_specific_file | 1.00 | 3.03 | 15.84 | ok | | memory_policy | store_user_preference | 1.00 | 3.62 | 14.92 | ok | | memory_policy | ignore_trivial_tool_call | 0.30 | 3.19 | 18.17 | stored_trivial={'should_store': True, 'memory_type': 'fact', 'summary': 'Password was successfully launched and user was informed.', 'importance': 0.7, 'scope': 'global', 'metadata': {}} | | recall | select_relevant_memory | 0.30 | 3.74 | 16.05 | wrong_ids=[] | | summary | preserve_decisions | 0.80 | 3.33 | 18.29 | missing=['approval'] | | critic | reflection_quality | 1.00 | 10.70 | 20.57 | ok | ### Qwen3.5-9B-GLM5.1-Distill-v1-Q4_K_M CPU | Role | Case | Score | Latency, s | tok/s | Note | | --- | --- | ---: | ---: | ---: | --- | | action | direct_answer_no_tools | 1.00 | 68.08 | 1.06 | ok | | action | read_specific_file | 1.00 | 72.15 | 1.19 | ok | | memory_policy | store_user_preference | 0.00 | 67.76 | 2.66 | invalid_json: Expecting value: line 1 column 1 (char 0) | | memory_policy | ignore_trivial_tool_call | 0.00 | 64.65 | 2.47 | invalid_json: Expecting value: line 1 column 1 (char 0) | | recall | select_relevant_memory | 0.00 | 59.45 | 2.69 | invalid_json: Expecting property name enclosed in double quotes: line 1 column 2 (char 1) | | summary | preserve_decisions | 0.20 | 47.05 | 3.83 | missing=['8000', '8081', 'approval', 'allow_forever'] | | critic | reflection_quality | 0.60 | 54.43 | 4.04 | missing=['risk', 'lesson'] | ### X-Coder-SFT-Qwen3-8B.Q6_K CPU | Role | Case | Score | Latency, s | tok/s | Note | | --- | --- | ---: | ---: | ---: | --- | | action | direct_answer_no_tools | 0.00 | 121.05 | 1.49 | invalid_json: Expecting ',' delimiter: line 13 column 6 (char 632) | | action | read_specific_file | 1.00 | 37.56 | 3.57 | ok | | memory_policy | store_user_preference | 1.00 | 66.98 | 1.19 | ok | | memory_policy | ignore_trivial_tool_call | 0.30 | 21.77 | 2.85 | stored_trivial={'should_store': True, 'memory_type': 'event', 'summary': 'User executed pwd command and received /tmp/project as output.', 'importance': 0.8, 'scope': 'conversation', 'metadata': {}} | | recall | select_relevant_memory | 1.00 | 58.66 | 1.53 | ok | | summary | preserve_decisions | 1.00 | 53.24 | 3.38 | ok | | critic | reflection_quality | 1.00 | 61.55 | 3.57 | ok | ### gemma-4-E4B-it-Q4_K_M CPU | Role | Case | Score | Latency, s | tok/s | Note | | --- | --- | ---: | ---: | ---: | --- | | action | direct_answer_no_tools | 1.00 | 35.72 | 1.48 | ok | | action | read_specific_file | 1.00 | 13.32 | 6.60 | ok | | memory_policy | store_user_preference | 1.00 | 27.13 | 3.61 | ok | | memory_policy | ignore_trivial_tool_call | 1.00 | 10.23 | 8.80 | ok | | recall | select_relevant_memory | 1.00 | 19.39 | 3.20 | ok | | summary | preserve_decisions | 1.00 | 14.37 | 6.12 | ok | | critic | reflection_quality | 0.80 | 28.48 | 7.72 | missing=['lesson'] |