many fixes
This commit is contained in:
parent
282085c384
commit
f63125e945
|
|
@ -69,6 +69,7 @@ WebChat доступен через FastAPI на `http://127.0.0.1:8000/`.
|
||||||
- reflection
|
- reflection
|
||||||
- experience records
|
- experience records
|
||||||
- Skill candidate selection теперь используется в обычном и streaming chat.
|
- Skill candidate selection теперь используется в обычном и streaming chat.
|
||||||
|
- `scripts/duck.sh` и `scripts/duck-mtp.sh` управляют всем локальным стеком: Qdrant, llama-server и DuckLM API.
|
||||||
- `scripts/duck.sh status --probe` и `scripts/duck-mtp.sh status --probe` показывают live-состояние DuckLM runtime, model backend и vector memory.
|
- `scripts/duck.sh status --probe` и `scripts/duck-mtp.sh status --probe` показывают live-состояние DuckLM runtime, model backend и vector memory.
|
||||||
- Structured utility-outputs валидируются локально по JSON schema; это защищает tool loop и memory writes от мусора модели.
|
- Structured utility-outputs валидируются локально по JSON schema; это защищает tool loop и memory writes от мусора модели.
|
||||||
- Live E2E выявил и исправил два runtime-дефекта: большие stdout больше не раздувают следующий planning prompt, повторяющиеся identical actions больше не исполняются повторно.
|
- Live E2E выявил и исправил два runtime-дефекта: большие stdout больше не раздувают следующий planning prompt, повторяющиеся identical actions больше не исполняются повторно.
|
||||||
|
|
|
||||||
|
|
@ -14,10 +14,14 @@ bash scripts/duck.sh start
|
||||||
|
|
||||||
Open `http://127.0.0.1:8000/`.
|
Open `http://127.0.0.1:8000/`.
|
||||||
|
|
||||||
|
`duck.sh` starts and stops the local Qdrant vector memory service together with
|
||||||
|
`llama-server` and the DuckLM API. Use `status --probe` for live backend checks.
|
||||||
|
|
||||||
Useful commands:
|
Useful commands:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
bash scripts/duck.sh status
|
bash scripts/duck.sh status
|
||||||
|
bash scripts/duck.sh status --probe
|
||||||
bash scripts/duck.sh logs --follow
|
bash scripts/duck.sh logs --follow
|
||||||
bash scripts/duck.sh restart
|
bash scripts/duck.sh restart
|
||||||
bash scripts/duck.sh stop
|
bash scripts/duck.sh stop
|
||||||
|
|
|
||||||
|
|
@ -26,8 +26,9 @@ chat and accept that it can slow down the next request.
|
||||||
bash scripts/duck.sh start
|
bash scripts/duck.sh start
|
||||||
```
|
```
|
||||||
|
|
||||||
This starts both processes:
|
This starts the local stack:
|
||||||
|
|
||||||
|
- Qdrant vector memory on `http://127.0.0.1:6333/`
|
||||||
- `llama-server` on `http://127.0.0.1:8081/v1`
|
- `llama-server` on `http://127.0.0.1:8081/v1`
|
||||||
- DuckLM API/WebChat on `http://127.0.0.1:8000/`
|
- DuckLM API/WebChat on `http://127.0.0.1:8000/`
|
||||||
|
|
||||||
|
|
@ -97,5 +98,4 @@ curl http://127.0.0.1:8000/v1/approvals/pending
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
bash scripts/duck.sh stop
|
bash scripts/duck.sh stop
|
||||||
docker compose -f docker-compose.memory.yml down
|
|
||||||
```
|
```
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,68 @@
|
||||||
# Memory Architecture
|
# Memory Architecture
|
||||||
|
|
||||||
Semantic memory uses Qdrant as the vector store. Embeddings come from `/v1/embeddings` when the model backend supports it.
|
DuckLM currently has two memory layers:
|
||||||
|
|
||||||
If embeddings are unavailable, `VectorMemory` fails explicitly with `EmbeddingsUnavailableError`; it does not invent a local embedding algorithm.
|
- SQLite memory in `duck_core.memory.store.MemoryStore` for durable structured records.
|
||||||
|
- Vector memory in `duck_core.memory.vector_memory.VectorMemory` for semantic search through Qdrant.
|
||||||
|
|
||||||
|
## SQLite Memory
|
||||||
|
|
||||||
|
SQLite is the primary durable store. Runtime writes memory records after
|
||||||
|
`memory_policy` decides that a completed task contains reusable information.
|
||||||
|
Manual memory records can also be added through `/v1/memory` and the WebChat
|
||||||
|
memory drawer.
|
||||||
|
|
||||||
|
SQLite memory remains available even when Qdrant is down.
|
||||||
|
|
||||||
|
## Vector Memory
|
||||||
|
|
||||||
|
Vector memory stores the same useful memory summaries in Qdrant when vector
|
||||||
|
storage is configured and reachable. Qdrant is managed by the local service
|
||||||
|
scripts:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash scripts/duck.sh start
|
||||||
|
bash scripts/duck.sh status --probe
|
||||||
|
bash scripts/duck.sh stop
|
||||||
|
```
|
||||||
|
|
||||||
|
The MTP stack uses the same memory lifecycle through `scripts/duck-mtp.sh`.
|
||||||
|
|
||||||
|
## Embeddings
|
||||||
|
|
||||||
|
The default embedding source is a local `sentence-transformers` model:
|
||||||
|
|
||||||
|
```text
|
||||||
|
./models/all-MiniLM-L6-v2
|
||||||
|
```
|
||||||
|
|
||||||
|
`VectorMemory` lazy-loads that model only when it needs to write or search
|
||||||
|
vectors. Health checks do not load the embedding model; they only probe Qdrant.
|
||||||
|
|
||||||
|
A remote OpenAI-compatible embeddings endpoint can be used by setting
|
||||||
|
`embeddings_base_url`, but the normal local stack does not rely on
|
||||||
|
`llama-server` embeddings.
|
||||||
|
|
||||||
|
If no embedding source is configured, `VectorMemory` raises
|
||||||
|
`EmbeddingsUnavailableError`. It does not silently invent fallback embeddings.
|
||||||
|
|
||||||
|
## Status And Verification
|
||||||
|
|
||||||
|
Runtime status is available through:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl --noproxy '*' 'http://127.0.0.1:8000/v1/status?probe=true'
|
||||||
|
```
|
||||||
|
|
||||||
|
`scripts/duck.sh status --probe` prints the same backend result plus Docker
|
||||||
|
Compose state for Qdrant. WebChat also shows model and vector memory state in
|
||||||
|
the Runtime panel.
|
||||||
|
|
||||||
|
The live smoke test for Qdrant write/search is:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
.venv/bin/python -m pytest tests/smoke/test_vector_memory_live.py -q
|
||||||
|
```
|
||||||
|
|
||||||
|
The test skips when Qdrant is not reachable, and runs a real add/search cycle
|
||||||
|
when the local stack is up.
|
||||||
|
|
|
||||||
|
|
@ -245,12 +245,15 @@ def create_app() -> FastAPI:
|
||||||
memory_records = await relevant_memory(
|
memory_records = await relevant_memory(
|
||||||
body.message, conversation.workspace, conversation.conversation_id
|
body.message, conversation.workspace, conversation.conversation_id
|
||||||
)
|
)
|
||||||
|
memory_sufficient_to_answer = False
|
||||||
# Use recall-role to filter relevant memories via LLM
|
# Use recall-role to filter relevant memories via LLM
|
||||||
if memory_records and runtime.context_builder._model_client is not None:
|
if memory_records and runtime.context_builder._model_client is not None:
|
||||||
try:
|
try:
|
||||||
memory_records = await runtime.context_builder.recall_relevant_memory(
|
recall_decision = await runtime.context_builder.recall_relevant_memory_decision(
|
||||||
body.message, memory_records
|
body.message, memory_records
|
||||||
)
|
)
|
||||||
|
memory_records = recall_decision.records
|
||||||
|
memory_sufficient_to_answer = recall_decision.sufficient_to_answer
|
||||||
except Exception:
|
except Exception:
|
||||||
pass # Fallback to unfiltered memory_records
|
pass # Fallback to unfiltered memory_records
|
||||||
result = await runtime.run_chat(
|
result = await runtime.run_chat(
|
||||||
|
|
@ -262,6 +265,7 @@ def create_app() -> FastAPI:
|
||||||
skill_summary=await selected_skill_summary(body.message),
|
skill_summary=await selected_skill_summary(body.message),
|
||||||
reasoning=body.reasoning,
|
reasoning=body.reasoning,
|
||||||
reflect=bool(settings.enable_reflection),
|
reflect=bool(settings.enable_reflection),
|
||||||
|
skip_action_loop=memory_sufficient_to_answer,
|
||||||
)
|
)
|
||||||
await conversations.add_message(
|
await conversations.add_message(
|
||||||
conversation.conversation_id,
|
conversation.conversation_id,
|
||||||
|
|
@ -377,6 +381,16 @@ def create_app() -> FastAPI:
|
||||||
memory_records = await relevant_memory(
|
memory_records = await relevant_memory(
|
||||||
body.message, conversation.workspace, conversation.conversation_id
|
body.message, conversation.workspace, conversation.conversation_id
|
||||||
)
|
)
|
||||||
|
memory_sufficient_to_answer = False
|
||||||
|
if memory_records and runtime.context_builder._model_client is not None:
|
||||||
|
try:
|
||||||
|
recall_decision = await runtime.context_builder.recall_relevant_memory_decision(
|
||||||
|
body.message, memory_records
|
||||||
|
)
|
||||||
|
memory_records = recall_decision.records
|
||||||
|
memory_sufficient_to_answer = recall_decision.sufficient_to_answer
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
task = await task_store.create_task(
|
task = await task_store.create_task(
|
||||||
body.message, conversation.workspace, body.debug
|
body.message, conversation.workspace, body.debug
|
||||||
)
|
)
|
||||||
|
|
@ -407,9 +421,17 @@ def create_app() -> FastAPI:
|
||||||
"planning",
|
"planning",
|
||||||
"Планирую, нужны ли локальные действия...",
|
"Планирую, нужны ли локальные действия...",
|
||||||
)
|
)
|
||||||
|
tool_observations = []
|
||||||
|
if not memory_sufficient_to_answer:
|
||||||
tool_observations = await runtime._run_action_loop(
|
tool_observations = await runtime._run_action_loop(
|
||||||
task.task_id, messages, conversation.workspace
|
task.task_id, messages, conversation.workspace
|
||||||
)
|
)
|
||||||
|
else:
|
||||||
|
await event_store.append(
|
||||||
|
task.task_id,
|
||||||
|
"action_loop_skipped",
|
||||||
|
{"reason": "recall_sufficient_to_answer"},
|
||||||
|
)
|
||||||
if tool_observations:
|
if tool_observations:
|
||||||
yield runtime_status(
|
yield runtime_status(
|
||||||
task.task_id,
|
task.task_id,
|
||||||
|
|
|
||||||
|
|
@ -2,6 +2,7 @@ from __future__ import annotations
|
||||||
|
|
||||||
import json
|
import json
|
||||||
import logging
|
import logging
|
||||||
|
from dataclasses import dataclass
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
from duck_core.tasks.state import TaskState
|
from duck_core.tasks.state import TaskState
|
||||||
|
|
@ -12,6 +13,13 @@ logger = logging.getLogger(__name__)
|
||||||
_CHARS_PER_TOKEN = 4
|
_CHARS_PER_TOKEN = 4
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class RecallDecision:
|
||||||
|
records: list[dict[str, str]]
|
||||||
|
sufficient_to_answer: bool = False
|
||||||
|
reasoning: str = ""
|
||||||
|
|
||||||
|
|
||||||
def estimate_tokens(text: str) -> int:
|
def estimate_tokens(text: str) -> int:
|
||||||
"""Rough token estimate based on character count."""
|
"""Rough token estimate based on character count."""
|
||||||
return max(len(text) // _CHARS_PER_TOKEN, 1)
|
return max(len(text) // _CHARS_PER_TOKEN, 1)
|
||||||
|
|
@ -64,20 +72,27 @@ class ContextBuilder:
|
||||||
Returns only the memories that are relevant to the query.
|
Returns only the memories that are relevant to the query.
|
||||||
Falls back to returning all records if LLM is unavailable.
|
Falls back to returning all records if LLM is unavailable.
|
||||||
"""
|
"""
|
||||||
|
return (await self.recall_relevant_memory_decision(query, memory_records)).records
|
||||||
|
|
||||||
|
async def recall_relevant_memory_decision(
|
||||||
|
self,
|
||||||
|
query: str,
|
||||||
|
memory_records: list[dict[str, str]],
|
||||||
|
) -> RecallDecision:
|
||||||
if not memory_records or self._model_client is None:
|
if not memory_records or self._model_client is None:
|
||||||
return memory_records
|
return RecallDecision(records=memory_records, sufficient_to_answer=False)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
return await self._llm_recall(query, memory_records)
|
return await self._llm_recall(query, memory_records)
|
||||||
except Exception as exc:
|
except Exception as exc:
|
||||||
logger.warning("Recall failed, using all memories: %s", exc)
|
logger.warning("Recall failed, using all memories: %s", exc)
|
||||||
return memory_records
|
return RecallDecision(records=memory_records, sufficient_to_answer=False)
|
||||||
|
|
||||||
async def _llm_recall(
|
async def _llm_recall(
|
||||||
self,
|
self,
|
||||||
query: str,
|
query: str,
|
||||||
memory_records: list[dict[str, str]],
|
memory_records: list[dict[str, str]],
|
||||||
) -> list[dict[str, str]]:
|
) -> RecallDecision:
|
||||||
"""Call recall-role LLM to identify relevant memories."""
|
"""Call recall-role LLM to identify relevant memories."""
|
||||||
memories_text = "\n".join(
|
memories_text = "\n".join(
|
||||||
f"[{m.get('memory_id', i)}] {m.get('text', '')}"
|
f"[{m.get('memory_id', i)}] {m.get('text', '')}"
|
||||||
|
|
@ -105,6 +120,10 @@ class ContextBuilder:
|
||||||
"type": "array",
|
"type": "array",
|
||||||
"items": {"type": "string"},
|
"items": {"type": "string"},
|
||||||
},
|
},
|
||||||
|
"sufficient_to_answer": {
|
||||||
|
"type": "boolean",
|
||||||
|
"description": "True when selected memories are enough to answer without local tools/actions.",
|
||||||
|
},
|
||||||
"reasoning": {"type": "string"},
|
"reasoning": {"type": "string"},
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
|
|
@ -115,8 +134,16 @@ class ContextBuilder:
|
||||||
data = json.loads(response.content)
|
data = json.loads(response.content)
|
||||||
relevant_ids = set(data.get("relevant_ids", []))
|
relevant_ids = set(data.get("relevant_ids", []))
|
||||||
if not relevant_ids:
|
if not relevant_ids:
|
||||||
return []
|
return RecallDecision(records=[], sufficient_to_answer=False, reasoning=str(data.get("reasoning", "")))
|
||||||
return [m for i, m in enumerate(memory_records) if m.get("memory_id", str(i)) in relevant_ids]
|
records = [
|
||||||
|
m for i, m in enumerate(memory_records)
|
||||||
|
if m.get("memory_id", str(i)) in relevant_ids
|
||||||
|
]
|
||||||
|
return RecallDecision(
|
||||||
|
records=records,
|
||||||
|
sufficient_to_answer=bool(data.get("sufficient_to_answer", False)) and bool(records),
|
||||||
|
reasoning=str(data.get("reasoning", "")),
|
||||||
|
)
|
||||||
|
|
||||||
def build_basic_messages(
|
def build_basic_messages(
|
||||||
self,
|
self,
|
||||||
|
|
|
||||||
|
|
@ -20,6 +20,7 @@ from duck_core.tools.gateway import ToolGateway
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
ACTION_DIRECTIVE_SCHEMA = load_json_schema("duck_core/schemas/action_directive.schema.json")
|
ACTION_DIRECTIVE_SCHEMA = load_json_schema("duck_core/schemas/action_directive.schema.json")
|
||||||
MAX_TOOL_OBSERVATION_TEXT_CHARS = 2000
|
MAX_TOOL_OBSERVATION_TEXT_CHARS = 2000
|
||||||
|
MAX_MEMORY_TRANSCRIPT_CHARS = 6000
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
|
|
@ -67,6 +68,7 @@ class RuntimeLoop:
|
||||||
skill_summary: str | None = None,
|
skill_summary: str | None = None,
|
||||||
reflect: bool = True,
|
reflect: bool = True,
|
||||||
reasoning: ReasoningMode | None = None,
|
reasoning: ReasoningMode | None = None,
|
||||||
|
skip_action_loop: bool = False,
|
||||||
) -> ChatResult:
|
) -> ChatResult:
|
||||||
task = await self.task_store.create_task(message, workspace, debug)
|
task = await self.task_store.create_task(message, workspace, debug)
|
||||||
await self.event_store.append(
|
await self.event_store.append(
|
||||||
|
|
@ -78,7 +80,15 @@ class RuntimeLoop:
|
||||||
messages = await self.context_builder.build_async_messages(
|
messages = await self.context_builder.build_async_messages(
|
||||||
task, history_messages, memory_records, skill_summary=skill_summary
|
task, history_messages, memory_records, skill_summary=skill_summary
|
||||||
)
|
)
|
||||||
|
tool_observations = []
|
||||||
|
if not skip_action_loop:
|
||||||
tool_observations = await self._run_action_loop(task.task_id, messages, workspace)
|
tool_observations = await self._run_action_loop(task.task_id, messages, workspace)
|
||||||
|
else:
|
||||||
|
await self.event_store.append(
|
||||||
|
task.task_id,
|
||||||
|
"action_loop_skipped",
|
||||||
|
{"reason": "recall_sufficient_to_answer"},
|
||||||
|
)
|
||||||
if any(observation.get("requires_approval") for observation in tool_observations):
|
if any(observation.get("requires_approval") for observation in tool_observations):
|
||||||
await self.task_store.waiting_for_approval(task.task_id)
|
await self.task_store.waiting_for_approval(task.task_id)
|
||||||
await self.event_store.append(
|
await self.event_store.append(
|
||||||
|
|
@ -288,7 +298,8 @@ class RuntimeLoop:
|
||||||
if self.memory_store is None:
|
if self.memory_store is None:
|
||||||
return
|
return
|
||||||
try:
|
try:
|
||||||
decision = await self.memory_policy.classify(final_response, task_id)
|
transcript = await self._build_memory_policy_transcript(task_id, final_response)
|
||||||
|
decision = await self.memory_policy.classify(transcript, task_id)
|
||||||
await self.event_store.append(
|
await self.event_store.append(
|
||||||
task_id,
|
task_id,
|
||||||
"memory_policy_decision",
|
"memory_policy_decision",
|
||||||
|
|
@ -326,6 +337,47 @@ class RuntimeLoop:
|
||||||
{"error": str(exc)},
|
{"error": str(exc)},
|
||||||
)
|
)
|
||||||
|
|
||||||
|
async def _build_memory_policy_transcript(self, task_id: str, final_response: str) -> str:
|
||||||
|
task = await self.task_store.get_task(task_id)
|
||||||
|
parts = []
|
||||||
|
if task is not None:
|
||||||
|
parts.append(f"User message:\n{task.user_message}")
|
||||||
|
if task.workspace:
|
||||||
|
parts.append(f"Workspace:\n{task.workspace}")
|
||||||
|
parts.append(f"Assistant final response:\n{final_response}")
|
||||||
|
|
||||||
|
event_summaries = []
|
||||||
|
for event in await self.event_store.list_events(task_id):
|
||||||
|
summary = self._summarize_memory_event(event.event_type, event.payload)
|
||||||
|
if summary:
|
||||||
|
event_summaries.append(f"- {event.event_type}: {summary}")
|
||||||
|
if event_summaries:
|
||||||
|
parts.append("Relevant runtime events:\n" + "\n".join(event_summaries))
|
||||||
|
|
||||||
|
transcript = "\n\n".join(parts)
|
||||||
|
if len(transcript) > MAX_MEMORY_TRANSCRIPT_CHARS:
|
||||||
|
return transcript[:MAX_MEMORY_TRANSCRIPT_CHARS] + "\n...[truncated]"
|
||||||
|
return transcript
|
||||||
|
|
||||||
|
def _summarize_memory_event(self, event_type: str, payload: dict[str, Any]) -> str:
|
||||||
|
if event_type == "action_directive":
|
||||||
|
intent = payload.get("intent")
|
||||||
|
hints = payload.get("memory_hints") or []
|
||||||
|
if hints:
|
||||||
|
return f"intent={intent}; memory_hints={hints}"
|
||||||
|
return f"intent={intent}" if intent else ""
|
||||||
|
if event_type == "tool_call_finished":
|
||||||
|
result = payload.get("result") or {}
|
||||||
|
output = str(result.get("output") or result.get("error") or "").strip()
|
||||||
|
if len(output) > 500:
|
||||||
|
output = output[:500] + "...[truncated]"
|
||||||
|
return "{} ok={} {}".format(payload.get("tool"), result.get("ok"), output).strip()
|
||||||
|
if event_type == "tool_approval_requested":
|
||||||
|
return str(payload.get("reason") or payload.get("tool") or "")
|
||||||
|
if event_type == "tool_call_skipped":
|
||||||
|
return str(payload.get("reason") or "")
|
||||||
|
return ""
|
||||||
|
|
||||||
async def _run_reflection(self, task_id: str) -> None:
|
async def _run_reflection(self, task_id: str) -> None:
|
||||||
"""Run critic reflection on completed task and record experience."""
|
"""Run critic reflection on completed task and record experience."""
|
||||||
if self.experience_recorder is None:
|
if self.experience_recorder is None:
|
||||||
|
|
@ -463,6 +515,18 @@ class RuntimeLoop:
|
||||||
if seen_action_keys is not None:
|
if seen_action_keys is not None:
|
||||||
seen_action_keys.add(action_key)
|
seen_action_keys.add(action_key)
|
||||||
tool_name = str(action.get("tool", ""))
|
tool_name = str(action.get("tool", ""))
|
||||||
|
if tool_name not in gateway.tools:
|
||||||
|
await self.event_store.append(
|
||||||
|
task_id,
|
||||||
|
"tool_call_skipped",
|
||||||
|
{
|
||||||
|
"index": index,
|
||||||
|
"tool": tool_name,
|
||||||
|
"reason": "unknown_tool",
|
||||||
|
"action": action,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
continue
|
||||||
await self.event_store.append(
|
await self.event_store.append(
|
||||||
task_id,
|
task_id,
|
||||||
"tool_call_started",
|
"tool_call_started",
|
||||||
|
|
|
||||||
|
|
@ -894,19 +894,46 @@ async function sendMessage() {
|
||||||
|
|
||||||
async function checkRuntime() {
|
async function checkRuntime() {
|
||||||
try {
|
try {
|
||||||
await jsonFetch("/health");
|
const status = await jsonFetch("/v1/status?probe=true");
|
||||||
|
const services = status.services || {};
|
||||||
|
const llama = services.llama || {};
|
||||||
|
const vector = services.vector_memory || {};
|
||||||
|
const roles = Object.keys(status.models?.roles || {}).sort();
|
||||||
|
|
||||||
setStatus("#api-status", "online", "ok");
|
setStatus("#api-status", "online", "ok");
|
||||||
|
setStatus("#model-status", serviceStatusText(llama), serviceTone(llama));
|
||||||
|
setStatus("#vector-status", serviceStatusText(vector), serviceTone(vector));
|
||||||
|
setStatus("#embedding-status", compactEmbeddingSource(vector.embedding_source), vector.configured ? "ok" : "warn");
|
||||||
|
setStatus("#roles-status", roles.length ? String(roles.length) : "none", roles.length ? "ok" : "warn");
|
||||||
} catch {
|
} catch {
|
||||||
setStatus("#api-status", "offline", "bad");
|
setStatus("#api-status", "offline", "bad");
|
||||||
|
setStatus("#model-status", "unknown", "bad");
|
||||||
|
setStatus("#vector-status", "unknown", "bad");
|
||||||
|
setStatus("#embedding-status", "unknown", "bad");
|
||||||
|
setStatus("#roles-status", "unknown", "bad");
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
|
||||||
try {
|
function serviceStatusText(service) {
|
||||||
const roles = await jsonFetch("/v1/models/ping");
|
if (!service || service.probed === false) return "not probed";
|
||||||
const ok = Object.values(roles).every((item) => item.ok);
|
if (service.ok === true) {
|
||||||
setStatus("#model-status", ok ? "online" : "degraded", ok ? "ok" : "warn");
|
return service.latency_ms !== undefined ? `ok ${Math.round(service.latency_ms)}ms` : "ok";
|
||||||
} catch {
|
|
||||||
setStatus("#model-status", "offline", "bad");
|
|
||||||
}
|
}
|
||||||
|
if (service.ok === false) return "failed";
|
||||||
|
return service.configured === false ? "disabled" : "unknown";
|
||||||
|
}
|
||||||
|
|
||||||
|
function serviceTone(service) {
|
||||||
|
if (!service || service.ok === false) return "bad";
|
||||||
|
if (service.ok === true) return "ok";
|
||||||
|
return service.configured === false ? "warn" : "neutral";
|
||||||
|
}
|
||||||
|
|
||||||
|
function compactEmbeddingSource(source) {
|
||||||
|
if (!source) return "unknown";
|
||||||
|
if (source.startsWith("local:")) return source.slice(6).split("/").filter(Boolean).pop() || "local";
|
||||||
|
if (source.startsWith("remote:")) return "remote";
|
||||||
|
return source;
|
||||||
}
|
}
|
||||||
|
|
||||||
function bindChat() {
|
function bindChat() {
|
||||||
|
|
@ -928,6 +955,9 @@ function bindChat() {
|
||||||
document.querySelector("#reload-chat")?.addEventListener("click", () => {
|
document.querySelector("#reload-chat")?.addEventListener("click", () => {
|
||||||
if (state.currentConversationId) selectConversation(state.currentConversationId).catch(console.error);
|
if (state.currentConversationId) selectConversation(state.currentConversationId).catch(console.error);
|
||||||
});
|
});
|
||||||
|
document.querySelector("#refresh-runtime")?.addEventListener("click", () => {
|
||||||
|
checkRuntime().catch(console.error);
|
||||||
|
});
|
||||||
document.querySelector("#activity-open")?.addEventListener("click", () => {
|
document.querySelector("#activity-open")?.addEventListener("click", () => {
|
||||||
openActivity("events");
|
openActivity("events");
|
||||||
});
|
});
|
||||||
|
|
@ -1114,7 +1144,9 @@ async function renderMemoryPageResults(query) {
|
||||||
if (!results.length) {
|
if (!results.length) {
|
||||||
const empty = document.createElement("p");
|
const empty = document.createElement("p");
|
||||||
empty.className = "compact-empty";
|
empty.className = "compact-empty";
|
||||||
empty.textContent = "No memories found.";
|
empty.textContent = query.trim()
|
||||||
|
? "No matching memories."
|
||||||
|
: "No memories yet. Add one here or let DuckLM store useful task results automatically.";
|
||||||
container.append(empty);
|
container.append(empty);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -4,7 +4,7 @@
|
||||||
<meta charset="utf-8">
|
<meta charset="utf-8">
|
||||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||||
<title>DuckLM WebChat</title>
|
<title>DuckLM WebChat</title>
|
||||||
<link rel="stylesheet" href="/static/style.css">
|
<link rel="stylesheet" href="/static/style.css?v=20260523-runtime-status">
|
||||||
</head>
|
</head>
|
||||||
<body>
|
<body>
|
||||||
<div class="app-shell">
|
<div class="app-shell">
|
||||||
|
|
@ -53,7 +53,10 @@
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section class="status-panel" aria-labelledby="status-title">
|
<section class="status-panel" aria-labelledby="status-title">
|
||||||
|
<div class="panel-heading">
|
||||||
<h2 id="status-title">Runtime</h2>
|
<h2 id="status-title">Runtime</h2>
|
||||||
|
<button id="refresh-runtime" type="button" title="Refresh runtime status">R</button>
|
||||||
|
</div>
|
||||||
<dl>
|
<dl>
|
||||||
<div>
|
<div>
|
||||||
<dt>API</dt>
|
<dt>API</dt>
|
||||||
|
|
@ -63,6 +66,18 @@
|
||||||
<dt>Model</dt>
|
<dt>Model</dt>
|
||||||
<dd id="model-status">checking</dd>
|
<dd id="model-status">checking</dd>
|
||||||
</div>
|
</div>
|
||||||
|
<div>
|
||||||
|
<dt>Vector</dt>
|
||||||
|
<dd id="vector-status">checking</dd>
|
||||||
|
</div>
|
||||||
|
<div>
|
||||||
|
<dt>Embed</dt>
|
||||||
|
<dd id="embedding-status">checking</dd>
|
||||||
|
</div>
|
||||||
|
<div>
|
||||||
|
<dt>Roles</dt>
|
||||||
|
<dd id="roles-status">checking</dd>
|
||||||
|
</div>
|
||||||
<div>
|
<div>
|
||||||
<dt>Last task</dt>
|
<dt>Last task</dt>
|
||||||
<dd id="task-status">none</dd>
|
<dd id="task-status">none</dd>
|
||||||
|
|
@ -146,6 +161,6 @@
|
||||||
</section>
|
</section>
|
||||||
</aside>
|
</aside>
|
||||||
</div>
|
</div>
|
||||||
<script src="/static/app.js"></script>
|
<script src="/static/app.js?v=20260523-runtime-status"></script>
|
||||||
</body>
|
</body>
|
||||||
</html>
|
</html>
|
||||||
|
|
|
||||||
|
|
@ -3,7 +3,7 @@
|
||||||
<head>
|
<head>
|
||||||
<meta charset="utf-8">
|
<meta charset="utf-8">
|
||||||
<title>DuckLM Memory</title>
|
<title>DuckLM Memory</title>
|
||||||
<link rel="stylesheet" href="/static/style.css">
|
<link rel="stylesheet" href="/static/style.css?v=20260523-runtime-status">
|
||||||
</head>
|
</head>
|
||||||
<body>
|
<body>
|
||||||
<main class="simple-page">
|
<main class="simple-page">
|
||||||
|
|
@ -28,6 +28,6 @@
|
||||||
<div id="memory-results" class="memory-list"></div>
|
<div id="memory-results" class="memory-list"></div>
|
||||||
</section>
|
</section>
|
||||||
</main>
|
</main>
|
||||||
<script src="/static/app.js"></script>
|
<script src="/static/app.js?v=20260523-runtime-status"></script>
|
||||||
</body>
|
</body>
|
||||||
</html>
|
</html>
|
||||||
|
|
|
||||||
|
|
@ -3,6 +3,7 @@ You are DuckLM recall role. Given a user query and a list of memory records, ide
|
||||||
Return ONLY valid JSON:
|
Return ONLY valid JSON:
|
||||||
{
|
{
|
||||||
"relevant_ids": ["memory_id_1", "memory_id_2"],
|
"relevant_ids": ["memory_id_1", "memory_id_2"],
|
||||||
|
"sufficient_to_answer": false,
|
||||||
"reasoning": "brief explanation of why these memories were selected"
|
"reasoning": "brief explanation of why these memories were selected"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -11,4 +12,6 @@ Rules:
|
||||||
- Prefer specific memories over general ones
|
- Prefer specific memories over general ones
|
||||||
- Include global memories if they apply to the current context
|
- Include global memories if they apply to the current context
|
||||||
- If no memories are relevant, return empty relevant_ids array
|
- If no memories are relevant, return empty relevant_ids array
|
||||||
|
- Set sufficient_to_answer=true only when the selected memories alone are enough to answer the user directly without local tools, files, commands, repository inspection, web/API calls, or other actions
|
||||||
|
- Set sufficient_to_answer=false when the user asks to inspect, run, change, search, verify, debug, edit, or otherwise act on local/external state
|
||||||
- Be conservative — better to include too few than too many irrelevant memories
|
- Be conservative — better to include too few than too many irrelevant memories
|
||||||
|
|
|
||||||
|
|
@ -20,6 +20,9 @@ ENV_KEYS=(
|
||||||
DUCK_API_PID_FILE
|
DUCK_API_PID_FILE
|
||||||
DUCK_API_LOG_FILE
|
DUCK_API_LOG_FILE
|
||||||
DUCK_API_COMMAND
|
DUCK_API_COMMAND
|
||||||
|
DUCK_QDRANT_MANAGED
|
||||||
|
DUCK_QDRANT_COMPOSE_FILE
|
||||||
|
DUCK_QDRANT_SERVICE
|
||||||
)
|
)
|
||||||
declare -A ENV_OVERRIDES=()
|
declare -A ENV_OVERRIDES=()
|
||||||
for key in "${ENV_KEYS[@]}"; do
|
for key in "${ENV_KEYS[@]}"; do
|
||||||
|
|
@ -43,14 +46,17 @@ API_PID_FILE="${DUCK_API_PID_FILE:-${ROOT_DIR}/data/duck-api.pid}"
|
||||||
API_LOG_FILE="${DUCK_API_LOG_FILE:-${ROOT_DIR}/data/duck-api.log}"
|
API_LOG_FILE="${DUCK_API_LOG_FILE:-${ROOT_DIR}/data/duck-api.log}"
|
||||||
API_URL="http://${DUCK_API_HOST:-127.0.0.1}:${DUCK_API_PORT:-8000}"
|
API_URL="http://${DUCK_API_HOST:-127.0.0.1}:${DUCK_API_PORT:-8000}"
|
||||||
LLAMA_SCRIPT="${ROOT_DIR}/scripts/llama/start_mtp_main.sh"
|
LLAMA_SCRIPT="${ROOT_DIR}/scripts/llama/start_mtp_main.sh"
|
||||||
|
QDRANT_MANAGED="${DUCK_QDRANT_MANAGED:-1}"
|
||||||
|
QDRANT_COMPOSE_FILE="${DUCK_QDRANT_COMPOSE_FILE:-${ROOT_DIR}/docker-compose.memory.yml}"
|
||||||
|
QDRANT_SERVICE="${DUCK_QDRANT_SERVICE:-qdrant}"
|
||||||
|
|
||||||
usage() {
|
usage() {
|
||||||
cat <<'EOF'
|
cat <<'EOF'
|
||||||
Usage: scripts/duck-mtp.sh <command>
|
Usage: scripts/duck-mtp.sh <command>
|
||||||
|
|
||||||
Commands:
|
Commands:
|
||||||
start Start MTP llama-server and DuckLM API in the background
|
start Start Qdrant, MTP llama-server, and DuckLM API in the background
|
||||||
stop Stop DuckLM API and managed MTP llama-server
|
stop Stop DuckLM API, managed MTP llama-server, and Qdrant
|
||||||
restart Stop and start the whole local DuckLM stack
|
restart Stop and start the whole local DuckLM stack
|
||||||
status Print process and HTTP health status
|
status Print process and HTTP health status
|
||||||
Use "status --probe" to include live model/vector checks
|
Use "status --probe" to include live model/vector checks
|
||||||
|
|
@ -63,11 +69,77 @@ Environment:
|
||||||
DUCK_API_PID_FILE API PID file path
|
DUCK_API_PID_FILE API PID file path
|
||||||
DUCK_API_LOG_FILE API log file path
|
DUCK_API_LOG_FILE API log file path
|
||||||
DUCK_API_COMMAND API command override, default ".venv/bin/python -m duck_core.api"
|
DUCK_API_COMMAND API command override, default ".venv/bin/python -m duck_core.api"
|
||||||
|
DUCK_QDRANT_MANAGED Set to 0 to skip Docker Compose Qdrant lifecycle
|
||||||
|
DUCK_QDRANT_COMPOSE_FILE
|
||||||
|
Compose file path, default docker-compose.memory.yml
|
||||||
|
DUCK_QDRANT_SERVICE Compose service name, default qdrant
|
||||||
|
|
||||||
MTP llama-server environment is handled by scripts/llama/start_mtp_main.sh.
|
MTP llama-server environment is handled by scripts/llama/start_mtp_main.sh.
|
||||||
EOF
|
EOF
|
||||||
}
|
}
|
||||||
|
|
||||||
|
qdrant_enabled() {
|
||||||
|
[[ "${QDRANT_MANAGED}" != "0" && "${QDRANT_MANAGED,,}" != "false" && "${QDRANT_MANAGED,,}" != "no" ]]
|
||||||
|
}
|
||||||
|
|
||||||
|
docker_compose() {
|
||||||
|
docker compose -f "${QDRANT_COMPOSE_FILE}" "$@"
|
||||||
|
}
|
||||||
|
|
||||||
|
start_qdrant() {
|
||||||
|
if ! qdrant_enabled; then
|
||||||
|
echo "Qdrant management: disabled"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
if ! command -v docker >/dev/null 2>&1; then
|
||||||
|
echo "Docker not found; cannot start Qdrant" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
if [[ ! -f "${QDRANT_COMPOSE_FILE}" ]]; then
|
||||||
|
echo "Qdrant compose file not found: ${QDRANT_COMPOSE_FILE}" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Starting Qdrant..."
|
||||||
|
docker_compose up -d "${QDRANT_SERVICE}"
|
||||||
|
}
|
||||||
|
|
||||||
|
stop_qdrant() {
|
||||||
|
if ! qdrant_enabled; then
|
||||||
|
echo "Qdrant management: disabled"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
if ! command -v docker >/dev/null 2>&1; then
|
||||||
|
echo "Docker not found; cannot stop Qdrant" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
if [[ ! -f "${QDRANT_COMPOSE_FILE}" ]]; then
|
||||||
|
echo "Qdrant compose file not found: ${QDRANT_COMPOSE_FILE}" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Stopping Qdrant..."
|
||||||
|
docker_compose down
|
||||||
|
}
|
||||||
|
|
||||||
|
status_qdrant() {
|
||||||
|
if ! qdrant_enabled; then
|
||||||
|
echo "Qdrant status: disabled"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
if ! command -v docker >/dev/null 2>&1; then
|
||||||
|
echo "Qdrant status: Docker not found"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
if [[ ! -f "${QDRANT_COMPOSE_FILE}" ]]; then
|
||||||
|
echo "Qdrant status: compose file not found (${QDRANT_COMPOSE_FILE})"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Qdrant status:"
|
||||||
|
docker_compose ps "${QDRANT_SERVICE}"
|
||||||
|
}
|
||||||
|
|
||||||
api_is_running() {
|
api_is_running() {
|
||||||
[[ -f "${API_PID_FILE}" ]] || return 1
|
[[ -f "${API_PID_FILE}" ]] || return 1
|
||||||
local pid
|
local pid
|
||||||
|
|
@ -157,6 +229,7 @@ stop_api() {
|
||||||
}
|
}
|
||||||
|
|
||||||
start_stack() {
|
start_stack() {
|
||||||
|
start_qdrant
|
||||||
"${LLAMA_SCRIPT}" start
|
"${LLAMA_SCRIPT}" start
|
||||||
start_api
|
start_api
|
||||||
echo
|
echo
|
||||||
|
|
@ -167,6 +240,7 @@ start_stack() {
|
||||||
stop_stack() {
|
stop_stack() {
|
||||||
stop_api
|
stop_api
|
||||||
"${LLAMA_SCRIPT}" stop
|
"${LLAMA_SCRIPT}" stop
|
||||||
|
stop_qdrant
|
||||||
}
|
}
|
||||||
|
|
||||||
status_stack() {
|
status_stack() {
|
||||||
|
|
@ -210,6 +284,11 @@ status_stack() {
|
||||||
if [[ "${llama_rc}" != "0" && "${rc}" == "0" ]]; then
|
if [[ "${llama_rc}" != "0" && "${rc}" == "0" ]]; then
|
||||||
rc="${llama_rc}"
|
rc="${llama_rc}"
|
||||||
fi
|
fi
|
||||||
|
local qdrant_rc=0
|
||||||
|
status_qdrant || qdrant_rc=$?
|
||||||
|
if [[ "${qdrant_rc}" != "0" && "${rc}" == "0" ]]; then
|
||||||
|
rc="${qdrant_rc}"
|
||||||
|
fi
|
||||||
return "${rc}"
|
return "${rc}"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -296,6 +375,13 @@ logs_stack() {
|
||||||
echo
|
echo
|
||||||
echo "==> llama-server log: ${DUCK_LLAMA_LOG_FILE:-${ROOT_DIR}/data/llama-mtp.log} <=="
|
echo "==> llama-server log: ${DUCK_LLAMA_LOG_FILE:-${ROOT_DIR}/data/llama-mtp.log} <=="
|
||||||
"${LLAMA_SCRIPT}" logs --lines "${lines}"
|
"${LLAMA_SCRIPT}" logs --lines "${lines}"
|
||||||
|
echo
|
||||||
|
echo "==> Qdrant log <=="
|
||||||
|
if qdrant_enabled && command -v docker >/dev/null 2>&1 && [[ -f "${QDRANT_COMPOSE_FILE}" ]]; then
|
||||||
|
docker_compose logs --tail "${lines}" "${QDRANT_SERVICE}"
|
||||||
|
else
|
||||||
|
status_qdrant || true
|
||||||
|
fi
|
||||||
fi
|
fi
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -20,6 +20,9 @@ ENV_KEYS=(
|
||||||
DUCK_API_PID_FILE
|
DUCK_API_PID_FILE
|
||||||
DUCK_API_LOG_FILE
|
DUCK_API_LOG_FILE
|
||||||
DUCK_API_COMMAND
|
DUCK_API_COMMAND
|
||||||
|
DUCK_QDRANT_MANAGED
|
||||||
|
DUCK_QDRANT_COMPOSE_FILE
|
||||||
|
DUCK_QDRANT_SERVICE
|
||||||
)
|
)
|
||||||
declare -A ENV_OVERRIDES=()
|
declare -A ENV_OVERRIDES=()
|
||||||
for key in "${ENV_KEYS[@]}"; do
|
for key in "${ENV_KEYS[@]}"; do
|
||||||
|
|
@ -43,14 +46,17 @@ API_PID_FILE="${DUCK_API_PID_FILE:-${ROOT_DIR}/data/duck-api.pid}"
|
||||||
API_LOG_FILE="${DUCK_API_LOG_FILE:-${ROOT_DIR}/data/duck-api.log}"
|
API_LOG_FILE="${DUCK_API_LOG_FILE:-${ROOT_DIR}/data/duck-api.log}"
|
||||||
API_URL="http://${DUCK_API_HOST:-127.0.0.1}:${DUCK_API_PORT:-8000}"
|
API_URL="http://${DUCK_API_HOST:-127.0.0.1}:${DUCK_API_PORT:-8000}"
|
||||||
LLAMA_SCRIPT="${ROOT_DIR}/scripts/llama/start_main.sh"
|
LLAMA_SCRIPT="${ROOT_DIR}/scripts/llama/start_main.sh"
|
||||||
|
QDRANT_MANAGED="${DUCK_QDRANT_MANAGED:-1}"
|
||||||
|
QDRANT_COMPOSE_FILE="${DUCK_QDRANT_COMPOSE_FILE:-${ROOT_DIR}/docker-compose.memory.yml}"
|
||||||
|
QDRANT_SERVICE="${DUCK_QDRANT_SERVICE:-qdrant}"
|
||||||
|
|
||||||
usage() {
|
usage() {
|
||||||
cat <<'EOF'
|
cat <<'EOF'
|
||||||
Usage: scripts/duck.sh <command>
|
Usage: scripts/duck.sh <command>
|
||||||
|
|
||||||
Commands:
|
Commands:
|
||||||
start Start llama-server and DuckLM API in the background
|
start Start Qdrant, llama-server, and DuckLM API in the background
|
||||||
stop Stop DuckLM API and managed llama-server
|
stop Stop DuckLM API, managed llama-server, and Qdrant
|
||||||
restart Stop and start the whole local DuckLM stack
|
restart Stop and start the whole local DuckLM stack
|
||||||
status Print process and HTTP health status
|
status Print process and HTTP health status
|
||||||
Use "status --probe" to include live model/vector checks
|
Use "status --probe" to include live model/vector checks
|
||||||
|
|
@ -63,11 +69,77 @@ Environment:
|
||||||
DUCK_API_PID_FILE API PID file path
|
DUCK_API_PID_FILE API PID file path
|
||||||
DUCK_API_LOG_FILE API log file path
|
DUCK_API_LOG_FILE API log file path
|
||||||
DUCK_API_COMMAND API command override, default ".venv/bin/python -m duck_core.api"
|
DUCK_API_COMMAND API command override, default ".venv/bin/python -m duck_core.api"
|
||||||
|
DUCK_QDRANT_MANAGED Set to 0 to skip Docker Compose Qdrant lifecycle
|
||||||
|
DUCK_QDRANT_COMPOSE_FILE
|
||||||
|
Compose file path, default docker-compose.memory.yml
|
||||||
|
DUCK_QDRANT_SERVICE Compose service name, default qdrant
|
||||||
|
|
||||||
llama-server environment is handled by scripts/llama/start_main.sh.
|
llama-server environment is handled by scripts/llama/start_main.sh.
|
||||||
EOF
|
EOF
|
||||||
}
|
}
|
||||||
|
|
||||||
|
qdrant_enabled() {
|
||||||
|
[[ "${QDRANT_MANAGED}" != "0" && "${QDRANT_MANAGED,,}" != "false" && "${QDRANT_MANAGED,,}" != "no" ]]
|
||||||
|
}
|
||||||
|
|
||||||
|
docker_compose() {
|
||||||
|
docker compose -f "${QDRANT_COMPOSE_FILE}" "$@"
|
||||||
|
}
|
||||||
|
|
||||||
|
start_qdrant() {
|
||||||
|
if ! qdrant_enabled; then
|
||||||
|
echo "Qdrant management: disabled"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
if ! command -v docker >/dev/null 2>&1; then
|
||||||
|
echo "Docker not found; cannot start Qdrant" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
if [[ ! -f "${QDRANT_COMPOSE_FILE}" ]]; then
|
||||||
|
echo "Qdrant compose file not found: ${QDRANT_COMPOSE_FILE}" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Starting Qdrant..."
|
||||||
|
docker_compose up -d "${QDRANT_SERVICE}"
|
||||||
|
}
|
||||||
|
|
||||||
|
stop_qdrant() {
|
||||||
|
if ! qdrant_enabled; then
|
||||||
|
echo "Qdrant management: disabled"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
if ! command -v docker >/dev/null 2>&1; then
|
||||||
|
echo "Docker not found; cannot stop Qdrant" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
if [[ ! -f "${QDRANT_COMPOSE_FILE}" ]]; then
|
||||||
|
echo "Qdrant compose file not found: ${QDRANT_COMPOSE_FILE}" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Stopping Qdrant..."
|
||||||
|
docker_compose down
|
||||||
|
}
|
||||||
|
|
||||||
|
status_qdrant() {
|
||||||
|
if ! qdrant_enabled; then
|
||||||
|
echo "Qdrant status: disabled"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
if ! command -v docker >/dev/null 2>&1; then
|
||||||
|
echo "Qdrant status: Docker not found"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
if [[ ! -f "${QDRANT_COMPOSE_FILE}" ]]; then
|
||||||
|
echo "Qdrant status: compose file not found (${QDRANT_COMPOSE_FILE})"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "Qdrant status:"
|
||||||
|
docker_compose ps "${QDRANT_SERVICE}"
|
||||||
|
}
|
||||||
|
|
||||||
api_is_running() {
|
api_is_running() {
|
||||||
[[ -f "${API_PID_FILE}" ]] || return 1
|
[[ -f "${API_PID_FILE}" ]] || return 1
|
||||||
local pid
|
local pid
|
||||||
|
|
@ -157,6 +229,7 @@ stop_api() {
|
||||||
}
|
}
|
||||||
|
|
||||||
start_stack() {
|
start_stack() {
|
||||||
|
start_qdrant
|
||||||
"${LLAMA_SCRIPT}" start
|
"${LLAMA_SCRIPT}" start
|
||||||
start_api
|
start_api
|
||||||
echo
|
echo
|
||||||
|
|
@ -167,6 +240,7 @@ start_stack() {
|
||||||
stop_stack() {
|
stop_stack() {
|
||||||
stop_api
|
stop_api
|
||||||
"${LLAMA_SCRIPT}" stop
|
"${LLAMA_SCRIPT}" stop
|
||||||
|
stop_qdrant
|
||||||
}
|
}
|
||||||
|
|
||||||
status_stack() {
|
status_stack() {
|
||||||
|
|
@ -210,6 +284,11 @@ status_stack() {
|
||||||
if [[ "${llama_rc}" != "0" && "${rc}" == "0" ]]; then
|
if [[ "${llama_rc}" != "0" && "${rc}" == "0" ]]; then
|
||||||
rc="${llama_rc}"
|
rc="${llama_rc}"
|
||||||
fi
|
fi
|
||||||
|
local qdrant_rc=0
|
||||||
|
status_qdrant || qdrant_rc=$?
|
||||||
|
if [[ "${qdrant_rc}" != "0" && "${rc}" == "0" ]]; then
|
||||||
|
rc="${qdrant_rc}"
|
||||||
|
fi
|
||||||
return "${rc}"
|
return "${rc}"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -296,6 +375,13 @@ logs_stack() {
|
||||||
echo
|
echo
|
||||||
echo "==> llama-server log: ${DUCK_LLAMA_LOG_FILE:-${ROOT_DIR}/data/llama-main.log} <=="
|
echo "==> llama-server log: ${DUCK_LLAMA_LOG_FILE:-${ROOT_DIR}/data/llama-main.log} <=="
|
||||||
"${LLAMA_SCRIPT}" logs --lines "${lines}"
|
"${LLAMA_SCRIPT}" logs --lines "${lines}"
|
||||||
|
echo
|
||||||
|
echo "==> Qdrant log <=="
|
||||||
|
if qdrant_enabled && command -v docker >/dev/null 2>&1 && [[ -f "${QDRANT_COMPOSE_FILE}" ]]; then
|
||||||
|
docker_compose logs --tail "${lines}" "${QDRANT_SERVICE}"
|
||||||
|
else
|
||||||
|
status_qdrant || true
|
||||||
|
fi
|
||||||
fi
|
fi
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -111,6 +111,77 @@ def test_stream_chat_forwards_reasoning_toggle_to_thinker(tmp_path, monkeypatch)
|
||||||
assert captured == {"role": "thinker", "reasoning": "off"}
|
assert captured == {"role": "thinker", "reasoning": "off"}
|
||||||
|
|
||||||
|
|
||||||
|
def test_stream_chat_skips_action_role_for_direct_memory_question(tmp_path, monkeypatch):
|
||||||
|
monkeypatch.setenv("DUCK_DB_PATH", str(tmp_path / "duck.sqlite3"))
|
||||||
|
calls = []
|
||||||
|
|
||||||
|
async def fake_chat(self, role, messages, temperature=None, max_output_tokens=None, response_format=None):
|
||||||
|
calls.append(role)
|
||||||
|
if role == "action":
|
||||||
|
raise AssertionError("direct memory question should skip action role")
|
||||||
|
if role == "recall":
|
||||||
|
content = {
|
||||||
|
"relevant_ids": ["0"],
|
||||||
|
"sufficient_to_answer": True,
|
||||||
|
"reasoning": "name memory is relevant and sufficient",
|
||||||
|
}
|
||||||
|
elif role == "memory_policy":
|
||||||
|
content = {
|
||||||
|
"should_store": False,
|
||||||
|
"memory_type": "note",
|
||||||
|
"summary": "Recall answer, no new memory.",
|
||||||
|
"importance": 0.1,
|
||||||
|
"scope": "conversation",
|
||||||
|
"metadata": {},
|
||||||
|
}
|
||||||
|
else:
|
||||||
|
content = {
|
||||||
|
"kind": "action_directive",
|
||||||
|
"intent": "unused",
|
||||||
|
"risk_level": "none",
|
||||||
|
"actions": [],
|
||||||
|
}
|
||||||
|
return ModelResponse(
|
||||||
|
role=role,
|
||||||
|
model="local-main",
|
||||||
|
content=json.dumps(content),
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=1.0,
|
||||||
|
)
|
||||||
|
|
||||||
|
async def fake_stream_chat(self, role, messages, **kwargs):
|
||||||
|
assert role == "thinker"
|
||||||
|
yield {"type": "content_delta", "delta": "Вас зовут Владимир."}
|
||||||
|
|
||||||
|
monkeypatch.setattr("duck_core.model_client.ModelClient.chat", fake_chat)
|
||||||
|
monkeypatch.setattr("duck_core.model_client.ModelClient.stream_chat", fake_stream_chat)
|
||||||
|
with TestClient(create_app()) as client:
|
||||||
|
client.post(
|
||||||
|
"/v1/memory",
|
||||||
|
json={
|
||||||
|
"text": "Пользователя зовут Владимир.",
|
||||||
|
"workspace": "./workspace",
|
||||||
|
"scope": "global",
|
||||||
|
"memory_type": "fact",
|
||||||
|
"importance": 0.8,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
with client.stream(
|
||||||
|
"POST",
|
||||||
|
"/v1/chat/stream",
|
||||||
|
json={"message": "Как меня зовут? Ответь коротко.", "workspace": "./workspace"},
|
||||||
|
) as response:
|
||||||
|
body = "".join(response.iter_text())
|
||||||
|
task_id = re.search(r'"task_id"\s*:\s*"([^"]+)"', body).group(1)
|
||||||
|
events = client.get(f"/v1/tasks/{task_id}/events").json()
|
||||||
|
|
||||||
|
assert response.status_code == 200
|
||||||
|
assert "Вас зовут Владимир." in body
|
||||||
|
assert "action" not in calls
|
||||||
|
assert any(event["event_type"] == "action_loop_skipped" for event in events)
|
||||||
|
|
||||||
|
|
||||||
def test_stream_chat_runs_memory_policy_and_reflection_after_completion(tmp_path, monkeypatch):
|
def test_stream_chat_runs_memory_policy_and_reflection_after_completion(tmp_path, monkeypatch):
|
||||||
monkeypatch.setenv("DUCK_DB_PATH", str(tmp_path / "duck.sqlite3"))
|
monkeypatch.setenv("DUCK_DB_PATH", str(tmp_path / "duck.sqlite3"))
|
||||||
monkeypatch.setenv("DUCK_ENABLE_REFLECTION", "1")
|
monkeypatch.setenv("DUCK_ENABLE_REFLECTION", "1")
|
||||||
|
|
|
||||||
|
|
@ -156,7 +156,7 @@ async def test_context_builder_recall_awaits_model_client():
|
||||||
return_value=ModelResponse(
|
return_value=ModelResponse(
|
||||||
role="recall",
|
role="recall",
|
||||||
model="local-main",
|
model="local-main",
|
||||||
content='{"relevant_ids":["mem_1"],"reasoning":"matches query"}',
|
content='{"relevant_ids":["mem_1"],"sufficient_to_answer":true,"reasoning":"matches query"}',
|
||||||
reasoning_content=None,
|
reasoning_content=None,
|
||||||
raw={},
|
raw={},
|
||||||
latency_ms=1.0,
|
latency_ms=1.0,
|
||||||
|
|
@ -174,6 +174,32 @@ async def test_context_builder_recall_awaits_model_client():
|
||||||
model_client.chat.assert_awaited_once()
|
model_client.chat.assert_awaited_once()
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_context_builder_recall_returns_sufficiency_decision():
|
||||||
|
model_client = AsyncMock()
|
||||||
|
model_client.chat = AsyncMock(
|
||||||
|
return_value=ModelResponse(
|
||||||
|
role="recall",
|
||||||
|
model="local-main",
|
||||||
|
content='{"relevant_ids":["mem_1"],"sufficient_to_answer":true,"reasoning":"memory answers directly"}',
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=1.0,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
builder = ContextBuilder(model_client=model_client)
|
||||||
|
records = [
|
||||||
|
{"memory_id": "mem_1", "text": "User name is Vladimir."},
|
||||||
|
{"memory_id": "mem_2", "text": "Unrelated."},
|
||||||
|
]
|
||||||
|
|
||||||
|
decision = await builder.recall_relevant_memory_decision("What is my name?", records)
|
||||||
|
|
||||||
|
assert decision.records == [records[0]]
|
||||||
|
assert decision.sufficient_to_answer is True
|
||||||
|
assert decision.reasoning == "memory answers directly"
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
@pytest.mark.asyncio
|
||||||
async def test_context_builder_summary_awaits_model_client():
|
async def test_context_builder_summary_awaits_model_client():
|
||||||
model_client = AsyncMock()
|
model_client = AsyncMock()
|
||||||
|
|
|
||||||
|
|
@ -50,6 +50,7 @@ def test_duck_script_manages_llama_and_api(tmp_path):
|
||||||
"DUCK_API_LOG_FILE": str(api_log_file),
|
"DUCK_API_LOG_FILE": str(api_log_file),
|
||||||
"DUCK_API_COMMAND": str(fake_api),
|
"DUCK_API_COMMAND": str(fake_api),
|
||||||
"DUCK_API_PORT": "18000",
|
"DUCK_API_PORT": "18000",
|
||||||
|
"DUCK_QDRANT_MANAGED": "0",
|
||||||
}
|
}
|
||||||
script = "scripts/duck.sh"
|
script = "scripts/duck.sh"
|
||||||
|
|
||||||
|
|
@ -142,6 +143,7 @@ def test_duck_mtp_script_starts_mtp_llama_for_duck_api(tmp_path):
|
||||||
"DUCK_API_LOG_FILE": str(api_log_file),
|
"DUCK_API_LOG_FILE": str(api_log_file),
|
||||||
"DUCK_API_COMMAND": str(fake_api),
|
"DUCK_API_COMMAND": str(fake_api),
|
||||||
"DUCK_API_PORT": "18000",
|
"DUCK_API_PORT": "18000",
|
||||||
|
"DUCK_QDRANT_MANAGED": "0",
|
||||||
}
|
}
|
||||||
script = "scripts/duck-mtp.sh"
|
script = "scripts/duck-mtp.sh"
|
||||||
|
|
||||||
|
|
@ -225,6 +227,7 @@ def test_duck_mtp_script_sets_llama_bin_dir_library_path_for_help_check(tmp_path
|
||||||
"DUCK_API_LOG_FILE": str(tmp_path / "duck-api-mtp.log"),
|
"DUCK_API_LOG_FILE": str(tmp_path / "duck-api-mtp.log"),
|
||||||
"DUCK_API_COMMAND": str(fake_api),
|
"DUCK_API_COMMAND": str(fake_api),
|
||||||
"DUCK_API_PORT": "18000",
|
"DUCK_API_PORT": "18000",
|
||||||
|
"DUCK_QDRANT_MANAGED": "0",
|
||||||
}
|
}
|
||||||
|
|
||||||
started = subprocess.run(
|
started = subprocess.run(
|
||||||
|
|
@ -244,3 +247,97 @@ def test_duck_mtp_script_sets_llama_bin_dir_library_path_for_help_check(tmp_path
|
||||||
assert "--model-draft" not in logs.stdout
|
assert "--model-draft" not in logs.stdout
|
||||||
finally:
|
finally:
|
||||||
subprocess.run(["scripts/duck-mtp.sh", "stop"], env=env, text=True, capture_output=True)
|
subprocess.run(["scripts/duck-mtp.sh", "stop"], env=env, text=True, capture_output=True)
|
||||||
|
|
||||||
|
|
||||||
|
def test_duck_script_manages_qdrant_compose_service(tmp_path):
|
||||||
|
docker_log = tmp_path / "docker.log"
|
||||||
|
fake_docker = tmp_path / "docker"
|
||||||
|
fake_docker.write_text(
|
||||||
|
textwrap.dedent(
|
||||||
|
f"""\
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
echo "$*" >> "{docker_log}"
|
||||||
|
if [[ "$*" == *"compose"* && "$*" == *"ps qdrant"* ]]; then
|
||||||
|
echo "ducklm-qdrant-1 running"
|
||||||
|
fi
|
||||||
|
if [[ "$*" == *"compose"* && "$*" == *"logs"* ]]; then
|
||||||
|
echo "fake qdrant log"
|
||||||
|
fi
|
||||||
|
exit 0
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
)
|
||||||
|
fake_docker.chmod(0o755)
|
||||||
|
|
||||||
|
fake_llama = tmp_path / "llama-server"
|
||||||
|
fake_llama.write_text(
|
||||||
|
textwrap.dedent(
|
||||||
|
"""\
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
echo "fake llama-server $*" >&2
|
||||||
|
trap 'exit 0' TERM INT
|
||||||
|
while true; do sleep 1; done
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
)
|
||||||
|
fake_llama.chmod(0o755)
|
||||||
|
|
||||||
|
fake_api = tmp_path / "fake-api.sh"
|
||||||
|
fake_api.write_text(
|
||||||
|
textwrap.dedent(
|
||||||
|
"""\
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
trap 'exit 0' TERM INT
|
||||||
|
while true; do sleep 1; done
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
)
|
||||||
|
fake_api.chmod(0o755)
|
||||||
|
|
||||||
|
api_pid_file = tmp_path / "duck-api.pid"
|
||||||
|
env = {
|
||||||
|
**os.environ,
|
||||||
|
"PATH": f"{tmp_path}:{os.environ['PATH']}",
|
||||||
|
"DUCK_LLAMA_SERVER_BIN": str(fake_llama),
|
||||||
|
"DUCK_MAIN_MODEL_PATH": str(tmp_path / "model.gguf"),
|
||||||
|
"DUCK_LLAMA_PID_FILE": str(tmp_path / "llama.pid"),
|
||||||
|
"DUCK_LLAMA_LOG_FILE": str(tmp_path / "llama.log"),
|
||||||
|
"DUCK_MAIN_PORT": "18081",
|
||||||
|
"DUCK_API_PID_FILE": str(api_pid_file),
|
||||||
|
"DUCK_API_LOG_FILE": str(tmp_path / "duck-api.log"),
|
||||||
|
"DUCK_API_COMMAND": str(fake_api),
|
||||||
|
"DUCK_API_PORT": "18000",
|
||||||
|
}
|
||||||
|
Path(env["DUCK_MAIN_MODEL_PATH"]).write_text("fake")
|
||||||
|
|
||||||
|
started = subprocess.run(
|
||||||
|
["scripts/duck.sh", "start"], env=env, text=True, capture_output=True
|
||||||
|
)
|
||||||
|
assert started.returncode == 0
|
||||||
|
assert "Starting Qdrant" in started.stdout
|
||||||
|
assert "Qdrant status:" in started.stdout
|
||||||
|
|
||||||
|
running = subprocess.run(
|
||||||
|
["scripts/duck.sh", "status"], env=env, text=True, capture_output=True
|
||||||
|
)
|
||||||
|
assert running.returncode == 0
|
||||||
|
assert "Qdrant status:" in running.stdout
|
||||||
|
assert "ducklm-qdrant-1 running" in running.stdout
|
||||||
|
|
||||||
|
logs = subprocess.run(
|
||||||
|
["scripts/duck.sh", "logs", "--lines", "20"], env=env, text=True, capture_output=True
|
||||||
|
)
|
||||||
|
assert logs.returncode == 0
|
||||||
|
assert "Qdrant log" in logs.stdout
|
||||||
|
assert "fake qdrant log" in logs.stdout
|
||||||
|
|
||||||
|
stopped = subprocess.run(["scripts/duck.sh", "stop"], env=env, text=True, capture_output=True)
|
||||||
|
assert stopped.returncode == 0
|
||||||
|
assert "Stopping Qdrant" in stopped.stdout
|
||||||
|
|
||||||
|
calls = docker_log.read_text()
|
||||||
|
assert "compose -f" in calls
|
||||||
|
assert "docker-compose.memory.yml up -d qdrant" in calls
|
||||||
|
assert "docker-compose.memory.yml ps qdrant" in calls
|
||||||
|
assert "docker-compose.memory.yml logs --tail 20 qdrant" in calls
|
||||||
|
assert "docker-compose.memory.yml down" in calls
|
||||||
|
|
|
||||||
|
|
@ -4,7 +4,11 @@ from unittest.mock import AsyncMock
|
||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
from duck_core.memory.policy import MemoryPolicy
|
from duck_core.memory.policy import MemoryPolicy
|
||||||
|
from duck_core.memory.store import MemoryStore
|
||||||
from duck_core.model_client import ModelClient, ModelResponse
|
from duck_core.model_client import ModelClient, ModelResponse
|
||||||
|
from duck_core.events.store import EventStore
|
||||||
|
from duck_core.runtime_loop import RuntimeLoop
|
||||||
|
from duck_core.tasks.store import TaskStore
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture
|
@pytest.fixture
|
||||||
|
|
@ -145,3 +149,81 @@ async def test_llm_memory_policy_schema_violation_falls_back(mock_model_client):
|
||||||
assert decision.should_store is False
|
assert decision.should_store is False
|
||||||
assert decision.metadata["source"] == "llm_policy_fallback"
|
assert decision.metadata["source"] == "llm_policy_fallback"
|
||||||
assert "schema violation" in decision.metadata["error"]
|
assert "schema violation" in decision.metadata["error"]
|
||||||
|
|
||||||
|
|
||||||
|
class FakeRuntimeMemoryModelClient:
|
||||||
|
def __init__(self):
|
||||||
|
self.memory_policy_prompt = ""
|
||||||
|
|
||||||
|
async def chat(self, role, messages, **kwargs):
|
||||||
|
if role == "action":
|
||||||
|
return ModelResponse(
|
||||||
|
role=role,
|
||||||
|
model="local-main",
|
||||||
|
content=json.dumps({
|
||||||
|
"kind": "action_directive",
|
||||||
|
"intent": "direct answer",
|
||||||
|
"risk_level": "none",
|
||||||
|
"actions": [],
|
||||||
|
}),
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=1.0,
|
||||||
|
)
|
||||||
|
if role == "thinker":
|
||||||
|
return ModelResponse(
|
||||||
|
role=role,
|
||||||
|
model="local-main",
|
||||||
|
content="Приятно познакомиться. Чем помочь?",
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=1.0,
|
||||||
|
)
|
||||||
|
if role == "memory_policy":
|
||||||
|
self.memory_policy_prompt = messages[0]["content"]
|
||||||
|
should_store = "Меня зовут Владимир" in self.memory_policy_prompt
|
||||||
|
return ModelResponse(
|
||||||
|
role=role,
|
||||||
|
model="local-main",
|
||||||
|
content=json.dumps({
|
||||||
|
"should_store": should_store,
|
||||||
|
"memory_type": "fact",
|
||||||
|
"summary": "User's name is Vladimir.",
|
||||||
|
"importance": 0.7,
|
||||||
|
"scope": "global",
|
||||||
|
"metadata": {},
|
||||||
|
}),
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=1.0,
|
||||||
|
)
|
||||||
|
raise AssertionError(f"unexpected role: {role}")
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_runtime_memory_policy_sees_user_message_not_only_final_answer(tmp_path):
|
||||||
|
db_path = str(tmp_path / "duck.sqlite3")
|
||||||
|
task_store = TaskStore(db_path)
|
||||||
|
event_store = EventStore(db_path)
|
||||||
|
memory_store = MemoryStore(db_path)
|
||||||
|
model_client = FakeRuntimeMemoryModelClient()
|
||||||
|
loop = RuntimeLoop(
|
||||||
|
task_store,
|
||||||
|
event_store,
|
||||||
|
model_client,
|
||||||
|
memory_store=memory_store,
|
||||||
|
)
|
||||||
|
|
||||||
|
result = await loop.run_chat(
|
||||||
|
"Меня зовут Владимир. Я работаю системным администратором.",
|
||||||
|
str(tmp_path),
|
||||||
|
debug=True,
|
||||||
|
reflect=False,
|
||||||
|
)
|
||||||
|
memories = await memory_store.list(workspace=str(tmp_path))
|
||||||
|
|
||||||
|
assert result.status == "completed"
|
||||||
|
assert "User message:" in model_client.memory_policy_prompt
|
||||||
|
assert "Меня зовут Владимир" in model_client.memory_policy_prompt
|
||||||
|
assert "Assistant final response:" in model_client.memory_policy_prompt
|
||||||
|
assert memories[0].text == "User's name is Vladimir."
|
||||||
|
|
|
||||||
|
|
@ -211,6 +211,61 @@ class FakeRepeatingActionModelClient:
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class FakeUnknownToolActionModelClient:
|
||||||
|
async def chat(self, role, messages):
|
||||||
|
if role == "action":
|
||||||
|
return ModelResponse(
|
||||||
|
role=role,
|
||||||
|
model="local-main",
|
||||||
|
content=json.dumps(
|
||||||
|
{
|
||||||
|
"kind": "action_directive",
|
||||||
|
"intent": "answer from context",
|
||||||
|
"risk_level": "low",
|
||||||
|
"actions": [
|
||||||
|
{
|
||||||
|
"tool": "answer",
|
||||||
|
"args": {"text": "This is not a real tool."},
|
||||||
|
"reason": "Model attempted to answer as a tool",
|
||||||
|
}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
),
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=5.0,
|
||||||
|
)
|
||||||
|
assert role == "thinker"
|
||||||
|
return ModelResponse(
|
||||||
|
role=role,
|
||||||
|
model="local-main",
|
||||||
|
content="Answered normally without unknown tool execution.",
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=12.0,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class FakeDirectMemoryQuestionModelClient:
|
||||||
|
def __init__(self):
|
||||||
|
self.roles = []
|
||||||
|
|
||||||
|
async def chat(self, role, messages, **kwargs):
|
||||||
|
self.roles.append(role)
|
||||||
|
if role == "action":
|
||||||
|
raise AssertionError("direct memory question should skip action role")
|
||||||
|
assert role == "thinker"
|
||||||
|
assert any("Known memory" in message["content"] for message in messages)
|
||||||
|
return ModelResponse(
|
||||||
|
role=role,
|
||||||
|
model="local-main",
|
||||||
|
content="Вас зовут Владимир.",
|
||||||
|
reasoning_content=None,
|
||||||
|
raw={},
|
||||||
|
latency_ms=12.0,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
@pytest.mark.asyncio
|
@pytest.mark.asyncio
|
||||||
async def test_runtime_executes_action_directive_tool_and_finishes_with_observation(tmp_path):
|
async def test_runtime_executes_action_directive_tool_and_finishes_with_observation(tmp_path):
|
||||||
(tmp_path / "note.txt").write_text("hello from tool")
|
(tmp_path / "note.txt").write_text("hello from tool")
|
||||||
|
|
@ -330,6 +385,50 @@ async def test_runtime_skips_duplicate_action_within_same_task(tmp_path):
|
||||||
assert skipped_tools[0].payload["reason"] == "duplicate_action"
|
assert skipped_tools[0].payload["reason"] == "duplicate_action"
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_runtime_skips_unknown_action_tools_before_gateway(tmp_path):
|
||||||
|
db_path = str(tmp_path / "duck.sqlite3")
|
||||||
|
task_store = TaskStore(db_path)
|
||||||
|
event_store = EventStore(db_path)
|
||||||
|
loop = RuntimeLoop(task_store, event_store, FakeUnknownToolActionModelClient())
|
||||||
|
|
||||||
|
result = await loop.run_chat("answer from known context", str(tmp_path), debug=True)
|
||||||
|
events = await event_store.list_events(result.task_id)
|
||||||
|
skipped_tools = [event for event in events if event.event_type == "tool_call_skipped"]
|
||||||
|
|
||||||
|
assert result.status == "completed"
|
||||||
|
assert result.final_response == "Answered normally without unknown tool execution."
|
||||||
|
assert len(skipped_tools) == 1
|
||||||
|
assert skipped_tools[0].payload["reason"] == "unknown_tool"
|
||||||
|
assert skipped_tools[0].payload["tool"] == "answer"
|
||||||
|
assert not any(event.event_type == "tool_call_started" for event in events)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_runtime_skips_action_loop_for_direct_memory_question(tmp_path):
|
||||||
|
db_path = str(tmp_path / "duck.sqlite3")
|
||||||
|
task_store = TaskStore(db_path)
|
||||||
|
event_store = EventStore(db_path)
|
||||||
|
model_client = FakeDirectMemoryQuestionModelClient()
|
||||||
|
loop = RuntimeLoop(task_store, event_store, model_client)
|
||||||
|
|
||||||
|
result = await loop.run_chat(
|
||||||
|
"Как меня зовут? Ответь коротко.",
|
||||||
|
str(tmp_path),
|
||||||
|
debug=True,
|
||||||
|
memory_records=[{"text": "Known memory: user's name is Vladimir."}],
|
||||||
|
skip_action_loop=True,
|
||||||
|
reflect=False,
|
||||||
|
)
|
||||||
|
events = await event_store.list_events(result.task_id)
|
||||||
|
|
||||||
|
assert result.status == "completed"
|
||||||
|
assert result.final_response == "Вас зовут Владимир."
|
||||||
|
assert model_client.roles == ["thinker"]
|
||||||
|
assert any(event.event_type == "action_loop_skipped" for event in events)
|
||||||
|
assert not any(event.event_type == "model_call_started" and event.payload["role"] == "action" for event in events)
|
||||||
|
|
||||||
|
|
||||||
class FakeApprovalModelClient:
|
class FakeApprovalModelClient:
|
||||||
async def chat(self, role, messages):
|
async def chat(self, role, messages):
|
||||||
if role == "action":
|
if role == "action":
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,39 @@
|
||||||
|
import os
|
||||||
|
from uuid import uuid4
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from duck_core.memory.vector_memory import VectorMemory
|
||||||
|
|
||||||
|
|
||||||
|
async def qdrant_available(url: str) -> bool:
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient(timeout=2.0, trust_env=False) as client:
|
||||||
|
response = await client.get(url)
|
||||||
|
response.raise_for_status()
|
||||||
|
return True
|
||||||
|
except httpx.HTTPError:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_vector_memory_live_qdrant_write_and_search():
|
||||||
|
qdrant_url = os.environ.get("DUCK_QDRANT_URL", "http://127.0.0.1:6333")
|
||||||
|
if not await qdrant_available(qdrant_url):
|
||||||
|
pytest.skip(f"Qdrant is not running at {qdrant_url}")
|
||||||
|
|
||||||
|
embedding_model = os.environ.get("DUCK_LOCAL_EMBEDDING_MODEL", "./models/all-MiniLM-L6-v2")
|
||||||
|
collection = f"duck_memory_smoke_{uuid4().hex[:12]}"
|
||||||
|
memory = VectorMemory(
|
||||||
|
qdrant_url=qdrant_url,
|
||||||
|
collection_name=collection,
|
||||||
|
local_embedding_model=embedding_model,
|
||||||
|
)
|
||||||
|
marker = f"DuckLM vector memory live smoke {uuid4().hex}"
|
||||||
|
|
||||||
|
point_id = await memory.add_memory(marker, {"kind": "live_smoke"})
|
||||||
|
results = await memory.search_memory(marker, limit=3)
|
||||||
|
|
||||||
|
assert point_id
|
||||||
|
assert any(item.get("payload", {}).get("text") == marker for item in results)
|
||||||
Loading…
Reference in New Issue