Agent loop¶
Gaby's investigation engine is a homegrown state machine — not a LangChain / LlamaIndex / Autogen runtime. The motivation is reviewability: every transition is named, snapshottable, and resumable from disk.
States¶
PLANNING → RETRIEVING → SAFETY_CHK → ACTING → OBSERVING → VERDICT → WRITING_BACK
↓ ↓ ↓ ↓
DONE HALTED WAITING_APPROVAL
- PLANNING — call the planner LLM with the working-memory envelope, get a tool plan.
- RETRIEVING — pull relevant KB chunks via the hybrid BM25 + vector retriever.
- SAFETY_CHK — run the authz matrix; emit
NeedsApprovalfor dangerous writes. - ACTING — dispatch the tool call to the MCP host.
- OBSERVING — parse the connector's response back into working memory.
- VERDICT — call the verdict LLM, get
auto_resolved/propose_resolution/escalate. - WRITING_BACK — post the reply to the ticket source.
Each transition snapshots to investigations.working_memory_snapshot,
so a kill -9 mid-investigation resumes from the last completed state
without re-running tool calls.
Source: backend/src/gaby/agent/loop.py
One LLM call per purpose¶
Instead of one big planner-with-tools loop, Gaby splits the agent into four LLM purposes, each with its own prompt:
planner.py— what to do next.tool_selector.py— which tool to call.summarizer.py— compact prior observations to keep the context small.verdict.py— auto-resolve / propose / escalate.
Plus ask.py for the read-only operator console (Iter 16) and
replay/engine.py for batch knowledge extraction from historical tickets.
Prompts live as plain Markdown under backend/src/gaby/agent/prompts/
— no prompt framework, no chained calls inside one purpose.
4-breakpoint prompt cache¶
See Architecture §8.2 for the canonical layout. The short version: tools / system / KB chunks / messages are the four breakpoints, sized so the planner re-uses 95%+ of the input on every iteration.