Agent loop¶

Gaby's investigation engine is a homegrown state machine — not a LangChain / LlamaIndex / Autogen runtime. The motivation is reviewability: every transition is named, snapshottable, and resumable from disk.

States¶

PLANNING → RETRIEVING → SAFETY_CHK → ACTING → OBSERVING → VERDICT → WRITING_BACK
              ↓             ↓           ↓        ↓
            DONE         HALTED   WAITING_APPROVAL

PLANNING — call the planner LLM with the working-memory envelope, get a tool plan.
RETRIEVING — pull relevant KB chunks via the hybrid BM25 + vector retriever.
SAFETY_CHK — run the authz matrix; emit NeedsApproval for dangerous writes.
ACTING — dispatch the tool call to the MCP host.
OBSERVING — parse the connector's response back into working memory.
VERDICT — call the verdict LLM, get auto_resolved / propose_resolution / escalate.
WRITING_BACK — post the reply to the ticket source.

Each transition snapshots to investigations.working_memory_snapshot, so a kill -9 mid-investigation resumes from the last completed state without re-running tool calls.

Source: backend/src/gaby/agent/loop.py

One LLM call per purpose¶

Instead of one big planner-with-tools loop, Gaby splits the agent into four LLM purposes, each with its own prompt:

planner.py — what to do next.
tool_selector.py — which tool to call.
summarizer.py — compact prior observations to keep the context small.
verdict.py — auto-resolve / propose / escalate.

Plus ask.py for the read-only operator console (Iter 16) and replay/engine.py for batch knowledge extraction from historical tickets.

Prompts live as plain Markdown under backend/src/gaby/agent/prompts/ — no prompt framework, no chained calls inside one purpose.

4-breakpoint prompt cache¶

See Architecture §8.2 for the canonical layout. The short version: tools / system / KB chunks / messages are the four breakpoints, sized so the planner re-uses 95%+ of the input on every iteration.