Skip to content

Agent loop

Gaby's investigation engine is a homegrown state machine — not a LangChain / LlamaIndex / Autogen runtime. The motivation is reviewability: every transition is named, snapshottable, and resumable from disk.

States

PLANNING → RETRIEVING → SAFETY_CHK → ACTING → OBSERVING → VERDICT → WRITING_BACK
              ↓             ↓           ↓        ↓
            DONE         HALTED   WAITING_APPROVAL
  • PLANNING — call the planner LLM with the working-memory envelope, get a tool plan.
  • RETRIEVING — pull relevant KB chunks via the hybrid BM25 + vector retriever.
  • SAFETY_CHK — run the authz matrix; emit NeedsApproval for dangerous writes.
  • ACTING — dispatch the tool call to the MCP host.
  • OBSERVING — parse the connector's response back into working memory.
  • VERDICT — call the verdict LLM, get auto_resolved / propose_resolution / escalate.
  • WRITING_BACK — post the reply to the ticket source.

Each transition snapshots to investigations.working_memory_snapshot, so a kill -9 mid-investigation resumes from the last completed state without re-running tool calls.

Source: backend/src/gaby/agent/loop.py

One LLM call per purpose

Instead of one big planner-with-tools loop, Gaby splits the agent into four LLM purposes, each with its own prompt:

  • planner.py — what to do next.
  • tool_selector.py — which tool to call.
  • summarizer.py — compact prior observations to keep the context small.
  • verdict.py — auto-resolve / propose / escalate.

Plus ask.py for the read-only operator console (Iter 16) and replay/engine.py for batch knowledge extraction from historical tickets.

Prompts live as plain Markdown under backend/src/gaby/agent/prompts/no prompt framework, no chained calls inside one purpose.

4-breakpoint prompt cache

See Architecture §8.2 for the canonical layout. The short version: tools / system / KB chunks / messages are the four breakpoints, sized so the planner re-uses 95%+ of the input on every iteration.