Foundation¶

Foundation Plan — Gaby v0.1 → v1.0¶

Status: Draft v0.1 · Owner: Guilliano · Last updated: 2026-04-11

This is the foundation plan. It sits one step below SPEC.md (the what) and above ARCHITECTURE.md (the detailed how). Its job is to lock the decisions that are expensive to change later:

What language(s) we write in

How the repo is laid out

What the data model looks like

How we test it

How we ship it

How the prototype UI becomes a real design system

Everything in this doc is opinionated and defaulted. If a section says "we pick X", assume we start with X and revisit only if we have measured data showing X isn't working. The goal is to stop debating foundation and start building.

Grounding: the first roadmap target is the Founder persona. Every decision below is sized for "ship v0.1 in a small-team time frame" but structured so v0.2 (MSP), v0.3 (Support Lead), and v0.4 (SRE) slot in without rewrites.

0. Guiding principles (for any decision not listed below)¶

Boring tech by default. Python, React, Postgres, SQLite, Docker. Every novel dependency is a liability paid in debugging hours at 3 a.m.
Two languages max. One backend (Python), one frontend (TypeScript/React). No Go CLI, no Rust agent core, no Elixir sidecar. Adding a third language requires a written justification.
Ship a running docker compose up on day one of every week. The demo always works. If it doesn't, that's the next thing you fix.
Tests are part of the feature. Every PR ships with the tests that prove it works. No "will add tests in a follow-up".
The prototypes in personas/ are the UX spec, not a moodboard. The React implementation is a port, not a reinterpretation.
Every closed item ships with an HTML status report. For each delivered feature, module, or roadmap item, write a standalone HTML file under reports/ that describes what was done and how to test it. The file is written by whoever closes the item and linked from reports/index.html. No exceptions — this is the paper trail for review and handoff. Reports match the visual language of the landing page (Tailwind CDN, Inter font, clean cards) so they can be shared with non-engineers.

1. Stack decisions (locked)¶

1.1 Backend — Python¶

Concern	Choice	Why
Runtime	Python 3.12+	Pattern matching, performance wins, the best typing story in the Python 3 line, supported until 2028. Drop support when <5% of users.
Package manager	uv	Rust-based, ~10-100× faster than pip/poetry, single-tool for venv + install + lockfile + tool runs. The clear future.
Web framework	FastAPI	Async-first, automatic OpenAPI generation (needed to generate the TypeScript API client), Pydantic-native, excellent DX.
Agent loop	Homegrown (≤500 LOC) on top of the Anthropic + OpenAI SDKs via litellm	The loop is the product. We do not want to fight LangGraph/pydantic-ai abstractions when we tune prompts and error handling. We adopt a framework piece only when we find ourselves re-implementing it.
MCP	Official `mcp` Python SDK (as a client / host)	Standard. Supports both stdio and streamable HTTP transports. Every connector we ship and every community one is an MCP server.
LLM SDK abstraction	litellm (SDK only, not proxy) with direct Anthropic + OpenAI for hot paths	One provider interface for BYOK. Hot paths (planner, verdict) use the direct SDK. We never run the LiteLLM proxy in-process — it has known 2026 production issues. Pin `litellm` to a known-good version; install with hash verification. See `ARCHITECTURE.md §21` for the full rationale.
ORM	SQLAlchemy 2.x (async)	Async is non-negotiable for an I/O-heavy service. SQLAlchemy 2 is the mature, typed choice. No SQLModel — it's a thinner wrapper that we don't need.
Migrations	Alembic	Paired with SQLAlchemy. Boring, reliable.
Primary DB	SQLite (default) / Postgres (opt-in)	SQLite for single-node installs and first-run demos — zero external deps. Postgres via a one-line config change for scale. Same SQLAlchemy models.
Vector store	sqlite-vec (default) / pgvector (Postgres)	Matches the DB choice. sqlite-vec is maintained and production-capable; pgvector is the standard for Postgres. No external Qdrant/Pinecone at v0.1.
Full-text search	SQLite FTS5 / Postgres tsvector	Hybrid retrieval = BM25-ish + vector. Native to each DB, no extra service.
Memory graph	`MemoryGraph` protocol with `SQLiteMemoryGraph` default + opt-in `PostgresAGEMemoryGraph` + opt-in `FalkorDBLiteMemoryGraph`	Long-term agent memory is graph-shaped from day one (nodes + typed edges + POLE+O-style labels). Default backend is two SQLite tables with recursive CTEs — zero ops cost. Users who want graph-native from day one can opt into Apache AGE (Postgres extension, openCypher, recommended) or FalkorDBLite (embedded Cypher, Beta embedding on a production-stable engine) via `docker compose --profile graph-age` or `GABY_MEMORY_BACKEND=falkor-lite`. All three backends implement the same 11-method protocol; Iter 4 ships the 3-backend round-trip test that proves the migration path. See `ARCHITECTURE.md §22` for the full design.
Background jobs	arq (Redis-backed async) with a local in-process fallback	arq is async-native and minimal. For the "docker compose up" founder install, we fall back to an in-process worker so Redis isn't required until scale demands it.
Config & secrets	pydantic-settings + pluggable secret providers (env / file / Vault / AWS SM / GCP SM)	Types for free; one interface per provider.
Logging	structlog with JSON renderer	Structured JSON to stdout. 12-factor. Works with every log aggregator.
Tracing	OpenTelemetry SDK with OTLP exporter	Turn on by env var. Every tool call and LLM call is a span. Non-negotiable for the SRE persona later.
Metrics	`prometheus-client` with `/metrics` endpoint	Standard. No external push gateway.
Linter / formatter	ruff (both) + mypy --strict	ruff replaces black + isort + flake8 + many plugins, in one Rust-fast tool. mypy strict on the core package.
Test runner	pytest + pytest-asyncio + pytest-cov + hypothesis	Standard Python testing. Hypothesis for property-based tests on anything safety-critical.
HTTP mocking	respx	Async-compatible, integrates cleanly with httpx (which litellm uses).
Time / clock	`time-machine` in tests	Deterministic time; faster than freezegun.

Why not…¶

LangGraph / pydantic-ai / smolagents. They each have real strengths, but the investigation loop is the soul of the product and we need to iterate on prompts, tool-call retry semantics, and error recovery without a framework's opinions in the way. We ship our own 300-500 line loop. If a year in we find we are re-implementing langgraph.checkpoint, we adopt that specific piece.
Django. Gaby is an API-first service with a React frontend. FastAPI is the better fit; Django Admin is not something we want to maintain.
Go. Strong single-binary story but adds a second language and a second hiring pipeline. Docker Compose gives us "5-minute install" without it. We can add a Go CLI wrapper in v0.5+ if the install friction demands it.
Node for the backend. The Python AI ecosystem (tokenization, evals, RAG tooling, connector SDKs) is meaningfully ahead of Node's. Every AI team that has switched has switched toward Python, not away.

1.2 Frontend — TypeScript / React¶

Concern	Choice	Why
Framework	Vite + React 19 + React Router 7	No SSR complexity. The admin UI is a SPA served as static assets from the backend. Vite build is fast, HMR is instant.
Language	TypeScript strict	Non-negotiable. Any `any` requires an inline justification.
Styling	Tailwind CSS 4	Matches `shared/styles.css` in the prototypes. No migration tax.
Component primitives	shadcn/ui	Copy-paste Radix components, fully themable, matches the aesthetic we already shipped. Not a dependency — code we own.
Server state	TanStack Query v5	Cache, refetch, optimistic updates. The category winner.
Client state	Zustand	For the small handful of cases that aren't server state (wizard step, theme, transient UI).
Forms	React Hook Form + Zod	Zod schemas can be shared with the API client for end-to-end typing.
API client	Auto-generated from OpenAPI via `openapi-typescript`	FastAPI emits OpenAPI; we codegen the TS client. Adding a backend route never requires a manual frontend type update.
Charts	Recharts	Simple, well-known, good enough for dashboards.
Icons	Custom SVG set (port of `shared/icons.js`) + lucide-react for generic icons	Keep brand identity; reuse lucide for everything generic.
Unit tests	Vitest + React Testing Library	Vitest shares the Vite config; RTL is the standard.
E2E tests	Playwright (already in `tests/`)	Reuse the existing harness and page-object pattern.
Linter / formatter	Biome (primary) + ESLint for `react-hooks` only	Biome is ~25× faster and covers ~80% of ESLint rules. The gap is type-aware rules — specifically `eslint-plugin-react-hooks`, which Biome does not yet replicate. We run Biome on every file and keep a minimal ESLint config only for the React Hooks plugin until Biome closes that gap. Biome is the canonical formatter.
Package manager	pnpm	Disk-efficient, fast, handles monorepos natively.

Why not…¶

Next.js. We don't need SSR for an admin UI; we do want trivial Docker packaging. Next pulls in a server runtime we don't want inside the Docker image. Vite wins on simplicity.
shadcn+Radix alternatives (Chakra, MUI, Mantine). They're fine frameworks but they own the theme. shadcn leaves us in control of every pixel — and our prototypes already have a distinctive look we don't want to lose.
ESLint + Prettier. Biome is simply faster and one tool. ESLint is still the fallback if Biome lacks a rule we need.

1.3 Cross-cutting¶

Concern	Choice
Version control	Git + GitHub
Commit signing	DCO (`git commit -s`) — not a full CLA
Issue tracker	GitHub Issues
Docs site	MkDocs Material (Python-native, fast build, great search)
API docs	Auto from FastAPI's OpenAPI → rendered in MkDocs via `mkdocs-swagger-ui-tag`
Changelog	`CHANGELOG.md` hand-edited, enforced on release PRs
Release automation	GitHub Actions → Docker Hub / GHCR + PyPI + npm (for widget) + Helm chart repo
Container registry	GHCR primary, Docker Hub mirror
License headers in source	SPDX short form (`# SPDX-License-Identifier: Apache-2.0`) in every file

2. Repository layout¶

gaby/                                       # the repo root (name matches the product)
├── README.md                                # "what is this / 5-minute install"
├── SPEC.md                                  # (exists) product spec
├── FOUNDATION.md                            # (this document) foundation plan
├── ARCHITECTURE.md                          # (next) detailed technical architecture
├── ROADMAP.md                               # (next) dated roadmap across the 4 personas
├── CHANGELOG.md                             # (next) user-facing changes per release
├── LICENSE                                  # Apache 2.0
├── LICENSE-EE                               # commercial enterprise edition license
├── TRADEMARK.md
├── CONTRIBUTING.md
├── CODE_OF_CONDUCT.md
├── SECURITY.md                              # vuln disclosure
│
├── BUSINESS.md                              # OSS-vs-commercial position, kept honest
├── docs/                                    # MkDocs Material site — canonical marketing
│                                            # + narrative docs, deployed to gaby.skycloak.io
├── personas/                                # (exists) persona prototypes — canonical UX spec
│   ├── founder/
│   ├── msp/
│   ├── sre/
│   └── support-lead/
├── shared/                                  # (exists) prototype shared assets
│
├── backend/                                 # Python service
│   ├── pyproject.toml                       # single source of truth for deps, tools, build
│   ├── uv.lock
│   ├── README.md                            # "how to run the backend locally"
│   ├── src/gaby/
│   │   ├── __init__.py
│   │   ├── __main__.py                      # `python -m gaby`
│   │   ├── cli.py                           # `gaby` entry point (Typer)
│   │   │
│   │   ├── api/                             # FastAPI app
│   │   │   ├── app.py                       # FastAPI() instance + wiring
│   │   │   ├── deps.py                      # dependency-injection helpers
│   │   │   ├── middleware.py                # auth, CSRF, request-id, structlog binding
│   │   │   ├── errors.py                    # typed exception → HTTP mapping
│   │   │   └── routers/
│   │   │       ├── health.py                # /health /ready /metrics
│   │   │       ├── onboarding.py            # wizard-driven setup
│   │   │       ├── tickets.py
│   │   │       ├── investigations.py
│   │   │       ├── connectors.py
│   │   │       ├── knowledge.py
│   │   │       ├── chat.py                  # widget + operator console backend
│   │   │       ├── settings.py
│   │   │       └── admin.py
│   │   │
│   │   ├── agent/                           # THE investigation loop
│   │   │   ├── loop.py                      # run_investigation(ticket) → Investigation
│   │   │   ├── planner.py                   # decides next tool / next action
│   │   │   ├── memory.py                    # working memory within one investigation
│   │   │   ├── verdict.py                   # classify auto-resolved / needs-tech / etc.
│   │   │   ├── prompts/                     # versioned prompt templates (just .md files)
│   │   │   │   ├── planner.md
│   │   │   │   ├── tool_selector.md
│   │   │   │   ├── summarizer.md
│   │   │   │   └── verdict.md
│   │   │   └── safety_check.py              # scope enforcement *before* any action
│   │   │
│   │   ├── connectors/                      # MCP host + connector framework
│   │   │   ├── base.py                      # Connector abstract + Scope / Action types
│   │   │   ├── registry.py                  # load/persist connector configs
│   │   │   ├── mcp_host.py                  # spawn + supervise MCP server subprocesses
│   │   │   ├── mcp_client.py                # thin wrapper around the official SDK client
│   │   │   ├── catalog.yaml                 # first-party connector catalog (metadata only)
│   │   │   └── builtin/                     # connectors we ship in-tree for v0.1
│   │   │       ├── postgresql.py
│   │   │       ├── keycloak.py
│   │   │       └── zoho_desk.py
│   │   │
│   │   ├── knowledge/
│   │   │   ├── ingest.py                    # fs walker, git puller, url crawler, pdf reader
│   │   │   ├── chunker.py                   # token-aware Markdown/code chunking
│   │   │   ├── embeddings.py                # pluggable embedding provider
│   │   │   ├── store.py                     # write to vector + FTS indices
│   │   │   ├── retrieve.py                  # hybrid BM25 + vector, with citations
│   │   │   └── citations.py                 # "where did this claim come from?" helpers
│   │   │
│   │   ├── ticketing/                       # help desk adapters (source + sink)
│   │   │   ├── base.py                      # TicketSource + TicketSink interfaces
│   │   │   ├── zoho_desk.py                 # v0.1 canonical adapter
│   │   │   ├── linear.py
│   │   │   ├── github_issues.py
│   │   │   └── email_inbox.py
│   │   │
│   │   ├── llm/
│   │   │   ├── provider.py                  # Protocol: chat, stream, tool_call
│   │   │   ├── anthropic.py
│   │   │   ├── openai.py
│   │   │   ├── litellm_gateway.py           # fallback for any OpenAI-compatible endpoint
│   │   │   ├── budget.py                    # per-investigation token & dollar ceilings
│   │   │   ├── router.py                    # cheap-model for classification / big-model for verdict
│   │   │   └── cache.py                     # prompt-cache friendly hashing
│   │   │
│   │   ├── safety/                          # the machine will not wreck production
│   │   │   ├── authz.py                     # scope evaluation
│   │   │   ├── audit.py                     # append-only hash-chained log
│   │   │   ├── approvals.py                 # the approval queue
│   │   │   ├── scopes.py                    # scope DSL (read/write/dry_run)
│   │   │   └── redaction.py                 # PII redaction before sending to LLM
│   │   │
│   │   ├── chat/                            # human chat surface
│   │   │   ├── sessions.py
│   │   │   ├── widget.py                    # end-user widget backend
│   │   │   ├── slack.py                     # Slack app
│   │   │   ├── teams.py                     # Microsoft Teams app
│   │   │   └── handoff.py                   # Gaby → human takeover
│   │   │
│   │   ├── storage/
│   │   │   ├── db.py                        # engine, session factory, init
│   │   │   ├── encryption.py                # symmetric envelope for sensitive columns
│   │   │   └── models/                      # one file per aggregate
│   │   │       ├── workspace.py
│   │   │       ├── user.py
│   │   │       ├── connector.py
│   │   │       ├── knowledge.py
│   │   │       ├── ticket.py
│   │   │       ├── investigation.py
│   │   │       ├── action.py
│   │   │       ├── approval.py
│   │   │       ├── audit.py
│   │   │       ├── llm_call.py
│   │   │       └── chat.py
│   │   │
│   │   ├── observability/
│   │   │   ├── logging.py
│   │   │   ├── tracing.py
│   │   │   └── metrics.py
│   │   │
│   │   ├── workers/
│   │   │   ├── runner.py                    # in-process vs arq dispatcher
│   │   │   ├── ticket_poller.py             # poll help desks
│   │   │   ├── investigation_worker.py      # run the agent loop
│   │   │   └── summary_mailer.py            # nightly email for founder persona
│   │   │
│   │   ├── events.py                        # internal event bus (pub/sub)
│   │   └── config.py                        # the single Settings object
│   │
│   └── tests/
│       ├── conftest.py
│       ├── unit/                            # mirrors src/gaby/ structure 1:1
│       ├── integration/                     # marked @pytest.mark.integration
│       ├── contract/                        # connector contract tests
│       ├── property/                        # hypothesis tests
│       └── fixtures/
│           ├── tickets/
│           ├── docs/                        # small KB corpora
│           └── llm_transcripts/             # recorded LLM responses for deterministic replay
│
├── web/                                     # React admin UI
│   ├── package.json
│   ├── pnpm-lock.yaml
│   ├── vite.config.ts
│   ├── tsconfig.json
│   ├── biome.json
│   ├── tailwind.config.ts                   # imports tokens from ../shared/styles.css
│   ├── README.md
│   ├── index.html
│   ├── src/
│   │   ├── main.tsx
│   │   ├── App.tsx
│   │   ├── router.tsx
│   │   ├── routes/                          # one folder per top-level route
│   │   │   ├── onboarding/
│   │   │   ├── dashboard/
│   │   │   ├── investigation/
│   │   │   ├── tickets/
│   │   │   ├── connectors/
│   │   │   ├── knowledge/
│   │   │   ├── chat-console/                # operator view of the widget
│   │   │   └── settings/
│   │   ├── components/
│   │   │   ├── ui/                          # shadcn primitives (button, dialog, etc.)
│   │   │   └── domain/                      # TicketRow, TimelineStep, StatCard, Sidebar...
│   │   ├── lib/
│   │   │   ├── api.ts                       # generated OpenAPI client
│   │   │   ├── query.ts                     # TanStack Query setup
│   │   │   ├── auth.ts
│   │   │   └── theme.ts                     # persona palette (indigo/violet/emerald/sky)
│   │   ├── hooks/
│   │   ├── styles/globals.css
│   │   └── icons/                           # port of shared/icons.js as .tsx
│   ├── public/
│   └── tests/
│       ├── unit/                            # Vitest
│       └── components/
│
├── widget/                                  # embeddable end-user chat widget
│   ├── package.json
│   ├── vite.config.ts                       # library mode; outputs a single JS bundle
│   ├── README.md                            # "how to drop the snippet into any site"
│   ├── src/
│   │   ├── index.ts                         # entry: window.Gaby.init({...})
│   │   ├── widget.tsx                       # React root mounted into a shadow DOM
│   │   ├── api.ts                           # talks to /api/chat
│   │   └── styles.css                       # scoped to shadow DOM
│   └── dist/                                # built bundle shipped to npm + CDN
│
├── connectors/                              # first-party MCP servers (separately publishable)
│   ├── README.md                            # how to build a connector
│   ├── _contract/                           # the contract tests every connector must pass
│   ├── postgres/
│   ├── keycloak/
│   ├── stripe/
│   ├── zoho-desk/
│   ├── m365/
│   ├── entra-id/
│   ├── ninjaone/
│   ├── halopsa/
│   ├── kubernetes/
│   └── datadog/
│
├── docs/                                    # MkDocs Material
│   ├── mkdocs.yml
│   ├── index.md
│   ├── quickstart/
│   │   ├── docker-compose.md
│   │   ├── helm.md
│   │   └── managed-cloud.md
│   ├── concepts/
│   │   ├── agent-loop.md
│   │   ├── connectors.md
│   │   ├── knowledge.md
│   │   ├── safety.md
│   │   └── autonomy-levels.md
│   ├── connectors/                          # one page per connector
│   ├── personas/                            # how to set up Gaby for each persona
│   ├── operations/
│   │   ├── observability.md
│   │   ├── backups.md
│   │   ├── upgrades.md
│   │   └── security.md
│   ├── reference/
│   │   ├── api.md                           # OpenAPI rendered
│   │   ├── cli.md
│   │   └── config.md
│   └── contributing/
│       ├── dev-setup.md
│       ├── testing.md
│       └── connector-authoring.md
│
├── ops/                                     # deployment artifacts
│   ├── docker/
│   │   ├── Dockerfile.backend
│   │   ├── Dockerfile.web
│   │   ├── Dockerfile.widget
│   │   └── docker-compose.yml               # canonical v0.1 install path
│   ├── helm/
│   │   └── gaby/                            # chart
│   └── profiles/
│       ├── founder-quickstart/              # smallest, friendliest install
│       ├── msp-multiworkspace/
│       └── sre-readonly/
│
├── .github/
│   ├── workflows/
│   │   ├── backend-ci.yml
│   │   ├── web-ci.yml
│   │   ├── widget-ci.yml
│   │   ├── e2e.yml
│   │   ├── release.yml
│   │   └── docs.yml
│   ├── ISSUE_TEMPLATE/
│   └── PULL_REQUEST_TEMPLATE.md
│
└── scripts/
    ├── dev.sh                               # runs backend + web concurrently
    ├── seed-dev-data.py                     # loads fixture tickets + docs
    ├── gen-api-client.sh                    # OpenAPI → TS client
    └── mint-dev-secrets.py

Why this structure¶

Monorepo, not multiple repos. Three apps (backend/, web/, widget/) stay in lockstep; the alternative is broken releases and drift.
personas/ stays at the root as the canonical UX prototypes (per SPEC.md Section 4). The repo-root index.html was retired in v0.3.1 — docs/index.md (deployed to https://gaby.skycloak.io) is now the canonical marketing surface.
connectors/ is a sibling of backend/, not a sub-package. Each connector is its own MCP server, publishable to PyPI independently. The backend depends on MCP as a protocol, not on connector implementations.
Tests live next to the code they test (backend/tests/, web/tests/), not in a top-level tests/ folder — except for the existing tests/ harness for the prototype Playwright tests, which stays until the React UI replaces the HTML prototypes.
docs/ is MkDocs-managed, not a grab-bag of loose Markdown. One mkdocs.yml, deployable as a static site.

3. Data model core¶

These entities carry us from v0.1 (Founder) to v0.4 (SRE) without schema rewrites. The MSP persona's multi-workspace is baked in by having a workspace_id column on every row from day one, even though v0.1 uses a single hard-coded "default" workspace.

Aggregate	Key fields	Notes
workspaces	`id`, `name`, `plan`, `compliance_profile`, `residency_region`, `created_at`	Single "default" workspace in v0.1. Everything else joins on this.
users	`id`, `workspace_id`, `email`, `role` (admin/agent/viewer), `password_hash`, `disabled`	Operators of Gaby, not end-users. RBAC defaults to three roles; custom roles are EE.
api_keys	`id`, `workspace_id`, `user_id?`, `prefix`, `hash`, `scopes`, `expires_at`	For CLI and machine access.
sessions	`id`, `user_id`, `expires_at`, `csrf_token`, `operator_notes` (jsonb)	HTTP cookie sessions for the web UI. `operator_notes` holds medium-term, session-scoped notes (e.g. "operator just approved X — don't re-prompt for the rest of this session").
connectors	`id`, `workspace_id`, `kind`, `name`, `config_encrypted`, `status`, `last_health_check`, `scopes`, `autonomy_level`	`kind` = `postgresql` / `m365` / `zoho_desk` / … ; `scopes` is a JSON scope DSL; `autonomy_level` ∈ {investigate, propose, act}.
connector_events	`id`, `connector_id`, `ts`, `kind`, `payload`	Healthchecks, auth failures, permission denials.
knowledge_sources	`id`, `workspace_id`, `kind` (git / confluence / dir / url), `locator`, `config`, `last_sync`	Where to ingest from.
documents	`id`, `workspace_id`, `source_id`, `uri`, `title`, `content_hash`, `last_ingested_at`	One row per source document (a runbook, a PDF, a Confluence page).
document_chunks	`id`, `document_id`, `workspace_id`, `ordinal`, `text`, `token_count`, `embedding`, `fts`	The retrievable unit. Embedding column uses sqlite-vec or pgvector.
tickets	`id`, `workspace_id`, `source_id`, `external_id`, `title`, `body`, `customer`, `priority`, `status`, `sla_at`, `created_at`	Canonical form — every help-desk adapter maps to this.
investigations	`id`, `workspace_id`, `ticket_id`, `started_at`, `finished_at`, `verdict`, `summary`, `token_cost`, `dollar_cost`	One per ticket. Verdict ∈ {auto_resolved, needs_tech, needs_l2, needs_client, investigating, failed}.
investigation_steps	`id`, `investigation_id`, `ordinal`, `system`, `action`, `detail`, `type` (read/query/action/verify/verdict), `ts`	Exactly matches `renderTimelineStep` in the prototypes.
actions	`id`, `investigation_id`, `connector_id`, `scope`, `payload`, `dry_run`, `result`, `status`, `applied_at`, `rolled_back_at`	Every write Gaby does is recorded here — so is every dry-run shadow.
approvals	`id`, `action_id`, `requested_at`, `decided_at`, `decided_by`, `decision`, `reason`	Drives the approval queue for `propose` autonomy.
audit_log	`id`, `workspace_id`, `ts`, `actor_kind` (user/agent/system), `actor_id`, `event`, `payload`, `prev_hash`, `hash`	Append-only, hash-chained. Exportable to SIEM (EE).
llm_calls	`id`, `investigation_id?`, `purpose`, `model`, `prompt_hash`, `prompt_tokens`, `completion_tokens`, `latency_ms`, `cost`, `cached`	Cost dashboard + prompt debugging.
chat_sessions	`id`, `workspace_id`, `channel` (widget/slack/teams), `external_user_id`, `started_at`, `handed_off_at?`	The human-chat surface.
chat_messages	`id`, `session_id`, `role` (user/gaby/operator), `content`, `attachments`, `ts`
escalations	`id`, `ticket_id?`, `session_id?`, `channel` (slack/teams/pagerduty/email), `sent_at`, `acknowledged_at?`	Per-persona escalation routing.
kb_candidates	`id`, `workspace_id`, `staged_from_investigation_id`, `proposed_title`, `proposed_body`, `status` (pending/accepted/rejected/expired), `reviewed_by?`, `reviewed_at?`, `expires_at`, `created_at`	Staging area for auto-resolved investigations that want to be promoted to KB entries. Everything here is provisional until a human reviews. TTL 30 days default; expired rows auto-archive.
memory_nodes	`id`, `workspace_id`, `label` (customer/user/system/connector/ticket/investigation/fact/observation/resolution), `natural_key`, `properties` (jsonb), `provenance` (operator/proposed/imported), `status` (active/provisional/archived), `first_seen_at`, `last_seen_at`, `last_used_at`, `approved_by?`, `approved_at?`	Nodes of the long-term memory graph. See `ARCHITECTURE.md §22` for the full model, the `MemoryGraph` protocol, and the three backend implementations (SQLite default, Postgres+AGE opt-in, FalkorDBLite opt-in). Unique index on `(workspace_id, label, natural_key)`.
memory_edges	`id`, `workspace_id`, `from_node_id`, `to_node_id`, `relation` (7 typed categories), `weight`, `properties` (jsonb), `observed_at`, `decayed_at?`	Edges of the long-term memory graph. Typed relations per `ARCHITECTURE.md §22` (Causal / Solution / Context / Learning / Similarity / Workflow / Quality). Composite indexes on `(workspace_id, from_node_id, relation)` and `(workspace_id, to_node_id, relation)`.

Design rules the models must follow¶

Every row has workspace_id. Even in v0.1. Cheap now, impossible to retrofit later.
Secrets are encrypted at rest. connectors.config_encrypted, any API key-bearing column. Envelope encryption with a data key from the secrets provider.
Nothing ever hard-deletes. Soft-delete with deleted_at. We need audit reconstructibility.
Every sensitive column has a redaction rule. Before an investigation step is sent to an LLM, PII is stripped per the workspace's compliance profile.
UUIDv7 for all primary keys (time-sortable, index-friendly).

4. Test strategy¶

Testing an LLM-driven system is where most AI projects rot. We avoid that by separating the deterministic parts from the LLM-driven parts and testing each on its own terms.

4.1 Layers¶

Layer	Tool	Scope	Where it runs	Budget
Unit	pytest	Pure functions and classes. Chunker, scope evaluator, audit hash chain, prompt builders, retrieval scoring, DB models against in-memory SQLite. No network.	Every PR, pre-commit	<30s full run
Property	hypothesis	Invariants on the critical-path: scope evaluator, audit hash chain, retrieval top-k containment, redaction idempotence.	Every PR	<60s
Integration	pytest + testcontainers	Real SQLite on disk; real Postgres via testcontainers for the Postgres profile. FastAPI TestClient. Real MCP server spawned as a subprocess. LLM calls go to a deterministic mock provider that replays recorded transcripts.	Every PR	<5 min
Connector contract	pytest (shared fixtures)	Every connector (ours + community) must pass: tool-list endpoint, scope declaration, healthcheck, dry-run of each declared action.	On every connector PR	<2 min/connector
End-to-end	Playwright (existing harness)	Full stack in Docker Compose: backend + web + mock connectors + deterministic LLM. Walks through founder onboarding → dashboard → investigation.	Every PR, nightly full sweep	<10 min
Load	k6	API throughput: 100 concurrent investigations, p95 latency, error rate.	Weekly scheduled	15 min
Evals (LLM-specific)	promptfoo (v0.1) → Inspect AI (v0.2+) for full agent evals with tool calls	Fixed corpus of ≥50 tickets with known-good resolutions. Measures auto-resolution rate, citation accuracy, safety-boundary compliance. promptfoo handles prompt regression well; Inspect AI (from UK AISI) is purpose-built for agent evaluation and fits our v0.2 tool-calling tests better.	Manual before every release, automated weekly	30 min

4.2 Making LLM-driven code deterministic for CI¶

The agent loop calls an LLM. We do two things to make that testable:

Transcript replay. Every integration test that exercises the agent uses a "fake LLM provider" that is seeded with a recorded transcript (tool-call sequences plus final text) stored under backend/tests/fixtures/llm_transcripts/. The test asserts on the deterministic behavior (scope checks, DB writes, audit log shape), not the LLM text.
Evals are a separate beast. Evals use real LLM calls against a fixed corpus, run on a schedule (not per-PR), and measure quality metrics like auto-resolution rate and citation accuracy. They gate releases but not PRs.

This separation is the difference between "our tests pass in 3 minutes" and "our tests cost $400/month in OpenAI credits".

4.3 What we refuse to test with mocks¶

The DB schema. Integration tests use a real DB engine, not an in-memory mock. Schema bugs cost 10× to find in prod.
The MCP protocol. Integration tests spawn a real MCP server (a tiny stub is fine) and exchange real messages.
The FastAPI request lifecycle. Integration tests use TestClient, which runs the real middleware stack.

4.4 Test file conventions¶

backend/tests/unit/ mirrors backend/src/gaby/ 1:1. If you touch agent/loop.py, you touch tests/unit/agent/test_loop.py.
Integration tests are in backend/tests/integration/ and are marked @pytest.mark.integration. The default pytest command runs unit + property only; pytest -m integration runs the slower ones.
Every test file starts with from __future__ import annotations.
No shared mutable fixtures across tests — everything is function-scoped or rebuilt per test.

4.5 Coverage¶

Target: 85% line coverage on backend/src/gaby/, enforced in CI. Not because coverage is a quality metric, but because it catches "forgot to wire this up" regressions.
Critical paths are 100%: safety/, audit, scopes, authz. A PR that drops these below 100% is auto-rejected.
Frontend coverage is best-effort; we rely on Playwright e2e for confidence there.

5. Design system — from prototype to real UI¶

The persona prototypes under personas/ are already the spec. Porting them to React is a translation, not a redesign. Here is how.

5.1 The tokens we keep¶

Pulled from shared/styles.css:

Token family	Prototype source	Target
Color palettes	`.hero-gradient-`, `.btn-primary-`, `.selected-`, `.active-` (indigo / violet / emerald / sky)	`web/src/lib/theme.ts` → CSS variables + Tailwind theme extension
Typography	Inter + JetBrains Mono from `index.html`	`tailwind.config.ts` `fontFamily`
Spacing / radius	Tailwind defaults	Unchanged
Shadows	`stat-card`, `feature-card` styles	`shadow-sm/md/lg` tokens in Tailwind
Animations	`float-up`, `slide-in`, `pulse-dot`, `progress-bar`	`tailwindcss-animate` plugin + custom keyframes in `globals.css`

5.2 Domain components to build¶

Each maps to a function in shared/components.js:

Component	Prototype function	Responsibility
`<OnboardingWizard>`	steps in `personas/*/index.html`	Generic 5-6 step wizard container, consumes a config array
`<ProgressDots>`	`renderProgress`	Wizard progress indicator
`<Sidebar>`	`renderSidebar`	Persona-themed nav
`<StatCard>`	`renderStatCard`	Dashboard stat tile
`<TicketRow>`	`renderTicketRow`	One ticket in a queue
`<TicketFilters>`	`filterTickets` logic	Filter buttons over a ticket list
`<InvestigationTimeline>`	`renderTimelineStep`	Full investigation view with timeline steps
`<TimelineStep>`	`renderTimelineStep`	One step in an investigation
`<ConnectorCard>`	`renderServerCard`	A connector in the catalog or settings
`<Toast>`	`showToast`	Transient notification
`<SimulatedInvestigation>`	`simulateInvestigation`	The "live demo" step in onboarding

5.3 Persona theming¶

Persona colors become CSS variables scoped to a wrapper class:

/* web/src/styles/globals.css */
:root[data-persona="founder"]      { --gaby-primary: 99 102 241; /* indigo-500 */ }
:root[data-persona="support-lead"] { --gaby-primary: 139 92 246; /* violet-500 */ }
:root[data-persona="sre"]          { --gaby-primary: 5 150 105;  /* emerald-600 */ }
:root[data-persona="msp"]          { --gaby-primary: 2 132 199;  /* sky-600 */ }

Components then use bg-[rgb(var(--gaby-primary))] / text-[rgb(var(--gaby-primary))]. One set of components, per-persona skin with no code duplication.

5.4 What happens to the prototypes once the React UI ships¶

They stay as static files, served from the repo, and remain the visual spec and marketing.
The React app is available at /app (served by the backend); the landing page stays at /.
When the React UI reaches parity with a persona's prototype, the prototype is marked [reference] in its header and the React app becomes the runtime.
We never break the prototypes. They are easier to share with non-engineers than a running SPA.

5.5 Accessibility baseline¶

All new components are built on Radix primitives (via shadcn), which ship with keyboard navigation and ARIA.
Every interactive element has a data-testid matching the prototype conventions (already observed in the HTML prototypes).
Color contrast meets WCAG AA at minimum; we test with @axe-core/react in dev.
Every form has labels, and error messages are read out via aria-live.

6. Observability, ops, and security baselines (day-1 musts)¶

6.1 Observability (cross-cutting)¶

Signal	Default	Opt-in
Logs	Structured JSON to stdout via structlog	Ship to Loki / Datadog via OTLP
Traces	OpenTelemetry, off by default	`GABY_OTEL_EXPORTER=otlp://...` turns it on
Metrics	Prometheus `/metrics` endpoint always on	Dashboards ship as JSON in `docs/operations/dashboards/`
Status	Built-in `/status` page in the web UI	`/health`, `/ready` for probes

6.2 Security (non-negotiable at v0.1)¶

Control	Implementation
Authenticated API by default	No "open mode" in v0.1. First run mints an admin user.
Secrets at rest	Envelope encryption (AES-GCM) with data keys from the configured secrets provider
CSRF	Session cookie + CSRF token on every state-changing web route
CSP	Strict-ish CSP by default; MkDocs docs are served from a separate origin
Input validation	Pydantic models on every route; reject unknown fields
Rate limits	Per-IP and per-API-key, token-bucket, in Redis or in-process
Dependency scanning	Dependabot + `pip-audit` + `pnpm audit` in CI
SBOM	`syft` in the release workflow; SBOM attached to every Docker image
Vuln disclosure	`SECURITY.md` with a PGP-signed security address

6.3 CI pipelines (GitHub Actions)¶

Workflow	Triggers	Steps	Gate
`backend-ci.yml`	PR + main	uv sync → ruff check → mypy strict → pytest (unit + property) → pytest-cov ≥85%	Blocking
`backend-int.yml`	PR + nightly	`pytest -m integration` with testcontainers Postgres	Blocking
`web-ci.yml`	PR + main	pnpm i → biome check → tsc --noEmit → vitest	Blocking
`widget-ci.yml`	PR + main	pnpm i → biome check → vitest → build size-limit check	Blocking
`e2e.yml`	PR + nightly	docker compose up → Playwright run → upload traces on failure	Blocking (fast suite) / nightly (full)
`connector-ci.yml`	PR touching `connectors/**`	Run contract tests against the touched connectors	Blocking
`docs.yml`	PR + main	MkDocs strict build + broken-link check	Blocking
`release.yml`	tag `v..*`	Build + push Docker images, Helm chart, PyPI wheel, npm widget; publish SBOM; draft release notes	Manual review
`eval.yml`	weekly + manual	Run the LLM eval harness; post a summary comment to a tracking issue	Informational

Cache strategy: uv and pnpm caches keyed on their respective lockfiles.

6.4 Release cadence¶

v0.x: weekly or whenever a meaningful slice is ready. Breaking changes are expected and flagged in CHANGELOG.md.
v1.0: the four personas ship, API is stable, upgrade path is documented. Semver from here.

7. v0.1 exit criteria (Founder persona) — what shipping actually means¶

We ship v0.1 when a technical founder can docker compose up Gaby, connect their stack in under 10 minutes, and receive a Slack DM by the next morning showing a real ticket Gaby auto-resolved overnight.

7.1 In scope for v0.1¶

[ ] Installation: docker compose up works on macOS, Linux, WSL2. SQLite only; no Postgres / Redis required.
[ ] Onboarding wizard: the Founder flow from personas/founder/index.html, fully real in React.
[ ] Connectors shipped: PostgreSQL (read-only + limited write), Keycloak (read-only), Zoho Desk (read + reply).
[ ] Ticket source: Zoho Desk polling adapter. Canonical ticket model in the DB.
[ ] Knowledge ingestion: point at a local ./docs folder; Markdown + PDF.
[ ] Agent loop: Anthropic Claude via litellm, homegrown loop, token + dollar budget per investigation.
[ ] Safety: three autonomy modes; dry-run by default on writes; approval queue for propose.
[ ] Audit log: append-only, hash-chained.
[ ] Web UI: onboarding → dashboard → investigation detail → basic settings.
[ ] Slack escalation: outbound only. Nightly summary email via SMTP.
[ ] Observability: structured logs, /metrics, /health, /ready.
[ ] Docs: quickstart page that matches the install experience; one-page "how Gaby works"; one-page security overview.
[ ] Tests: every layer above passing in CI. 85% backend coverage. 100% on safety/.

7.2 Explicitly NOT in v0.1¶

Multi-workspace mode (it's built into the schema but we serve one default workspace)
Per-client autonomy rules (MSP persona)
Human chat widget (v0.3 at earliest)
Slack / Teams inbound bot (only outbound escalation)
SSO / SAML / SCIM (Enterprise Edition)
Air-gapped install mode (Enterprise Edition)
All connectors not listed above
The Support Lead / SRE / MSP persona wizards
The operator chat console

7.3 Definition of done for the v0.1 release¶

[ ] All in-scope checkboxes above are ticked
[ ] The founder quickstart (docs/quickstart/docker-compose.md) runs green on all three target OSes
[ ] A five-ticket live test against a staging Zoho Desk: at least three of five tickets are auto-resolved, all writes go through the safety layer, the audit log reconstructs the full sequence
[ ] The eval suite (50 fixture tickets) achieves ≥60% auto-resolution with zero safety violations
[ ] CHANGELOG.md has a v0.1.0 entry written by a human
[ ] A tagged release with Docker images on GHCR, Helm chart in the chart repo, and Python wheel on PyPI
[ ] The landing page's "Join Waitlist" button is replaced with "Install" once v0.1 ships

8. Which things we will probably get wrong (so future-us can forgive us)¶

These are the calls I am least confident about. They will likely change. Document them here so the change is expected, not a crisis.

Agent loop as homegrown. If after 3 months we are spending more time on loop mechanics than on prompts and tools, we adopt the matching piece of pydantic-ai or LangGraph. The loop's public interface is small enough to swap.
arq vs Celery. arq is simpler but newer. If we hit production pain (worker supervision, retries, visibility), we switch to Celery before v0.5.
sqlite-vec vs pgvector lock-in. We abstract the vector operations behind a tiny interface so swapping is a day of work, not a week.
Widget in a shadow DOM. Theoretically clean, practically has quirks (fonts, CSP). If it becomes a maintenance sink, we fall back to an iframe.
Keeping the HTML prototypes as reference after the React UI ships. This might create drift. Mitigation: a CI check that both prototype and React page use the same data-testid set.
In-process worker vs external. Great for v0.1 demos, but "Gaby is slow" will probably mean "the worker is blocking the API". v0.2 promotes arq+Redis to default in Compose.

9. What this document is not¶

Not the detailed technical architecture. That is ARCHITECTURE.md — sequence diagrams, exact class contracts, concurrency model, scaling notes. It follows this document.
Not a dated roadmap. That is ROADMAP.md — v0.1 through v1.0 with estimates and milestones.
Not a contributor guide. That is CONTRIBUTING.md — dev setup, branch rules, code review rituals.
Not a product spec. That is SPEC.md — the what and the why.

Read in order: SPEC.md → FOUNDATION.md (this) → ARCHITECTURE.md → ROADMAP.md.

10. Appendix — decisions at a glance¶

Area	Choice
Backend lang	Python 3.12+
Backend framework	FastAPI
Agent loop	Homegrown, ≤500 LOC
LLM abstraction	litellm (BYOK) + direct Anthropic/OpenAI (hot paths)
MCP	Official Python SDK
ORM	SQLAlchemy 2.x async + Alembic
DB default	SQLite (+ sqlite-vec + FTS5)
DB opt-in	Postgres (+ pgvector + tsvector)
Background jobs	arq (Redis) with in-process fallback
Package manager	uv
Linter	ruff + mypy --strict
Frontend	Vite + React 19 + RR7 + TS strict
Styling	Tailwind 4 + shadcn/ui
Server state	TanStack Query
Client state	Zustand
Forms	React Hook Form + Zod
API client	`openapi-typescript` generated from FastAPI OpenAPI
Widget	Vite library mode → shadow DOM React
Frontend lint	Biome
Frontend pm	pnpm
Tests (BE)	pytest + hypothesis + testcontainers + respx
Tests (FE)	Vitest + RTL + Playwright (reuse `tests/`)
Deterministic LLM testing	transcript replay via a fake provider
Evals	promptfoo or homegrown, scheduled not per-PR
Docs	MkDocs Material
Containers	Docker + Compose (v0.1), Helm (v0.2)
Registry	GHCR primary, Docker Hub mirror
License (core)	Apache 2.0
License (EE)	Commercial
Contribution	DCO
Release cadence	Weekly v0.x, monthly from v1.0