Skip to main content

API Contract Validation (Master Plan)

This SOP defines how we validate API contracts across the home/office stack (Portal/Dash/Auth/Agents/MCP/Device services) so we can move services between environments (home → office) with minimal surprises.

Scope (current stack)

  • Web-edge (VM103 / office web-edge): Caddy, Authentik, Portal, Docs, Dash FE, agents-orchestrator, second-brain backend, vm-health, sensors web.
  • Core-services (VM102 / office core): MCP gateway.
  • LoRaWAN (VM101): ChirpStack stack + UDP/1700.

Contract rules (non-negotiable)

R1) Version every API

  • Every JSON API path must include a version segment: /v1/...
  • Breaking changes require /v2/....

R2) Machine-validated schemas

For each endpoint we maintain:

  • Request schema (JSON)
  • Response schema (JSON)
  • Example payloads (minimal + typical)

Recommended schema options:

  • Zod (Node) or Pydantic (Python)
  • OpenAPI generated from code when possible

Known rule (Agents SDK / Zod): Optional fields that may be absent must be modeled as nullable when required by structured outputs tooling (we previously fixed a crash by changing .optional().nullable()).

R3) Strict JSON shape at edges

At the edge boundary (Caddy → upstream), responses must be:

  • Content-Type: application/json for JSON endpoints
  • deterministic keys
  • stable error envelope

R4) Error envelope

All JSON APIs should return:

{
"ok": false,
"error": {
"code": "string",
"message": "string",
"details": {}
}
}

And success:

{ "ok": true, "data": { } }

R5) Idempotency

  • Any ingest endpoint that may be retried must support an idempotency key (header or field) and/or accept duplicates safely.

Validation layers (what we test)

L1 — Unit validation (in code)

  • Parse + validate payload with schema.
  • Reject unknown keys where possible.

L2 — Contract tests (local)

  • Use a small battery of curl tests and JSON schema checks.
  • Minimum requirement: a script that runs on the server and returns PASS/FAIL.

L3 — Integration (edge → upstream)

  • Verify via the public domain behind Authentik (SSO protected) and via LAN IP.
  • Ensure no endpoint relies on private IPs from the browser when used remotely.

L4 — Observability

  • Log request id + short reason on schema failures.
  • No secrets in logs.

Canonical contracts (current)

A) Portal health API (vm-health)

Purpose: A single, SSO-protected API that Portal can call from any network.

Endpoint:

  • GET /api/vm-health/v1/status (public, behind Authentik)

Minimal success example:

{
"ok": true,
"data": {
"updatedAt": 1739999999,
"services": {
"caddy": {"ok": true, "detail": "active"},
"authentik": {"ok": true, "detail": "healthy"}
}
}
}

Minimal error example:

{
"ok": false,
"error": {
"code": "not_found",
"message": "unknown path",
"details": {}
}
}

B) Agents-orchestrator

Purpose: internal agent workflows, proxied under Portal.

Public path (behind Authentik):

  • GET /agent/*127.0.0.1:8090

Contract requirement: add a simple health endpoint:

  • GET /health{ "ok": true }

C) Second-brain backend

Purpose: A2UI/AG-UI backend services.

LAN/internal:

  • GET http://10.10.9.103:8000/health

Contract requirement: expose an SSO-protected proxy path (so remote Portal can check it) e.g.

  • GET /api/health/second-brain → server-side check to 127.0.0.1:8000/health

D) MCP gateway

Purpose: internal tool gateway.

LAN/internal:

  • http://10.10.9.104:8787

Contract requirement:

  • GET /health{ "ok": true, "version": "..." }
  • Do NOT expose directly to public internet; if needed, proxy via Portal as /api/health/mcp.

E) Sensors web (RAK3112)

Purpose: demo UI + serial parsing.

Public path (behind Authentik):

  • GET /sensors/ (UI)
  • GET /sensors/v1/events (SSE)
  • POST /sensors/v1/serial/start {port, baud}

Sample parse input line (serial):

sensor=sgp40 fields=tvoc_ppb:12,raw:345 err=0

Small-data “contract seed” dataset

Use this tiny dataset to validate end-to-end JSON shapes without real hardware:

{
"sensorSample": {
"ts": 1739999999.123,
"sensor": "sgp40",
"ok": true,
"fields": {"tvoc_ppb": 12, "raw": 345},
"err": 0,
"raw": "sensor=sgp40 fields=tvoc_ppb:12,raw:345 err=0"
},
"health": {"ok": true, "data": {"status": "ok"}}
}

Data-flow (minimal) — Mermaid

flowchart LR
U[User Browser] -->|HTTPS| C[Caddy (edge)]
C -->|forward_auth| OP[Authentik Proxy Outpost]
OP --> AK[Authentik Server]

C --> P[Portal static]
P -->|fetch JSON| VH[/vm-health API\n/api/vm-health/v1/status/]

VH -->|check| SB[Second-brain backend\n:8000 /health]
VH -->|check| AO[Agents-orchestrator\n:8090 /health]
VH -->|check| MCP[MCP gateway\n10.10.9.104:8787 /health]

C --> SENS[Sensors UI\n/sensors/*]
SENS -->|SSE| U

State-machine (minimal) — Mermaid

This models service health at the Portal boundary (what users experience):

stateDiagram-v2
[*] --> Unavailable

Unavailable --> SSO_Required: portal/dash/docs opened
SSO_Required --> Available: login success + forward_auth allows
SSO_Required --> Unavailable: auth unreachable

Available --> Degraded: upstream health check fails
Degraded --> Available: health restored

Available --> Unavailable: edge down (Caddy) or DNS/NAT failure
Degraded --> Unavailable: multiple upstreams down

Next actions to complete the master plan

  1. Implement unified health endpoints (server-side, SSO protected):
    • /api/health/second-brain
    • /api/health/agents
    • /api/health/mcp
  2. Update Portal UI to only call /api/... endpoints (never 10.10.9.x from browser).
  3. Add contract-test script (curl + jq) that runs on office/home servers and prints PASS/FAIL.