API Contract Validation (Master Plan)
This SOP defines how we validate API contracts across the home/office stack (Portal/Dash/Auth/Agents/MCP/Device services) so we can move services between environments (home → office) with minimal surprises.
Scope (current stack)
- Web-edge (VM103 / office web-edge): Caddy, Authentik, Portal, Docs, Dash FE, agents-orchestrator, second-brain backend, vm-health, sensors web.
- Core-services (VM102 / office core): MCP gateway.
- LoRaWAN (VM101): ChirpStack stack + UDP/1700.
Contract rules (non-negotiable)
R1) Version every API
- Every JSON API path must include a version segment:
/v1/... - Breaking changes require
/v2/....
R2) Machine-validated schemas
For each endpoint we maintain:
- Request schema (JSON)
- Response schema (JSON)
- Example payloads (minimal + typical)
Recommended schema options:
- Zod (Node) or Pydantic (Python)
- OpenAPI generated from code when possible
Known rule (Agents SDK / Zod): Optional fields that may be absent must be modeled as nullable when required by structured outputs tooling (we previously fixed a crash by changing .optional() → .nullable()).
R3) Strict JSON shape at edges
At the edge boundary (Caddy → upstream), responses must be:
Content-Type: application/jsonfor JSON endpoints- deterministic keys
- stable error envelope
R4) Error envelope
All JSON APIs should return:
{
"ok": false,
"error": {
"code": "string",
"message": "string",
"details": {}
}
}
And success:
{ "ok": true, "data": { } }
R5) Idempotency
- Any ingest endpoint that may be retried must support an idempotency key (header or field) and/or accept duplicates safely.
Validation layers (what we test)
L1 — Unit validation (in code)
- Parse + validate payload with schema.
- Reject unknown keys where possible.
L2 — Contract tests (local)
- Use a small battery of curl tests and JSON schema checks.
- Minimum requirement: a script that runs on the server and returns PASS/FAIL.
L3 — Integration (edge → upstream)
- Verify via the public domain behind Authentik (SSO protected) and via LAN IP.
- Ensure no endpoint relies on private IPs from the browser when used remotely.
L4 — Observability
- Log request id + short reason on schema failures.
- No secrets in logs.
Canonical contracts (current)
A) Portal health API (vm-health)
Purpose: A single, SSO-protected API that Portal can call from any network.
Endpoint:
GET /api/vm-health/v1/status(public, behind Authentik)
Minimal success example:
{
"ok": true,
"data": {
"updatedAt": 1739999999,
"services": {
"caddy": {"ok": true, "detail": "active"},
"authentik": {"ok": true, "detail": "healthy"}
}
}
}
Minimal error example:
{
"ok": false,
"error": {
"code": "not_found",
"message": "unknown path",
"details": {}
}
}
B) Agents-orchestrator
Purpose: internal agent workflows, proxied under Portal.
Public path (behind Authentik):
GET /agent/*→127.0.0.1:8090
Contract requirement: add a simple health endpoint:
GET /health→{ "ok": true }
C) Second-brain backend
Purpose: A2UI/AG-UI backend services.
LAN/internal:
GET http://10.10.9.103:8000/health
Contract requirement: expose an SSO-protected proxy path (so remote Portal can check it) e.g.
GET /api/health/second-brain→ server-side check to127.0.0.1:8000/health
D) MCP gateway
Purpose: internal tool gateway.
LAN/internal:
http://10.10.9.104:8787
Contract requirement:
GET /health→{ "ok": true, "version": "..." }- Do NOT expose directly to public internet; if needed, proxy via Portal as
/api/health/mcp.
E) Sensors web (RAK3112)
Purpose: demo UI + serial parsing.
Public path (behind Authentik):
GET /sensors/(UI)GET /sensors/v1/events(SSE)POST /sensors/v1/serial/start{port, baud}
Sample parse input line (serial):
sensor=sgp40 fields=tvoc_ppb:12,raw:345 err=0
Small-data “contract seed” dataset
Use this tiny dataset to validate end-to-end JSON shapes without real hardware:
{
"sensorSample": {
"ts": 1739999999.123,
"sensor": "sgp40",
"ok": true,
"fields": {"tvoc_ppb": 12, "raw": 345},
"err": 0,
"raw": "sensor=sgp40 fields=tvoc_ppb:12,raw:345 err=0"
},
"health": {"ok": true, "data": {"status": "ok"}}
}
Data-flow (minimal) — Mermaid
flowchart LR
U[User Browser] -->|HTTPS| C[Caddy (edge)]
C -->|forward_auth| OP[Authentik Proxy Outpost]
OP --> AK[Authentik Server]
C --> P[Portal static]
P -->|fetch JSON| VH[/vm-health API\n/api/vm-health/v1/status/]
VH -->|check| SB[Second-brain backend\n:8000 /health]
VH -->|check| AO[Agents-orchestrator\n:8090 /health]
VH -->|check| MCP[MCP gateway\n10.10.9.104:8787 /health]
C --> SENS[Sensors UI\n/sensors/*]
SENS -->|SSE| U
State-machine (minimal) — Mermaid
This models service health at the Portal boundary (what users experience):
stateDiagram-v2
[*] --> Unavailable
Unavailable --> SSO_Required: portal/dash/docs opened
SSO_Required --> Available: login success + forward_auth allows
SSO_Required --> Unavailable: auth unreachable
Available --> Degraded: upstream health check fails
Degraded --> Available: health restored
Available --> Unavailable: edge down (Caddy) or DNS/NAT failure
Degraded --> Unavailable: multiple upstreams down
Next actions to complete the master plan
- Implement unified health endpoints (server-side, SSO protected):
/api/health/second-brain/api/health/agents/api/health/mcp
- Update Portal UI to only call
/api/...endpoints (never10.10.9.xfrom browser). - Add contract-test script (
curl + jq) that runs on office/home servers and prints PASS/FAIL.