Skip to main content

Project: Harness Engineering (Care Flow Orchestration)

Purpose

Create a concrete, verifiable Care Flow orchestration system using harness engineering principles:

  • humans steer (intent + constraints)
  • agents execute (automation + glue)
  • repo/KB is the system of record
  • feedback loops (logs/metrics/tests) make automation reliable

Care Flow scope (from your earlier voice note):

  • monitoring
  • integration
  • provisioning
  • dashboard
  • alerts
  • messaging

Scope

In scope

  • Define Care Flow boundaries, actors, and events
  • Event schemas and idempotent processing
  • Monitoring + alerting pipeline (health, metrics, incidents)
  • Provisioning workflow (device onboarding)
  • Messaging workflow (status, incidents, escalations)
  • Dashboards (operational + management)
  • AI agent runbooks for common incidents

Out of scope (for initial MVP)

  • Full production HA/DR across regions
  • Full multi-tenant RBAC (unless required immediately)

Owner / Team

  • Owner: Arif
  • Team: (assign)

Status

  • Status: Proposed
  • Start date: 2026-02-13
  • Target date: (set)

Architecture

High-level flow

Devices/Gateways → ingestion → event store → rules/alerts → dashboards + messaging

AI orchestration sits alongside:

  • generates/updates execution plans
  • proposes actions
  • runs verified scripts/playbooks

Key artifacts

  • Event schema(s)
  • Runbooks (SOPs)
  • “Check health” scripts
  • “Provision device” scripts

Prereqs

  • Knowledge hub established (Docusaurus KB)
  • Auth + dashboard workflow stable (see harness-engineering-dad)

Inputs

Deliverables

  • Care Flow system map (components + ownership)
  • Event schema (JSON) + examples
  • MVP monitoring + alert rules + notification routing
  • MVP provisioning workflow + verification steps
  • Incident runbooks (top 5 incidents)
  • Dashboard pages/tiles for Care Flow
  • A single “careflow-check.sh” that validates the pipeline end-to-end

Procedure (Step-by-step)

Step 1 — Define boundaries + event taxonomy

  • Devices, gateways, backends
  • Event types: heartbeat, status, error, provision, alert

Step 2 — Define data contracts

  • JSON schema + versioning strategy
  • Idempotency keys

Step 3 — Build MVP ingestion + persistence

  • Minimal reliable storage
  • Replay capability

Step 4 — Alerting + messaging

  • Rules
  • escalation
  • Discord/WhatsApp/email integration as needed

Step 5 — Dashboards

  • Operational dashboard
  • Management summary

Step 6 — AI runbooks + automation

  • Agent reads logs/metrics
  • Proposes action
  • Executes scripts with verification

Config

  • Define environment variables and secrets policy (no secrets in KB)

Verification

  • Synthetic test events appear in dashboards
  • Alerts fire and route correctly
  • Provisioning flow completes end-to-end

Rollback

  • Disable automated actions (manual-only mode)
  • Disable alert rules if noisy

Troubleshooting

  • Missing events → check ingestion + auth
  • Duplicate events → check idempotency

References

Changelog

  • 2026-02-13: Created project plan page.