Project: VM Rebuild (Local-only) — Auth + Docs/KB + Dashboard + Mattermost
Purpose
Recreate the service VMs for a local-only stack and standardize operations using harness engineering:
- clear VM/service boundaries
- predictable folder/layout conventions
- verification-first runbooks
- mechanical checks (healthcheck scripts)
- small, indexed KB docs as the system of record
Target services:
- Authentik (SSO)
- Docs/KB (Docusaurus)
- Dashboard (Homepage)
- Mattermost (team chat / Discord-like)
- GitLab (source control + issues/CI, local-only)
Edge/proxy notes:
- Existing Nginx Proxy Manager + Cloudflared VM at 10.10.8.100 will remain the edge.
- Arif will manage Cloudflared + NPM configuration; this project focuses on LAN-side VMs and clean, verifiable deployments.
Scope
In scope
- VM plan (IPs, sizing, OS, storage)
- Docker-compose based deployments
- Local-only access policy (LAN, no public exposure)
- Internal DNS naming (optional)
- Backups + restore drills
- SOP/runbook creation in Docusaurus KB
Out of scope
- Public domains/Cloudflare/NAT
- External SSO providers
Owner / Team
- Owner: Arif
- Main orchestrator: Auro (main agent)
- Sub-agents: one per service
Status
- Status: Proposed
- Start date: 2026-02-13
Architecture
Local-only traffic model
LAN client → (optional internal reverse proxy) → services
Topology (confirmed by Arif)
One VM per service, local-only.
-
Existing edge VM: Nginx Proxy Manager + Cloudflared at 10.10.8.100 (managed by Arif)
- NPM host-based routing (Arif will configure)
- Cloudflared tunnel config (Arif will configure)
-
VM 1: Authentik
- Authentik + Postgres + Redis
- LAN-only; firewall allow inbound
9000/tcponly from NPM VM (10.10.8.100) and admin subnet(s)
-
VM 2: Docs/KB
- Docusaurus build + static serve
- Bind service only to VM IP (not 127.0.0.1) so NPM can reach it
-
VM 3: Dashboard
- Homepage
-
VM 4: Mattermost
- Mattermost + Postgres (+ optional MinIO)
-
VM 5: GitLab
- GitLab CE (+ runner later if needed)
Note: Because NPM runs in Docker on a different VM, upstream services must not bind to
127.0.0.1only; they must listen on the VM’s LAN IP or0.0.0.0, and the VM firewall should restrict who can reach the ports.
Prereqs
- Proxmox host reachable on LAN
- Storage configured for VM disks + backups
Inputs
- LAN subnet(s): 10.10.8.0/24 (confirmed)
- Local DNS: recommended (AdGuard/Pi-hole/router DNS overrides)
- SSO requirement: Yes (Authentik in front of Docs + Mattermost + GitLab)
Deliverables
- VM IP plan + sizing
- Compose stacks for each service
- SOP per service:
- install
- configure
- verification
- rollback
- troubleshooting
- Cross-service SOP:
- "Stack Map" (single source of truth)
- backup + restore drill
- Healthcheck scripts per VM
Procedure (Step-by-step)
Phase 0 — Design + inventory
- Confirm required services + local-only constraints
- Choose VM IPs and hostnames
- Choose where persistent data lives
Phase 0.5 — Pilot VM (verification-first) ✅
Before building all service VMs, create one test VM to validate the whole LAN posture:
- Proxmox VM + Debian 12 baseline works
- Static IP works on
10.10.8.0/24 - Docker works
- Firewall allowlist works
- NPM (10.10.8.100) can reach the VM upstream (important: no 127.0.0.1 binding issue)
Pilot service: run a tiny HTTP container on the VM (e.g., nginx:alpine) bound to 0.0.0.0:8080 and allow inbound 8080/tcp only from 10.10.8.100.
Success criteria:
- From NPM VM:
curl -I http://<pilot-vm-ip>:8080returns 200 - From other LAN device: blocked (optional, if you enable firewall deny-all)
- NPM proxy host to this upstream works end-to-end
Phase 1 — Base VM standard
For every VM:
- Debian 12
- static IP
- docker + docker compose
- firewall allowlist (NPM VM + admin subnet)
- folder standard:
/opt/<service>/for compose + persistent data/opt/<service>/scripts/for check/update scripts
Phase 2 — Deploy services (one by one)
Order (recommended):
- Authentik
- Docs/KB
- Mattermost
- GitLab
- Homepage
Phase 3 — Harness engineering hardening
- Add
check-*.shscripts per VM - Add KB "Stack Map" page (ports, IPs, domains, upstreams)
- Add backup/restore drills
Edge VM (10.10.8.100) — useful audit commands (NPM + Cloudflared)
Run these on the NPM VM to understand what’s already listening, what containers exist, and what firewall rules are active.
Network / ports
ip -br a
ip r
# Listening TCP/UDP sockets (what ports are open)
sudo ss -lntup
# Optional (if installed): map ports to processes
sudo lsof -nP -iTCP -sTCP:LISTEN
Docker (NPM / Cloudflared / other services)
docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}'
docker compose ls
# If you know the compose directory for NPM:
# cd /path/to/npm && docker compose ps && docker compose logs -n 200
Logs (common)
# last 200 lines per container
docker logs --tail=200 <container_name>
# follow logs
docker logs -f <container_name>
Firewall rules
If you use UFW:
sudo ufw status verbose
sudo ufw status numbered
If you use nftables:
sudo nft list ruleset | less
If you use iptables legacy:
sudo iptables -S
sudo iptables -L -n -v
System services
systemctl --failed
systemctl status docker --no-pager
sudo journalctl -u docker --no-pager -n 200
Quick connectivity checks (from NPM VM)
# verify NPM VM can reach an upstream
curl -I http://10.10.8.126:8080
# verify DNS resolution if you use local DNS
getent hosts auth.aurbotstem.lan || true
Verification
- All services reachable from LAN only (or only via NPM as intended)
- Auth works for docs + mattermost + gitlab
- Backups succeed and restore test passes
Rollback
- Each service has a
docker compose downrollback - Restore from Proxmox backup snapshots if needed
Troubleshooting
- Add a standard troubleshooting flow:
- DNS → IP reachability → ports → container status → logs
References
- Harness engineering principles:
Changelog
- 2026-02-13: Created project plan.