Skip to main content
Matthew BobackBackend & Platform Engineer
Private Engagement2026-04 to OngoingSolo

Case Study

Access Control Platform

Cloud-managed physical access control with offline-first site orchestrators and hardware reconciliation.

A commissioned access-control platform that manages card credentials, doors, and access levels across remote sites from a centralized cloud control plane. An on-premise Go orchestrator owns each site's state in SQLite and keeps the physical controllers enforcing access even when the cloud link is down; the cloud holds only a projected read model.

0

Inbound Ports At The Site

2

Transports: Durable + Live

SQLite

Site Source Of Truth

On-Prem

Enforcement Survives WAN Loss

GoSQLiteNATS JetStreamWireGuardSvelteKit 5HonoProtobuf ContractsEmbedded Hardware

Read this first

The cloud is convenient; the doors must work without it.
This system controls physical doors. The writeup is framed around the constraint that matters: every site keeps enforcing access locally even when its link to the cloud is down.

Operators manage people, credentials, doors, and access levels for remote sites from one dashboard, with command outcomes visible per action.

Site enforcement is independent of cloud availability: controllers keep their card tables locally and the orchestrator keeps working against SQLite.

Credential changes survive outages end-to-end: queued durably, applied transactionally, reconciled after reconnect, and projected back with an audit trail.

Overview

Why access control needs an orchestrator.

Physical access control has an unforgiving constraint: the doors must keep working when the network does not. This platform manages people, credentials, doors, and access levels for vendor access controllers — embedded boards speaking a proprietary binary TCP protocol — across remote sites, from a single cloud dashboard.

The architecture follows the constraint. Each site runs a Go orchestrator that owns its state in SQLite and talks to the controllers directly on the LAN. The cloud is a control plane: a Hono API and SvelteKit dashboard over a PostgreSQL read model that is explicitly a projection, never the source of truth. Commands flow down through NATS JetStream over WireGuard; state flows back up through events and compressed batch sync.

Highlights

The parts I would inspect first

Offline-first by construction: the site orchestrator's SQLite database is the authoritative source of truth, and doors keep working when the WAN drops.

Two command transports with different guarantees: durable JetStream delivery for credential changes, fail-fast NATS request/reply for door opens — a queued door-open arriving an hour late is a security hole, not a feature.

A reconcile loop re-applies unresolved credentials to controllers on startup and board reconnect, so hardware converges to desired state after any outage.

Sites connect outbound over WireGuard to reach NATS and MinIO bound to the VPN interface — zero inbound ports at the site.

Dual sync paths: sub-second NATS events for command outcomes and card-swipe activity, plus guaranteed batch entity sync through a compressed MinIO outbox.

Problem

Centralized management, but the doors can't depend on the cloud.

Managing access control across remote sites from vendor desktop software means driving to the site, or worse, exposing controller ports to the internet. A naive cloud rebuild inverts the real requirement: if the cloud database is the source of truth and the WAN drops, nobody gets in — or a revoked card keeps working because the controller never heard about it.

The hardware adds its own constraints. The controllers speak a proprietary binary TCP protocol with no delivery guarantees, no idempotency, and no notion of desired state — just imperative writes. Commands can fail halfway through a multi-controller update, boards reboot, and the link between site and cloud is the least reliable part of the whole system. Different operations also need different failure semantics: a credential change should be queued and retried until applied, but a remote door-open that arrives late must be dropped, not replayed.

Solution

An orchestrator per site that owns the truth and reconciles the hardware.

Each site runs a single Go binary: SQLite as the authoritative store for that site's people, credentials, doors, and access levels; a board manager holding TCP connections to each controller with backoff reconnect; and a NATS session reaching the cloud outbound over WireGuard. Credential changes arrive as durable JetStream commands, are applied transactionally to SQLite, compiled into per-controller policy payloads — door masks, time groups, enable flags — and written to the boards. Door opens use core NATS request/reply with a short timeout: the operator gets 'applied' or 'site offline' within seconds, never a stale replay.

State flows back to the cloud on two parallel paths: real-time NATS events for command outcomes and card-swipe activity, and a compressed JSONL outbox synced through MinIO for guaranteed entity-state delivery. Cloud projectors fold both into the PostgreSQL read model that serves the dashboard. A reconcile loop closes the remaining gap: on startup or board reconnect, every unresolved credential is re-applied until the hardware matches the intended state.

Workflow

Author -> Dispatch -> Reconcile
Operators author credentials and access levels in the cloud dashboard. Commands travel over a durable bus to each site's orchestrator, which owns local state and reconciles the physical controllers against it.
QUEUED
APPLYING
APPLIED
FAILED

commands (durable, per site)

create_person

create_access_level

issue_credential

disable_credential

restore_credential

commands (live, per site)

open_door (request/reply, fail-fast)

events (site → cloud)

command outcomes

card-swipe events

controller inventory

Key Endpoints

POST

/api/credentials/people/:id/credentials

Issue a credential: validated, persisted as queued, dispatched durably to the site.

POST

/api/doors/:id/open

Live door open via NATS request/reply; 200 applied or 504 site offline.

GET

/api/people / /api/doors / /api/access-levels

Read-model queries serving the dashboard, populated by the projectors.

GET

/api/events

Site activity: card swipes, command outcomes, controller status.

POST

/agent/site-bootstrap

Site orchestrator bootstrap channel for configuration handoff.

Architecture

The system shape behind the product.

Three runtime surfaces with strict ownership boundaries: a Hono REST API and SvelteKit 5 dashboard in the cloud over a projected PostgreSQL read model, and a Go site orchestrator on-premise owning SQLite state, board connections, and sync. Protobuf contracts define the command and event shapes shared across the boundary, and workspace dependency rules keep the site runtime importable without any cloud code.

Cloud UI

Cloud UI

SvelteKit 5 dashboard with SSR auth guard

People, doors, credentials, access levels, events

Session cookie forwarded on server-side fetches

Cloud API

Cloud API

Hono REST API with JWT + RBAC middleware

Command dispatch: durable publish or live request

Batch and NATS projectors feeding the read model

Transport

Transport

NATS JetStream: durable per-site command streams

Core NATS request/reply for live door commands

MinIO inbox/outbox: JSONL+zstd entity batches

Network Boundary

Network Boundary

WireGuard tunnel per site, outbound-only

NATS and MinIO bound to the VPN interface

No inbound ports at any site

Site Orchestrator

Site Orchestrator

Go binary with SQLite source of truth

Command executor with policy compilation

Reconcile loop on startup and board reconnect

Hardware

Hardware

Vendor access controllers on the site LAN

Proprietary binary TCP protocol adapter

Card-swipe event streaming back through NATS

Platform Surfaces

What the platform handles

Offline-First Site Authority

core

The site's SQLite database is the source of truth for all site-scoped entities. The cloud holds a projection. A WAN outage degrades management, never enforcement.

Durable Credential Commands

core

Issue, disable, and restore credentials flow through per-site JetStream streams with acknowledgement-based delivery. The cloud answers 202 immediately and projects the outcome when the site reports it.

Fail-Fast Live Commands

security

Remote door-open uses core NATS request/reply with a short timeout and expiry validation. If the site is offline the operator sees that immediately — late commands are dropped, never replayed against a door.

Hardware Reconciliation

core

On startup and board reconnect, the orchestrator queries unresolved credentials, recompiles their policy payloads, and re-applies them through the board adapter until the controllers converge.

Policy Compilation

core

A credential's access level resolves to doors, grouped per controller, and compiles into board-level payloads — door bitmasks, time groups, and enable/blacklist flags — so one logical change fans out correctly across hardware.

Dual-Path State Sync

core

Sub-second NATS events carry command outcomes and card-swipe activity; a compressed JSONL outbox through MinIO guarantees entity-state delivery even across long outages. Cloud projectors fold both into PostgreSQL.

Zero Inbound Site Exposure

security

Sites reach the cloud outbound over WireGuard; NATS and MinIO bind to the VPN interface. Controllers are never reachable from the internet, and the site needs no port forwarding.

RBAC With Schema-Level Tenancy

security

Owner, admin, and user roles gate every route, with membership re-checked from the database per request. Composite (site_id, id) foreign keys make cross-tenant references structurally impossible.

Vendor Protocol Adapter

dx

The board protocol lives in one Go package with a stdlib-only dependency rule, behind an adapter interface — the rest of the system deals in credentials and doors, not wire formats.

Tradeoffs

Decisions shaped by physical consequences.

Why is the site, not the cloud, the source of truth?

The system's one non-negotiable is that access enforcement works during a WAN outage. If the cloud owned the truth, every site would be a cache with invalidation problems and doors that fail closed (or worse, open) on disconnect. Making the site authoritative and the cloud a projection turns the unreliable link into a sync problem instead of a correctness problem.

Cloud-authoritative with site cachingMulti-master replicationDirect cloud-to-controller control

Why two command transports instead of one queue for everything?

Credential changes and door opens have opposite delivery requirements. A credential change must eventually apply, so it rides durable JetStream with retries and outcome events. A door open must apply now or not at all — a queued open replaying after an hour is a physical security failure. Core NATS request/reply with a short timeout and command expiry gives exactly fail-fast semantics.

Single durable queue for all commandsHTTP polling from the siteDirect gRPC calls to the site

Why batch sync through object storage when NATS already streams events?

JetStream is at-least-once and ack-based, but stream retention is bounded and a site can be offline longer than any reasonable retention window. The MinIO outbox — compressed JSONL batches uploaded when connectivity returns — guarantees entity state eventually lands in the cloud regardless of outage length. The two paths cover each other: NATS for latency, the outbox for completeness.

JetStream as the only sync pathPostgres logical replication over VPNPeriodic full-database snapshots

Why WireGuard instead of exposing brokers with TLS and auth?

Controllers and the site orchestrator should never be internet-reachable, and the brokers shouldn't be either. Binding NATS and MinIO to the WireGuard interface means the entire command path exists only inside the VPN, the site needs zero inbound ports, and broker authentication becomes defense in depth rather than the only wall.

Public brokers with mTLSSSH tunnels per siteCloud-initiated connections to sites

Why reconcile from unresolved commands instead of diffing full state?

The controllers can't report their full configuration cheaply, so a state diff would mean trusting a shadow copy anyway. Tracking command resolution in SQLite gives a precise worklist: anything not APPLIED gets recompiled and re-applied on startup or reconnect. Idempotent board writes make repeated application safe.

Full desired-state diff per reconnectPeriodic blind re-push of everythingManual operator-triggered resync

Tech Stack

The pieces doing the work.

Cloud Control Plane

Hono + Node 22

REST API, auth, RBAC, command dispatch, and projection services

SvelteKit 5

Server-rendered operator dashboard with session-guarded routes

PostgreSQL 16

Auth tables plus the projected read model of all site state

Site Runtime

Go 1.25

Site orchestrator binary: executor, board manager, sync, reconcile

SQLite

Authoritative site database with command and sync outbox tables

Protobuf

Shared command and event contracts across the cloud/site boundary

Transport

NATS JetStream

Durable command delivery and real-time event streaming per site

MinIO

Inbox/outbox batch sync with compressed JSONL entity payloads

WireGuard

Outbound-only site connectivity; brokers bound to the VPN interface

Infrastructure

Podman Quadlet

Cloud containers as systemd units on the VPS

systemd (user)

Site orchestrator service with automatic restart

Caddy

TLS termination and routing for the cloud dashboard and API

Challenges

Failure modes I had to design around.

Controllers accept imperative writes only — no desired state, no delivery guarantees, no idempotency at the protocol level.

The executor compiles each logical change into per-controller policy payloads and tracks resolution in SQLite; the reconcile loop re-applies anything unresolved after restarts or reconnects, and board writes are structured so repetition is safe.

A door-open command that arrives late is a security failure, not a retry success.

Live commands bypass the durable stream entirely: core NATS request/reply with a short timeout and expiry validation at the site, so a command either applies within seconds or dies with an explicit site-offline error.

Sites can be offline longer than any message-broker retention window.

Entity state changes are journaled to a SQLite outbox and shipped as compressed JSONL batches through MinIO whenever connectivity returns — guaranteed delivery decoupled from stream retention, with NATS events covering the low-latency path.

A multi-controller credential update can fail halfway through.

Commands persist per-credential resolution state; partial failures resolve as FAILED with the outcome published to the cloud, and the reconciler retries until every affected controller has converged.

Cloud dashboards need fresh data without ever becoming a write path to the site.

Two projector services — a JetStream consumer and a MinIO batch poller — are the only writers of site data into PostgreSQL, and the read model is structurally marked as projection so no API path mutates site entities directly.

Outcomes

What the platform does now.

Operators manage people, credentials, doors, and access levels for remote sites from one dashboard, with command outcomes visible per action.

Site enforcement is independent of cloud availability: controllers keep their card tables locally and the orchestrator keeps working against SQLite.

Credential changes survive outages end-to-end: queued durably, applied transactionally, reconciled after reconnect, and projected back with an audit trail.

Remote door-open behaves like a physical button: immediate success or an explicit site-offline error, never a delayed replay.

Card-swipe activity streams to the cloud events view in near real time over the same outbound tunnel.

The vendor protocol is contained in one adapter package, so supporting different controller hardware means writing one new adapter, not touching the platform.

Next Step

Ask about the implementation

The code is private because it runs a client's physical security, but the architecture is worth a conversation: durable command dispatch, offline-first site state, and hardware reconciliation.