Node-B: The CR90 Corvette, Data Integrity & The Operations Hub

Category: Storage / Identity / Data Operations


Post 04 covered the node that crashed. This is the node that saved the investigation.

When the Millennium Falcon hard-locked with zero local crash artifacts, the only reason a root cause analysis was possible is because the CR90 Corvette (QCM1255) had been collecting Telegraf metrics into InfluxDB at 10-second intervals. The Flux queries that reconstructed the crash timeline, the data that eliminated every software cause: all of it lived on the Corvette's ECC-protected, ZFS-backed storage.

That's not luck. That's the design.


1. Core Specs

Component Spec
Chassis QCM1255 (Micro Form Factor)
Processor AMD Ryzen 7 PRO (native ECC support, high multi-threaded efficiency)
Memory 64GB DDR5 ECC
Storage 4TB ZFS

The "PRO" series is a deliberate choice for ECC support. ECC detects and corrects single-bit memory errors at the hardware level before they corrupt a file during a write operation. Non-negotiable for a node hosting authentication databases, time-series metrics, and encrypted credentials.


2. ZFS: The Alliance Shield

ZFS is the gold standard for data protection:

Self-Healing. Continuous validation against cryptographic checksums. If bit-rot is detected, ZFS repairs the corrupted data from the clean copy on the mirror automatically.

Atomic Snapshots. Instantaneous to create. Allows immediate rollback if a deployment goes sideways.

Why ECC + ZFS matters here specifically: A silent bit-flip in InfluxDB poisons your monitoring history (the same data that just saved a crash investigation). A corrupted row in Postgres-Archives breaks SSO for every service in the fleet. ECC catches what ZFS might miss in RAM. ZFS catches what a standard filesystem would never detect on disk.


3. Workload Distribution

The CR90 Corvette is the reliable backbone: built for sustained operation, carrying critical data and maintaining long-running services.

Service Codename Purpose
Authentik SSO Authentik-ChainCode Identity Core. The Chain Code is the universal digital ID for the Alliance. SSO via OIDC/SAML across 15+ services, MFA enforced (TOTP + WebAuthn). Reduced password surface from 12+ credentials per user to one SSO login. 100% MFA coverage. Full login audit trail forwarded to Wazuh.
PostgreSQL Postgres-Archives The Jedi Archives. Central relational database for Authentik, n8n, and stateful services. On ECC-protected storage.
Redis Redis-HoloNet The FTL communication network. In-memory session cache for Authentik. Instant transmission across the fleet.
n8n (Tactical Droid) API automation. Processes Wazuh security webhooks, pushes alerts to Discord via Admiral Ackbar. Previously included automated IP blocking via OPNsense API. Rebuilding for UDM platform.
InfluxDB 2.x Time-series database storing 10-second telemetry from all nodes. The dataset that made the VFIO lockup forensics possible.
Grafana TIG Stack visualization. Dashboards being expanded.
Vaultwarden Self-hosted Bitwarden-compatible password manager. Credentials on ZFS.
Home Assistant IoT automation on isolated VLAN 30. On Node-B for data persistence, device network fully segmented.
Homepage Centralized dashboard for fleet-wide service access.

4. Why Centralize on One Node

Deliberate choice, not default. In earlier iterations, services were distributed more evenly across all three nodes. The problem was inconsistent reliability: a service on ZFS-backed ECC storage had different data integrity guarantees than the same service on standard NVMe.

Consolidating every stateful service on the Corvette means they all share the same ECC + ZFS protection, the same snapshot/rollback capability, the same fault domain. Simplifies backup strategy too: one node to snapshot, one storage pool to protect.

The tradeoff is concentration risk. If Node-B fails, operational services go with it. Acceptable because Node-C's security monitoring and Node-A's compute are independent, and ECC + ZFS makes undetected data loss unlikely. The planned addition of Proxmox Backup Server will add scheduled VM/CT snapshots with retention policies, and offsite encrypted replication is on the roadmap.


Backups are a requirement. Redundancy is a strategy. ECC + ZFS is the trust layer that every other service in the fleet depends on.

Next: Post 06, Node-C: The Gozanti Cruiser