Node-B: The CR90 Corvette, Data Integrity & The Operations Hub
Category: Storage / Identity / Data Operations
Post 04 covered the node that crashed. This is the node that saved the investigation.
When the Millennium Falcon hard-locked with zero local crash artifacts, the only reason a root cause analysis was possible is because the CR90 Corvette (QCM1255) had been collecting Telegraf metrics into InfluxDB at 10-second intervals. The Flux queries that reconstructed the crash timeline, the data that eliminated every software cause: all of it lived on the Corvette's ECC-protected, ZFS-backed storage.
That's not luck. That's the design.
1. Core Specs
| Component | Spec |
|---|---|
| Chassis | QCM1255 (Micro Form Factor) |
| Processor | AMD Ryzen 7 PRO (native ECC support, high multi-threaded efficiency) |
| Memory | 64GB DDR5 ECC |
| Storage | 4TB ZFS |
The "PRO" series is a deliberate choice for ECC support. ECC detects and corrects single-bit memory errors at the hardware level before they corrupt a file during a write operation. Non-negotiable for a node hosting authentication databases, time-series metrics, and encrypted credentials.
2. ZFS: The Alliance Shield
ZFS is the gold standard for data protection:
Self-Healing. Continuous validation against cryptographic checksums. If bit-rot is detected, ZFS repairs the corrupted data from the clean copy on the mirror automatically.
Atomic Snapshots. Instantaneous to create. Allows immediate rollback if a deployment goes sideways.
Why ECC + ZFS matters here specifically: A silent bit-flip in InfluxDB poisons your monitoring history (the same data that just saved a crash investigation). A corrupted row in Postgres-Archives breaks SSO for every service in the fleet. ECC catches what ZFS might miss in RAM. ZFS catches what a standard filesystem would never detect on disk.
3. Workload Distribution
The CR90 Corvette is the reliable backbone: built for sustained operation, carrying critical data and maintaining long-running services.
| Service | Codename | Purpose |
|---|---|---|
| Authentik SSO | Authentik-ChainCode | Identity Core. The Chain Code is the universal digital ID for the Alliance. SSO via OIDC/SAML across 15+ services, MFA enforced (TOTP + WebAuthn). Reduced password surface from 12+ credentials per user to one SSO login. 100% MFA coverage. Full login audit trail forwarded to Wazuh. |
| PostgreSQL | Postgres-Archives | The Jedi Archives. Central relational database for Authentik, n8n, and stateful services. On ECC-protected storage. |
| Redis | Redis-HoloNet | The FTL communication network. In-memory session cache for Authentik. Instant transmission across the fleet. |
| n8n | (Tactical Droid) | API automation. Processes Wazuh security webhooks, pushes alerts to Discord via Admiral Ackbar. Previously included automated IP blocking via OPNsense API. Rebuilding for UDM platform. |
| InfluxDB 2.x | Time-series database storing 10-second telemetry from all nodes. The dataset that made the VFIO lockup forensics possible. | |
| Grafana | TIG Stack visualization. Dashboards being expanded. | |
| Vaultwarden | Self-hosted Bitwarden-compatible password manager. Credentials on ZFS. | |
| Home Assistant | IoT automation on isolated VLAN 30. On Node-B for data persistence, device network fully segmented. | |
| Homepage | Centralized dashboard for fleet-wide service access. |
4. Why Centralize on One Node
Deliberate choice, not default. In earlier iterations, services were distributed more evenly across all three nodes. The problem was inconsistent reliability: a service on ZFS-backed ECC storage had different data integrity guarantees than the same service on standard NVMe.
Consolidating every stateful service on the Corvette means they all share the same ECC + ZFS protection, the same snapshot/rollback capability, the same fault domain. Simplifies backup strategy too: one node to snapshot, one storage pool to protect.
The tradeoff is concentration risk. If Node-B fails, operational services go with it. Acceptable because Node-C's security monitoring and Node-A's compute are independent, and ECC + ZFS makes undetected data loss unlikely. The planned addition of Proxmox Backup Server will add scheduled VM/CT snapshots with retention policies, and offsite encrypted replication is on the roadmap.
Backups are a requirement. Redundancy is a strategy. ECC + ZFS is the trust layer that every other service in the fleet depends on.