The Architecture of the Alliance: Measure Twice, Cut Once
Category: Engineering / Strategy / Proxmox
Post 001 laid out the design philosophy. Now let's look at how it translates into hardware and provisioning.
Before a single Micro PC was powered on or a single ISO flashed, I architected the Alliance on paper. In my experience as a Senior IT Engineer, the most expensive mistake is building without a blueprint. A cluster isn't a collection of hardware. It's a balanced ecosystem designed for failure and built for scale.
The Alliance: Cluster Overview
PROXMOX VE CLUSTER (3 Nodes)
Corosync Quorum · Kernel 6.17.x-pve
┌───────────────────┬───────────────────┬───────────────────┐
│ NODE-A │ NODE-B │ NODE-C │
│ Millennium Falcon │ CR90 Corvette │ Gozanti Cruiser │
│ (FCM2250) │ (QCM1255) │ (OptiPlex 7050) │
├───────────────────┼───────────────────┼───────────────────┤
│ Core Ultra 9 │ Ryzen 7 PRO │ i7-7700 │
│ 64GB DDR5 │ 64GB DDR5 ECC │ 32GB DDR4 │
│ 2TB NVMe Gen4 │ 4TB Storage │ 512GB + 1TB SATA │
│ RTX 4000 Ada 20GB │ │ 2.5GbE NIC (mod) │
├───────────────────┼───────────────────┼───────────────────┤
│ AI/ML Compute │ Data & Operations │ Network & Security │
│ │ │ │
│ Tantive-III VM │ PostgreSQL │ Wazuh SIEM │
│ · Ollama LLM │ n8n Automation │ AdGuard DNS │
│ · OpenWebUI │ Authentik SSO │ Nginx Proxy Mgr │
│ · ComfyUI │ InfluxDB │ UptimeKuma │
│ · AnythingLLM │ Grafana │ │
│ │ Vaultwarden │ │
│ │ HomeAssistant │ │
│ │ Homepage │ │
└───────────────────┴───────────────────┴───────────────────┘
1. The MFF Strategy: High-Density Efficiency
1. The MFF Strategy: High-Density Efficiency
Micro Form Factor PCs are the backbone of the Alliance. Three units replicate an enterprise-grade cluster within a tiny physical footprint, running 25+ concurrent services on energy-efficient, bargain-sourced hardware.
Node A — Millennium Falcon · FCM2250
Role: AI/ML Compute
| Spec | Detail |
|---|---|
| CPU | Intel Core Ultra 9 |
| RAM | 64GB DDR5 |
| Storage | 2TB NVMe Gen4 |
| Special | RTX 4000 Ada 20GB (VFIO) |
Node B — CR90 Corvette · QCM1255
Role: Data & Operations
| Spec | Detail |
|---|---|
| CPU | AMD Ryzen 7 PRO |
| RAM | 64GB DDR5 ECC |
| Storage | 4TB ZFS |
| Special | ECC for data integrity |
Node C — Gozanti Cruiser · OptiPlex 7050
Role: Network & Security
| Spec | Detail |
|---|---|
| CPU | Intel i7-7700 |
| RAM | 32GB DDR4 |
| Storage | 512GB NVMe + 1TB SATA |
| Special | 2.5GbE NIC hardware mod |
Each chassis was selected for a specific workload profile. Posts 04, 05, and 06 cover each node's hardware rationale.
2. The Provisioning Protocol: Ventoy & Clean Slate
I use Ventoy as the primary deployment engine. Instead of flashing individual drives per node, Ventoy carries a toolbox of ISOs (Proxmox VE, Debian, recovery utilities) on a single USB stick. ISOs drag directly onto the drive. Provision or re-image a node in minutes.
The Clean Slate Rule: Every node is set to UEFI mode with Secure Boot stripped in BIOS. Using the Ventoy boot menu, the local NVMe is wiped entirely during Proxmox installation. Zero legacy interference, zero ghost files from previous installs.
This matters because I've rebuilt this cluster multiple times. Testing, breaking, starting fresh. Repeatable provisioning means rebuilds are minutes, not hours.
3. Fault Domain Isolation
The core design tenet. Specific roles on specific hardware. A failure in one domain doesn't cascade:
Millennium Falcon (Node-A): Pure Compute. One node, one VM (Tantive-III), one GPU. If the RTX 4000 Ada triggers a PCIe bus stall, the only casualty is AI inference.
CR90 Corvette (Node-B): Data & Operations. Every stateful service: Authentik-ChainCode (identity), Postgres-Archives (databases), n8n (automation), InfluxDB (metrics), Vaultwarden (credentials). All on ECC-protected ZFS.
Gozanti Cruiser (Node-C): Network & Security. Wazuh SIEM, AdGuard DNS, Nginx Proxy Manager, Uptime Kuma. These stay up during compute or data-tier maintenance. Security monitoring never goes dark.
This design was validated during the VFIO lockup on Node-A. The GPU froze the entire host. Identity, automation, SIEM, and observability on Nodes B and C continued without interruption.
4. Corosync & Quorum
The three nodes form a Proxmox VE cluster connected via Corosync quorum on Kernel 6.17.x-pve. This provides real-time cluster health awareness, not just management convenience.
During the VFIO lockup, corosync on Node-B produced the first data point when all local logs on the crashed node were destroyed:
Feb 09 07:55:07 QCM1255 corosync[1374]: [KNET] host: host: 1 has no active links
The 17-second delta between the last Telegraf metric and the corosync alert maps exactly to corosync's token timeout. Useful for calibrating monitoring thresholds.
Next: Post 003.1, The Fleet Manifest maps every service to its node with full resource allocation.