The Architecture of the Alliance: Measure Twice, Cut Once

Category: Engineering / Strategy / Proxmox


Post 001 laid out the design philosophy. Now let's look at how it translates into hardware and provisioning.

Before a single Micro PC was powered on or a single ISO flashed, I architected the Alliance on paper. In my experience as a Senior IT Engineer, the most expensive mistake is building without a blueprint. A cluster isn't a collection of hardware. It's a balanced ecosystem designed for failure and built for scale.


The Alliance: Cluster Overview

PROXMOX VE CLUSTER (3 Nodes)
Corosync Quorum · Kernel 6.17.x-pve

┌───────────────────┬───────────────────┬───────────────────┐
│     NODE-A        │     NODE-B        │     NODE-C        │
│ Millennium Falcon │   CR90 Corvette   │  Gozanti Cruiser  │
│    (FCM2250)      │    (QCM1255)      │  (OptiPlex 7050)  │
├───────────────────┼───────────────────┼───────────────────┤
│ Core Ultra 9      │ Ryzen 7 PRO       │ i7-7700           │
│ 64GB DDR5         │ 64GB DDR5 ECC     │ 32GB DDR4         │
│ 2TB NVMe Gen4     │ 4TB Storage       │ 512GB + 1TB SATA  │
│ RTX 4000 Ada 20GB │                   │ 2.5GbE NIC (mod)  │
├───────────────────┼───────────────────┼───────────────────┤
│ AI/ML Compute     │ Data & Operations │ Network & Security │
│                   │                   │                    │
│ Tantive-III VM    │ PostgreSQL        │ Wazuh SIEM         │
│ · Ollama LLM      │ n8n Automation    │ AdGuard DNS        │
│ · OpenWebUI       │ Authentik SSO     │ Nginx Proxy Mgr    │
│ · ComfyUI         │ InfluxDB          │ UptimeKuma         │
│ · AnythingLLM     │ Grafana           │                    │
│                   │ Vaultwarden       │                    │
│                   │ HomeAssistant     │                    │
│                   │ Homepage          │                    │
└───────────────────┴───────────────────┴───────────────────┘

1. The MFF Strategy: High-Density Efficiency

1. The MFF Strategy: High-Density Efficiency

Micro Form Factor PCs are the backbone of the Alliance. Three units replicate an enterprise-grade cluster within a tiny physical footprint, running 25+ concurrent services on energy-efficient, bargain-sourced hardware.

Node A — Millennium Falcon · FCM2250

Role: AI/ML Compute

Spec Detail
CPU Intel Core Ultra 9
RAM 64GB DDR5
Storage 2TB NVMe Gen4
Special RTX 4000 Ada 20GB (VFIO)

Node B — CR90 Corvette · QCM1255

Role: Data & Operations

Spec Detail
CPU AMD Ryzen 7 PRO
RAM 64GB DDR5 ECC
Storage 4TB ZFS
Special ECC for data integrity

Node C — Gozanti Cruiser · OptiPlex 7050

Role: Network & Security

Spec Detail
CPU Intel i7-7700
RAM 32GB DDR4
Storage 512GB NVMe + 1TB SATA
Special 2.5GbE NIC hardware mod

Each chassis was selected for a specific workload profile. Posts 04, 05, and 06 cover each node's hardware rationale.


2. The Provisioning Protocol: Ventoy & Clean Slate

I use Ventoy as the primary deployment engine. Instead of flashing individual drives per node, Ventoy carries a toolbox of ISOs (Proxmox VE, Debian, recovery utilities) on a single USB stick. ISOs drag directly onto the drive. Provision or re-image a node in minutes.

The Clean Slate Rule: Every node is set to UEFI mode with Secure Boot stripped in BIOS. Using the Ventoy boot menu, the local NVMe is wiped entirely during Proxmox installation. Zero legacy interference, zero ghost files from previous installs.

This matters because I've rebuilt this cluster multiple times. Testing, breaking, starting fresh. Repeatable provisioning means rebuilds are minutes, not hours.


3. Fault Domain Isolation

The core design tenet. Specific roles on specific hardware. A failure in one domain doesn't cascade:

Millennium Falcon (Node-A): Pure Compute. One node, one VM (Tantive-III), one GPU. If the RTX 4000 Ada triggers a PCIe bus stall, the only casualty is AI inference.

CR90 Corvette (Node-B): Data & Operations. Every stateful service: Authentik-ChainCode (identity), Postgres-Archives (databases), n8n (automation), InfluxDB (metrics), Vaultwarden (credentials). All on ECC-protected ZFS.

Gozanti Cruiser (Node-C): Network & Security. Wazuh SIEM, AdGuard DNS, Nginx Proxy Manager, Uptime Kuma. These stay up during compute or data-tier maintenance. Security monitoring never goes dark.

This design was validated during the VFIO lockup on Node-A. The GPU froze the entire host. Identity, automation, SIEM, and observability on Nodes B and C continued without interruption.


4. Corosync & Quorum

The three nodes form a Proxmox VE cluster connected via Corosync quorum on Kernel 6.17.x-pve. This provides real-time cluster health awareness, not just management convenience.

During the VFIO lockup, corosync on Node-B produced the first data point when all local logs on the crashed node were destroyed:

Feb 09 07:55:07 QCM1255 corosync[1374]: [KNET] host: host: 1 has no active links

The 17-second delta between the last Telegraf metric and the corosync alert maps exactly to corosync's token timeout. Useful for calibrating monitoring thresholds.


Next: Post 003.1, The Fleet Manifest maps every service to its node with full resource allocation.