Node-A: The Millennium Falcon, High-Performance Compute & AI
Category: Hardware / Artificial Intelligence / Virtualization
At the end of Post 3.2 I promised we'd start with the node that crashed.
The Millennium Falcon (FCM2250) is the Alliance's heavy hitter. Fast, heavily armed (GPU), and needs modifications that keep you on your toes (passthrough config). In the current architecture it has one job: GPU-accelerated AI/ML inference. All operational services have been moved off to ensure zero contention with the GPU workload.
1. Core Specs
| Component | Spec |
|---|---|
| Chassis | FCM2250 (Micro Form Factor) |
| Processor | Intel Core Ultra 9 (hybrid P-core/E-core architecture) |
| Memory | 64GB DDR5 |
| Storage | 2TB NVMe Gen4 |
| Accelerator | NVIDIA RTX 4000 SFF Ada (20GB ECC VRAM) |
2. GPU Passthrough: VFIO and the Tantive-III
The defining feature of Node-A is PCIe passthrough via VFIO/IOMMU. Proxmox hands the RTX 4000 SFF Ada directly to a dedicated VM: Tantive-III (the diplomatic consular ship, focused on translating complex data into human language).
20GB of ECC VRAM in a single-slot, low-power form factor. Enough to run 70B parameter models with near-native performance inside an MFF chassis.
Benchmarks:
- 50 tok/s on Llama3:8b
- 500+ document RAG corpus
- <3s query latency with retrieval
Tantive-III's crew:
| Service | Codename | Role |
|---|---|---|
| Ollama | LLM inference engine, multiple model sizes across 20GB VRAM | |
| OpenWebUI | Browser-based chat interface for Ollama | |
| ComfyUI | Node-based Stable Diffusion workflows via CUDA | |
| AnythingLLM | AnythingLLM-C3PO | RAG platform. Fluent in six million forms of communication. Manages the 500+ doc corpus with sub-3s retrieval. |
3. The VFIO Lockup
On February 9, 2026, Node-A hard-locked. Fans spinning, LEDs lit, completely unresponsive to network or console input. Required a physical hard reset.
Full writeup: VFIO Lockup Forensics
The problem. The node ran log2ram, which holds system logs in RAM. Instantaneous failure prevented a disk sync. Nine days of journal data, irrecoverably lost. No kernel panic, no pstore dump, no dmesg evidence.
The pivot. Used the TIG Stack on the CR90 Corvette (Node-B), covered in Post 3.2, to reconstruct a second-by-second timeline from externally-stored Telegraf metrics. System was idle at the moment of failure. Every software cause eliminated.
Root cause. PCIe bus stall from the NVIDIA GPU under VFIO passthrough. The GPU hangs, the PCIe bus locks, the entire host freezes without triggering any kernel panic or MCE because the CPU can't execute the crash handler while the bus is stalled.
Mitigations applied:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_aspm=off pci=noaer"
| Parameter | Purpose |
|---|---|
pcie_aspm=off |
Disables PCIe Active State Power Management. Prevents GPU power state transitions that trigger bus stalls. |
pci=noaer |
Disables Advanced Error Reporting handling. Prevents AER recovery attempts from stalling the bus further. |
log2ram disabled. Future crashes preserve journal data on disk.
Node-A has been stable since.
4. Why a Dedicated AI Node
In previous iterations, Node-A also hosted n8n, Home Assistant, and Uptime Kuma alongside the GPU workload. The VFIO lockup validated stripping it down to a single purpose.
If the GPU stalls again, the only casualty is AI inference. Authentik-ChainCode on Node-B keeps authenticating. Wazuh on Node-C keeps ingesting logs. n8n keeps processing alerts. Admiral Ackbar keeps posting to Discord. Fault domain isolation doing its job, not in theory, in a real incident.
5. Future-Proofing
The Ultra 9 and DDR5 give this node overhead to absorb the next several years of model size increases without a chassis swap. The "Measure Twice, Cut Once" philosophy in hardware form.
When the mission requires raw power, the Falcon is the first ship out of the hangar.