Node-A: The Millennium Falcon, High-Performance Compute & AI

Category: Hardware / Artificial Intelligence / Virtualization


At the end of Post 3.2 I promised we'd start with the node that crashed.

The Millennium Falcon (FCM2250) is the Alliance's heavy hitter. Fast, heavily armed (GPU), and needs modifications that keep you on your toes (passthrough config). In the current architecture it has one job: GPU-accelerated AI/ML inference. All operational services have been moved off to ensure zero contention with the GPU workload.


1. Core Specs

Component Spec
Chassis FCM2250 (Micro Form Factor)
Processor Intel Core Ultra 9 (hybrid P-core/E-core architecture)
Memory 64GB DDR5
Storage 2TB NVMe Gen4
Accelerator NVIDIA RTX 4000 SFF Ada (20GB ECC VRAM)

2. GPU Passthrough: VFIO and the Tantive-III

The defining feature of Node-A is PCIe passthrough via VFIO/IOMMU. Proxmox hands the RTX 4000 SFF Ada directly to a dedicated VM: Tantive-III (the diplomatic consular ship, focused on translating complex data into human language).

20GB of ECC VRAM in a single-slot, low-power form factor. Enough to run 70B parameter models with near-native performance inside an MFF chassis.

Benchmarks:

  • 50 tok/s on Llama3:8b
  • 500+ document RAG corpus
  • <3s query latency with retrieval

Tantive-III's crew:

Service Codename Role
Ollama LLM inference engine, multiple model sizes across 20GB VRAM
OpenWebUI Browser-based chat interface for Ollama
ComfyUI Node-based Stable Diffusion workflows via CUDA
AnythingLLM AnythingLLM-C3PO RAG platform. Fluent in six million forms of communication. Manages the 500+ doc corpus with sub-3s retrieval.

3. The VFIO Lockup

On February 9, 2026, Node-A hard-locked. Fans spinning, LEDs lit, completely unresponsive to network or console input. Required a physical hard reset.

Full writeup: VFIO Lockup Forensics

The problem. The node ran log2ram, which holds system logs in RAM. Instantaneous failure prevented a disk sync. Nine days of journal data, irrecoverably lost. No kernel panic, no pstore dump, no dmesg evidence.

The pivot. Used the TIG Stack on the CR90 Corvette (Node-B), covered in Post 3.2, to reconstruct a second-by-second timeline from externally-stored Telegraf metrics. System was idle at the moment of failure. Every software cause eliminated.

Root cause. PCIe bus stall from the NVIDIA GPU under VFIO passthrough. The GPU hangs, the PCIe bus locks, the entire host freezes without triggering any kernel panic or MCE because the CPU can't execute the crash handler while the bus is stalled.

Mitigations applied:

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_aspm=off pci=noaer"
Parameter Purpose
pcie_aspm=off Disables PCIe Active State Power Management. Prevents GPU power state transitions that trigger bus stalls.
pci=noaer Disables Advanced Error Reporting handling. Prevents AER recovery attempts from stalling the bus further.

log2ram disabled. Future crashes preserve journal data on disk.

Node-A has been stable since.


4. Why a Dedicated AI Node

In previous iterations, Node-A also hosted n8n, Home Assistant, and Uptime Kuma alongside the GPU workload. The VFIO lockup validated stripping it down to a single purpose.

If the GPU stalls again, the only casualty is AI inference. Authentik-ChainCode on Node-B keeps authenticating. Wazuh on Node-C keeps ingesting logs. n8n keeps processing alerts. Admiral Ackbar keeps posting to Discord. Fault domain isolation doing its job, not in theory, in a real incident.


5. Future-Proofing

The Ultra 9 and DDR5 give this node overhead to absorb the next several years of model size increases without a chassis swap. The "Measure Twice, Cut Once" philosophy in hardware form.

When the mission requires raw power, the Falcon is the first ship out of the hangar.

Next: Post 05, Node-B: The CR90 Corvette