Project Post 002

How I built a complete security monitoring pipeline across a 3-node Proxmox homelab, from Wazuh agent enrollment to real-time Discord alerts, and the silent HTTP 400 bug that almost killed the whole thing.

The Alliance Fleet runs 25+ services across three Proxmox nodes, four VLANs, and a mix of VMs and LXC containers. For months, security monitoring consisted of me occasionally SSH-ing into a host and grepping through journals when something felt off. No centralized logging. No file integrity monitoring. No alerting. Individual hosts logged locally, and some, like Node-A running log2ram, stored journals entirely in RAM. An unexpected reboot meant those logs were gone.

The VFIO lockup incident on Node-A in February 2026 made the cost of that gap concrete. The node suffered a complete hard lockup with zero local crash artifacts: no kernel panic, no pstore dump, no journal entries. The only reason I could reconstruct what happened was because externally-stored InfluxDB telemetry gave me a second-by-second performance timeline. If that Telegraf pipeline hadn't existed, the incident would have been completely unrecoverable.

That was the wake-up call. I needed a SIEM, and I needed it routed to where I actually work: Discord.

The Pipeline Architecture

The end-to-end flow looks like this:

Wazuh Agents (×10)
  └─► Wazuh Manager (LXC 110 · 192.168.20.30 · VLAN 20)
        └─► Rule engine evaluates events against 3,000+ built-in rules
              └─► Alerts at level 5+ trigger the integration module
                    └─► slack.py builds Slack-compatible JSON payload
                          └─► Discord /slack webhook endpoint
                                └─► #alert-triage with severity color coding

Wazuh runs as an all-in-one deployment (manager, indexer, and dashboard) on LXC 110 on Node-C (Gozanti Cruiser). Agents on every host in the fleet ship security events back to the manager over port 1514, with enrollment on 1515. The manager evaluates those events against its rule engine, and anything that hits level 5 or above gets routed through the Slack-compatible integration module to Discord.

Alerts land in #alert-triage with severity color mapping: green for level 4 and below, yellow for 5–7, red for 8+. Each embed includes MITRE ATT&CK technique tags when available, so I can see at a glance whether I'm looking at a brute-force attempt or a privilege escalation.

Why Wazuh

Wazuh won on three criteria that disqualified everything else at homelab scale.

Over Splunk: cost. Splunk's free tier caps at 500MB/day with no alerting. The Alliance Fleet generates more than that during active scan periods. Splunk is the industry standard, but its per-GB pricing is designed for enterprises with security budgets.

Over ELK Stack: operational overhead. ELK is more flexible for custom log parsing, but Wazuh ships with 3,000+ detection rules out of the box and native file integrity monitoring. For a homelab where I'm not building a custom SIEM from scratch, Wazuh delivers more security value per hour invested.

Over cloud SIEM (Sentinel, Datadog): no cloud dependency. The entire detection and response pipeline runs on-prem. An internet outage shouldn't disable threat response, and homelab telemetry shouldn't leave the cluster.

What tipped the scale: Wazuh's Slack-compatible webhook integration slots directly into the Discord pipeline that already powers fleet operations. Detection to alert in under three seconds, landing in a channel I actually monitor.

The Seven-Hour Installation Lesson

The manual install path was a trap. Seven hours across two attempts, installing each component individually on Debian 13, produced a partially functional stack that broke on restart. The indexer worked. The manager sort-of worked. The dashboard never got installed. Debian 13 (Trixie) wasn't officially supported, and the dependency tree had just enough mismatches to make the manual path unreliable.

The fix was humbling: destroy the broken container, create a fresh one, run the community automation script. Twenty minutes later, every component was up. The script pins package versions, handles certificate generation, configures inter-component auth, and tests health at each step.

The manual attempts weren't wasted. They taught me how the manager, indexer, and dashboard communicate via mutual TLS, where the config files live, and what breaks when certificates don't match. But the goal was a working SIEM, not a PhD in Wazuh packaging. Know when to stop fighting and use the script.

Enrolling Ten Agents

Agent enrollment used a reusable deployment script saved on OptiPlex7050. SSH into the target host, run the script with a hostname argument, and it installs the DEB amd64 agent package (v4.14.2), sets the manager IP, enables and starts the service.

The fleet now has ten active agents across all three Proxmox hypervisors, the AI workload VM (Tantive-III), the identity platform (Home One), the automation engine (Phoenix-Nest), the Discord bot host (Stinger Mantis), the DMZ reverse proxy (NPM), InfluxDB, and AdGuard DNS. Grafana remains the only pending enrollment.

Two gotchas worth documenting for minimal LXC containers: lsb-release is often missing and needs to be installed before the agent package will configure properly, and if the agent shows a configuration error on first start, the manager address likely wasn't set correctly in ossec.conf. A quick sed replacement and service restart fixes it.

Configuring the Discord Integration

The integration block in the manager's ossec.conf is deceptively simple:

<integration>
  <n>slack</n>
  <hook_url>DISCORD_WEBHOOK_URL/slack</hook_url>
  <level>5</level>
  <alert_format>json</alert_format>
</integration>

The webhook URL must end with /slack to use Discord's Slack-compatible endpoint. The integration module (integratord) starts with the manager and processes alerts through slack.py, which builds a Slack-compatible JSON payload using the attachments format.

One critical lesson learned here: do not add a <group> tag. Wazuh's internal group names (like syslog or authentication_failed) don't match what you'd expect, so adding a group filter silently drops every alert. During initial troubleshooting, integratord was running, alerts were firing at level 7+, but nothing reached Discord. The group filter was rejecting everything with no error logged. Removing it fixed the flow immediately.

The Silent HTTP 400: The slack.py Bug

This was the most frustrating issue in the entire deployment, and the most instructive.

After getting the integration block configured and integratord running, alerts still weren't reaching Discord. No errors in the integration log. No indication anything was wrong. The manager was generating alerts, the integration module was picking them up, and then... nothing. Silence.

The breakthrough came from running slack.py manually with debug output against a test alert file. The response came back: HTTP 400.

The root cause was a single line in /var/ossec/integrations/slack.py:

msg['ts'] = alert['id']

This passes the Wazuh alert ID (something like 1773279793.212108) as a Slack timestamp field. Real Slack accepts this without complaint. Discord's /slack compatibility endpoint is stricter about the ts format and rejects the entire payload. Silently. No error in Wazuh's logs unless you manually enable debug mode.

The fix was one line: comment out msg['ts'] = alert['id']. Restart the manager. Alerts started flowing immediately: rootcheck events, user creation (rule 5902, level 8), PAM failures, SSH events, all rendering in #alert-triage with proper severity colors.

I backed up the original at slack.py.bak before patching. This is the kind of bug that costs hours to find and seconds to fix, and it's worth documenting because anyone running Wazuh-to-Discord will hit it.

Testing the Pipeline

The quickest way to force a visible alert:

sudo useradd wazuh_test && sudo userdel wazuh_test

This fires rule 5902, "New user added to the system," at level 8, well above the level 5 threshold. The alert should appear in Discord within 30 seconds.

For full pipeline validation, I watch three things simultaneously: the alerts log (tail -f /var/ossec/logs/alerts/alerts.json), the integration log (tail -f /var/ossec/logs/integrations.log), and the Discord channel. If alerts appear in the log but not in Discord, it's either the <group> tag issue or the slack.py bug. If they don't appear in the alerts log at all, the agent isn't shipping events or the rule isn't triggering.

Managing the Noise

Rootcheck false positives are the primary noise source. LXC containers on Proxmox expose /dev/.lxc/* paths that Wazuh's rootcheck module flags as suspicious hidden files. Every scan, every container, every time. The AdGuard agent alone fired rule 510 fifty-five times on the same benign LXC runtime file. Debian 13 (Trixie) also triggers generic trojaned-binary signatures on legitimate system utilities like /bin/chsh and /bin/passwd.

These require custom suppression rules in local_rules.xml to silence without losing real detections. The alternative (raising the integration threshold to level 7 or restricting to specific rule IDs) works as a stopgap but trades coverage for quiet.

Alert noise management is an ongoing process, not a one-time configuration. Every new agent enrolled and every OS update can introduce new false positive patterns. The goal isn't zero noise; it's a signal-to-noise ratio where every Discord notification is worth reading.

Tradeoffs I'm Living With

Node-C is a single point of failure. The Wazuh Manager on LXC 110 has no clustering, no hot standby. If Node-C goes down, all security monitoring goes dark. Acceptable at homelab scale, but it means the security stack has the same HA gap as the services it's protecting.

The level 5 threshold is a judgment call. Level 3–4 events (informational but potentially relevant) are only visible in the Wazuh dashboard, which realistically gets checked less often than Discord. Lowering the threshold would increase noise; raising it would miss early warning signals.

Wazuh doesn't do behavioral baselining. Every detection rule is hand-written. No ML-driven correlation, no anomaly detection. It catches what you tell it to catch. For a homelab with predictable traffic patterns, this is manageable. At enterprise scale, it's a limitation.

The community is smaller. Troubleshooting edge cases sometimes means reading source code rather than finding a Stack Overflow answer. The slack.py ts field bug is a perfect example: nowhere in the documentation, nowhere in the forums, discovered only through manual debugging.

What This Enables

With ten agents reporting, custom suppression rules tuning the noise floor, and alerts flowing to a channel I check throughout the day, the Alliance Fleet went from zero security visibility to fleet-wide detection coverage. Failed SSH attempts, file integrity changes on critical configs, privilege escalation, new user creation. Events that were previously invisible now generate actionable alerts within seconds.

The pipeline also serves as the foundation for the next phase: 57 custom detection rules organized into a five-phase deployment plan covering SSH hardening, identity platform monitoring, network segmentation validation, and more. Each phase builds on this base infrastructure: agents enrolled, alerts flowing, Discord integration stable.

The seven-hour install failure, the silent group filter bug, the slack.py timestamp rejection. Each one was a wall. But the result is a security monitoring pipeline that runs 24/7, costs nothing beyond the hardware it sits on, and tells me what's happening across the fleet before I have to go looking.

This post is part of the Alliance Fleet infrastructure series on Holocron Labs. It maps to blog posts 008 (Wazuh deployment script), 017 (Wazuh → Discord integration), and 030 (false positive reduction). The full Wazuh project documentation, including PROBLEM, TRADEOFFS, and IMPLEMENTATION files, lives in the security-monitoring GitHub repo.