Post 019


Why Uptime Kuma

The TIG stack (Post 017) tells you how healthy your infrastructure is. Uptime Kuma tells you whether your services are actually reachable. Different questions, different tools.

Uptime Kuma runs lightweight HTTP/TCP/ping checks against every service endpoint and fires a Discord webhook the moment something goes down. It's the canary - fast, dumb, and reliable.


Deployment

Uptime Kuma runs on Node-C (Gozanti Cruiser) as a Docker container:

Host:    Node-C (OptiPlex 7050)
IP:      192.168.20.61
Port:    3001
Access:  https://uptime.tima.dev (via NPM)
docker run -d \
  --name uptime-kuma \
  -p 3001:3001 \
  -v uptime-kuma:/app/data \
  --restart unless-stopped \
  louislam/uptime-kuma:1

Monitor Configuration

Every service in the Alliance Fleet gets a monitor. The configuration follows a pattern:

HTTP Monitors (Web Services)

Service URL Interval Expected
Grafana http://192.168.20.40:3000 60s 200
Authentik http://192.168.20.10:9000 60s 200/302
Portainer https://192.168.20.10:9443 60s 200
n8n http://192.168.20.50:5678 60s 200
Vaultwarden http://192.168.20.51:80 60s 200
OpenWebUI http://192.168.20.20:3000 60s 200
Homepage http://192.168.20.60:3000 60s 200
Ghost Blog https://holocron-labs.tima.dev 300s 200
Portfolio https://tima.dev 300s 200

TCP Monitors (Infrastructure)

Service Host Port Interval
InfluxDB 192.168.20.41 8086 60s
PostgreSQL 192.168.20.10 5432 60s
Redis 192.168.20.10 6379 60s
Wazuh Manager 192.168.20.30 1514 60s
Wazuh API 192.168.20.30 55000 60s
Ollama 192.168.20.20 11434 60s

Ping Monitors (Hosts)

Host IP Interval
Node-A (Falcon) 192.168.1.10 60s
Node-B (Corvette) 192.168.1.11 60s
Node-C (Gozanti) 192.168.1.12 60s
UDM Pro 192.168.1.1 60s
AdGuard 192.168.1.4 60s

Discord Webhook Integration - Admiral Ackbar

Uptime Kuma sends alerts to the #uptime-beacon channel in the Alliance Fleet Discord server via webhook. The bot identity is Admiral Ackbar - consistent with the alert bot used by n8n and Wazuh.

Webhook Setup

In Discord: Server Settings → Integrations → Webhooks → New Webhook

Name:    Admiral Ackbar
Channel: #uptime-beacon

Copy the webhook URL, then in Uptime Kuma → Settings → Notifications → Setup Notification:

Type:        Discord
Webhook URL: (paste Discord webhook URL)
Bot Display Name: Admiral Ackbar

Alert Format

When a service goes down, Admiral Ackbar posts:

🔴 [DOWN] Grafana - http://192.168.20.40:3000
Time: 2026-02-15 14:32:00 UTC
Duration: 0s

When it recovers:

🟢 [UP] Grafana - http://192.168.20.40:3000
Time: 2026-02-15 14:35:12 UTC
Duration: 3m 12s

IP Source-of-Truth Document

With 25+ monitors configured, keeping track of which IP belongs to which service becomes its own problem. I maintain an IP source-of-truth document that maps every service to its:

  • Internal IP and port
  • VLAN membership
  • Host node
  • NPM subdomain
  • Monitor type in Uptime Kuma

This document serves double duty: it's the reference for configuring new monitors and the first thing I check during an incident to confirm I'm looking at the right endpoint.


Operational Value

Uptime Kuma catches three categories of issues:

  1. Service crashes - Container exits, application errors. Uptime Kuma detects it in 60 seconds and alerts via Discord.

  2. Network path failures - Inter-VLAN firewall rule changes, NPM misconfigurations, DNS issues. If the HTTP check fails but the TCP/ping check passes, the problem is at the application or proxy layer, not the network.

  3. Silent degradation - Services that respond with 200 but are actually broken (empty dashboards, login pages stuck in redirect loops). For these, Uptime Kuma's keyword matching feature checks that the response body contains expected content.


What Uptime Kuma Doesn't Do

Uptime Kuma is a binary up/down checker. It doesn't tell you:

  • Why a service is slow (that's Grafana + TIG stack)
  • What security events are happening (that's Wazuh)
  • Whether the service is functionally correct (that's application-level testing)

It's intentionally simple. The value is in the speed and reliability of the alert - not the depth of the diagnosis.


Related: Post 017 - TIG Stack Observability | Post 024 - Discord as an Ops Console