Post 019
Why Uptime Kuma
The TIG stack (Post 017) tells you how healthy your infrastructure is. Uptime Kuma tells you whether your services are actually reachable. Different questions, different tools.
Uptime Kuma runs lightweight HTTP/TCP/ping checks against every service endpoint and fires a Discord webhook the moment something goes down. It's the canary - fast, dumb, and reliable.
Deployment
Uptime Kuma runs on Node-C (Gozanti Cruiser) as a Docker container:
Host: Node-C (OptiPlex 7050)
IP: 192.168.20.61
Port: 3001
Access: https://uptime.tima.dev (via NPM)
docker run -d \
--name uptime-kuma \
-p 3001:3001 \
-v uptime-kuma:/app/data \
--restart unless-stopped \
louislam/uptime-kuma:1
Monitor Configuration
Every service in the Alliance Fleet gets a monitor. The configuration follows a pattern:
HTTP Monitors (Web Services)
| Service | URL | Interval | Expected |
|---|---|---|---|
| Grafana | http://192.168.20.40:3000 |
60s | 200 |
| Authentik | http://192.168.20.10:9000 |
60s | 200/302 |
| Portainer | https://192.168.20.10:9443 |
60s | 200 |
| n8n | http://192.168.20.50:5678 |
60s | 200 |
| Vaultwarden | http://192.168.20.51:80 |
60s | 200 |
| OpenWebUI | http://192.168.20.20:3000 |
60s | 200 |
| Homepage | http://192.168.20.60:3000 |
60s | 200 |
| Ghost Blog | https://holocron-labs.tima.dev |
300s | 200 |
| Portfolio | https://tima.dev |
300s | 200 |
TCP Monitors (Infrastructure)
| Service | Host | Port | Interval |
|---|---|---|---|
| InfluxDB | 192.168.20.41 | 8086 | 60s |
| PostgreSQL | 192.168.20.10 | 5432 | 60s |
| Redis | 192.168.20.10 | 6379 | 60s |
| Wazuh Manager | 192.168.20.30 | 1514 | 60s |
| Wazuh API | 192.168.20.30 | 55000 | 60s |
| Ollama | 192.168.20.20 | 11434 | 60s |
Ping Monitors (Hosts)
| Host | IP | Interval |
|---|---|---|
| Node-A (Falcon) | 192.168.1.10 | 60s |
| Node-B (Corvette) | 192.168.1.11 | 60s |
| Node-C (Gozanti) | 192.168.1.12 | 60s |
| UDM Pro | 192.168.1.1 | 60s |
| AdGuard | 192.168.1.4 | 60s |
Discord Webhook Integration - Admiral Ackbar
Uptime Kuma sends alerts to the #uptime-beacon channel in the Alliance Fleet Discord server via webhook. The bot identity is Admiral Ackbar - consistent with the alert bot used by n8n and Wazuh.
Webhook Setup
In Discord: Server Settings → Integrations → Webhooks → New Webhook
Name: Admiral Ackbar
Channel: #uptime-beacon
Copy the webhook URL, then in Uptime Kuma → Settings → Notifications → Setup Notification:
Type: Discord
Webhook URL: (paste Discord webhook URL)
Bot Display Name: Admiral Ackbar
Alert Format
When a service goes down, Admiral Ackbar posts:
🔴 [DOWN] Grafana - http://192.168.20.40:3000
Time: 2026-02-15 14:32:00 UTC
Duration: 0s
When it recovers:
🟢 [UP] Grafana - http://192.168.20.40:3000
Time: 2026-02-15 14:35:12 UTC
Duration: 3m 12s
IP Source-of-Truth Document
With 25+ monitors configured, keeping track of which IP belongs to which service becomes its own problem. I maintain an IP source-of-truth document that maps every service to its:
- Internal IP and port
- VLAN membership
- Host node
- NPM subdomain
- Monitor type in Uptime Kuma
This document serves double duty: it's the reference for configuring new monitors and the first thing I check during an incident to confirm I'm looking at the right endpoint.
Operational Value
Uptime Kuma catches three categories of issues:
-
Service crashes - Container exits, application errors. Uptime Kuma detects it in 60 seconds and alerts via Discord.
-
Network path failures - Inter-VLAN firewall rule changes, NPM misconfigurations, DNS issues. If the HTTP check fails but the TCP/ping check passes, the problem is at the application or proxy layer, not the network.
-
Silent degradation - Services that respond with 200 but are actually broken (empty dashboards, login pages stuck in redirect loops). For these, Uptime Kuma's keyword matching feature checks that the response body contains expected content.
What Uptime Kuma Doesn't Do
Uptime Kuma is a binary up/down checker. It doesn't tell you:
- Why a service is slow (that's Grafana + TIG stack)
- What security events are happening (that's Wazuh)
- Whether the service is functionally correct (that's application-level testing)
It's intentionally simple. The value is in the speed and reliability of the alert - not the depth of the diagnosis.
Related: Post 017 - TIG Stack Observability | Post 024 - Discord as an Ops Console