2025-11-05
Monitoring and Observability Setup
Logs, metrics, and alerts matter, but fifty dashboards nobody opens do not. A straight-shooting setup for knowing what broke, where, and who should wake up.
7 min read
Monitoring and Observability Setup
Observability is three things: logs tell you what happened in words, metrics tell you how much and how fast, alerts wake someone up when a human needs to act. Fancy tools do not replace clear signals. This is how we set up production systems so incidents are shorter and less mysterious.
Logging
Use structured logs (JSON) with stable fields: time, level, message, request id, tenant or user id when safe. Plain strings alone are hard to search at scale.
Use levels honestly: error for failures, warn for degradation, info for business-relevant events, debug for local troubleshooting. Turn debug off in prod.
Never log passwords, full tokens, or unnecessary PII. Use correlation ids when you need to stitch requests without exposing raw data.
Ship logs to one place (Loki, Datadog, CloudWatch, ELK, etc.) so you can search across services in one query.
Metrics
Export RED-style basics where they apply: request rate, errors, duration histograms. Add CPU, memory, and queue depth for infrastructure. Add business metrics (signups, orders) when product cares about them on the same dashboards as uptime.
Name metrics consistently (http_requests_total, not reqs in one service and requests in another). Dashboards should answer “is this service healthy?” in one glance, with links to runbooks.
Alerting
Alert on symptoms users feel: error rate spikes, latency SLOs missed, availability drops. Paging on every CPU blip creates alert fatigue; people start ignoring everything.
Every alert should have a runbook: what it means, how to confirm, how to fix or escalate. On-call rotation and ownership should be obvious before the pager fires.
Summary
Structure logs, centralize them, expose a small set of meaningful metrics, and page humans only when action is required. We help teams wire this up for their stack without boiling the ocean.
Cogent Softwares, DevOps and production readiness.