End-to-End Grafana Setup
Monitor Server Performance + Application Performance (Metrics, Logs, Traces)
If you want a single monitoring pane of glass for:
server health (CPU, RAM, disk, network)
application performance (latency, error rates)
centralized logs
distributed traces
Grafana works best when paired with the following stack:
Prometheus → metrics
Node Exporter → Linux server metrics
cAdvisor → container metrics (optional but very useful)
Loki → logs (optional)
Tempo → distributed traces (optional)
OpenTelemetry → application instrumentation (recommended)
This post walks through a practical, production-style setup on Ubuntu using Docker Compose, then shows how to instrument applications for full APM.
What You’ll Get (Architecture)
Server Performance Monitoring
CPU, RAM, load average
Disk usage and disk I/O
Network traffic and errors
System and process health (via exporters)
Application Performance Monitoring (APM)
RED metrics (Rate, Errors, Duration)
Centralized, searchable logs
Distributed traces for request-level root cause analysis
Grafana Dashboards + Alerting
Prebuilt dashboards
Alerts when things break (CPU high, disk full, error spikes)
Architecture Overview
Ubuntu Server
├── Prometheus (metrics)
├── Node Exporter (server metrics)
├── cAdvisor (container metrics)
├── Loki (logs)
├── Tempo (traces)
└── Grafana (visualization + alerting)
Prerequisites
Ubuntu host (VM or bare metal)
Docker & Docker Compose
Ports open (at least locally):
3000 → Grafana
9090 → Prometheus
9100 → Node Exporter
3100 → Loki (optional)
3200 → Tempo (optional)
Install Docker & Docker Compose (skip if already installed)
sudo apt update
sudo apt install -y docker.io docker-compose
sudo systemctl enable docker
sudo systemctl start docker
Step 1: Create the Monitoring Stack with Docker Compose
Create a working directory:
mkdir grafana-monitoring && cd grafana-monitoring
Create docker-compose.yml:
version: '3.8'
services:
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
depends_on:
- prometheus
- loki
- tempo
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus:/etc/prometheus
command:
- "--config.file=/etc/prometheus/prometheus.yml"
node_exporter:
image: prom/node-exporter:latest
ports:
- "9100:9100"
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
ports:
- "8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
loki:
image: grafana/loki:latest
ports:
- "3100:3100"
tempo:
image: grafana/tempo:latest
ports:
- "3200:3200"
- "4317:4317"
- "4318:4318"
volumes:
- ./tempo:/etc/tempo
command: ["-config.file=/etc/tempo/tempo.yaml"]
volumes:
grafana-data:
Step 2: Configure Prometheus Scraping
Create prometheus/prometheus.yml:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus:9090']
- job_name: 'node_exporter'
static_configs:
- targets: ['node_exporter:9100']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
Step 3: Configure Tempo (Optional but Recommended)
Create tempo/tempo.yaml:
auth_enabled: false
server:
http_listen_port: 3200
receivers:
otlp:
protocols:
http:
grpc:
storage:
trace:
backend: local
local:
path: /tmp/tempo
Step 4: Start the Stack
docker-compose up -d
Open:
Grafana → http://YOUR_SERVER_IP:3000 (admin / admin)
Prometheus → http://YOUR_SERVER_IP:9090
Step 5: Add Data Sources in Grafana
In Grafana:
Connections → Data sources → Add data source
Add Prometheus
URL:
http://prometheus:9090
Save & Test
Optional:
Loki →
http://loki:3100Tempo →
http://tempo:3200
Step 6: Import Dashboards (Fastest Value)
In Grafana:
Dashboards → New → Import
Import dashboards for:
Node Exporter (server metrics)
cAdvisor (container metrics)
📸 Suggested screenshots for the blog:
Server overview dashboard
Container overview dashboard
One alert firing (CPU > 90%)
Step 7: Application Performance Monitoring (APM)
A) Metrics (Prometheus)
Expose /metrics from your application and add a scrape target:
- job_name: 'my_app'
static_configs:
- targets: ['app:8081']
Track at minimum:
Request rate (R)
Error rate (E)
Duration / latency (D)
Saturation (queues, DB pools)
B) Traces (OpenTelemetry → Tempo)
Instrument your app using OpenTelemetry SDK and export traces to:
HTTP:
http://YOUR_SERVER_IP:4318gRPC:
grpc://YOUR_SERVER_IP:4317
Grafana then correlates:
Metric spike → trace waterfall → root cause
C) Logs (Loki – Optional)
Run Promtail or Grafana Alloy to ship logs:
/var/log/syslogapplication logs
nginx logs
If you share your log paths and runtime (VM / Docker / K8s), the exact config can be generated.
Step 8: Alerting – What to Alert First
Start simple:
CPU > 90% for 5–10 minutes
Disk free < 10%
Memory pressure
App error rate above baseline
p95 latency above SLO
Grafana alerts can notify via Email, Slack, PagerDuty, etc.
Security Hardening (Important)
Change Grafana admin password immediately
Put Grafana behind Nginx/Caddy with TLS
Restrict Prometheus & exporters to private network
Prefer VPN or zero-trust access
Troubleshooting Checklist
Prometheus → Status → Targets
If Node Exporter is DOWN:
Check port 9100
docker logs node_exporter
If Grafana can’t reach Prometheus:
Use
http://prometheus:9090(container network)
Final Thoughts
This Grafana stack gives you true full-stack observability on a single Ubuntu server:
Metrics for visibility
Logs for context
Traces for root cause analysis
It scales cleanly from a single VM to production systems.
Written from real-world DevOps and production monitoring experience.
Comments
Post a Comment