End-to-End Grafana Setup

Monitor Server Performance + Application Performance (Metrics, Logs, Traces)

If you want a single monitoring pane of glass for:

  • server health (CPU, RAM, disk, network)

  • application performance (latency, error rates)

  • centralized logs

  • distributed traces

Grafana works best when paired with the following stack:

  • Prometheus → metrics

  • Node Exporter → Linux server metrics

  • cAdvisor → container metrics (optional but very useful)

  • Loki → logs (optional)

  • Tempo → distributed traces (optional)

  • OpenTelemetry → application instrumentation (recommended)

This post walks through a practical, production-style setup on Ubuntu using Docker Compose, then shows how to instrument applications for full APM.


What You’ll Get (Architecture)

Server Performance Monitoring

  • CPU, RAM, load average

  • Disk usage and disk I/O

  • Network traffic and errors

  • System and process health (via exporters)

Application Performance Monitoring (APM)

  • RED metrics (Rate, Errors, Duration)

  • Centralized, searchable logs

  • Distributed traces for request-level root cause analysis

Grafana Dashboards + Alerting

  • Prebuilt dashboards

  • Alerts when things break (CPU high, disk full, error spikes)


Architecture Overview

Ubuntu Server
  ├── Prometheus  (metrics)
  ├── Node Exporter (server metrics)
  ├── cAdvisor (container metrics)
  ├── Loki (logs)
  ├── Tempo (traces)
  └── Grafana (visualization + alerting)

Prerequisites

  • Ubuntu host (VM or bare metal)

  • Docker & Docker Compose

  • Ports open (at least locally):

    • 3000 → Grafana

    • 9090 → Prometheus

    • 9100 → Node Exporter

    • 3100 → Loki (optional)

    • 3200 → Tempo (optional)

Install Docker & Docker Compose (skip if already installed)

sudo apt update
sudo apt install -y docker.io docker-compose
sudo systemctl enable docker
sudo systemctl start docker

Step 1: Create the Monitoring Stack with Docker Compose

Create a working directory:

mkdir grafana-monitoring && cd grafana-monitoring

Create docker-compose.yml:

version: '3.8'

services:
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    depends_on:
      - prometheus
      - loki
      - tempo

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus:/etc/prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"

  node_exporter:
    image: prom/node-exporter:latest
    ports:
      - "9100:9100"

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    ports:
      - "8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro

  loki:
    image: grafana/loki:latest
    ports:
      - "3100:3100"

  tempo:
    image: grafana/tempo:latest
    ports:
      - "3200:3200"
      - "4317:4317"
      - "4318:4318"
    volumes:
      - ./tempo:/etc/tempo
    command: ["-config.file=/etc/tempo/tempo.yaml"]

volumes:
  grafana-data:

Step 2: Configure Prometheus Scraping

Create prometheus/prometheus.yml:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['prometheus:9090']

  - job_name: 'node_exporter'
    static_configs:
      - targets: ['node_exporter:9100']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

Step 3: Configure Tempo (Optional but Recommended)

Create tempo/tempo.yaml:

auth_enabled: false

server:
  http_listen_port: 3200

receivers:
  otlp:
    protocols:
      http:
      grpc:

storage:
  trace:
    backend: local
    local:
      path: /tmp/tempo

Step 4: Start the Stack

docker-compose up -d

Open:

  • Grafana → http://YOUR_SERVER_IP:3000 (admin / admin)

  • Prometheus → http://YOUR_SERVER_IP:9090


Step 5: Add Data Sources in Grafana

In Grafana:

  1. Connections → Data sources → Add data source

  2. Add Prometheus

    • URL: http://prometheus:9090

  3. Save & Test

Optional:

  • Loki → http://loki:3100

  • Tempo → http://tempo:3200


Step 6: Import Dashboards (Fastest Value)

In Grafana:

  1. Dashboards → New → Import

  2. Import dashboards for:

    • Node Exporter (server metrics)

    • cAdvisor (container metrics)

📸 Suggested screenshots for the blog:

  • Server overview dashboard

  • Container overview dashboard

  • One alert firing (CPU > 90%)


Step 7: Application Performance Monitoring (APM)

A) Metrics (Prometheus)

Expose /metrics from your application and add a scrape target:

- job_name: 'my_app'
  static_configs:
    - targets: ['app:8081']

Track at minimum:

  • Request rate (R)

  • Error rate (E)

  • Duration / latency (D)

  • Saturation (queues, DB pools)


B) Traces (OpenTelemetry → Tempo)

Instrument your app using OpenTelemetry SDK and export traces to:

  • HTTP: http://YOUR_SERVER_IP:4318

  • gRPC: grpc://YOUR_SERVER_IP:4317

Grafana then correlates:

  • Metric spike → trace waterfall → root cause


C) Logs (Loki – Optional)

Run Promtail or Grafana Alloy to ship logs:

  • /var/log/syslog

  • application logs

  • nginx logs

If you share your log paths and runtime (VM / Docker / K8s), the exact config can be generated.


Step 8: Alerting – What to Alert First

Start simple:

  • CPU > 90% for 5–10 minutes

  • Disk free < 10%

  • Memory pressure

  • App error rate above baseline

  • p95 latency above SLO

Grafana alerts can notify via Email, Slack, PagerDuty, etc.


Security Hardening (Important)

  • Change Grafana admin password immediately

  • Put Grafana behind Nginx/Caddy with TLS

  • Restrict Prometheus & exporters to private network

  • Prefer VPN or zero-trust access


Troubleshooting Checklist

  • Prometheus → Status → Targets

  • If Node Exporter is DOWN:

    • Check port 9100

    • docker logs node_exporter

  • If Grafana can’t reach Prometheus:

    • Use http://prometheus:9090 (container network)


Final Thoughts

This Grafana stack gives you true full-stack observability on a single Ubuntu server:

  • Metrics for visibility

  • Logs for context

  • Traces for root cause analysis

It scales cleanly from a single VM to production systems.


Written from real-world DevOps and production monitoring experience.

Comments

Popular posts from this blog

How to Set Up AWS RDS Proxy for Applications in a Private Subnet