DocumentationpgBalancer Documentation
Monitoring & Metrics
Configure Prometheus Scraping
pgBalancer exposes Prometheus metrics on the /metrics endpoint:
Prometheus Configuration
# prometheus.yml configuration
global:
scrape_interval: 15s # Scrape every 15 seconds
evaluation_interval: 15s # Evaluate rules every 15 seconds
scrape_configs:
# pgBalancer metrics
- job_name: 'pgbalancer'
static_configs:
- targets:
- 'pgbalancer1.internal:8080'
- 'pgbalancer2.internal:8080'
- 'pgbalancer3.internal:8080'
metrics_path: '/metrics'
scrape_interval: 15s
scrape_timeout: 10s
# Load alert rules
rule_files:
- 'pgbalancer-alerts.yml'Verify Metrics Endpoint
# Test metrics endpoint
curl -s http://localhost:8080/metrics
# Reload Prometheus configuration
curl -X POST http://localhost:9090/-/reloadMonitor Key Metrics
Track critical pgBalancer metrics:
Backend Health Metrics
# Number of backends currently up
sum(pgbalancer_backend_up)
# Backend uptime percentage (last 24 hours)
avg_over_time(pgbalancer_backend_up[24h]) * 100
# Backends currently down (for alerting)
pgbalancer_backend_up == 0Load Distribution Metrics
# Queries per second by backend
rate(pgbalancer_backend_queries_total[5m])
# Total cluster queries per second
sum(rate(pgbalancer_backend_queries_total[5m]))Configure Alert Rules
Set up critical alerts for pgBalancer monitoring:
pgbalancer-alerts.yml
groups:
- name: pgbalancer_alerts
interval: 30s
rules:
# Critical: pgBalancer server down
- alert: PgbalancerDown
expr: pgbalancer_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "pgBalancer server is down"
description: "pgBalancer instance {{ $labels.instance }} is down"
# Critical: Backend node down
- alert: PgbalancerBackendDown
expr: pgbalancer_backend_up == 0
for: 2m
labels:
severity: warning
annotations:
summary: "Backend node {{ $labels.node_id }} is down"
description: "Backend {{ $labels.hostname }}:{{ $labels.port }} (node {{ $labels.node_id }}) has been down for 2 minutes"Grafana Dashboard
Import the pre-built pgBalancer Grafana dashboard:
Import Dashboard
# Download dashboard JSON
wget https://raw.githubusercontent.com/pgElephant/pgbalancer/main/monitoring/grafana/pgbalancer-dashboard.json
# Import via Grafana UI
# 1. Go to: http://localhost:3000/dashboard/import
# 2. Upload pgbalancer-dashboard.json
# 3. Select Prometheus data source
# 4. Click "Import"Alertmanager Notifications
Configure alert notifications via Slack, email, or PagerDuty:
Alertmanager Configuration
# alertmanager.yml
global:
resolve_timeout: 5m
slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
route:
receiver: 'pgbalancer-alerts'
group_by: ['alertname', 'instance']
group_wait: 30s
group_interval: 5m
repeat_interval: 4hMonitoring Best Practices
✓ DO
- • Scrape metrics every 15-30 seconds (balance freshness vs load)
- • Retain metrics for 30+ days for trend analysis
- • Use recording rules for expensive queries in dashboards
- • Configure Alertmanager for critical alerts
- • Test failover and alert pipelines regularly
✗ DON'T
- • Don't scrape faster than 10 seconds (adds unnecessary load)
- • Don't set alert
forduration too low (avoid flapping) - • Don't create high-cardinality metrics (per-connection labels)
- • Don't ignore warning alerts for extended periods
- • Don't rely only on metrics - monitor logs too
Metrics Reference
| Metric | Type | Description |
|---|---|---|
pgbalancer_up | Gauge | Server status (1=up) |
pgbalancer_backend_up | Gauge | Backend status by node_id |
pgbalancer_backend_queries_total | Counter | Total queries per backend |
pgbalancer_pool_utilization_percent | Gauge | Pool utilization (0-100) |