DocumentationpgBalancer Documentation

Restore pgBalancer Health

Use the sections below to resolve connectivity issues, saturated pools, failover drift, and alert noise. Each step includes the exact CLI or SQL needed to validate and fix the problem.

Connection Problems

Troubleshoot listeners, TLS, and authentication mappings that block client sessions.

Run connection diagnostics

ss -ltn | grep 6432
journalctl -u pgbalancer -n 100
psql "postgres://appuser:secret@pgbalancer:6432/appdb" -c 'SELECT 1;'

Common fixes

  • Expose the listener on 0.0.0.0 (or the appropriate subnet) when running inside containers.
  • Align TLS expectations between clients and pgBalancer; disable require_client_tls temporarily during debugging.
  • Regenerate credentials with pgbalancer admin users set if authentication fails.

Wait Queues & Saturation

Address exhausted pools and keep query latency predictable.

Inspect pool utilisation

pgbalancer admin pools stats --format table
pgbalancer admin pools resize primary --max-clients 400 --max-servers 50

Check backend database load

SELECT datname,
       numbackends,
       xact_commit,
       blks_read,
       blks_hit
  FROM pg_stat_database
 ORDER BY numbackends DESC;

Relief strategies

  • Raise max_servers only if upstream PostgreSQL hosts can accept more backends.
  • Enable AI routing with policy = adaptive to spread hot shards automatically.
  • Throttle chatty tenants via max_client_rate or max_query_rate rules.

Failover & Node Health

Ensure unhealthy replicas are demoted quickly and automation callbacks succeed.

Probe node status

pgbalancer admin nodes list
pgbalancer admin nodes check replica-2
pgbalancer admin nodes promote replica-3

Health check guidance

  • Set health_check_query per pool and keep health_check_interval low for timely demotions.
  • Quarantine nodes during maintenance with pgbalancer admin nodes quarantine.
  • Verify failover webhooks respond with 200; pgBalancer retries five times before dropping an alert.

Prometheus & Alert Tuning

Reduce alert noise by calibrating thresholds once baselines are known.

Example alert rule

- alert: PgBalancerHighLatency
  expr: histogram_quantile(0.95, sum(rate(pgbalancer_query_duration_seconds_bucket[5m])) by (le,pool)) > 0.2
  for: 3m
  labels:
    severity: warning

Alerting tips

  • Scrape /metrics every 15 seconds for accurate percentiles.
  • Set threshold_wait_queue and threshold_latency_ms in configuration to match SLAs.
  • Correlate pgBalancer alerts with pg_stat_activity and infrastructure telemetry before paging engineers.