DocumentationpgBalancer Documentation
Restore pgBalancer Health
Use the sections below to resolve connectivity issues, saturated pools, failover drift, and alert noise. Each step includes the exact CLI or SQL needed to validate and fix the problem.
Connection Problems
Troubleshoot listeners, TLS, and authentication mappings that block client sessions.
Run connection diagnostics
ss -ltn | grep 6432
journalctl -u pgbalancer -n 100
psql "postgres://appuser:secret@pgbalancer:6432/appdb" -c 'SELECT 1;'Common fixes
- Expose the listener on
0.0.0.0(or the appropriate subnet) when running inside containers. - Align TLS expectations between clients and pgBalancer; disable
require_client_tlstemporarily during debugging. - Regenerate credentials with
pgbalancer admin users setif authentication fails.
Wait Queues & Saturation
Address exhausted pools and keep query latency predictable.
Inspect pool utilisation
pgbalancer admin pools stats --format table
pgbalancer admin pools resize primary --max-clients 400 --max-servers 50Check backend database load
SELECT datname,
numbackends,
xact_commit,
blks_read,
blks_hit
FROM pg_stat_database
ORDER BY numbackends DESC;Relief strategies
- Raise
max_serversonly if upstream PostgreSQL hosts can accept more backends. - Enable AI routing with
policy = adaptiveto spread hot shards automatically. - Throttle chatty tenants via
max_client_rateormax_query_raterules.
Failover & Node Health
Ensure unhealthy replicas are demoted quickly and automation callbacks succeed.
Probe node status
pgbalancer admin nodes list
pgbalancer admin nodes check replica-2
pgbalancer admin nodes promote replica-3Health check guidance
- Set
health_check_queryper pool and keephealth_check_intervallow for timely demotions. - Quarantine nodes during maintenance with
pgbalancer admin nodes quarantine. - Verify failover webhooks respond with
200; pgBalancer retries five times before dropping an alert.
Prometheus & Alert Tuning
Reduce alert noise by calibrating thresholds once baselines are known.
Example alert rule
- alert: PgBalancerHighLatency
expr: histogram_quantile(0.95, sum(rate(pgbalancer_query_duration_seconds_bucket[5m])) by (le,pool)) > 0.2
for: 3m
labels:
severity: warningAlerting tips
- Scrape
/metricsevery 15 seconds for accurate percentiles. - Set
threshold_wait_queueandthreshold_latency_msin configuration to match SLAs. - Correlate pgBalancer alerts with
pg_stat_activityand infrastructure telemetry before paging engineers.