pgraft Troubleshooting
Resolve common issues and errors with pgraft PostgreSQL Raft extension.
Common Issues
Extension Won't Load
pgraft extension fails to load or initialize.
Symptoms:
- • Extension not found error
- • PostgreSQL fails to start
- • Shared library not found
Solutions:
# Check extension installation
ls -la /usr/lib/postgresql/16/lib/pgraft.so
# Reinstall extension
make clean && make && sudo make install
# Check PostgreSQL configuration
grep shared_preload_libraries postgresql.conf
Cluster Split-Brain
Multiple nodes think they are the leader.
Symptoms:
- • Multiple leaders reported
- • Inconsistent cluster state
- • Write conflicts
Solutions:
# Check cluster status
SELECT * FROM pgraft_cluster_status('my-cluster');
# Force new election
SELECT pgraft_force_election('my-cluster');
# Check network connectivity
ping -c 3 node1 node2 node3
High Replication Latency
Slow log replication between nodes.
Symptoms:
- • Large commit index lag
- • Slow failover times
- • Inconsistent read results
Solutions:
# Reduce heartbeat interval
SELECT pgraft_set_config('my-cluster', 'heartbeat_interval', '50ms');
# Check network latency
traceroute node1 node2 node3
# Monitor replication metrics
SELECT * FROM pgraft_metrics('my-cluster');
Error Codes and Messages
pgraft Error Codes
Error Code | Description | Solution |
---|---|---|
PGR01 | Cluster not initialized | Initialize cluster with pgraft_init_cluster() |
PGR02 | Node not found in cluster | Add node with pgraft_add_member() |
PGR03 | Not the leader | Wait for leader or check cluster status |
PGR04 | Network timeout | Check network connectivity and increase timeout |
PGR05 | Configuration error | Validate configuration parameters |
Diagnostic Commands
Cluster Health Checks
# Comprehensive cluster status
SELECT * FROM pgraft_cluster_status('my-cluster');
# Check leader information
SELECT * FROM pgraft_leader('my-cluster');
# Node-specific information
SELECT * FROM pgraft_node_info('my-cluster');
# Performance metrics
SELECT * FROM pgraft_metrics('my-cluster');
Configuration Validation
# View current configuration
SELECT * FROM pgraft_get_config('my-cluster');
# Check PostgreSQL settings
SHOW shared_preload_libraries;
SHOW max_connections;
# Verify extension is loaded
SELECT * FROM pg_extension WHERE extname = 'pgraft';
Network Diagnostics
# Test connectivity to other nodes
telnet node1 5433
telnet node2 5433
telnet node3 5433
# Check network latency
ping -c 10 node1
traceroute node1
# Test PostgreSQL connectivity
psql -h node1 -p 5432 -U postgres -c "SELECT 1"
Recovery Procedures
Node Recovery
1
Stop PostgreSQL
Gracefully stop PostgreSQL on the failed node
2
Clean Up State
Remove any corrupted state files or logs
3
Restart Services
Restart PostgreSQL and rejoin cluster
4
Verify Recovery
Check cluster status and node health
Cluster Recovery
# If cluster is completely down
# 1. Start majority of nodes
# 2. Reinitialize cluster
SELECT pgraft_init_cluster('my-cluster');
# 3. Add remaining nodes
SELECT pgraft_add_member('my-cluster', 'node1', 'host=node1 port=5432');
SELECT pgraft_add_member('my-cluster', 'node2', 'host=node2 port=5432');
SELECT pgraft_add_member('my-cluster', 'node3', 'host=node3 port=5432');
# 4. Verify cluster health
SELECT * FROM pgraft_cluster_status('my-cluster');
Performance Troubleshooting
Slow Operations
Check Metrics:
SELECT * FROM pgraft_metrics('my-cluster');
SELECT * FROM pg_stat_activity;
SELECT * FROM pg_stat_database;
Common Solutions:
- • Increase heartbeat frequency
- • Optimize network settings
- • Check PostgreSQL configuration
- • Monitor resource usage
Memory Issues
Symptoms:
- • High memory usage
- • Out of memory errors
- • Slow garbage collection
Solutions:
# Reduce log entries
SELECT pgraft_set_config('my-cluster', 'max_log_entries', '5000');
# Force snapshot
SELECT pgraft_snapshot('my-cluster');
# Check PostgreSQL memory settings
SHOW shared_buffers;
SHOW work_mem;