pgraft Cluster Management
Managing PostgreSQL clusters with pgraft Raft consensus for high availability and automatic failover.
Cluster Lifecycle Management
1. Cluster Creation
# Create a new cluster
SELECT pgraft_init_cluster('production-cluster');
# Verify cluster creation
SELECT * FROM pgraft_cluster_status('production-cluster');2. Add Nodes
# Add primary node
SELECT pgraft_add_member('production-cluster', 'node1', 'host=192.168.1.10 port=5432');
# Add replica nodes
SELECT pgraft_add_member('production-cluster', 'node2', 'host=192.168.1.11 port=5432');
SELECT pgraft_add_member('production-cluster', 'node3', 'host=192.168.1.12 port=5432');3. Monitor Cluster Health
# Check cluster status
SELECT * FROM pgraft_cluster_status('production-cluster');
# Check leader
SELECT * FROM pgraft_leader('production-cluster');
# Get metrics
SELECT * FROM pgraft_metrics('production-cluster');Node Operations
Adding New Nodes
1
Prepare New Node
Install PostgreSQL and pgraft extension on the new node
2
Configure PostgreSQL
Enable pgraft extension and configure cluster settings
3
Add to Cluster
Use pgraft_add_member() to add the node to the cluster
Removing Nodes
# Remove node from cluster
SELECT pgraft_remove_member('production-cluster', 'node3');
# Verify removal
SELECT * FROM pgraft_cluster_status('production-cluster');Ensure cluster has majority (odd number of nodes) after removal.
Node Maintenance
# Check node health
SELECT * FROM pgraft_node_info('production-cluster');
# Check if node is leader
SELECT pgraft_is_leader('production-cluster');
# Force election if needed
SELECT pgraft_force_election('production-cluster');Failover Management
Automatic Failover
pgraft automatically handles failover when the leader becomes unavailable. The failover process:
1
Leader Detection
Followers detect leader unavailability through missed heartbeats
2
Election Process
Follower becomes candidate and requests votes from other nodes
3
New Leader
Candidate with majority votes becomes new leader
4
Service Continuation
New leader takes over and continues serving requests
Manual Failover
# Trigger manual failover
SELECT pgraft_force_election('production-cluster');
# Check new leader
SELECT * FROM pgraft_leader('production-cluster');Monitoring and Maintenance
Health Monitoring
# Comprehensive cluster status
SELECT * FROM pgraft_cluster_status('production-cluster');
# Individual node status
SELECT * FROM pgraft_node_info('production-cluster');
# Performance metrics
SELECT * FROM pgraft_metrics('production-cluster');Configuration Management
# View current configuration
SELECT * FROM pgraft_get_config('production-cluster');
# Update configuration
SELECT pgraft_set_config('production-cluster', 'heartbeat_interval', '50ms');
SELECT pgraft_set_config('production-cluster', 'election_timeout', '500ms');Backup and Recovery
Regular Backups
# PostgreSQL backup
pg_dump -h leader-node -U postgres database_name > backup.sql
# Raft log backup
SELECT pgraft_backup_logs('production-cluster');Disaster Recovery
# Restore from backup
psql -h new-node -U postgres database_name < backup.sql
# Rebuild cluster
SELECT pgraft_init_cluster('production-cluster');
SELECT pgraft_add_member('production-cluster', 'node1', 'host=new-node port=5432');Best Practices
Cluster Design
- • Use odd number of nodes (3, 5, 7) for proper majority voting
- • Deploy nodes across different availability zones
- • Ensure low latency network connectivity between nodes
- • Monitor disk space for Raft logs and snapshots
Performance Tuning
- • Adjust heartbeat_interval based on network latency
- • Set election_timeout to 3-5x heartbeat_interval
- • Configure snapshot_threshold based on log growth
- • Monitor and tune PostgreSQL settings for optimal performance
Operational Procedures
- • Plan maintenance windows for non-leader nodes first
- • Always verify cluster health after changes
- • Keep detailed logs of cluster operations
- • Test failover procedures regularly