pgraft Cluster Management

Managing PostgreSQL clusters with pgraft Raft consensus for high availability and automatic failover.

Cluster Lifecycle Management

1. Cluster Creation

# Create a new cluster
SELECT pgraft_init_cluster('production-cluster');

# Verify cluster creation
SELECT * FROM pgraft_cluster_status('production-cluster');

2. Add Nodes

# Add primary node
SELECT pgraft_add_member('production-cluster', 'node1', 'host=192.168.1.10 port=5432');

# Add replica nodes
SELECT pgraft_add_member('production-cluster', 'node2', 'host=192.168.1.11 port=5432');
SELECT pgraft_add_member('production-cluster', 'node3', 'host=192.168.1.12 port=5432');

3. Monitor Cluster Health

# Check cluster status
SELECT * FROM pgraft_cluster_status('production-cluster');

# Check leader
SELECT * FROM pgraft_leader('production-cluster');

# Get metrics
SELECT * FROM pgraft_metrics('production-cluster');

Node Operations

Adding New Nodes

1

Prepare New Node

Install PostgreSQL and pgraft extension on the new node

2

Configure PostgreSQL

Enable pgraft extension and configure cluster settings

3

Add to Cluster

Use pgraft_add_member() to add the node to the cluster

Removing Nodes

# Remove node from cluster
SELECT pgraft_remove_member('production-cluster', 'node3');

# Verify removal
SELECT * FROM pgraft_cluster_status('production-cluster');

Ensure cluster has majority (odd number of nodes) after removal.

Node Maintenance

# Check node health
SELECT * FROM pgraft_node_info('production-cluster');

# Check if node is leader
SELECT pgraft_is_leader('production-cluster');

# Force election if needed
SELECT pgraft_force_election('production-cluster');

Failover Management

Automatic Failover

pgraft automatically handles failover when the leader becomes unavailable. The failover process:

1

Leader Detection

Followers detect leader unavailability through missed heartbeats

2

Election Process

Follower becomes candidate and requests votes from other nodes

3

New Leader

Candidate with majority votes becomes new leader

4

Service Continuation

New leader takes over and continues serving requests

Manual Failover

# Trigger manual failover
SELECT pgraft_force_election('production-cluster');

# Check new leader
SELECT * FROM pgraft_leader('production-cluster');

Monitoring and Maintenance

Health Monitoring

# Comprehensive cluster status
SELECT * FROM pgraft_cluster_status('production-cluster');

# Individual node status
SELECT * FROM pgraft_node_info('production-cluster');

# Performance metrics
SELECT * FROM pgraft_metrics('production-cluster');

Configuration Management

# View current configuration
SELECT * FROM pgraft_get_config('production-cluster');

# Update configuration
SELECT pgraft_set_config('production-cluster', 'heartbeat_interval', '50ms');
SELECT pgraft_set_config('production-cluster', 'election_timeout', '500ms');

Backup and Recovery

Regular Backups

# PostgreSQL backup
pg_dump -h leader-node -U postgres database_name > backup.sql

# Raft log backup
SELECT pgraft_backup_logs('production-cluster');

Disaster Recovery

# Restore from backup
psql -h new-node -U postgres database_name < backup.sql

# Rebuild cluster
SELECT pgraft_init_cluster('production-cluster');
SELECT pgraft_add_member('production-cluster', 'node1', 'host=new-node port=5432');

Best Practices

Cluster Design

  • • Use odd number of nodes (3, 5, 7) for proper majority voting
  • • Deploy nodes across different availability zones
  • • Ensure low latency network connectivity between nodes
  • • Monitor disk space for Raft logs and snapshots

Performance Tuning

  • • Adjust heartbeat_interval based on network latency
  • • Set election_timeout to 3-5x heartbeat_interval
  • • Configure snapshot_threshold based on log growth
  • • Monitor and tune PostgreSQL settings for optimal performance

Operational Procedures

  • • Plan maintenance windows for non-leader nodes first
  • • Always verify cluster health after changes
  • • Keep detailed logs of cluster operations
  • • Test failover procedures regularly