Documentationpgraft Documentation

pgraft Cluster Management

Bootstrap a New Cluster

Run these commands after installing pgraft and configuring the leader node. They initialize metadata, elect the first leader, and confirm that the cluster is healthy.

Initialize Raft metadata

-- Run on the leader after CREATE EXTENSION
SELECT pgraft_init();

-- Optional: set a human-friendly cluster label
SELECT pgraft_set_config('cluster_name', 'production-cluster');

-- Verify leader election and quorum
SELECT pgraft_is_leader() AS is_leader,
       pgraft_get_term() AS current_term,
       pgraft_quorum_met() AS quorum_ready;

Review current members

-- Shows the local node (leader) after initialization
SELECT * FROM pgraft_get_nodes();

-- Detailed cluster status including commit indexes
SELECT * FROM pgraft_get_cluster_status();

Add and Remove Nodes

Prepare each follower with the same postgresql.conf identity settings, then register it with the leader. Removing nodes requires leader confirmation to maintain quorum.

Add follower nodes

-- Execute on the elected leader once the follower database is running
SELECT pgraft_add_node(2, '10.0.0.12', 7002);
SELECT pgraft_add_node(3, '10.0.0.13', 7003);

-- Monitor replication catch-up
SELECT node_id,
       state,
       match_index,
       commit_index
  FROM pgraft_get_nodes();

Remove a node gracefully

-- Triggered from the leader to revoke membership
SELECT pgraft_remove_node(3);

-- Confirm removal and quorum health
SELECT pgraft_quorum_met() AS quorum_ok,
       pgraft_get_nodes();

Operational Monitoring

pgraft exposes diagnostic functions for Raft internals. Use them to track leadership, log replication, and worker health in dashboards or alerts.

Health overview

-- Leader identity, Raft term, and election metrics
SELECT * FROM pgraft_get_cluster_status();

-- Per-node connectivity and Raft lag
SELECT node_id,
       state,
       last_heartbeat_ms,
       replication_lag_bytes
  FROM pgraft_get_nodes();

Log and snapshot telemetry

-- Append/commit counts, snapshot cadence, and RPC statistics
SELECT * FROM pgraft_log_get_stats();

-- Inspect last five leadership transitions
SELECT *
  FROM pgraft_get_events()
 ORDER BY event_timestamp DESC
 LIMIT 5;

Failover & Leadership Control

Automatic elections occur when the leader misses heartbeat deadlines. Use the following procedures to simulate failover, promote a new leader, or pause elections during maintenance.

Manual leadership transfer

-- Ask the current leader to step down and trigger an election
SELECT pgraft_transfer_leadership(2);

-- Pause elections when taking the leader offline (e.g., maintenance)
SELECT pgraft_set_config('failover_enabled', 'false');
-- Resume elections after maintenance concludes
SELECT pgraft_set_config('failover_enabled', 'true');

Failover drill checklist

# 1. Confirm cluster is healthy
psql -c "SELECT pgraft_quorum_met();"

# 2. Trigger leadership transfer
psql -c "SELECT pgraft_transfer_leadership(2);"

# 3. Validate new leader
psql -c "SELECT pgraft_is_leader(), pgraft_get_leader();"

# 4. Re-enable automatic failover if disabled
psql -c "SELECT pgraft_set_config('failover_enabled', 'true');"

Rolling Maintenance Workflow

Keep quorum while patching or restarting individual members. Always drain workload and verify replication catch-up before shutting down a node.

1. Drain client traffic. Redirect application connections away from the target node or remove it from connection poolers.
2. Ensure follower status. If the node is leader, run pgraft_transfer_leadership() to promote another server.
3. Wait for log sync. Use SELECT replication_lag_bytes from pgraft_get_nodes() to confirm lag is zero.
4. Stop PostgreSQL. Apply OS patches or package upgrades and restart the instance.
5. Rejoin the cluster. After startup, pgraft automatically reconnects and catches up; monitor state = 'follower'.

PreviousPerformance

NextTutorial