pgraft Raft Protocol
Understanding Raft consensus protocol implementation in pgraft PostgreSQL extension.
Raft Consensus Protocol
Raft is a consensus algorithm designed to be understandable and implementable. It ensures that a cluster of servers can agree on the same state even in the presence of failures. pgraft implements Raft as a PostgreSQL extension, providing distributed consensus capabilities directly within the database.
Leader Election
Automatic leader selection with term-based voting
Log Replication
Consistent log replication across all nodes
Safety
Split-brain prevention and consistency guarantees
Node States
Leader
The leader handles all client requests and manages log replication. It sends periodic heartbeats to maintain leadership and replicates log entries to followers.
# Check if node is leader
SELECT pgraft_is_leader('my-cluster');
-- Returns: true if leader, false otherwise
Follower
Followers receive log entries from the leader and respond to heartbeats. They can become candidates if they don't receive heartbeats within the election timeout.
Candidate
Candidates request votes from other nodes during leader election. They become leaders if they receive votes from a majority of the cluster.
Leader Election Process
Leader election occurs when a follower doesn't receive heartbeats from the current leader within the election timeout period.
Election Timeout
Follower doesn't receive heartbeat within election timeout (default: 1000ms)
Become Candidate
Follower becomes candidate and increments term
Request Votes
Candidate sends RequestVote RPCs to all other nodes
Become Leader
Candidate becomes leader if it receives votes from majority
Log Replication
The leader replicates log entries to all followers to maintain consistency across the cluster. Entries are committed when they are replicated to a majority of nodes.
# Leader receives client request
# 1. Append entry to leader's log
# 2. Send AppendEntries RPCs to all followers
# 3. Wait for majority acknowledgment
# 4. Apply entry to state machine
# 5. Respond to client
Commit Index
The highest log entry that has been replicated to a majority of nodes and is safe to apply.
Last Applied
The highest log entry that has been applied to the state machine.
Safety Properties
Election Safety
At most one leader can be elected in a given term. This prevents split-brain scenarios where multiple nodes think they are the leader.
Leader Append-Only
A leader never overwrites or deletes entries in its log. It only appends new entries, ensuring log consistency.
Log Matching
If two logs contain an entry with the same index and term, then the logs are identical in all preceding entries.
Leader Completeness
If a log entry is committed in a given term, then that entry will be present in the logs of all leaders for higher-numbered terms.
Configuration Parameters
heartbeat_interval
How often the leader sends heartbeats (default: 100ms)
pgraft.heartbeat_interval = 50ms
election_timeout
Timeout before follower becomes candidate (default: 1000ms)
pgraft.election_timeout = 500ms
snapshot_threshold
Number of entries before taking snapshot (default: 1000)
pgraft.snapshot_threshold = 5000
max_log_entries
Maximum log entries before compaction (default: 10000)
pgraft.max_log_entries = 20000