pgraft Raft Protocol

Understanding Raft consensus protocol implementation in pgraft PostgreSQL extension.

Raft Consensus Protocol

Raft is a consensus algorithm designed to be understandable and implementable. It ensures that a cluster of servers can agree on the same state even in the presence of failures. pgraft implements Raft as a PostgreSQL extension, providing distributed consensus capabilities directly within the database.

👑

Leader Election

Automatic leader selection with term-based voting

📋

Log Replication

Consistent log replication across all nodes

🔒

Safety

Split-brain prevention and consistency guarantees

Node States

Leader

The leader handles all client requests and manages log replication. It sends periodic heartbeats to maintain leadership and replicates log entries to followers.

# Check if node is leader
SELECT pgraft_is_leader('my-cluster');
-- Returns: true if leader, false otherwise

Follower

Followers receive log entries from the leader and respond to heartbeats. They can become candidates if they don't receive heartbeats within the election timeout.

Candidate

Candidates request votes from other nodes during leader election. They become leaders if they receive votes from a majority of the cluster.

Leader Election Process

Leader election occurs when a follower doesn't receive heartbeats from the current leader within the election timeout period.

1

Election Timeout

Follower doesn't receive heartbeat within election timeout (default: 1000ms)

2

Become Candidate

Follower becomes candidate and increments term

3

Request Votes

Candidate sends RequestVote RPCs to all other nodes

4

Become Leader

Candidate becomes leader if it receives votes from majority

Log Replication

The leader replicates log entries to all followers to maintain consistency across the cluster. Entries are committed when they are replicated to a majority of nodes.

# Leader receives client request
# 1. Append entry to leader's log
# 2. Send AppendEntries RPCs to all followers
# 3. Wait for majority acknowledgment
# 4. Apply entry to state machine
# 5. Respond to client

Commit Index

The highest log entry that has been replicated to a majority of nodes and is safe to apply.

Last Applied

The highest log entry that has been applied to the state machine.

Safety Properties

Election Safety

At most one leader can be elected in a given term. This prevents split-brain scenarios where multiple nodes think they are the leader.

Leader Append-Only

A leader never overwrites or deletes entries in its log. It only appends new entries, ensuring log consistency.

Log Matching

If two logs contain an entry with the same index and term, then the logs are identical in all preceding entries.

Leader Completeness

If a log entry is committed in a given term, then that entry will be present in the logs of all leaders for higher-numbered terms.

Configuration Parameters

heartbeat_interval

How often the leader sends heartbeats (default: 100ms)

pgraft.heartbeat_interval = 50ms

election_timeout

Timeout before follower becomes candidate (default: 1000ms)

pgraft.election_timeout = 500ms

snapshot_threshold

Number of entries before taking snapshot (default: 1000)

pgraft.snapshot_threshold = 5000

max_log_entries

Maximum log entries before compaction (default: 10000)

pgraft.max_log_entries = 20000