pgraft Troubleshooting

Resolve common issues and errors with pgraft PostgreSQL Raft extension.

Common Issues

Extension Won't Load

pgraft extension fails to load or initialize.

Symptoms:

  • • Extension not found error
  • • PostgreSQL fails to start
  • • Shared library not found

Solutions:

# Check extension installation
ls -la /usr/lib/postgresql/16/lib/pgraft.so

# Reinstall extension
make clean && make && sudo make install

# Check PostgreSQL configuration
grep shared_preload_libraries postgresql.conf

Cluster Split-Brain

Multiple nodes think they are the leader.

Symptoms:

  • • Multiple leaders reported
  • • Inconsistent cluster state
  • • Write conflicts

Solutions:

# Check cluster status
SELECT * FROM pgraft_cluster_status('my-cluster');

# Force new election
SELECT pgraft_force_election('my-cluster');

# Check network connectivity
ping -c 3 node1 node2 node3

High Replication Latency

Slow log replication between nodes.

Symptoms:

  • • Large commit index lag
  • • Slow failover times
  • • Inconsistent read results

Solutions:

# Reduce heartbeat interval
SELECT pgraft_set_config('my-cluster', 'heartbeat_interval', '50ms');

# Check network latency
traceroute node1 node2 node3

# Monitor replication metrics
SELECT * FROM pgraft_metrics('my-cluster');

Error Codes and Messages

pgraft Error Codes

Error CodeDescriptionSolution
PGR01Cluster not initializedInitialize cluster with pgraft_init_cluster()
PGR02Node not found in clusterAdd node with pgraft_add_member()
PGR03Not the leaderWait for leader or check cluster status
PGR04Network timeoutCheck network connectivity and increase timeout
PGR05Configuration errorValidate configuration parameters

Diagnostic Commands

Cluster Health Checks

# Comprehensive cluster status
SELECT * FROM pgraft_cluster_status('my-cluster');

# Check leader information
SELECT * FROM pgraft_leader('my-cluster');

# Node-specific information
SELECT * FROM pgraft_node_info('my-cluster');

# Performance metrics
SELECT * FROM pgraft_metrics('my-cluster');

Configuration Validation

# View current configuration
SELECT * FROM pgraft_get_config('my-cluster');

# Check PostgreSQL settings
SHOW shared_preload_libraries;
SHOW max_connections;

# Verify extension is loaded
SELECT * FROM pg_extension WHERE extname = 'pgraft';

Network Diagnostics

# Test connectivity to other nodes
telnet node1 5433
telnet node2 5433
telnet node3 5433

# Check network latency
ping -c 10 node1
traceroute node1

# Test PostgreSQL connectivity
psql -h node1 -p 5432 -U postgres -c "SELECT 1"

Recovery Procedures

Node Recovery

1

Stop PostgreSQL

Gracefully stop PostgreSQL on the failed node

2

Clean Up State

Remove any corrupted state files or logs

3

Restart Services

Restart PostgreSQL and rejoin cluster

4

Verify Recovery

Check cluster status and node health

Cluster Recovery

# If cluster is completely down
# 1. Start majority of nodes
# 2. Reinitialize cluster
SELECT pgraft_init_cluster('my-cluster');

# 3. Add remaining nodes
SELECT pgraft_add_member('my-cluster', 'node1', 'host=node1 port=5432');
SELECT pgraft_add_member('my-cluster', 'node2', 'host=node2 port=5432');
SELECT pgraft_add_member('my-cluster', 'node3', 'host=node3 port=5432');

# 4. Verify cluster health
SELECT * FROM pgraft_cluster_status('my-cluster');

Performance Troubleshooting

Slow Operations

Check Metrics:

SELECT * FROM pgraft_metrics('my-cluster');
SELECT * FROM pg_stat_activity;
SELECT * FROM pg_stat_database;

Common Solutions:

  • • Increase heartbeat frequency
  • • Optimize network settings
  • • Check PostgreSQL configuration
  • • Monitor resource usage

Memory Issues

Symptoms:

  • • High memory usage
  • • Out of memory errors
  • • Slow garbage collection

Solutions:

# Reduce log entries
SELECT pgraft_set_config('my-cluster', 'max_log_entries', '5000');

# Force snapshot
SELECT pgraft_snapshot('my-cluster');

# Check PostgreSQL memory settings
SHOW shared_buffers;
SHOW work_mem;