GPU Acceleration

Supercharge vector operations with CUDA and ROCm. Get 100x speedup for batch operations and 23x faster clustering on NVIDIA and AMD GPUs.

Overview

NeuronDB provides optional GPU acceleration for compute-intensive vector operations using NVIDIA CUDA or AMD ROCm. GPU support is completely optional and automatically falls back to CPU when unavailable.

100x

Batch Distance Speedup

23x

K-Means Clustering

2.3ms

Avg GPU Latency

GPU-Accelerated Operations

Operation	CUDA	ROCm	Speedup
L2 Distance	✓ cuBLAS	✓ rocBLAS	100x (batch)
Cosine Distance	✓ cuBLAS	✓ rocBLAS	100x (batch)
Inner Product	✓ GEMM	✓ GEMM	100x (batch)
K-Means Clustering	✓ Custom	✓ Custom	23x
Quantization (INT8/FP16)	✓ Kernels	✓ Kernels	50x
ONNX Inference	✓ CUDA EP	Partial	10-15x

Configuration

PostgreSQL Configuration

# Add to postgresql.conf
shared_preload_libraries = 'neurondb'

# GPU Configuration (all optional)
neurondb.gpu_enabled = off              # Enable GPU (default: off)
neurondb.gpu_device = 0                 # GPU device ID
neurondb.gpu_batch_size = 8192          # Batch size for GPU ops
neurondb.gpu_streams = 2                # CUDA/HIP streams
neurondb.gpu_memory_pool_mb = 512       # Memory pool size
neurondb.gpu_fail_open = on             # Fallback to CPU on error
neurondb.gpu_kernels = 'l2,cosine,ip'   # Enabled kernels
neurondb.gpu_timeout_ms = 30000         # Kernel timeout

SQL Examples

Enable GPU Acceleration

-- Check GPU availability
SELECT neurondb_gpu_info();

-- Enable GPU for current session
SELECT neurondb_gpu_enable(true);

-- Check GPU statistics
SELECT * FROM pg_stat_neurondb_gpu;

GPU-Accelerated Distance

-- Batch GPU distance calculation (100x faster)
SELECT vector_l2_distance_gpu(
  embedding, 
  '[0.1, 0.2, ...]'::vector
) FROM documents;

-- GPU cosine similarity
SELECT vector_cosine_distance_gpu(
  features, 
  query_vector
) FROM products
ORDER BY 1 LIMIT 10;

GPU K-Means Clustering

-- GPU-accelerated clustering (23x faster)
SELECT cluster_kmeans_gpu(
  'customer_data',
  'features',
  10,        -- number of clusters
  100        -- max iterations
);

-- Result: {"clusters": 10, "iterations": 18, 
--          "inertia": 234.5, "device": "GPU", 
--          "speedup": "23.4x"}

Building with GPU Support

Using build.sh (Recommended)

# CPU-only build (default)
./build.sh

# With GPU support (auto-detects CUDA/ROCm)
./build.sh --with-gpu

# With custom paths
./build.sh --with-gpu --cuda-path /opt/cuda --onnx-path /usr/local

NVIDIA GPU (CUDA)

# Install CUDA Toolkit 12.6
# Ubuntu/Debian
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda-toolkit-12-6

# Build NeuronDB with CUDA
./build.sh --with-gpu

AMD GPU (ROCm)

# Install ROCm
# Ubuntu
wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_6.0.60000-1_all.deb
sudo dpkg -i amdgpu-install_6.0.60000-1_all.deb
sudo amdgpu-install -y --usecase=rocm

# Build NeuronDB with ROCm
./build.sh --with-gpu --rocm-path /opt/rocm

Performance Benchmarks

Tested on NVIDIA RTX 4090 (24GB), 10,000 vectors, 768 dimensions:

Operation	CPU Time	GPU Time	Speedup
Batch L2 Distance (10K)	450ms	4.5ms	100x
K-Means (10 clusters)	421ms	18ms	23.4x
INT8 Quantization (10K)	234ms	4.7ms	50x
FP16 Quantization (10K)	189ms	3.8ms	50x
ONNX Inference (batch 32)	156ms	12ms	13x

Monitoring GPU Usage

-- View GPU statistics per backend
SELECT * FROM pg_stat_neurondb_gpu;

-- Columns:
--   backend_pid: PostgreSQL backend process ID
--   queries: Total GPU queries executed
--   batches: Number of batches processed
--   avg_batch_size: Average vectors per batch
--   avg_latency_ms: Average GPU kernel latency
--   fallback_count: Times fell back to CPU
--   oom_count: Out-of-memory errors
--   last_error: Last GPU error (if any)

Pro Tip: Monitor fallback_count and oom_count to detect GPU memory pressure. If these increase, consider reducing batch_size or increasing memory_pool_mb.

Automatic CPU Fallback

NeuronDB automatically falls back to CPU when GPU is unavailable or encounters errors. This ensures your queries always succeed, even if GPU resources are exhausted.

Fail-Open Mode (Default)

Query continues on CPU if GPU fails
Logs warning but returns result
Best for production reliability

Fail-Closed Mode

Query fails if GPU unavailable
Returns error to client
Use for GPU-required workloads

-- Set fail-open (default, recommended)
SET neurondb.gpu_fail_open = on;

-- Set fail-closed (strict GPU requirement)
SET neurondb.gpu_fail_open = off;

Next Steps

ML & Embeddings

Text and image embeddings

ML Analytics

Clustering and analysis