GPU Acceleration
GPU Acceleration
Supercharge vector operations with CUDA and ROCm. Get 100x speedup for batch operations and 23x faster clustering on NVIDIA and AMD GPUs.
Overview
NeuronDB provides optional GPU acceleration for compute-intensive vector operations using NVIDIA CUDA or AMD ROCm. GPU support is completely optional and automatically falls back to CPU when unavailable.
100x
Batch Distance Speedup
23x
K-Means Clustering
2.3ms
Avg GPU Latency
GPU-Accelerated Operations
| Operation | CUDA | ROCm | Speedup |
|---|---|---|---|
| L2 Distance | ✓ cuBLAS | ✓ rocBLAS | 100x (batch) |
| Cosine Distance | ✓ cuBLAS | ✓ rocBLAS | 100x (batch) |
| Inner Product | ✓ GEMM | ✓ GEMM | 100x (batch) |
| K-Means Clustering | ✓ Custom | ✓ Custom | 23x |
| Quantization (INT8/FP16) | ✓ Kernels | ✓ Kernels | 50x |
| ONNX Inference | ✓ CUDA EP | Partial | 10-15x |
Configuration
PostgreSQL Configuration
# Add to postgresql.conf
shared_preload_libraries = 'neurondb'
# GPU Configuration (all optional)
neurondb.gpu_enabled = off # Enable GPU (default: off)
neurondb.gpu_device = 0 # GPU device ID
neurondb.gpu_batch_size = 8192 # Batch size for GPU ops
neurondb.gpu_streams = 2 # CUDA/HIP streams
neurondb.gpu_memory_pool_mb = 512 # Memory pool size
neurondb.gpu_fail_open = on # Fallback to CPU on error
neurondb.gpu_kernels = 'l2,cosine,ip' # Enabled kernels
neurondb.gpu_timeout_ms = 30000 # Kernel timeoutSQL Examples
Enable GPU Acceleration
-- Check GPU availability
SELECT neurondb_gpu_info();
-- Enable GPU for current session
SELECT neurondb_gpu_enable(true);
-- Check GPU statistics
SELECT * FROM pg_stat_neurondb_gpu;GPU-Accelerated Distance
-- Batch GPU distance calculation (100x faster)
SELECT vector_l2_distance_gpu(
embedding,
'[0.1, 0.2, ...]'::vector
) FROM documents;
-- GPU cosine similarity
SELECT vector_cosine_distance_gpu(
features,
query_vector
) FROM products
ORDER BY 1 LIMIT 10;GPU K-Means Clustering
-- GPU-accelerated clustering (23x faster)
SELECT cluster_kmeans_gpu(
'customer_data',
'features',
10, -- number of clusters
100 -- max iterations
);
-- Result: {"clusters": 10, "iterations": 18,
-- "inertia": 234.5, "device": "GPU",
-- "speedup": "23.4x"}Building with GPU Support
Using build.sh (Recommended)
# CPU-only build (default)
./build.sh
# With GPU support (auto-detects CUDA/ROCm)
./build.sh --with-gpu
# With custom paths
./build.sh --with-gpu --cuda-path /opt/cuda --onnx-path /usr/localNVIDIA GPU (CUDA)
# Install CUDA Toolkit 12.6
# Ubuntu/Debian
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda-toolkit-12-6
# Build NeuronDB with CUDA
./build.sh --with-gpuAMD GPU (ROCm)
# Install ROCm
# Ubuntu
wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_6.0.60000-1_all.deb
sudo dpkg -i amdgpu-install_6.0.60000-1_all.deb
sudo amdgpu-install -y --usecase=rocm
# Build NeuronDB with ROCm
./build.sh --with-gpu --rocm-path /opt/rocmPerformance Benchmarks
Tested on NVIDIA RTX 4090 (24GB), 10,000 vectors, 768 dimensions:
| Operation | CPU Time | GPU Time | Speedup |
|---|---|---|---|
| Batch L2 Distance (10K) | 450ms | 4.5ms | 100x |
| K-Means (10 clusters) | 421ms | 18ms | 23.4x |
| INT8 Quantization (10K) | 234ms | 4.7ms | 50x |
| FP16 Quantization (10K) | 189ms | 3.8ms | 50x |
| ONNX Inference (batch 32) | 156ms | 12ms | 13x |
Monitoring GPU Usage
-- View GPU statistics per backend
SELECT * FROM pg_stat_neurondb_gpu;
-- Columns:
-- backend_pid: PostgreSQL backend process ID
-- queries: Total GPU queries executed
-- batches: Number of batches processed
-- avg_batch_size: Average vectors per batch
-- avg_latency_ms: Average GPU kernel latency
-- fallback_count: Times fell back to CPU
-- oom_count: Out-of-memory errors
-- last_error: Last GPU error (if any)Pro Tip: Monitor fallback_count and oom_count to detect GPU memory pressure. If these increase, consider reducing batch_size or increasing memory_pool_mb.
Automatic CPU Fallback
NeuronDB automatically falls back to CPU when GPU is unavailable or encounters errors. This ensures your queries always succeed, even if GPU resources are exhausted.
Fail-Open Mode (Default)
- Query continues on CPU if GPU fails
- Logs warning but returns result
- Best for production reliability
Fail-Closed Mode
- Query fails if GPU unavailable
- Returns error to client
- Use for GPU-required workloads
-- Set fail-open (default, recommended)
SET neurondb.gpu_fail_open = on;
-- Set fail-closed (strict GPU requirement)
SET neurondb.gpu_fail_open = off;