GPU Acceleration

GPU Acceleration

Supercharge vector operations with CUDA and ROCm. Get 100x speedup for batch operations and 23x faster clustering on NVIDIA and AMD GPUs.

Overview

NeuronDB provides optional GPU acceleration for compute-intensive vector operations using NVIDIA CUDA or AMD ROCm. GPU support is completely optional and automatically falls back to CPU when unavailable.

100x
Batch Distance Speedup
23x
K-Means Clustering
2.3ms
Avg GPU Latency

GPU-Accelerated Operations

OperationCUDAROCmSpeedup
L2 Distance✓ cuBLAS✓ rocBLAS100x (batch)
Cosine Distance✓ cuBLAS✓ rocBLAS100x (batch)
Inner Product✓ GEMM✓ GEMM100x (batch)
K-Means Clustering✓ Custom✓ Custom23x
Quantization (INT8/FP16)✓ Kernels✓ Kernels50x
ONNX Inference✓ CUDA EPPartial10-15x

Configuration

PostgreSQL Configuration

# Add to postgresql.conf shared_preload_libraries = 'neurondb' # GPU Configuration (all optional) neurondb.gpu_enabled = off # Enable GPU (default: off) neurondb.gpu_device = 0 # GPU device ID neurondb.gpu_batch_size = 8192 # Batch size for GPU ops neurondb.gpu_streams = 2 # CUDA/HIP streams neurondb.gpu_memory_pool_mb = 512 # Memory pool size neurondb.gpu_fail_open = on # Fallback to CPU on error neurondb.gpu_kernels = 'l2,cosine,ip' # Enabled kernels neurondb.gpu_timeout_ms = 30000 # Kernel timeout

SQL Examples

Enable GPU Acceleration

-- Check GPU availability SELECT neurondb_gpu_info(); -- Enable GPU for current session SELECT neurondb_gpu_enable(true); -- Check GPU statistics SELECT * FROM pg_stat_neurondb_gpu;

GPU-Accelerated Distance

-- Batch GPU distance calculation (100x faster) SELECT vector_l2_distance_gpu( embedding, '[0.1, 0.2, ...]'::vector ) FROM documents; -- GPU cosine similarity SELECT vector_cosine_distance_gpu( features, query_vector ) FROM products ORDER BY 1 LIMIT 10;

GPU K-Means Clustering

-- GPU-accelerated clustering (23x faster) SELECT cluster_kmeans_gpu( 'customer_data', 'features', 10, -- number of clusters 100 -- max iterations ); -- Result: {"clusters": 10, "iterations": 18, -- "inertia": 234.5, "device": "GPU", -- "speedup": "23.4x"}

Building with GPU Support

Using build.sh (Recommended)

# CPU-only build (default) ./build.sh # With GPU support (auto-detects CUDA/ROCm) ./build.sh --with-gpu # With custom paths ./build.sh --with-gpu --cuda-path /opt/cuda --onnx-path /usr/local

NVIDIA GPU (CUDA)

# Install CUDA Toolkit 12.6 # Ubuntu/Debian wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo apt-get update sudo apt-get install -y cuda-toolkit-12-6 # Build NeuronDB with CUDA ./build.sh --with-gpu

AMD GPU (ROCm)

# Install ROCm # Ubuntu wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_6.0.60000-1_all.deb sudo dpkg -i amdgpu-install_6.0.60000-1_all.deb sudo amdgpu-install -y --usecase=rocm # Build NeuronDB with ROCm ./build.sh --with-gpu --rocm-path /opt/rocm

Performance Benchmarks

Tested on NVIDIA RTX 4090 (24GB), 10,000 vectors, 768 dimensions:

OperationCPU TimeGPU TimeSpeedup
Batch L2 Distance (10K)450ms4.5ms100x
K-Means (10 clusters)421ms18ms23.4x
INT8 Quantization (10K)234ms4.7ms50x
FP16 Quantization (10K)189ms3.8ms50x
ONNX Inference (batch 32)156ms12ms13x

Monitoring GPU Usage

-- View GPU statistics per backend SELECT * FROM pg_stat_neurondb_gpu; -- Columns: -- backend_pid: PostgreSQL backend process ID -- queries: Total GPU queries executed -- batches: Number of batches processed -- avg_batch_size: Average vectors per batch -- avg_latency_ms: Average GPU kernel latency -- fallback_count: Times fell back to CPU -- oom_count: Out-of-memory errors -- last_error: Last GPU error (if any)

Pro Tip: Monitor fallback_count and oom_count to detect GPU memory pressure. If these increase, consider reducing batch_size or increasing memory_pool_mb.

Automatic CPU Fallback

NeuronDB automatically falls back to CPU when GPU is unavailable or encounters errors. This ensures your queries always succeed, even if GPU resources are exhausted.

Fail-Open Mode (Default)

  • Query continues on CPU if GPU fails
  • Logs warning but returns result
  • Best for production reliability

Fail-Closed Mode

  • Query fails if GPU unavailable
  • Returns error to client
  • Use for GPU-required workloads
-- Set fail-open (default, recommended) SET neurondb.gpu_fail_open = on; -- Set fail-closed (strict GPU requirement) SET neurondb.gpu_fail_open = off;