Resolve NeurondB Operational Issues
Fast triage checklist
Before troubleshooting, verify:
- Enable
log_min_messages = debug1temporarily when reproducing issues - Verify
SELECT * FROM pg_extension WHERE extname = 'neurondb';returns the expected version - Collect
EXPLAIN (ANALYZE, BUFFERS)plans for slow queries before tuning - Ensure GPU drivers (CUDA/ROCm) are the same version used during compilation
Note: Run commands in a staging environment first. Switch settings back after confirming the fix.
GPU acceleration issues
Fix runtime failures when enabling CUDA or ROCm acceleration.
Error: "GPU function not available"
NeurondB cannot locate compiled GPU kernels or drivers.
Confirm that GPU support was compiled and drivers are visible.
Diagnostic commands
# Confirm NeurondB was built with GPU support
strings $(pg_config --pkglibdir)/neurondb.so | grep USE_GPU
# Check driver visibility
nvidia-smi # CUDA
rocm-smi # ROCm
ls /dev/nvidia* # CUDA devices
ls /dev/kfd # ROCm deviceFallback to CPU while debugging
-- Allow CPU fallback if GPU init fails
ALTER SYSTEM SET neurondb.gpu_fail_open = on;
SELECT pg_reload_conf();GPU slower than CPU
Batch sizes or stream counts are too small to saturate the GPU.
Increase GPU parallelism
SET neurondb.gpu_batch_size = 5000;
SET neurondb.gpu_streams = 8;
SET neurondb.gpu_memory_pool_mb = 2048;Re-run workload and compare latency using \timing.
Error: GPU out of memory
Reduce GPU batch sizes and memory pool before reattempting.
Reduce footprint
SET neurondb.gpu_batch_size = 500;
SET neurondb.gpu_memory_pool_mb = 256;
-- Optional: quantize vectors to int8 to shrink memory
UPDATE documents SET embedding = vector_to_int8_gpu(embedding);ML clustering & analytics issues
Address convergence, accuracy, and data quality warnings from NeurondB ML pipelines.
"K-Means did not converge"
Increase iteration budget or relax tolerance for the dataset.
Retry K-Means with relaxed thresholds
SELECT *
FROM cluster_kmeans(
(SELECT embedding FROM documents),
5, -- k
500, -- max_iter
0.001 -- tol
);Clustering quality is poor
Normalize embeddings and reassess the optimal k value.
Normalize before clustering
WITH normalized AS (
SELECT id,
embedding / ||embedding|| AS norm_embedding
FROM documents
)
SELECT *
FROM cluster_kmeans(
(SELECT norm_embedding FROM normalized),
6,
150,
0.0005
);Outlier detection insufficient data
Collect more points or lower the Z-score threshold.
Adjust Z-score
SELECT *
FROM detect_outliers_zscore(
(SELECT embedding FROM documents),
2.5 -- threshold
);Index build & query diagnostics
Resolve index build failures, poor recall, and sequential scans.
Index not used / sequential scan
Verify plan
EXPLAIN (ANALYZE, BUFFERS)
SELECT *
FROM documents
ORDER BY embedding <-> '{0.1,0.2,...}'::vector
LIMIT 10;If planner chooses seq scan, ensure the index operator class matches the query, or temporarily SET enable_seqscan = off.
Index build failed: out of memory
Tune HNSW / IVF
SET maintenance_work_mem = '4GB';
-- HNSW with lower memory
CREATE INDEX docs_hnsw ON documents
USING hnsw (embedding vector_l2_ops)
WITH (m = 12, ef_construction = 32);
-- Alternative IVF index
CREATE INDEX docs_ivf ON documents
USING ivfflat (embedding vector_l2_ops)
WITH (lists = 100);Low recall / missing neighbours
Increase search width
SET hnsw.ef_search = 200;
-- IVF equivalent
SET ivfflat.probes = 20;Embedding & LLM integration issues
Troubleshoot embedding API failures, timeouts, and dimension mismatches.
LLM API unauthorized
Set API key
SET neurondb.llm_api_key = 'sk-...';
ALTER DATABASE mydb SET neurondb.llm_api_key = 'sk-...';Embedding API timeout
Extend timeout & retries
SET neurondb.llm_timeout_ms = 60000;
SET neurondb.llm_max_retries = 5;Dimension mismatch errors
Align vector dimensions
ALTER TABLE documents
ALTER COLUMN embedding
TYPE vector(3072);
-- Confirm new dimension
SELECT attname, atttypmod
FROM pg_attribute
WHERE attrelid = 'documents'::regclass
AND attname = 'embedding';Next Steps
- Configuration Reference - Verify each GUC parameter and recommended value after making changes.
- Performance Tuning - Benchmark NeurondB after applying fixes to confirm SLO improvements.
- Open GitHub Issue - Share logs and repro steps with the community for unresolved bugs.