NeuronDB: PostgreSQL AI Vector Database Extension
📦 View on GitHub | 📥 Download Latest Release | 📖 Documentation
Executive Summary
Modern AI applications require efficient vector similarity search, semantic retrieval, and machine learning inference capabilities directly in the database. NeuronDB provides a production-ready PostgreSQL extension that transforms your database into a complete AI platform with vector search, ML inference, GPU acceleration, and hybrid retrieval—all while maintaining full pgvector compatibility.
Introduction: The AI Database Challenge
Building AI applications with PostgreSQL traditionally requires multiple tools: pgvector for vectors, separate ML frameworks for embeddings, external services for GPU acceleration, and custom code for hybrid search. This fragmentation creates complexity, latency, and operational overhead.
NeuronDB unifies these capabilities into a single PostgreSQL extension—giving you semantic search, RAG (Retrieval Augmented Generation), recommendation systems, and ML inference directly in your database.
What Makes NeuronDB Different?
Vector Search Capabilities
NeuronDB provides enterprise-grade vector search with advanced indexing:
Indexing Algorithms
- HNSW (Hierarchical Navigable Small World) - Sub-10ms queries on 100M+ vectors
- IVFFlat - Memory-efficient approximate nearest neighbor search
- Flat - Exact nearest neighbor for small datasets
- DiskANN - Billion-scale vectors with SSD storage
Distance Metrics (10+ supported)
- L2 distance (Euclidean)
- Inner product (dot product)
- Cosine similarity
- Hamming distance
- Jaccard distance
- Manhattan (L1)
- Chebyshev
- Minkowski
- Canberra
- Braycurtis
Vector Optimization
- Scalar quantization (4x memory reduction)
- Product quantization (8-16x reduction)
- Binary quantization for Hamming distance
- GPU-accelerated search (10-100x faster)
ML Inference Engine
Built-in machine learning capabilities eliminate external API dependencies:
Embedding Generation
- 50+ pre-trained models (BERT, sentence-transformers, OpenAI-compatible)
- Automatic text-to-vector conversion
- Batch processing for high throughput
- Multi-modal embeddings (text, image, audio)
Model Formats
- ONNX runtime integration
- Hugging Face model support
- Custom model loading
- GPU inference acceleration
Inference Modes
- Real-time embedding generation
- Batch background processing
- Streaming inference
- Multi-model support
Hybrid Search
Combine vector similarity with traditional search for superior relevance:
Search Types
- Vector similarity search
- Full-text search (PostgreSQL FTS)
- BM25 ranking
- Multi-vector search
- Faceted filtering
Fusion Algorithms
- Reciprocal Rank Fusion (RRF)
- Weighted scoring
- Custom rank aggregation
- Score normalization
GPU Acceleration
Optional CUDA support for 10-100x performance improvements:
GPU Features
- CUDA kernel optimization
- Batch query processing
- Multi-GPU support
- Automatic CPU/GPU switching
Performance
- 100M vectors: <10ms search latency
- 1B vectors with DiskANN: <50ms
- 10,000+ QPS on single GPU
- Linear scaling with multiple GPUs
Supported Hardware
- NVIDIA RTX series (RTX 3090, 4090, A6000)
- Data center GPUs (A100, H100, V100)
- CUDA 11.0+ compatibility
Installation and Configuration
Prerequisites
- PostgreSQL 12, 13, 14, 15, 16, or 17
- Linux (Ubuntu 20.04+, Rocky 8+), macOS, or Windows (WSL2)
- Optional: NVIDIA GPU with CUDA 11.0+ for GPU acceleration
Quick Installation
Ubuntu/Debian
# Install dependenciessudo apt-get install -y postgresql-server-dev-all build-essential# Download and install NeuronDBwget https://github.com/pgElephant/NeurondB/releases/latest/download/neurondb-pg16-ubuntu.tar.gztar -xzf neurondb-pg16-ubuntu.tar.gzcd neurondbsudo make install# Enable extensionpsql -c "CREATE EXTENSION neurondb;"
macOS
# Install with Homebrewbrew install pgelephant/tap/neurondb# Enable extensionpsql -c "CREATE EXTENSION neurondb;"
Build from Source
git clone https://github.com/pgElephant/NeurondB.gitcd NeurondBmake PG_CONFIG=/path/to/pg_configsudo make install
GPU Support (Optional)
# Install CUDA toolkitwget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.debsudo dpkg -i cuda-keyring_1.0-1_all.debsudo apt-get updatesudo apt-get install -y cuda# Build NeuronDB with GPU supportmake USE_CUDA=1sudo make install
Real-World Use Cases
Semantic Search
Build Google-like semantic search over your documents:
-- Create table with embeddingsCREATE TABLE documents (id SERIAL PRIMARY KEY,content TEXT,embedding vector(768));-- Auto-generate embeddingsINSERT INTO documents (content, embedding) VALUES('PostgreSQL is a powerful relational database',neurondb.embed_text('all-MiniLM-L6-v2', 'PostgreSQL is a powerful relational database'));-- Create HNSW indexCREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);-- Semantic searchSELECT content,1 - (embedding <=> neurondb.embed_text('all-MiniLM-L6-v2', 'database system')) AS similarityFROM documentsORDER BY embedding <=> neurondb.embed_text('all-MiniLM-L6-v2', 'database system')LIMIT 10;
RAG (Retrieval Augmented Generation)
Power ChatGPT-like applications with your own data:
-- Store knowledge base with embeddingsCREATE TABLE knowledge_base (id SERIAL PRIMARY KEY,title TEXT,content TEXT,embedding vector(1536) -- OpenAI ada-002 dimensions);-- Create function for RAG retrievalCREATE FUNCTION get_context(query_text TEXT, top_k INT DEFAULT 5)RETURNS TABLE(content TEXT, score FLOAT) AS $$SELECT content,1 - (embedding <=> neurondb.embed_text('text-embedding-ada-002', query_text)) AS scoreFROM knowledge_baseORDER BY embedding <=> neurondb.embed_text('text-embedding-ada-002', query_text)LIMIT top_k;$$ LANGUAGE SQL;-- Retrieve context for LLMSELECT * FROM get_context('How does PostgreSQL handle transactions?');
Recommendation System
Build Netflix-style recommendations:
-- User preference vectorsCREATE TABLE user_preferences (user_id INT PRIMARY KEY,preference_vector vector(128));-- Item embeddingsCREATE TABLE items (item_id INT PRIMARY KEY,title TEXT,item_vector vector(128));-- Get personalized recommendationsSELECT i.title,1 - (i.item_vector <=> u.preference_vector) AS match_scoreFROM user_preferences uCROSS JOIN items iWHERE u.user_id = 12345ORDER BY i.item_vector <=> u.preference_vectorLIMIT 20;
Image Search
Find similar images by visual features:
-- Image embeddings from CLIPCREATE TABLE images (id SERIAL PRIMARY KEY,filename TEXT,image_embedding vector(512));-- Text-to-image searchSELECT filename,1 - (image_embedding <=> neurondb.embed_text('clip-vit-base', 'sunset over ocean')) AS similarityFROM imagesORDER BY image_embedding <=> neurondb.embed_text('clip-vit-base', 'sunset over ocean')LIMIT 50;
Performance and Benchmarks
Query Performance
100M Vector Dataset (768 dimensions)
- HNSW index: 5-8ms average latency
- IVFFlat index: 15-25ms average latency
- GPU HNSW: 0.5-2ms average latency
1 Billion Vector Dataset (DiskANN)
- SSD-backed index: 30-50ms average latency
- 95th percentile: <100ms
- Memory usage: <16GB
Throughput
Single PostgreSQL Instance
- CPU-only: 1,000-2,000 queries/second
- Single GPU: 10,000-15,000 queries/second
- Multi-GPU: 50,000+ queries/second
Accuracy
Recall@10 on Standard Benchmarks
- HNSW (ef_search=100): 98-99%
- IVFFlat (nprobe=20): 95-97%
- DiskANN: 96-98%
Configuration Options
Vector Index Tuning
-- HNSW parametersCREATE INDEX ON vectors USING hnsw (embedding vector_cosine_ops)WITH (m = 16, ef_construction = 64);-- Query-time tuningSET neurondb.hnsw_ef_search = 100; -- Higher = better recall, slower-- IVFFlat parametersCREATE INDEX ON vectors USING ivfflat (embedding vector_l2_ops)WITH (lists = 1000);SET neurondb.ivfflat_probes = 20; -- Higher = better recall
GPU Configuration
-- Enable GPU accelerationSET neurondb.use_gpu = on;-- GPU device selectionSET neurondb.gpu_device_id = 0; -- Use first GPU-- Batch size for GPU queriesSET neurondb.gpu_batch_size = 1000;
Embedding Models
-- List available modelsSELECT * FROM neurondb.list_models();-- Load custom modelSELECT neurondb.load_model('custom-bert', '/path/to/model.onnx');-- Set default embedding modelSET neurondb.default_model = 'all-MiniLM-L6-v2';
Integration Examples
Python with psycopg2
import psycopg2import numpy as npconn = psycopg2.connect("dbname=mydb")cur = conn.cursor()# Create tablecur.execute("""CREATE TABLE IF NOT EXISTS embeddings (id SERIAL PRIMARY KEY,text TEXT,vector vector(768))""")# Insert with auto-embeddingcur.execute("""INSERT INTO embeddings (text, vector)VALUES (%s, neurondb.embed_text('all-MiniLM-L6-v2', %s))""", ("Hello world", "Hello world"))# Semantic searchquery = "greeting message"cur.execute("""SELECT text,1 - (vector <=> neurondb.embed_text('all-MiniLM-L6-v2', %s)) AS similarityFROM embeddingsORDER BY vector <=> neurondb.embed_text('all-MiniLM-L6-v2', %s)LIMIT 5""", (query, query))results = cur.fetchall()for text, similarity in results:print(f"{text}: {similarity:.4f}")
Node.js with pg
const { Pool } = require('pg');const pool = new Pool({connectionString: 'postgresql://localhost/mydb'});async function semanticSearch(query) {const result = await pool.query(`SELECT content,1 - (embedding <=> neurondb.embed_text('all-MiniLM-L6-v2', $1)) AS scoreFROM documentsORDER BY embedding <=> neurondb.embed_text('all-MiniLM-L6-v2', $1)LIMIT 10`, [query]);return result.rows;}semanticSearch('database performance').then(results => {results.forEach(row => {console.log(`${row.content}: ${row.score}`);});});
LangChain Integration
from langchain.vectorstores import NeuronDBfrom langchain.embeddings import HuggingFaceEmbeddings# Initialize embeddingsembeddings = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')# Create vector storevectorstore = NeuronDB(connection_string="postgresql://localhost/mydb",table_name="documents",embeddings=embeddings)# Add documentstexts = ["PostgreSQL is powerful", "Vector search is fast"]vectorstore.add_texts(texts)# Similarity searchresults = vectorstore.similarity_search("database system", k=5)for doc in results:print(doc.page_content)
Monitoring and Observability
Performance Views
-- Index statisticsSELECT * FROM neurondb.index_stats;-- Query performanceSELECT * FROM neurondb.query_statsORDER BY avg_latency DESC;-- GPU utilizationSELECT * FROM neurondb.gpu_stats;-- Embedding cache hitsSELECT * FROM neurondb.cache_stats;
Maintenance
-- Rebuild HNSW indexREINDEX INDEX CONCURRENTLY vectors_hnsw_idx;-- Vacuum embedding cacheSELECT neurondb.vacuum_cache();-- Update index statisticsANALYZE embeddings;
Migration from pgvector
NeuronDB is designed as a drop-in replacement for pgvector:
-- Works with existing pgvector tablesCREATE TABLE vectors (id SERIAL PRIMARY KEY,embedding vector(1536));-- Use NeuronDB indexes for better performanceCREATE INDEX ON vectors USING hnsw (embedding vector_cosine_ops);-- All pgvector operators workSELECT * FROM vectors ORDER BY embedding <=> '[1,2,3...]' LIMIT 10;
Migration Benefits
- 10-100x faster queries with HNSW
- GPU acceleration option
- Built-in embedding generation
- Hybrid search capabilities
- No query changes required
Roadmap
Upcoming Features
- ✅ HNSW and IVFFlat indexing
- ✅ GPU acceleration (CUDA)
- ✅ 50+ embedding models
- ✅ Hybrid search
- 🚧 Quantization improvements
- 🚧 Distributed indexing
- 📋 Multi-modal search (image + text)
- 📋 Sparse vector support
- 📋 Graph-based retrieval
Community and Support
Get Involved
Commercial Support For production deployments, enterprise support, and custom features, contact support@pgelephant.com
Conclusion
NeuronDB transforms PostgreSQL into a complete AI platform, eliminating the need for separate vector databases, ML services, and complex integrations. With production-ready performance, GPU acceleration, and comprehensive AI capabilities, NeuronDB enables you to build semantic search, RAG applications, and recommendation systems entirely within PostgreSQL.
Get Started Today
About pgElephant
pgElephant builds production-ready PostgreSQL extensions for modern data workloads. Our mission is to extend PostgreSQL's capabilities while maintaining its reliability, simplicity, and open-source philosophy.
Other projects: pg_stat_insights | pgBalancer | pgRaft