Performance
Performance
Sub-millisecond vector searches, 100x GPU acceleration, SIMD-optimized operations, and intelligent caching for production-scale AI workloads.
Performance Benchmarks
Test Environment: AWS r6i.2xlarge (8 vCPU, 64GB RAM), 10M vectors, 768 dimensions
| Operation | Throughput | Latency (p95) | Notes |
|---|---|---|---|
| Vector Insert | 50K/sec | 2ms | Bulk COPY |
| HNSW Search (k=10) | 10K QPS | 5ms | ef_search=40 |
| Embedding Generation | 1K/sec | 10ms | Batch size 32 |
| Hybrid Search | 5K QPS | 8ms | Vector+FTS |
| Reranking | 2K/sec | 15ms | Cross-encoder |
| GPU K-Means | 55K vectors/sec | 18ms | 10 clusters |
Optimization Techniques
1. SIMD Acceleration
Automatic SIMD (Single Instruction Multiple Data) optimization for distance calculations using AVX2, AVX-512 (x86) or NEON (ARM).
4-8x
AVX2 Speedup
8-16x
AVX-512 Speedup
Auto
Detection
2. Intelligent Caching
- Embedding Cache: 95%+ hit rate, 50x faster than generation
- Model Cache: Models loaded in shared memory, 99.8% hit rate
- ANN Buffer: Hot centroids and entry points cached
- Index Page Cache: 92%+ hit rate for frequently accessed vectors
3. Query Planning
Intelligent cost-based query planning chooses optimal execution paths:
- • Small result sets → Sequential scan
- • Medium result sets → IVF index
- • Large result sets → HNSW index
- • GPU available + large batch → GPU acceleration
- • Hybrid query → Parallel vector + FTS execution
Best Practices
1. Index Selection
| Dataset Size | Recommended Index | Parameters |
|---|---|---|
| < 100K vectors | HNSW | m=16, ef=200 |
| 100K - 10M vectors | HNSW or IVF | m=32, ef=400 or nlist=sqrt(n) |
| > 10M vectors | IVF + PQ | nlist=4000, PQ compression |
2. Use Batch Operations
-- Good: Batch embedding generation (5x faster)
UPDATE docs SET embedding = batch.emb
FROM (
SELECT id, unnest(embed_text_batch(array_agg(content))) AS emb
FROM docs GROUP BY id % 100
) batch WHERE docs.id = batch.id;
-- Bad: Individual calls
UPDATE docs SET embedding = embed_text(content); -- Slow!3. Monitor Cache Hit Rates
SELECT * FROM neurondb_cache_stats();
-- Target hit rates:
-- Embeddings: > 50%
-- Models: > 95%
-- Index pages: > 90%