Vector Types in NeurondB
Understanding vectors, their purpose, and choosing the right type for your use case
What Are Vectors?
A vector is a mathematical object represented as an array of numbers. In AI and machine learning, vectors represent data (text, images, audio) in numerical format that computers can process and compare.
Example Vector:
[0.234, -0.891, 0.456, 0.123, -0.678]
This is a 5-dimensional vector where each number represents a feature
Traditional Database
Stores structured data: numbers, text, dates. Searches using exact matches or patterns.
Vector Database
Stores numerical representations of data. Searches by semantic similarity and meaning.
Why Use Vectors?
Semantic Search
Find similar items based on meaning, not just keywords. Search for "laptop" and get results for "notebook computer", "portable PC".
Recommendations
Build recommendation systems that suggest related products, content, or services based on similarity.
Anomaly Detection
Identify unusual patterns by finding data points that are distant from normal clusters.
How Vector Similarity Works:
Query: "laptop computers" ↓ Convert to vector: [0.8, 0.2, 0.1, ...] ↓ Find similar vectors in database: • "notebook PCs" distance: 0.15 ✅ Very similar • "tablets" distance: 0.45 ✅ Somewhat similar • "bicycles" distance: 2.30 ❌ Not similar
NeurondB Vector Types
vector (Standard Precision)
The primary vector type using 32-bit floating-point numbers (float32).
-- Create table with vector column
CREATE TABLE embeddings (
id SERIAL PRIMARY KEY,
data vector(384) -- 384-dimensional vector
);
-- Insert vector
INSERT INTO embeddings (data)
VALUES ('[0.1, 0.2, 0.3, ...]'::vector);Specifications:
- • Precision: 32-bit float (7 decimal digits)
- • Storage: 4 bytes × dimensions
- • Range: ±1.175e-38 to ±3.402e+38
Best For:
- • General-purpose embeddings
- • Research and development
- • High-accuracy requirements
Storage Example: 1 million 768-dimensional vectors = 1M × 768 × 4 bytes = 3GB
float16 (Half Precision)
Compressed format using 16-bit floating-point for 2x storage savings.
Specifications:
- • Precision: 16-bit float
- • Storage: 2 bytes × dimensions (50% savings)
- • Accuracy: 99%+ recall maintained
Best For:
- • Production deployments
- • Large-scale applications
- • Storage-constrained systems
int8 (Quantized)
Highly compressed format using 8-bit integers for 4x storage savings.
Specifications:
- • Precision: 8-bit integer (-128 to 127)
- • Storage: 1 byte × dimensions (75% savings)
- • Accuracy: 95-98% recall
Best For:
- • Very large datasets (100M+ vectors)
- • Cost-optimized deployments
- • Acceptable accuracy tradeoffs
binary (Maximum Compression)
Extreme compression using 1-bit binary representation for 32x storage savings.
Specifications:
- • Precision: 1-bit (0 or 1)
- • Storage: 0.125 bytes × dimensions (96.875% savings)
- • Speed: Fastest similarity calculations
Best For:
- • Massive-scale deployments (1B+ vectors)
- • Real-time filtering/ranking
- • Memory-constrained environments
Choosing the Right Type
| Type | Storage (768-dim) | Accuracy | Speed | Use Case |
|---|---|---|---|---|
| vector (float32) | 3.0 KB | 100% | Fast | Development, research, high accuracy |
| float16 | 1.5 KB | 99%+ | Faster | Production, balanced performance |
| int8 | 768 bytes | 95-98% | Very Fast | Large scale, cost-optimized |
| binary | 96 bytes | 85-90% | Fastest | Massive scale, filtering |
💡 Recommendation: Start with vector (float32) for development. Switch to float16 or int8 in production when you understand your accuracy requirements.
Code Examples
Creating Vector Columns
-- Standard precision CREATE TABLE docs (id SERIAL, embedding vector(384)); -- Half precision (2x storage savings) CREATE TABLE docs_half (id SERIAL, embedding float16(384)); -- Quantized (4x storage savings) CREATE TABLE docs_int8 (id SERIAL, embedding int8(384)); -- Binary (32x storage savings) CREATE TABLE docs_binary (id SERIAL, embedding binary(384));
Similarity Search
-- L2 distance (Euclidean) SELECT id, embedding <-> '[0.1, 0.2, ...]'::vector AS distance FROM docs ORDER BY distance LIMIT 10; -- Cosine similarity SELECT id, embedding <=> '[0.1, 0.2, ...]'::vector AS similarity FROM docs ORDER BY similarity DESC LIMIT 10; -- Inner product SELECT id, embedding <#> '[0.1, 0.2, ...]'::vector AS score FROM docs ORDER BY score DESC LIMIT 10;