Core Features

Vector Types in NeurondB

Understanding vectors, their purpose, and choosing the right type for your use case

What Are Vectors?

A vector is a mathematical object represented as an array of numbers. In AI and machine learning, vectors represent data (text, images, audio) in numerical format that computers can process and compare.

Example Vector:

[0.234, -0.891, 0.456, 0.123, -0.678]

This is a 5-dimensional vector where each number represents a feature

Traditional Database

Stores structured data: numbers, text, dates. Searches using exact matches or patterns.

Vector Database

Stores numerical representations of data. Searches by semantic similarity and meaning.

Why Use Vectors?

Semantic Search

Find similar items based on meaning, not just keywords. Search for "laptop" and get results for "notebook computer", "portable PC".

Recommendations

Build recommendation systems that suggest related products, content, or services based on similarity.

Anomaly Detection

Identify unusual patterns by finding data points that are distant from normal clusters.

How Vector Similarity Works:

Query: "laptop computers"
   ↓
Convert to vector: [0.8, 0.2, 0.1, ...]
   ↓
Find similar vectors in database:
   • "notebook PCs"     distance: 0.15 ✅ Very similar
   • "tablets"          distance: 0.45 ✅ Somewhat similar  
   • "bicycles"         distance: 2.30 ❌ Not similar

NeurondB Vector Types

vector (Standard Precision)

The primary vector type using 32-bit floating-point numbers (float32).

-- Create table with vector column
CREATE TABLE embeddings (
    id SERIAL PRIMARY KEY,
    data vector(384)  -- 384-dimensional vector
);

-- Insert vector
INSERT INTO embeddings (data) 
VALUES ('[0.1, 0.2, 0.3, ...]'::vector);

Specifications:

  • • Precision: 32-bit float (7 decimal digits)
  • • Storage: 4 bytes × dimensions
  • • Range: ±1.175e-38 to ±3.402e+38

Best For:

  • • General-purpose embeddings
  • • Research and development
  • • High-accuracy requirements

Storage Example: 1 million 768-dimensional vectors = 1M × 768 × 4 bytes = 3GB

float16 (Half Precision)

Compressed format using 16-bit floating-point for 2x storage savings.

Specifications:

  • • Precision: 16-bit float
  • • Storage: 2 bytes × dimensions (50% savings)
  • • Accuracy: 99%+ recall maintained

Best For:

  • • Production deployments
  • • Large-scale applications
  • • Storage-constrained systems

int8 (Quantized)

Highly compressed format using 8-bit integers for 4x storage savings.

Specifications:

  • • Precision: 8-bit integer (-128 to 127)
  • • Storage: 1 byte × dimensions (75% savings)
  • • Accuracy: 95-98% recall

Best For:

  • • Very large datasets (100M+ vectors)
  • • Cost-optimized deployments
  • • Acceptable accuracy tradeoffs

binary (Maximum Compression)

Extreme compression using 1-bit binary representation for 32x storage savings.

Specifications:

  • • Precision: 1-bit (0 or 1)
  • • Storage: 0.125 bytes × dimensions (96.875% savings)
  • • Speed: Fastest similarity calculations

Best For:

  • • Massive-scale deployments (1B+ vectors)
  • • Real-time filtering/ranking
  • • Memory-constrained environments

Choosing the Right Type

TypeStorage (768-dim)AccuracySpeedUse Case
vector (float32)3.0 KB100%FastDevelopment, research, high accuracy
float161.5 KB99%+FasterProduction, balanced performance
int8768 bytes95-98%Very FastLarge scale, cost-optimized
binary96 bytes85-90%FastestMassive scale, filtering

💡 Recommendation: Start with vector (float32) for development. Switch to float16 or int8 in production when you understand your accuracy requirements.

Code Examples

Creating Vector Columns

-- Standard precision
CREATE TABLE docs (id SERIAL, embedding vector(384));

-- Half precision (2x storage savings)
CREATE TABLE docs_half (id SERIAL, embedding float16(384));

-- Quantized (4x storage savings)
CREATE TABLE docs_int8 (id SERIAL, embedding int8(384));

-- Binary (32x storage savings)
CREATE TABLE docs_binary (id SERIAL, embedding binary(384));

Similarity Search

-- L2 distance (Euclidean)
SELECT id, embedding <-> '[0.1, 0.2, ...]'::vector AS distance
FROM docs ORDER BY distance LIMIT 10;

-- Cosine similarity
SELECT id, embedding <=> '[0.1, 0.2, ...]'::vector AS similarity
FROM docs ORDER BY similarity DESC LIMIT 10;

-- Inner product
SELECT id, embedding <#> '[0.1, 0.2, ...]'::vector AS score
FROM docs ORDER BY score DESC LIMIT 10;