DocumentationNeurondB Documentation

RAG Pipeline

Text Chunking

Split long documents into smaller chunks with overlap to maintain context between chunks.

Overlap-aware chunking

WITH long_document AS (
    SELECT 'This is a very long document that needs to be chunked...' as doc
)
SELECT 
    unnest(neurondb.chunk(
        doc,       -- Text to chunk
        100,       -- Chunk size (characters)
        20         -- Overlap (characters)
    )) as chunk
FROM long_document;

Text Embeddings

Generate embeddings from text using various models.

Generate embeddings

WITH text_samples AS (
    SELECT 'Machine learning in databases is powerful' as text, 1 as id
    UNION ALL
    SELECT 'PostgreSQL extensions enable ML capabilities' as text, 2 as id
)
SELECT 
    id,
    text,
    neurondb.embed(
        'all-MiniLM-L6-v2',  -- Model name
        text,                 -- Text to embed
        true                  -- Use GPU acceleration
    ) as embedding
FROM text_samples;

Ranking

Rank documents by relevance using various scoring methods.

Rank documents

SELECT 
    id,
    content,
    neurondb.rank(
        query_embedding,
        document_embedding,
        'cosine'  -- Distance metric
    ) as relevance_score
FROM documents
ORDER BY relevance_score DESC
LIMIT 10;

Next Steps