RAG Pipeline
RAG Pipeline
Complete Retrieval Augmented Generation pipeline in PostgreSQL. Build production-ready RAG systems entirely in-database with document processing, semantic retrieval, reranking, and LLM generation.
What is RAG?
Retrieval Augmented Generation (RAG) enhances LLM responses by retrieving relevant context from your database before generating answers. This grounds LLM outputs in your actual data, reducing hallucinations and improving accuracy.
RAG Workflow
1
User Question: "What is PostgreSQL replication?"
2
Retrieve: Find relevant documents using hybrid search
3
Rerank: Score and sort results by relevance
4
Generate: LLM creates answer using retrieved context
5
Return: Grounded, accurate answer with sources
Complete RAG Example
Step 1: Setup Knowledge Base
-- Create knowledge base table
CREATE TABLE knowledge_docs (
id SERIAL PRIMARY KEY,
title TEXT,
content TEXT,
embedding vector(384),
ts_vector tsvector,
metadata JSONB,
created_at TIMESTAMPTZ DEFAULT now()
);
-- Create indexes
CREATE INDEX ON knowledge_docs USING hnsw (embedding vector_l2_ops);
CREATE INDEX ON knowledge_docs USING gin (ts_vector);
CREATE INDEX ON knowledge_docs USING gin (metadata);Step 2: Ingest Documents
-- Insert documents with automatic embedding generation
INSERT INTO knowledge_docs (title, content, embedding, ts_vector, metadata)
VALUES (
'PostgreSQL Replication Guide',
'PostgreSQL supports streaming replication for high availability...',
embed_text('PostgreSQL supports streaming replication for high availability...'),
to_tsvector('PostgreSQL supports streaming replication for high availability...'),
'{"category": "database", "tags": ["replication", "ha"]}'::jsonb
);
-- Batch insert with embeddings
INSERT INTO knowledge_docs (title, content, embedding, ts_vector)
SELECT
title,
content,
embed_cached(content), -- Uses cache for duplicate content
to_tsvector(content)
FROM imported_documents;Step 3: Retrieve Relevant Documents
-- Hybrid search: 70% vector + 30% text
WITH candidates AS (
SELECT * FROM hybrid_search(
'knowledge_docs',
'content',
'embedding',
'What is PostgreSQL replication?',
20, -- fetch 20 candidates
0.7, -- 70% vector weight
0.3 -- 30% text weight
)
)
-- Rerank top candidates for precision
SELECT * FROM rerank_cross_encoder(
'What is PostgreSQL replication?',
(SELECT array_agg(content) FROM candidates),
'ms-marco-MiniLM-L-6-v2',
5 -- return top 5
);Step 4: Generate Answer
-- Complete RAG query
SELECT neurondb_rag_answer(
'What is PostgreSQL replication?', -- question
'knowledge_docs', -- table
'content', -- content column
'embedding', -- vector column
'{ -- options
"model": "gpt-4",
"temperature": 0.7,
"max_tokens": 500,
"retrieve_count": 5,
"rerank": true
}'::jsonb
);
-- Returns:
-- {
-- "answer": "PostgreSQL replication is...",
-- "sources": [{"id": 1, "title": "...", "score": 0.95}],
-- "tokens_used": 342,
-- "latency_ms": 1234
-- }Advanced RAG Features
Multi-Query RAG
Generate multiple query variations to improve recall.
SELECT neurondb_multiquery_rag(
'complex question',
'docs', 'content', 'embedding',
3 -- generate 3 variations
);Contextual Compression
Extract only relevant sections from retrieved documents.
SELECT compress_context(
'question',
documents_array,
model := 'gpt-4'
);Guardrails
Content safety checks and policy enforcement.
SELECT check_guardrails(
answer_text,
'{
"toxicity": 0.1,
"pii_detection": true
}'::jsonb
);Cost Tracking
Monitor LLM token usage and costs per query.
SELECT *
FROM neurondb_llm_costs
ORDER BY timestamp DESC
LIMIT 10;