RAG Pipeline

RAG Pipeline

Complete Retrieval Augmented Generation pipeline in PostgreSQL. Build production-ready RAG systems entirely in-database with document processing, semantic retrieval, reranking, and LLM generation.

What is RAG?

Retrieval Augmented Generation (RAG) enhances LLM responses by retrieving relevant context from your database before generating answers. This grounds LLM outputs in your actual data, reducing hallucinations and improving accuracy.

RAG Workflow

1
User Question: "What is PostgreSQL replication?"
2
Retrieve: Find relevant documents using hybrid search
3
Rerank: Score and sort results by relevance
4
Generate: LLM creates answer using retrieved context
5
Return: Grounded, accurate answer with sources

Complete RAG Example

Step 1: Setup Knowledge Base

-- Create knowledge base table CREATE TABLE knowledge_docs ( id SERIAL PRIMARY KEY, title TEXT, content TEXT, embedding vector(384), ts_vector tsvector, metadata JSONB, created_at TIMESTAMPTZ DEFAULT now() ); -- Create indexes CREATE INDEX ON knowledge_docs USING hnsw (embedding vector_l2_ops); CREATE INDEX ON knowledge_docs USING gin (ts_vector); CREATE INDEX ON knowledge_docs USING gin (metadata);

Step 2: Ingest Documents

-- Insert documents with automatic embedding generation INSERT INTO knowledge_docs (title, content, embedding, ts_vector, metadata) VALUES ( 'PostgreSQL Replication Guide', 'PostgreSQL supports streaming replication for high availability...', embed_text('PostgreSQL supports streaming replication for high availability...'), to_tsvector('PostgreSQL supports streaming replication for high availability...'), '{"category": "database", "tags": ["replication", "ha"]}'::jsonb ); -- Batch insert with embeddings INSERT INTO knowledge_docs (title, content, embedding, ts_vector) SELECT title, content, embed_cached(content), -- Uses cache for duplicate content to_tsvector(content) FROM imported_documents;

Step 3: Retrieve Relevant Documents

-- Hybrid search: 70% vector + 30% text WITH candidates AS ( SELECT * FROM hybrid_search( 'knowledge_docs', 'content', 'embedding', 'What is PostgreSQL replication?', 20, -- fetch 20 candidates 0.7, -- 70% vector weight 0.3 -- 30% text weight ) ) -- Rerank top candidates for precision SELECT * FROM rerank_cross_encoder( 'What is PostgreSQL replication?', (SELECT array_agg(content) FROM candidates), 'ms-marco-MiniLM-L-6-v2', 5 -- return top 5 );

Step 4: Generate Answer

-- Complete RAG query SELECT neurondb_rag_answer( 'What is PostgreSQL replication?', -- question 'knowledge_docs', -- table 'content', -- content column 'embedding', -- vector column '{ -- options "model": "gpt-4", "temperature": 0.7, "max_tokens": 500, "retrieve_count": 5, "rerank": true }'::jsonb ); -- Returns: -- { -- "answer": "PostgreSQL replication is...", -- "sources": [{"id": 1, "title": "...", "score": 0.95}], -- "tokens_used": 342, -- "latency_ms": 1234 -- }

Advanced RAG Features

Multi-Query RAG

Generate multiple query variations to improve recall.

SELECT neurondb_multiquery_rag( 'complex question', 'docs', 'content', 'embedding', 3 -- generate 3 variations );

Contextual Compression

Extract only relevant sections from retrieved documents.

SELECT compress_context( 'question', documents_array, model := 'gpt-4' );

Guardrails

Content safety checks and policy enforcement.

SELECT check_guardrails( answer_text, '{ "toxicity": 0.1, "pii_detection": true }'::jsonb );

Cost Tracking

Monitor LLM token usage and costs per query.

SELECT * FROM neurondb_llm_costs ORDER BY timestamp DESC LIMIT 10;

Related Documentation