RAG Pipeline

Complete Retrieval Augmented Generation pipeline in PostgreSQL. Build production-ready RAG systems entirely in-database with document processing, semantic retrieval, reranking, and LLM generation.

What is RAG?

Retrieval Augmented Generation (RAG) enhances LLM responses by retrieving relevant context from your database before generating answers. This grounds LLM outputs in your actual data, reducing hallucinations and improving accuracy.

RAG Workflow

User Question: "What is PostgreSQL replication?"

Retrieve: Find relevant documents using hybrid search

Rerank: Score and sort results by relevance

Generate: LLM creates answer using retrieved context

Return: Grounded, accurate answer with sources

Complete RAG Example

Step 1: Setup Knowledge Base

-- Create knowledge base table
CREATE TABLE knowledge_docs (
  id SERIAL PRIMARY KEY,
  title TEXT,
  content TEXT,
  embedding vector(384),
  ts_vector tsvector,
  metadata JSONB,
  created_at TIMESTAMPTZ DEFAULT now()
);

-- Create indexes
CREATE INDEX ON knowledge_docs USING hnsw (embedding vector_l2_ops);
CREATE INDEX ON knowledge_docs USING gin (ts_vector);
CREATE INDEX ON knowledge_docs USING gin (metadata);

Step 2: Ingest Documents

-- Insert documents with automatic embedding generation
INSERT INTO knowledge_docs (title, content, embedding, ts_vector, metadata)
VALUES (
  'PostgreSQL Replication Guide',
  'PostgreSQL supports streaming replication for high availability...',
  embed_text('PostgreSQL supports streaming replication for high availability...'),
  to_tsvector('PostgreSQL supports streaming replication for high availability...'),
  '{"category": "database", "tags": ["replication", "ha"]}'::jsonb
);

-- Batch insert with embeddings
INSERT INTO knowledge_docs (title, content, embedding, ts_vector)
SELECT 
  title,
  content,
  embed_cached(content),  -- Uses cache for duplicate content
  to_tsvector(content)
FROM imported_documents;

Step 3: Retrieve Relevant Documents

-- Hybrid search: 70% vector + 30% text
WITH candidates AS (
  SELECT * FROM hybrid_search(
    'knowledge_docs',
    'content',
    'embedding',
    'What is PostgreSQL replication?',
    20,    -- fetch 20 candidates
    0.7,   -- 70% vector weight
    0.3    -- 30% text weight
  )
)
-- Rerank top candidates for precision
SELECT * FROM rerank_cross_encoder(
  'What is PostgreSQL replication?',
  (SELECT array_agg(content) FROM candidates),
  'ms-marco-MiniLM-L-6-v2',
  5  -- return top 5
);

Step 4: Generate Answer

-- Complete RAG query
SELECT neurondb_rag_answer(
  'What is PostgreSQL replication?',    -- question
  'knowledge_docs',                      -- table
  'content',                             -- content column
  'embedding',                           -- vector column
  '{                                     -- options
    "model": "gpt-4",
    "temperature": 0.7,
    "max_tokens": 500,
    "retrieve_count": 5,
    "rerank": true
  }'::jsonb
);

-- Returns:
-- {
--   "answer": "PostgreSQL replication is...",
--   "sources": [{"id": 1, "title": "...", "score": 0.95}],
--   "tokens_used": 342,
--   "latency_ms": 1234
-- }

Advanced RAG Features

Multi-Query RAG

Generate multiple query variations to improve recall.

SELECT neurondb_multiquery_rag(
  'complex question',
  'docs', 'content', 'embedding',
  3  -- generate 3 variations
);

Contextual Compression

Extract only relevant sections from retrieved documents.

SELECT compress_context(
  'question',
  documents_array,
  model := 'gpt-4'
);

Guardrails

Content safety checks and policy enforcement.

SELECT check_guardrails(
  answer_text,
  '{
    "toxicity": 0.1,
    "pii_detection": true
  }'::jsonb
);

Cost Tracking

Monitor LLM token usage and costs per query.

SELECT * 
FROM neurondb_llm_costs
ORDER BY timestamp DESC
LIMIT 10;