DocumentationNeurondB Documentation
Document Processing
Overview
Text processing and NLP capabilities.
Text Processing
Process and clean text:
Clean and normalize text
-- Clean and normalize text
SELECT process_text(
'Raw text with multiple spaces',
'{"lowercase": true, "remove_extra_spaces": true}'::jsonb
) AS processed_text;Chunking
Split documents into chunks:
Chunk text
-- Chunk text
SELECT chunk_text(
'long document text...',
500, -- chunk size
50 -- overlap
) AS chunks;Tokenization
Tokenize text
-- Tokenize text
SELECT tokenize_text('Hello world', 'whitespace') AS tokens;Learn More
For detailed documentation on document processing, chunking strategies, tokenization, and NLP features, visit: Document Processing Documentation
Related Topics
- RAG Overview - RAG pipeline
- Embedding Generation - Generate embeddings