DocumentationNeurondB Documentation

Document Processing

Overview

Text processing and NLP capabilities.

Text Processing

Process and clean text:

Clean and normalize text

-- Clean and normalize text
SELECT process_text(
    'Raw text with   multiple   spaces',
    '{"lowercase": true, "remove_extra_spaces": true}'::jsonb
) AS processed_text;

Chunking

Split documents into chunks:

Chunk text

-- Chunk text
SELECT chunk_text(
    'long document text...',
    500,  -- chunk size
    50    -- overlap
) AS chunks;

Tokenization

Tokenize text

-- Tokenize text
SELECT tokenize_text('Hello world', 'whitespace') AS tokens;

Learn More

For detailed documentation on document processing, chunking strategies, tokenization, and NLP features, visit: Document Processing Documentation

Related Topics