ML Analytics
ML Analytics Suite
Comprehensive machine learning algorithms for clustering, dimensionality reduction, outlier detection, and embedding quality assessment—all in SQL.
Clustering Algorithms
K-Means Clustering
Lloyd's K-Means with k-means++ initialization for finding customer segments, topic clusters, and data grouping.
-- CPU K-Means
SELECT cluster_kmeans(
'customer_data', -- table
'features', -- vector column
5, -- number of clusters
100 -- max iterations
);
-- GPU K-Means (23x faster)
SELECT cluster_kmeans_gpu(
'customer_data', 'features', 5, 100
);
-- Get cluster assignments
SELECT id, cluster_id, centroid_distance
FROM neurondb_cluster_assignments('customer_data', 'features', 5)
ORDER BY cluster_id, centroid_distance
LIMIT 100;O(n·k·i·d)
Time Complexity
23x GPU
Speedup on GPU
k-means++
Initialization
DBSCAN (Density-Based)
Density-based clustering that automatically discovers the number of clusters and identifies outliers.
-- DBSCAN clustering (auto-discovers cluster count)
SELECT cluster_dbscan(
'customer_data',
'features',
0.5, -- epsilon (neighborhood radius)
5 -- min_points (minimum cluster size)
);
-- Get clusters and outliers
SELECT cluster_id, COUNT(*) as size
FROM neurondb_dbscan_assignments('customer_data', 'features', 0.5, 5)
GROUP BY cluster_id
ORDER BY cluster_id;
-- cluster_id = -1 means outlierDimensionality Reduction
PCA (Principal Component Analysis)
Reduce high-dimensional vectors to lower dimensions while preserving variance.
-- Reduce dimensions: 768 → 128
SELECT reduce_dimensionality_pca(
'embeddings_table',
'vector_column',
128 -- target dimensions
);
-- Returns: {"components": 128,
-- "explained_variance": [0.45, 0.23, 0.12, ...],
-- "total_variance_explained": 0.80}
-- 80% of information retained with 83% size reductionOutlier Detection
Isolation Forest
Detect anomalies and unusual patterns in your vector data using Isolation Forest algorithm.
-- Detect outliers with 95% confidence
SELECT detect_outliers(
'customer_data',
'features',
0.95 -- confidence level
) AS outlier_count;
-- Get outlier details
SELECT id, anomaly_score
FROM neurondb_outlier_scores('customer_data', 'features', 0.95)
WHERE is_outlier = true
ORDER BY anomaly_score DESC;