Gradient Boosting
Overview
NeuronDB supports XGBoost, LightGBM, and CatBoost for gradient boosting.
XGBoost
XGBoost (Extreme Gradient Boosting) is a powerful gradient boosting framework. NeuronDB uses the unified ML API for XGBoost training and prediction.
Train XGBoost Model
Train XGBoost classifier
-- Train XGBoost classifier using unified API
CREATE TEMP TABLE xgb_model AS
SELECT neurondb.train(
'default', -- project name
'xgboost', -- algorithm
'training_table', -- source table
'label', -- target column (integer for classification)
ARRAY['features'], -- feature columns
'{"max_depth": 6, "n_estimators": 100}'::jsonb -- hyperparameters
)::integer AS model_id;
-- For regression, use numeric target column
CREATE TEMP TABLE xgb_reg_model AS
SELECT neurondb.train(
'default',
'xgboost',
'training_table',
'target', -- numeric target column
ARRAY['features'],
'{}'::jsonb -- use default hyperparameters
)::integer AS model_id;Function Signature:
neurondb.train( project TEXT, algorithm TEXT, -- 'xgboost' table_name TEXT, target_column TEXT, feature_columns TEXT[], hyperparameters JSONB ) RETURNS INTEGER -- Returns model_idCommon XGBoost Hyperparameters:
max_depth(integer): Maximum tree depth. Default: 6. Range: 1-20.n_estimators(integer): Number of boosting rounds. Default: 100.learning_rate(float): Step size shrinkage. Default: 0.1. Range: 0.01-1.0.subsample(float): Fraction of samples per tree. Default: 1.0. Range: 0.1-1.0.colsample_bytree(float): Fraction of features per tree. Default: 1.0.
LightGBM
LightGBM is a fast, distributed gradient boosting framework optimized for efficiency and accuracy.
Train LightGBM model
-- Train LightGBM classifier
CREATE TEMP TABLE lgbm_model AS
SELECT neurondb.train(
'default',
'lightgbm', -- algorithm
'training_table',
'label',
ARRAY['features'],
'{"num_leaves": 31, "learning_rate": 0.1}'::jsonb
)::integer AS model_id;Common LightGBM Hyperparameters:
num_leaves(integer): Maximum tree leaves. Default: 31.learning_rate(float): Boosting learning rate. Default: 0.1.feature_fraction(float): Random feature subset ratio. Default: 1.0.bagging_fraction(float): Random data subset ratio. Default: 1.0.
CatBoost
CatBoost handles categorical features automatically and is robust to overfitting.
Train CatBoost model
-- Train CatBoost classifier
CREATE TEMP TABLE catboost_model AS
SELECT neurondb.train(
'default',
'catboost', -- algorithm
'training_table',
'label',
ARRAY['features'],
'{}'::jsonb -- use defaults
)::integer AS model_id;Prediction
Use the unified neurondb.predict function for all gradient boosting models:
Make predictions
-- Predict with trained XGBoost model
SELECT
id,
neurondb.predict(
(SELECT model_id FROM xgb_model),
features
) AS prediction
FROM test_table
LIMIT 10;
-- Get model from catalog if needed
SELECT
id,
neurondb.predict(
(SELECT model_id FROM neurondb.ml_models
WHERE algorithm = 'xgboost'
ORDER BY model_id DESC LIMIT 1),
features
) AS prediction
FROM test_table;Function Signature:
neurondb.predict( model_id INTEGER, -- Model ID from neurondb.train() features VECTOR -- Feature vector ) RETURNS NUMERIC -- Prediction valueModel Evaluation
Evaluate gradient boosting models using the unified evaluation API:
Evaluate model
-- Evaluate XGBoost model
DO $$
DECLARE
mid integer;
metrics_result jsonb;
BEGIN
-- Get model_id
SELECT model_id INTO mid FROM xgb_model LIMIT 1;
-- Evaluate
metrics_result := neurondb.evaluate(
mid,
'test_table',
'features',
'label'
);
-- Display metrics
RAISE NOTICE 'Accuracy: %', metrics_result->>'accuracy';
RAISE NOTICE 'Precision: %', metrics_result->>'precision';
RAISE NOTICE 'Recall: %', metrics_result->>'recall';
RAISE NOTICE 'F1 Score: %', metrics_result->>'f1_score';
END $$;Learn More
For detailed documentation on gradient boosting algorithms, hyperparameter tuning, feature importance, and model comparison, visit: Gradient Boosting Documentation
Related Topics
- Random Forest - Ensemble methods
- Classification - Classification algorithms