-
Notifications
You must be signed in to change notification settings - Fork 74
Generalized Additive Models
MLTK currently supports building low-dimensional additive components in GAMs:
GAM works on both classification and regression problems. The following code trains a GAM
object using standard method. The base learner is a tree ensemble with 100 trees of at most 3 leaves.
GAMLearner learner = new GAMLearner();
learner.setBaseLearner("tr:3:100");
learner.setMaxNumIters(100);
learner.setLearningRate(0.01);
learner.setTask(Task.REGRESSION);
learner.setMetric(new RMSE());
GAM gam = learner.build(trainSet);
GA2M works on both classification and regression problems. The following code takes a GAM
object and learn pairwise feature interactions on the (f1, f2), (f2, f3), (f1, f3).
List<IntPair> terms = new ArrayList<>();
terms.add(new IntPair(0, 1));
terms.add(new IntPair(1, 2));
terms.add(new IntPair(0, 2));
GA2MLearner learner = new GA2MLearner();
learner.setGAM(gam);
learner.setMaxNumIters(100);
learner.setTask(Task.REGRESSION);
learner.setMetric(metric);
learner.setPairs(terms);
learner.setLearningRate(0.01);
GAM gam = learner.build(trainSet);
Note that current MLTK only supports feature interactions on binned and nominal attributes. Numeric attributes should be discretized and convert them into binned attributes. It is recommended to discretize all the features before building GAM
.
Sparse partially linear additive models (SPLAMs) automatically discovers which features should be included in the model, and when they are included, which of them are nonlinear features and which of them stay linear. Currently only cubic spline basis is supported for SPLAM. SPLAM is a special form of GAM that works on both classification and regression problems. The following code trains a GAM
object using standard method.
SPLAMLearner learner = new SPLAMLearner();
learner.setNumKnots(10);
learner.setMaxNumIters(100);
learner.setAlpha(0.6);
learner.setLambda(0.1);
learner.setTask(Task.REGRESSION);
GAM gam = learner.build(trainSet);
The code above trains a SPLAM model using cubic spline basis with 10 knots. The lambda
is the regularization parameter and alpha
(should be in (0, 1]) controls the regularization on linear and nonlinear terms. When alpha
is set to 1, we are essentially training a sparse additive model (SPAM) and therefore no linear terms will be included in the model. When alpha
is close to 0, SPLAM reduces to the lasso and we are essentially training a linear model.