Boosting is an ensemble method that combines multiple weak learners (typically decision trees) to
form a strong learner.
It builds models sequentially, with each new model trying to correct the errors made by the previous
ones.
In boosting, more weight is given to data points that were misclassified by earlier models, ensuring
that subsequent models focus on the hard-to-classify examples.
Common boosting algorithms include AdaBoost, Gradient Boosting, and XGBoost, each improving
performance and efficiency in different ways.
AdaBoost is a powerful ensemble learning algorithm that combines multiple weak learners to form a
strong classifier.
A weak learner is a model that performs slightly better than random guessing (e.g., a decision stump—
one-level decision tree).
AdaBoost adapts by focusing on misclassified samples, giving them more importance in subsequent
rounds.
Key features
1. Adaptive Weighting
AdaBoost adjusts the weights of training samples after each round. It increases the weights of
misclassified samples so that the next weak learner pays more attention to these samples.
3. Uses Weak Learners
AdaBoost typically uses simple models (like decision stumps). Although each weak learner performs
only slightly better than random, their combination results in a strong overall classifier.
4. Reduces Bias and Variance
By combining multiple weak learners, AdaBoost reduces bias and can also control variance. This
leads to improved generalization performance on unseen data.
How AdaBoost Works?
1. Initialize Weights:
o Assign equal weights to all training examples.
2. Train Weak Learner:
o Train a weak classifier on the weighted dataset.
3. Evaluate Errors:
o Calculate the weighted error rate (how many examples the classifier got wrong).
4. Compute Classifier Weight:
o Give the trained classifier a weight based on its accuracy (more accurate → higher
weight).
5. Update Weights:
o Increase the weights of misclassified examples so that the next classifier focuses
more on them.
6. Repeat:
o Continue training new weak learners on the updated dataset for a specified number
of rounds or until accuracy is sufficient.
7. Final Model:
o The final prediction is a weighted majority vote of all the weak learners.
Advantages
• Works well with a variety of weak learners.
• Robust to overfitting.
• Simple and effective in practice.
Limitations
• Sensitive to noisy data and outliers.
• Typically, slower due to sequential learning.