0% found this document useful (0 votes)

49 views9 pages

AdaBoost Explained

AdaBoost, or Adaptive Boosting, is an ensemble machine learning algorithm that enhances classification and regression tasks by combining multiple weak learners into a strong learner. It adjusts the weights of training instances based on previous classifications to improve accuracy, utilizing decision trees as its primary estimator. The algorithm iteratively builds models to correct errors from previous iterations, ultimately leading to a more accurate prediction model.

Uploaded by

Hanock Jacob

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views9 pages

AdaBoost Explained

Uploaded by

Hanock Jacob

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

AdaBoost, short for Adaptive Boosting, is an ensemble machine learning

algorithm that can be used in a wide variety of classification and regression

tasks. It is a supervised learning algorithm that is used to classify data by
combining multiple weak or base learners (e.g., decision trees) into a strong
learner. AdaBoost works by weighting the instances in the training dataset
based on the accuracy of previous classifications.

AdaBoost Algorithm

Freund and Schapire first presented boosting as an ensemble modelling

approach in 1997. Boosting has now become a popular strategy for dealing
with binary classification issues. These algorithms boost prediction power by
transforming a large number of weak learners into strong learners.

Boosting algorithms work on the idea of first building a model on the training
dataset and then building a second model to correct the faults in the first
model. This technique is repeated until the mistakes are reduced and the
dataset is accurately predicted. Boosting algorithms function similarly in that
they combine numerous models (weak learners) to produce the final result
(strong learners).

There are three kinds of boosting algorithms:

• The AdaBoost algorithm is used.

• Gradient descent algorithm
• Xtreme gradient descent algorithm

Let’s understand “What is AdaBoost algorithm?”

What is AdaBoost Algorithm in Machine Learning?
What is AdaBoost in machine learning? There are several machine learning
algorithms from which to chose for your issue statements. AdaBoost in
machine learning is one of these predictive modelling techniques. AdaBoost,
also known as Adaptive Boosting, is a Machine Learning approach that is
utilised as an Ensemble Method. AdaBoost's most commonly used estimator
is decision trees with one level, which is decision trees with just one split.
These trees are often referred to as Decision Stumps.

This approach constructs a model and assigns equal weights to all data
points. It then applies larger weights to incorrectly categorised points. In the
following model, all points with greater weights are given more weight. It will
continue to train models until a smaller error is returned.

AdaBoost in Machine Learning

To illustrate, imagine you created a decision tree algorithm using the Titanic
dataset and obtained an accuracy of 80%. Following that, you use a new
method and assess the accuracy, which is 75% for KNN and 70% for Linear
Regression.

When we develop a new model on the same dataset, the accuracy varies.
What if we combine all of these algorithms to create the final prediction?
Using the average of the outcomes from various models will yield more
accurate results. In this method, we can improve prediction power.

Understanding the Working of the AdaBoost

Classifier Algorithm
Step 1:

The image below represents the adaboost algorithm example or adaboost

example by taking below dataset . It is a classification challenge since the
target column is binary. First and foremost, these data points will be
weighted. At first, all of the weights will be equal.

ROW NO. GENDER AGE INCOME ILLNESS SAMPLE WEIGHTS

1 Male 41 40000 Yes 1/5
2 Male 54 30000 No 1/5
3 Female 42 25000 No 1/5
4 Female 40 60000 Yes 1/5
5 Male 46 50000 Yes 1/5

The sample weights are calculated using the

following formula:

N denotes the total number of data points.

Because we have 5 data points, the sample weights will be 1/5.

Step 2:

We will examine how well "Gender" classifies the samples, followed by how
the variables (Age and Income) categorise the samples. We'll make a
decision stump for each characteristic and then compute each tree's Gini
Index. Our first stump will be the tree with the lowest Gini Index.

Let's suppose Gender has the lowest gini index in our dataset, thus it will be
our first stump.

Step 3:
Using this approach, we will now determine the "Amount of Say" or
"Importance" or "Influence" for this classifier in categorising the data points:

The total error is just the sum of all misclassified data points' sample weights.

If there is one incorrect output in our dataset, thus our total error is 1/5, and
the alpha (performance of the stump) is:

0 represents a flawless stump, while 1 represents a bad stump.

• According to the graph above, when there is no misclassification, there is no
error (Total Error = 0), hence the "amount of say (alpha)" will be a huge value.
• When the classifier predicts half correctly and half incorrect, the Total Error
equals 0.5, and the classifier's significance (amount of say) equals 0.
• If all of the samples were improperly categorised, the error will be quite large
(about to 1), and our alpha value will be a negative integer.

Step 4:

You're probably asking why it's required to determine a stump's TE and

performance. The answer is simple: we need to update the weights since if
the same weights are used in the next model, the result will be the same as
it was in the previous model.

The weights of the incorrect forecasts will be increased, while the weights of
the successful predictions will be dropped. When we create our next model
after updating the weights, we will assign greater weight to the points with
higher weights.
After determining the classifier's significance and total error, we must update
the weights using the following formula:

• When the sample is successfully identified, the amount of, say, (alpha) will be
negative.
• When the sample is misclassified, the amount of (alpha) will be positive.
• There are four correctly categorised samples and one incorrectly classified
sample. In this case, the sample weight of the datapoint is 1/5, and the
quantity of say/performance of the Gender stump is 0.69.

The following are new weights for correctly identified samples:

The adjusted weights for incorrectly categorised samples will be:

ROW SAMPLE NEW SAMPLE

GENDER AGE INCOME ILLNESS
NO. WEIGHTS WEIGHTS
1 Male 41 40000 Yes 1/5 0.1004
2 Male 54 30000 No 1/5 0.1004
3 Female 42 25000 No 1/5 0.1004
4 Female 40 60000 Yes 1/5 0.3988
5 Male 46 50000 Yes 1/5 0.1004

We know that the entire sum of the sample weights must equal one, but if
we add all of the new sample weights together, we get 0.8004. To get this
amount equal to 1, we will normalise these weights by dividing all the weights
by the entire sum of updated weights, which is 0.8004. Hence, we get this
dataset after normalising the sample weights, and the sum is now equal to
1.

ROW SAMPLE NEW SAMPLE

GENDER AGE INCOME ILLNESS
NO. WEIGHTS WEIGHTS
1 Male 41 40000 Yes 1/5 0.1004/0.8004 = 0.1254
2 Male 54 30000 No 1/5 0.1004/0.8004 = 0.1254
3 Female 42 25000 No 1/5 0.1004/0.8004 = 0.1254
4 Female 40 60000 Yes 1/5 0.3988/0.8004=0.4982
5 Male 46 50000 Yes 1/5 0.1004/0.8004 = 0.1254

Step 5:
We must now create a fresh dataset to see whether or not the mistakes have
decreased. To do this, we will delete the "sample weights" and "new sample
weights" columns and then split our data points into buckets based on the
"new sample weights.”

ROW SAMPLE NEW SAMPLE

GENDER AGE INCOME ILLNESS BUCKETS
NO. WEIGHTS WEIGHTS
1 Male 41 40000 Yes 1/5 0.1254 0 to 0.1254
0.1254 to
2 Male 54 30000 No 1/5 0.1254
0.2508
0.2508 to
3 Female 42 25000 No 1/5 0.1254
0.3762
0.3762 to
4 Female 40 60000 Yes 1/5 0.4982
0.8744
0.8744 to
5 Male 46 50000 Yes 1/5 0.1254
0.9998

Step 6:

We're nearly there. The method now chooses random values ranging from 0
to 1. Because improperly categorised records have greater sample weights,
the likelihood of picking them is relatively high.

Assume the five random integers chosen by our algorithm are

0.38,0.26,0.98,0.40,0.55.

Now we'll examine where these random numbers go in the bucket and create
our new dataset, which is displayed below.

ROW NO. GENDER AGE INCOME ILLNESS

1 Female 40 60000 Yes
2 Male 54 30000 No
3 Female 42 25000 No
4 Female 40 60000 Yes
5 Female 40 60000 Yes

This is our new dataset, and we can see that the data point that was
incorrectly categorised has been picked three times since it has a greater
weight.

Step 7:

This now serves as our new dataset, and we must repeat all of the preceding
stages, i.e. Give each data point an equal weight. Determine the stump that
best classifies the new group of samples by calculating their Gini index and
picking the one with the lowest Gini index. To update the prior sample
weights, compute the "Amount of Say" and "Total error." Normalize the newly
calculated sample weights. Iterate through these procedures until a low
training error is obtained.

Assume that we have built three decision trees (DT1, DT2, and DT3)
sequentially with regard to our dataset. If we transmit our test data now, it
will go through all of the decision trees, and we will eventually find which
class has the majority, and we will make predictions for our test dataset
based on that.

import numpy as np

import pandas as pd

df = pd.read_csv('Iris.csv')

df.head()

df.info()

x = df[['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']]

x.head()

y = df['Species']

y.head()

from sklearn.preprocessing import LabelEncoder

le=LabelEncoder()

y=le.fit_transform(y)

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Import the AdaBoost classifier

from sklearn.ensemble import AdaBoostClassifier

# Create adaboost classifer object

abc = AdaBoostClassifier(n_estimators=50, learning_rate=1, random_state=0)

# Train Adaboost Classifer

model1 = abc.fit(x_train, y_train)

#Predict the response for test dataset

y_pred = model1.predict(x_test)

from sklearn.metrics import accuracy_score

print("Accuracy:", accuracy_score(y_test, y_pred))

ML Minors Exp8
No ratings yet
ML Minors Exp8
6 pages
Adaboostt
No ratings yet
Adaboostt
7 pages
Adaboost Algorithm Guide
No ratings yet
Adaboost Algorithm Guide
22 pages
ml1 Lab 6
No ratings yet
ml1 Lab 6
5 pages
Ensemble Methods
No ratings yet
Ensemble Methods
30 pages
AIE121 Tut5
No ratings yet
AIE121 Tut5
6 pages
Statistics Project
No ratings yet
Statistics Project
5 pages
Lecture 10 Boosting
No ratings yet
Lecture 10 Boosting
20 pages
2025 Ensemble Learning
No ratings yet
2025 Ensemble Learning
25 pages
Lecture 16 Ensemble Learning
No ratings yet
Lecture 16 Ensemble Learning
16 pages
Topic 2
No ratings yet
Topic 2
47 pages
Handling Imbalanced Data in ML
No ratings yet
Handling Imbalanced Data in ML
6 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
Tex
No ratings yet
Tex
7 pages
Data Minning Unit 2-1
No ratings yet
Data Minning Unit 2-1
10 pages
Ml2 Summary
No ratings yet
Ml2 Summary
8 pages
Artificial Intelligence Fundamentals: Learning: Boosting
No ratings yet
Artificial Intelligence Fundamentals: Learning: Boosting
24 pages
Introduction To Machine Learning - Boosting
No ratings yet
Introduction To Machine Learning - Boosting
6 pages
Lecture 2.1 - AML
No ratings yet
Lecture 2.1 - AML
32 pages
Advanced ML Classification Guide
No ratings yet
Advanced ML Classification Guide
40 pages
Unit 2
No ratings yet
Unit 2
20 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Data Mining: Class Imbalance Solutions
No ratings yet
Data Mining: Class Imbalance Solutions
56 pages
Boosting Buehlmann
No ratings yet
Boosting Buehlmann
52 pages
Imbalanced Classes in Big Data
No ratings yet
Imbalanced Classes in Big Data
20 pages
DM 09 Classification and Prediction 19112024 102854am
No ratings yet
DM 09 Classification and Prediction 19112024 102854am
21 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Data Science Machine Learning
No ratings yet
Data Science Machine Learning
470 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
8 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
ML - Module 4
No ratings yet
ML - Module 4
8 pages
Random and Synthetic Over Sampling Approach To Resolve Data 2zu79c47m6
No ratings yet
Random and Synthetic Over Sampling Approach To Resolve Data 2zu79c47m6
9 pages
21csc305p Machine Learning Unit 5
No ratings yet
21csc305p Machine Learning Unit 5
61 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
ML Unit 3
No ratings yet
ML Unit 3
3 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
Unit Iii
No ratings yet
Unit Iii
67 pages
5 2 Ensemble Learning
No ratings yet
5 2 Ensemble Learning
38 pages
Confusion Matrix
No ratings yet
Confusion Matrix
8 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
AdaBoost Notes
No ratings yet
AdaBoost Notes
5 pages
Machine Learning Project
No ratings yet
Machine Learning Project
12 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
11 pages
ML Unit-3 - RTU
No ratings yet
ML Unit-3 - RTU
20 pages
Validaciones - Bosstrap
No ratings yet
Validaciones - Bosstrap
50 pages
ML8 Ensembles
No ratings yet
ML8 Ensembles
31 pages
Machine Learning Unit-2
No ratings yet
Machine Learning Unit-2
89 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
Data Science Machine Learning
No ratings yet
Data Science Machine Learning
369 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
50 pages
Unit 3
100% (1)
Unit 3
21 pages
Top 90+ Data Science Interview Questions and Answers (2024)
No ratings yet
Top 90+ Data Science Interview Questions and Answers (2024)
38 pages
Random Forest
No ratings yet
Random Forest
20 pages
Unit 6 Classification and Prediction
No ratings yet
Unit 6 Classification and Prediction
66 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
Part A 2 5 10 Marks
No ratings yet
Part A 2 5 10 Marks
2 pages
Unit-1 Java (Nep)
No ratings yet
Unit-1 Java (Nep)
53 pages
9 - Machine Learning
No ratings yet
9 - Machine Learning
200 pages
Dot Net Lab
No ratings yet
Dot Net Lab
77 pages
Used Car Price Prediction Model
No ratings yet
Used Car Price Prediction Model
10 pages
4.3 BSMM-8710 - Introduction To Data Analytics (2023S) - Lecture 7 - Classification Models - v1.0
No ratings yet
4.3 BSMM-8710 - Introduction To Data Analytics (2023S) - Lecture 7 - Classification Models - v1.0
50 pages
10 1109@BigData 2018 8622491
No ratings yet
10 1109@BigData 2018 8622491
10 pages
A Natural Language Processing Approach To Support Biomedical Data
No ratings yet
A Natural Language Processing Approach To Support Biomedical Data
32 pages
Certified AI & ML BlackBelt Plus Program Brochure
No ratings yet
Certified AI & ML BlackBelt Plus Program Brochure
40 pages
ML Interview Questions
No ratings yet
ML Interview Questions
7 pages
02 - Bharghav Fake News Detection
No ratings yet
02 - Bharghav Fake News Detection
49 pages
Ensemble
No ratings yet
Ensemble
6 pages
SYLLABUS
No ratings yet
SYLLABUS
3 pages
Thesis Sales Forcasting
No ratings yet
Thesis Sales Forcasting
43 pages
Customer Churn Prediction Using Machine Learning: A Study in The B2B Subscription Based Service Context
No ratings yet
Customer Churn Prediction Using Machine Learning: A Study in The B2B Subscription Based Service Context
49 pages
Current Trends in AI and ML For Cybersecurity A State-Of-The-Art Survey
No ratings yet
Current Trends in AI and ML For Cybersecurity A State-Of-The-Art Survey
31 pages
Course - Machine Learning Part 1 Batch 2025
No ratings yet
Course - Machine Learning Part 1 Batch 2025
30 pages
10 38016-Jista 922663-1720006
No ratings yet
10 38016-Jista 922663-1720006
7 pages
IJNRD2404218
No ratings yet
IJNRD2404218
5 pages
Multi-Label Feature Aware XGBoost Model For Student Performance Assessment Using Behavior Data in Online Learning Environment
No ratings yet
Multi-Label Feature Aware XGBoost Model For Student Performance Assessment Using Behavior Data in Online Learning Environment
7 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Introduction To AI and ML
No ratings yet
Introduction To AI and ML
22 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
8 pages
Breast Cancer Diagnosis Using Machine
No ratings yet
Breast Cancer Diagnosis Using Machine
11 pages
Extremely Randomized Trees: Pierre Geurts
No ratings yet
Extremely Randomized Trees: Pierre Geurts
40 pages
Dicu Bogdan
No ratings yet
Dicu Bogdan
19 pages
2024 and 2025 Python IEEE Deep Learning and Machine Learning Projects List
No ratings yet
2024 and 2025 Python IEEE Deep Learning and Machine Learning Projects List
10 pages
Aiml Module 04
No ratings yet
Aiml Module 04
62 pages
B3 Twitter Data
No ratings yet
B3 Twitter Data
68 pages
Part1 Ai&Ml Course File
No ratings yet
Part1 Ai&Ml Course File
26 pages
ML Probable Questions 2026 - أسئلة محتملة لامتحان تعلم الآلة 2026 ??
No ratings yet
ML Probable Questions 2026 - أسئلة محتملة لامتحان تعلم الآلة 2026 ??
2 pages
Iot Platform
No ratings yet
Iot Platform
22 pages
Weighted Ensemble Model For Image Classification: Talib Iqball M. Arif Wani
No ratings yet
Weighted Ensemble Model For Image Classification: Talib Iqball M. Arif Wani
8 pages
XGBoost for DDOS Detection in SDN
No ratings yet
XGBoost for DDOS Detection in SDN
8 pages