AdaBoost, short for Adaptive Boosting, is an ensemble machine learning
algorithm that can be used in a wide variety of classification and regression
tasks. It is a supervised learning algorithm that is used to classify data by
combining multiple weak or base learners (e.g., decision trees) into a strong
learner. AdaBoost works by weighting the instances in the training dataset
based on the accuracy of previous classifications.
AdaBoost Algorithm
Freund and Schapire first presented boosting as an ensemble modelling
approach in 1997. Boosting has now become a popular strategy for dealing
with binary classification issues. These algorithms boost prediction power by
transforming a large number of weak learners into strong learners.
Boosting algorithms work on the idea of first building a model on the training
dataset and then building a second model to correct the faults in the first
model. This technique is repeated until the mistakes are reduced and the
dataset is accurately predicted. Boosting algorithms function similarly in that
they combine numerous models (weak learners) to produce the final result
(strong learners).
There are three kinds of boosting algorithms:
• The AdaBoost algorithm is used.
• Gradient descent algorithm
• Xtreme gradient descent algorithm
Let’s understand “What is AdaBoost algorithm?”
What is AdaBoost Algorithm in Machine Learning?
What is AdaBoost in machine learning? There are several machine learning
algorithms from which to chose for your issue statements. AdaBoost in
machine learning is one of these predictive modelling techniques. AdaBoost,
also known as Adaptive Boosting, is a Machine Learning approach that is
utilised as an Ensemble Method. AdaBoost's most commonly used estimator
is decision trees with one level, which is decision trees with just one split.
These trees are often referred to as Decision Stumps.
This approach constructs a model and assigns equal weights to all data
points. It then applies larger weights to incorrectly categorised points. In the
following model, all points with greater weights are given more weight. It will
continue to train models until a smaller error is returned.
AdaBoost in Machine Learning
To illustrate, imagine you created a decision tree algorithm using the Titanic
dataset and obtained an accuracy of 80%. Following that, you use a new
method and assess the accuracy, which is 75% for KNN and 70% for Linear
Regression.
When we develop a new model on the same dataset, the accuracy varies.
What if we combine all of these algorithms to create the final prediction?
Using the average of the outcomes from various models will yield more
accurate results. In this method, we can improve prediction power.
Understanding the Working of the AdaBoost
Classifier Algorithm
Step 1:
The image below represents the adaboost algorithm example or adaboost
example by taking below dataset . It is a classification challenge since the
target column is binary. First and foremost, these data points will be
weighted. At first, all of the weights will be equal.
ROW NO. GENDER AGE INCOME ILLNESS SAMPLE WEIGHTS
1 Male 41 40000 Yes 1/5
2 Male 54 30000 No 1/5
3 Female 42 25000 No 1/5
4 Female 40 60000 Yes 1/5
5 Male 46 50000 Yes 1/5
The sample weights are calculated using the
following formula:
N denotes the total number of data points.
Because we have 5 data points, the sample weights will be 1/5.
Step 2:
We will examine how well "Gender" classifies the samples, followed by how
the variables (Age and Income) categorise the samples. We'll make a
decision stump for each characteristic and then compute each tree's Gini
Index. Our first stump will be the tree with the lowest Gini Index.
Let's suppose Gender has the lowest gini index in our dataset, thus it will be
our first stump.
Step 3:
Using this approach, we will now determine the "Amount of Say" or
"Importance" or "Influence" for this classifier in categorising the data points:
The total error is just the sum of all misclassified data points' sample weights.
If there is one incorrect output in our dataset, thus our total error is 1/5, and
the alpha (performance of the stump) is:
0 represents a flawless stump, while 1 represents a bad stump.
• According to the graph above, when there is no misclassification, there is no
error (Total Error = 0), hence the "amount of say (alpha)" will be a huge value.
• When the classifier predicts half correctly and half incorrect, the Total Error
equals 0.5, and the classifier's significance (amount of say) equals 0.
• If all of the samples were improperly categorised, the error will be quite large
(about to 1), and our alpha value will be a negative integer.
Step 4:
You're probably asking why it's required to determine a stump's TE and
performance. The answer is simple: we need to update the weights since if
the same weights are used in the next model, the result will be the same as
it was in the previous model.
The weights of the incorrect forecasts will be increased, while the weights of
the successful predictions will be dropped. When we create our next model
after updating the weights, we will assign greater weight to the points with
higher weights.
After determining the classifier's significance and total error, we must update
the weights using the following formula:
• When the sample is successfully identified, the amount of, say, (alpha) will be
negative.
• When the sample is misclassified, the amount of (alpha) will be positive.
• There are four correctly categorised samples and one incorrectly classified
sample. In this case, the sample weight of the datapoint is 1/5, and the
quantity of say/performance of the Gender stump is 0.69.
The following are new weights for correctly identified samples:
The adjusted weights for incorrectly categorised samples will be:
ROW SAMPLE NEW SAMPLE
GENDER AGE INCOME ILLNESS
NO. WEIGHTS WEIGHTS
1 Male 41 40000 Yes 1/5 0.1004
2 Male 54 30000 No 1/5 0.1004
3 Female 42 25000 No 1/5 0.1004
4 Female 40 60000 Yes 1/5 0.3988
5 Male 46 50000 Yes 1/5 0.1004
We know that the entire sum of the sample weights must equal one, but if
we add all of the new sample weights together, we get 0.8004. To get this
amount equal to 1, we will normalise these weights by dividing all the weights
by the entire sum of updated weights, which is 0.8004. Hence, we get this
dataset after normalising the sample weights, and the sum is now equal to
1.
ROW SAMPLE NEW SAMPLE
GENDER AGE INCOME ILLNESS
NO. WEIGHTS WEIGHTS
1 Male 41 40000 Yes 1/5 0.1004/0.8004 = 0.1254
2 Male 54 30000 No 1/5 0.1004/0.8004 = 0.1254
3 Female 42 25000 No 1/5 0.1004/0.8004 = 0.1254
4 Female 40 60000 Yes 1/5 0.3988/0.8004=0.4982
5 Male 46 50000 Yes 1/5 0.1004/0.8004 = 0.1254
Step 5:
We must now create a fresh dataset to see whether or not the mistakes have
decreased. To do this, we will delete the "sample weights" and "new sample
weights" columns and then split our data points into buckets based on the
"new sample weights.”
ROW SAMPLE NEW SAMPLE
GENDER AGE INCOME ILLNESS BUCKETS
NO. WEIGHTS WEIGHTS
1 Male 41 40000 Yes 1/5 0.1254 0 to 0.1254
0.1254 to
2 Male 54 30000 No 1/5 0.1254
0.2508
0.2508 to
3 Female 42 25000 No 1/5 0.1254
0.3762
0.3762 to
4 Female 40 60000 Yes 1/5 0.4982
0.8744
0.8744 to
5 Male 46 50000 Yes 1/5 0.1254
0.9998
Step 6:
We're nearly there. The method now chooses random values ranging from 0
to 1. Because improperly categorised records have greater sample weights,
the likelihood of picking them is relatively high.
Assume the five random integers chosen by our algorithm are
0.38,0.26,0.98,0.40,0.55.
Now we'll examine where these random numbers go in the bucket and create
our new dataset, which is displayed below.
ROW NO. GENDER AGE INCOME ILLNESS
1 Female 40 60000 Yes
2 Male 54 30000 No
3 Female 42 25000 No
4 Female 40 60000 Yes
5 Female 40 60000 Yes
This is our new dataset, and we can see that the data point that was
incorrectly categorised has been picked three times since it has a greater
weight.
Step 7:
This now serves as our new dataset, and we must repeat all of the preceding
stages, i.e. Give each data point an equal weight. Determine the stump that
best classifies the new group of samples by calculating their Gini index and
picking the one with the lowest Gini index. To update the prior sample
weights, compute the "Amount of Say" and "Total error." Normalize the newly
calculated sample weights. Iterate through these procedures until a low
training error is obtained.
Assume that we have built three decision trees (DT1, DT2, and DT3)
sequentially with regard to our dataset. If we transmit our test data now, it
will go through all of the decision trees, and we will eventually find which
class has the majority, and we will make predictions for our test dataset
based on that.
import numpy as np
import pandas as pd
df = pd.read_csv('Iris.csv')
df.head()
df.info()
x = df[['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm']]
x.head()
y = df['Species']
y.head()
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
y=le.fit_transform(y)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
# Import the AdaBoost classifier
from sklearn.ensemble import AdaBoostClassifier
# Create adaboost classifer object
abc = AdaBoostClassifier(n_estimators=50, learning_rate=1, random_state=0)
# Train Adaboost Classifer
model1 = abc.fit(x_train, y_train)
#Predict the response for test dataset
y_pred = model1.predict(x_test)
from sklearn.metrics import accuracy_score
print("Accuracy:", accuracy_score(y_test, y_pred))