0% found this document useful (0 votes)

11 views14 pages

ML Exp4 Part A

Uploaded by

Nishad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views14 pages

ML Exp4 Part A

Uploaded by

Nishad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

PART A

(PART A: TO BE REFFERED BY STUDENTS)

Experiment No. 4
A.1 Aim:
To implement Ensemble algorithm.

A.2 Prerequisite:
Python Basic Concepts

A.3 Outcome:
Students will be able to implement Ensemble algorithm.

A.4 Theory:

Ensemble Learning Techniques in Machine Learning, Machine learning models suffer bias
and/or variance. Bias is the difference between the predicted value and actual value by the
model. Bias is introduced when the model doesn’t consider the variation of data and creates a
simple model. The simple model doesn’t follow the patterns of data, and hence the model
gives errors in predicting training as well as testing data i.e. the model with high bias and
high variance
When the model follows even random quirks of data, as pattern of data, then the model might
do very well on training dataset i.e. it gives low bias, but it fails on test data and gives high
variance.
Therefore, to improve the accuracy (estimate) of the model, ensemble learning methods are
developed. Ensemble is a machine learning concept, in which several models are trained
using machine learning algorithms. It combines low performing classifiers (also called as
weak learners or base learner) and combine individual model prediction for the final
prediction.
On the basis of type of base learners, ensemble methods can be categorized as homogeneous
and heterogeneous ensemble methods. If base learners are same, then it is a homogeneous
ensemble method. If base learners are different then it is a heterogeneous ensemble method.

Ensemble Learning Methods

Ensemble techniques are classified into three types:

1. Bagging
2. Boosting
3. Stacking
4. Bagging
Consider a scenario where you are looking at the users’ ratings for a product. Instead of
approving one user’s good/bad rating, we consider average rating given to the product. With
average rating, we can be considerably sure of quality of the product. Bagging makes use of
this principle. Instead of depending on one model, it runs the data through multiple models in
parallel, and average them out as model’s final output.

What is Bagging? How it works?

 Bagging is an acronym for Bootstrapped Aggregation. Bootstrapping means

random selection of records with replacement from the training dataset. ‘Random
selection with replacement’ can be explained as follows:

a. Consider that there are 8 samples in the training dataset. Out of these 8 samples,
every weak learner gets 5 samples as training data for the model. These 5 samples
need not be unique, or non-repetitive.
b. The model (weak learner) is allowed to get a sample multiple times. For example,
as shown in the figure, Rec5 is selected 2 times by the model. Therefore, weak
learner1 gets Rec2, Rec5, Rec8, Rec5, Rec4 as training data.
c. All the samples are available for selection to next weak learners. Thus all 8
samples will be available for next weak learner and any sample can be selected
multiple times by next weak learners.

 Bagging is a parallel method, which means several weak learners learn the data
pattern independently and simultaneously. This can be best shown in the below
diagram:
1. The output of each weak learner is averaged to generate final output of the model.
2. Since the weak learner’s outputs are averaged, this mechanism helps to reduce
variance or variability in the predictions. However, it does not help to reduce bias
of the model.
3. Since final prediction is an average of output of each weak learner, it means that
each weak learner has equal say or weight in the final output.
To summarize:

1. Bagging is Bootstrapped Aggregation

2. It is Parallel method
3. Final output is calculated by averaging the outputs produced by individual weak
learner
4. Each weak learner has equal say
5. Bagging reduces variance

Boosting
We saw that in bagging every model is given equal preference, but if one model predicts data
more correctly than the other, then higher weightage should be given to this model over the
other. Also, the model should attempt to reduce bias. These concepts are applied in the
second ensemble method that we are going to learn, that is Boosting.

What is Boosting?

1. To start with, boosting assigns equal weights to all data points as all points are
equally important in the beginning. For example, if a training dataset has N
samples, it assigns weight = 1/N to each sample.
2. The weak learner classifies the data. The weak classifier classifies some samples
correctly, while making mistake in classifying others.
3. After classification, sample weights are changed. Weight of correctly classified
sample is reduced, and weight of incorrectly classified sample is increased. Then
the next weak classifier is run.
4. This process continues until model as a whole gives strong predictions.
Note: Adaboost is the ensemble learning method used in binary classification only.
PART B
(PART B : TO BE COMPLETED BY STUDENTS)

(Students must submit the soft copy as per following segments within two hours of the practical.
The soft copy must be uploaded on the Blackboard or emailed to the concerned lab in charge
faculties at the end of the practical in case the there is no Black board access available)

Roll. No. BE-A10 Name: Nishad Sutar

Class: BE-Comps A Batch: A1
Date of Experiment: 28/07/2025 Date of Submission: 04/08/2025
Grade:

B.1 Software Code written by student:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.datasets import load_breast_cancer

# Load a dataset (Breast Cancer dataset for classification)

data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target, name='target')

# No missing values in this dataset, but including imputation for completeness

# Identify categorical and numerical features
# The breast cancer dataset only has numerical features, but including steps for categorical
features for completeness
numerical_features = X.columns
categorical_features = [] # No categorical features in this dataset
# Create transformers for numerical and categorical features
numerical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())
])

categorical_transformer = Pipeline(steps=[
('onehot', OneHotEncoder(handle_unknown='ignore'))
])

# Create a column transformer to apply different transformations to different columns

preprocessor = ColumnTransformer(
transformers=[
('num', numerical_transformer, numerical_features),
('cat', categorical_transformer, categorical_features)
])

# Create a preprocessing pipeline

preprocess_pipeline = Pipeline(steps=[('preprocessor', preprocessor)])

# Apply preprocessing to the data

X_processed = preprocess_pipeline.fit_transform(X)

# Convert the processed data back to a DataFrame (optional, but useful for inspection)
# If there were categorical features, the column names would be different after one-hot
encoding.
# For this dataset, since only numerical features are present and scaled, we can keep the
original column names.
X_processed_df = pd.DataFrame(X_processed, columns=numerical_features)

# Split the preprocessed data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X_processed_df, y, test_size=0.2,
random_state=42)
print("Original data shape:", X.shape)
print("Processed data shape:", X_processed_df.shape)
print("Training data shape:", X_train.shape)
print("Testing data shape:", X_test.shape)
print("Training target shape:", y_train.shape)
print("Testing target shape:", y_test.shape)

display(X_train.head())
display(y_train.head())

from sklearn.ensemble import BaggingClassifier

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Instantiate a DecisionTreeClassifier as the base estimator

dt_classifier = DecisionTreeClassifier(random_state=42)

# Instantiate a BaggingClassifier with the corrected parameter name

bagging_classifier = BaggingClassifier(estimator=dt_classifier, n_estimators=100,
random_state=42)

# Train the BaggingClassifier

bagging_classifier.fit(X_train, y_train)

# Predict on the test data

y_pred_bagging = bagging_classifier.predict(X_test)

# Calculate the accuracy

accuracy_bagging = accuracy_score(y_test, y_pred_bagging)

print(f"Bagging Classifier Accuracy: {accuracy_bagging:.4f}")

from sklearn.ensemble import GradientBoostingClassifier

from sklearn.metrics import accuracy_score

# Instantiate a GradientBoostingClassifier
boosting_classifier = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1,
max_depth=3, random_state=42)

# Train the Boosting Classifier

boosting_classifier.fit(X_train, y_train)

# Predict on the test data

y_pred_boosting = boosting_classifier.predict(X_test)

# Calculate the accuracy

accuracy_boosting = accuracy_score(y_test, y_pred_boosting)

print(f"Boosting Classifier Accuracy: {accuracy_boosting:.4f}")

from sklearn.ensemble import StackingClassifier

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Define a list of base models (estimators)

estimators = [
('dt', DecisionTreeClassifier(random_state=42)),
('knn', KNeighborsClassifier())
]

# Define a meta-classifier
meta_classifier = LogisticRegression(random_state=42)

# Instantiate StackingClassifier
stacking_classifier = StackingClassifier(
estimators=estimators,
final_estimator=meta_classifier,
cv=5 # Cross-validation for training base models
)

# Train the StackingClassifier

stacking_classifier.fit(X_train, y_train)

# Make predictions on the test data

y_pred_stacking = stacking_classifier.predict(X_test)

# Calculate the accuracy

accuracy_stacking = accuracy_score(y_test, y_pred_stacking)

# Print the calculated accuracy

print(f"Stacking Classifier Accuracy: {accuracy_stacking:.4f}")

from sklearn.metrics import accuracy_score

from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier

# Calculate accuracy for Decision Tree base model

dt_classifier = DecisionTreeClassifier(random_state=42)
dt_classifier.fit(X_train, y_train)
y_pred_dt = dt_classifier.predict(X_test)
accuracy_dt = accuracy_score(y_test, y_pred_dt)

# Calculate accuracy for K-Nearest Neighbors base model

knn = KNeighborsClassifier()
knn.fit(X_train, y_train)
y_pred_knn = knn.predict(X_test)
accuracy_knn = accuracy_score(y_test, y_pred_knn)

# Store accuracies in a dictionary

model_accuracies = {
'Bagging': accuracy_bagging,
'Boosting': accuracy_boosting,
'Stacking': accuracy_stacking,
'Decision Tree (Base)': accuracy_dt,
'K-Nearest Neighbors (Base)': accuracy_knn
}

# Print accuracies for comparison

print("Model Accuracies:")
for model, accuracy in model_accuracies.items():
print(f"{model}: {accuracy:.4f}")

import matplotlib.pyplot as plt

# Model names and accuracies

models = [
'Decision Tree (Base)',
'K-Nearest Neighbors (Base)',
'Bagging',
'Boosting',
'Stacking'
]

accuracies = [0.9474, 0.9474, 0.9561, 0.9561, 0.9649]

# Convert to percentages for display

accuracies_percent = [acc * 100 for acc in accuracies]

# Define colors (optional)

colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd']

# Create the bar chart

plt.figure(figsize=(10, 6))
bars = plt.bar(models, accuracies_percent, color=colors)

# Add value labels on top of each bar

for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2, height + 0.2,
f'{height:.2f}%', ha='center', va='bottom', fontsize=10)

# Chart formatting
plt.title('Model Accuracy Comparison', fontsize=14)
plt.ylabel('Accuracy (%)', fontsize=12)
plt.ylim(94, 98)
plt.xticks(rotation=15)
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Show the plot

plt.tight_layout()
plt.show()
B.2 Input and Output:
B.3 Observations and learning:
In this experiment, I implemented Ensemble Learning techniques, a sophisticated approach in
machine learning designed to enhance model accuracy. I observed that the core principle of ensemble
methods is to combine the predictions of several base models, or "weak learners," to produce a single,
superior "strong learner." This strategy directly addresses the fundamental trade-off between bias and
variance that affects individual models. I explored two primary ensemble methods: Bagging and
Boosting. I noted that Bagging, or Bootstrap Aggregating, works in parallel, training multiple models
on different random subsets of the data and averaging their outputs to reduce variance. In contrast, I
observed that Boosting works sequentially, with each new model focusing on correcting the errors
made by its predecessor by adjusting data point weights, thereby reducing bias.
B.4 Conclusion:
In conclusion, this experiment successfully achieved its aim of implementing Ensemble algorithms. I
have learned that instead of relying on a single model, combining multiple models can significantly
improve predictive performance and robustness. The experiment demonstrated that Bagging is an
effective method for reducing variance and preventing overfitting, while Boosting is a powerful
technique for reducing bias and building highly accurate classifiers. This practical implementation
reinforces the understanding that ensemble learning is a critical concept in machine learning, proving
that the collective "wisdom" of multiple models is often more powerful and reliable than any single
model alone.
B.5 Question of Curiosity
(To be answered by student based on the practical performed and learning/observations)

Ens Embling
No ratings yet
Ens Embling
8 pages
UNIT1
No ratings yet
UNIT1
80 pages
Week 11
No ratings yet
Week 11
16 pages
Ensemble Learning for Data Scientists
No ratings yet
Ensemble Learning for Data Scientists
31 pages
Unit 4 ML
No ratings yet
Unit 4 ML
25 pages
Ensemble Learning
No ratings yet
Ensemble Learning
35 pages
Jntuk r20 ML Unit-III
No ratings yet
Jntuk r20 ML Unit-III
28 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
8 Ensembles
No ratings yet
8 Ensembles
94 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
LR Desktop Udo6rlp
No ratings yet
LR Desktop Udo6rlp
4 pages
Unit 2
No ratings yet
Unit 2
13 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
Chapter 3 Ensemble Learning
No ratings yet
Chapter 3 Ensemble Learning
37 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
9 pages
Article Review 9 Eng
No ratings yet
Article Review 9 Eng
21 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Machine Learning
No ratings yet
Machine Learning
76 pages
Ensemble Methods (Final)
No ratings yet
Ensemble Methods (Final)
16 pages
UNIT3 Class
No ratings yet
UNIT3 Class
30 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Ensemble Learning
No ratings yet
Ensemble Learning
13 pages
CH 7 - Ensemble Learning and Random Forests
No ratings yet
CH 7 - Ensemble Learning and Random Forests
78 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Chapter 7 - Ensemble
No ratings yet
Chapter 7 - Ensemble
12 pages
Unit 3
No ratings yet
Unit 3
99 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
B43 Exp4 ML
No ratings yet
B43 Exp4 ML
6 pages
Unit 5 ML
No ratings yet
Unit 5 ML
14 pages
ML Unit 3 V2
No ratings yet
ML Unit 3 V2
47 pages
M05 Ensemble
No ratings yet
M05 Ensemble
42 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
15 pages
ML & DA Unit4
No ratings yet
ML & DA Unit4
34 pages
2023 ML
No ratings yet
2023 ML
69 pages
Module 4
No ratings yet
Module 4
30 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
24 pages
Daa PL 6
No ratings yet
Daa PL 6
28 pages
Green University of Bangladesh Department of Computer Science and Engineering (CSE)
No ratings yet
Green University of Bangladesh Department of Computer Science and Engineering (CSE)
6 pages
MMC102 - Module 4 - Notes
No ratings yet
MMC102 - Module 4 - Notes
39 pages
ML Cat 2 - 7
No ratings yet
ML Cat 2 - 7
30 pages
Unit 3
No ratings yet
Unit 3
63 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
47 pages
Ensemble Techniques Presentation
No ratings yet
Ensemble Techniques Presentation
17 pages
Time To Explore (5) ML
No ratings yet
Time To Explore (5) ML
9 pages
ML Unit-1 em
No ratings yet
ML Unit-1 em
61 pages
Boosting
No ratings yet
Boosting
28 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Ensembles Learning
No ratings yet
Ensembles Learning
16 pages
Unit 4
No ratings yet
Unit 4
24 pages
Module 4
No ratings yet
Module 4
44 pages
Unit 1
No ratings yet
Unit 1
14 pages
UMl - Unit 3
No ratings yet
UMl - Unit 3
50 pages
ML Exp6
No ratings yet
ML Exp6
7 pages
BC Assignment 1
No ratings yet
BC Assignment 1
63 pages
Exp 5 ML
No ratings yet
Exp 5 ML
9 pages
ML Exp1 Part A
No ratings yet
ML Exp1 Part A
5 pages
Naive Bayes
No ratings yet
Naive Bayes
36 pages
Heart Disease Prediction via Data Mining
No ratings yet
Heart Disease Prediction via Data Mining
4 pages
Customer Support Dataset With Multi-Intent Annotations For Conversational Ai
No ratings yet
Customer Support Dataset With Multi-Intent Annotations For Conversational Ai
27 pages
Malware Detection with ML Techniques
No ratings yet
Malware Detection with ML Techniques
8 pages
An All-Digital 65-nm Tsetlin Machine Image Classification Accelerator With 8.6 NJ Per MNIST Frame at 60.3k Frames Per Second
No ratings yet
An All-Digital 65-nm Tsetlin Machine Image Classification Accelerator With 8.6 NJ Per MNIST Frame at 60.3k Frames Per Second
10 pages
Feature Extraction, Feature Selection and Machine Learning For Image Classification: A Case Study
No ratings yet
Feature Extraction, Feature Selection and Machine Learning For Image Classification: A Case Study
6 pages
Unit 1
No ratings yet
Unit 1
66 pages
Data Analytics Unit4 Notes
No ratings yet
Data Analytics Unit4 Notes
3 pages
Material For Student CAIEC™ (V062021A) EN
No ratings yet
Material For Student CAIEC™ (V062021A) EN
189 pages
List of Datasets For Machine-Learning Research
No ratings yet
List of Datasets For Machine-Learning Research
48 pages
RGIS603 Assignment 2
No ratings yet
RGIS603 Assignment 2
3 pages
120 DS-With Answer
100% (1)
120 DS-With Answer
32 pages
Machine Learning and Web Scraping Lesson02
No ratings yet
Machine Learning and Web Scraping Lesson02
29 pages
SYLLABUS
No ratings yet
SYLLABUS
3 pages
Roman Urdu News Headline Classification Empowered With Machine Learning
No ratings yet
Roman Urdu News Headline Classification Empowered With Machine Learning
16 pages
DSC Module2 13.08.25
No ratings yet
DSC Module2 13.08.25
38 pages
Shopping With A Robotic Companion
No ratings yet
Shopping With A Robotic Companion
15 pages
BAM - The Behance Artistic Media Dataset For Recognition Beyond Photography
No ratings yet
BAM - The Behance Artistic Media Dataset For Recognition Beyond Photography
10 pages
3.popular Machine Learning Algorithm
No ratings yet
3.popular Machine Learning Algorithm
11 pages
Transition Plan v061511
100% (2)
Transition Plan v061511
5 pages
Deep Learning MCQs
No ratings yet
Deep Learning MCQs
18 pages
Stock Price Prediction: A Comparative Study Between Traditional Statistical Approach and Machine Learning Approach
No ratings yet
Stock Price Prediction: A Comparative Study Between Traditional Statistical Approach and Machine Learning Approach
37 pages
M.Tech. Data Science & Engineering - Programme Brochure
100% (1)
M.Tech. Data Science & Engineering - Programme Brochure
18 pages
TulikaArun AIPT LabExperiments
No ratings yet
TulikaArun AIPT LabExperiments
47 pages
21AI63 Simp 23
No ratings yet
21AI63 Simp 23
3 pages
Code Smell Severity Detection Using ML
No ratings yet
Code Smell Severity Detection Using ML
18 pages
A Comparative Study On Handwriting Digit Recognition Using Neural Networks
No ratings yet
A Comparative Study On Handwriting Digit Recognition Using Neural Networks
5 pages
11 Economics MCQ
No ratings yet
11 Economics MCQ
70 pages
Data Mining: Accuracy and Error Measures For Classification and Prediction
No ratings yet
Data Mining: Accuracy and Error Measures For Classification and Prediction
15 pages
Google Play Download Behavior Analysis
No ratings yet
Google Play Download Behavior Analysis
11 pages

ML Exp4 Part A

Uploaded by

ML Exp4 Part A

Uploaded by

PART A

(PART A: TO BE REFFERED BY STUDENTS)

Ensemble Learning Methods

What is Bagging? How it works?

 Bagging is an acronym for Bootstrapped Aggregation. Bootstrapping means

1. Bagging is Bootstrapped Aggregation

Roll. No. BE-A10 Name: Nishad Sutar

B.1 Software Code written by student:

# Load a dataset (Breast Cancer dataset for classification)

# No missing values in this dataset, but including imputation for completeness

# Create a column transformer to apply different transformations to different columns

# Create a preprocessing pipeline

# Apply preprocessing to the data

# Split the preprocessed data into training and testing sets

from sklearn.ensemble import BaggingClassifier

# Instantiate a DecisionTreeClassifier as the base estimator

# Instantiate a BaggingClassifier with the corrected parameter name

# Train the BaggingClassifier

# Predict on the test data

# Calculate the accuracy

print(f"Bagging Classifier Accuracy: {accuracy_bagging:.4f}")

from sklearn.ensemble import GradientBoostingClassifier

# Train the Boosting Classifier

# Predict on the test data

# Calculate the accuracy

print(f"Boosting Classifier Accuracy: {accuracy_boosting:.4f}")

from sklearn.ensemble import StackingClassifier

# Define a list of base models (estimators)

# Train the StackingClassifier

# Make predictions on the test data

# Calculate the accuracy

# Print the calculated accuracy

from sklearn.metrics import accuracy_score

# Calculate accuracy for Decision Tree base model

# Calculate accuracy for K-Nearest Neighbors base model

# Store accuracies in a dictionary

# Print accuracies for comparison

import matplotlib.pyplot as plt

# Model names and accuracies

accuracies = [0.9474, 0.9474, 0.9561, 0.9561, 0.9649]

# Convert to percentages for display

# Define colors (optional)

# Create the bar chart

# Add value labels on top of each bar

# Show the plot

You might also like