0% found this document useful (0 votes)

25 views6 pages

L3_Classification_RandomForest - Jupyter Notebook

The document outlines the process of loading the Iris dataset, training a Random Forest Classifier, and evaluating its accuracy. It demonstrates the importance of feature selection by removing the least significant feature, which improved model accuracy from approximately 91.11% to 93.33%. Additionally, it includes visualizations of decision trees from the Random Forest model.

Uploaded by

Gaynika Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views6 pages

L3_Classification_RandomForest - Jupyter Notebook

Uploaded by

Gaynika Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

In [1]:

# Loading dataset

# importing required libraries

# importing Scikit-learn library and datasets package

from sklearn import datasets

import pandas as pd

# importing random forest classifier from ensemble module

from sklearn.ensemble import RandomForestClassifier

# Loading the iris plants dataset (classification)

iris = datasets.load_iris()

In [2]:

print(iris.target_names) # Dependent Variable

['setosa' 'versicolor' 'virginica']

In [3]:

print(iris.feature_names) # Independent features or columns

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

In [4]:

# Here dataset will contain all independent columns given by iris.data. It will convert to dataframe.

dataset = pd.DataFrame(iris.data)

In [5]:

# printing the top 5 rows in iris dataset

print(dataset.head())

0 1 2 3
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2

In [6]:

# We are trying to create a new column named 'species' in dataset. The values of species column
# is same as iris.target - setosa, versicolor and verginica i.e. 0,1,2

dataset['species'] = iris.target
In [7]:

# Adding column name to the respective columns

dataset.columns =['sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species']

# displaying the DataFrame

print(dataset)

sepallength sepalwidth petallength petalwidth species

0 5.1 3.5 1.4 0.2 0
1 4.9 3.0 1.4 0.2 0
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 0
.. ... ... ... ... ...
145 6.7 3.0 5.2 2.3 2
146 6.3 2.5 5.0 1.9 2
147 6.5 3.0 5.2 2.0 2
148 6.2 3.4 5.4 2.3 2
149 5.9 3.0 5.1 1.8 2

[150 rows x 5 columns]

In [8]:

# Spliting arrays or matrices into random train and test subsets

from sklearn.model_selection import train_test_split

X = dataset.iloc[:, : -1]
y = dataset.iloc[:, -1]

# i.e. 70 % training dataset and 30 % test datasets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state = 8)

In [9]:

from sklearn.ensemble import RandomForestClassifier

# Create a Random Forest Classifier

# n_estimators : int, default=100 : The number of trees in the forest.

# criterion{“gini”, “entropy”}, default=”gini”

clf = RandomForestClassifier(n_estimators=100) # 100 trees

# Train the model using the training sets y_pred=clf.predict(X_test)

clf.fit(X_train,y_train)

# Prediction on test set

y_pred=clf.predict(X_test)

# Import scikit-learn metrics module for accuracy calculation

from sklearn import metrics

# Model Accuracy, how often is the classifier correct?

print("Accuracy: ",metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.9111111111111111

In [10]:

# predicting which type of flower it is.

clf.predict([[3, 3, 2, 2]])

Out[10]:

array([0])

In [11]:

# This implies it is setosa flower type as we got the three species or classes in our data set:
# Setosa, Versicolor, and Virginia.
In [12]:

clf.predict([[3, 5, 5, 2]])

# Here, array([2]) indicates the flower type Virginica.

Out[12]:

array([2])

In [13]:

# Now we will also find out the important features or selecting features in the IRIS dataset.

In [14]:

# importing random forest classifier from assemble module

from sklearn.ensemble import RandomForestClassifier

# Create a Random forest Classifier

clf = RandomForestClassifier(n_estimators = 100)

# Train the model using the training sets

clf.fit(X_train, y_train)

Out[14]:

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,

criterion='gini', max_depth=None, max_features='auto',
max_leaf_nodes=None, max_samples=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=100,
n_jobs=None, oob_score=False, random_state=None,
verbose=0, warm_start=False)

In [15]:

# using the feature importance variable

feature_imp = pd.Series(clf.feature_importances_, index = iris.feature_names).sort_values(

ascending = False)

feature_imp

Out[15]:

petal width (cm) 0.519967

petal length (cm) 0.349479
sepal length (cm) 0.103166
sepal width (cm) 0.027388
dtype: float64

In [16]:

# Generating the Model on Selected Features

# Here, we can remove the "sepal width" feature because it has very low importance,
# and select the 3 remaining features.

# Import train_test_split function

from sklearn.model_selection import train_test_split

# Split dataset into features and labels

X=dataset[['petallength', 'petalwidth','sepallength']]

# Removed feature "sepal width"

y=dataset['species']

# Split dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=5)

In [17]:

from sklearn.ensemble import RandomForestClassifier

# Create Random Forest Classifier

# n_estimatorsint, default=100: The number of trees in the forest.

clf=RandomForestClassifier(n_estimators=100)

# Train the model using the training sets y_pred=clf.predict(X_test)

clf.fit(X_train,y_train)

# Prediction on test set

y_pred=clf.predict(X_test)

# Import scikit-learn metrics module for accuracy calculation

from sklearn import metrics

# Model Accuracy, how often is the classifier correct?

print("Accuracy: ",metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.9333333333333333

In [18]:

# We can see that after removing the least important features (sepal width), the accuracy increased.
# This is because you removed misleading data and noise, resulting in an increased accuracy.
# A lesser amount of features also reduces the training time.

In [19]:

# first decision tree is 0th tree and total trees are from 0 to 99

clf.estimators_[0]

Out[19]:

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',

max_depth=None, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort='deprecated',
random_state=1364519456, splitter='best')
In [20]:

# Plot first decision tree

from sklearn import tree

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(25,20))

a = tree.plot_tree(clf.estimators_[0], feature_names = X.columns, filled=True)

In [21]:

clf.estimators_[1]

Out[21]:

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',

# Plot second decision tree

from sklearn import tree

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(25,20))

a = tree.plot_tree(clf.estimators_[1], feature_names = X.columns, filled=True)

In [ ]:

Pra8 (1)
No ratings yet
Pra8 (1)
4 pages
Dm.practical06
No ratings yet
Dm.practical06
12 pages
CO3
No ratings yet
CO3
8 pages
dsbda_assig_6_data_analytcs_3[1]
No ratings yet
dsbda_assig_6_data_analytcs_3[1]
6 pages
dsbdalab6 (1)
No ratings yet
dsbdalab6 (1)
5 pages
Support Vector Machine (SVM Classifier) Implemenation in Python With Scikit-Learn
No ratings yet
Support Vector Machine (SVM Classifier) Implemenation in Python With Scikit-Learn
21 pages
12.1
No ratings yet
12.1
4 pages
cropR
No ratings yet
cropR
10 pages
pr_6
No ratings yet
pr_6
6 pages
Introduction to Neural Networks
No ratings yet
Introduction to Neural Networks
4 pages
Untitled2 - Jupyter Notebook
No ratings yet
Untitled2 - Jupyter Notebook
4 pages
vertopal.com_task7 (1)
No ratings yet
vertopal.com_task7 (1)
14 pages
ML Lab Programs
No ratings yet
ML Lab Programs
23 pages
ML EXPT 2
No ratings yet
ML EXPT 2
5 pages
TranMinhTu1 bt2 2
No ratings yet
TranMinhTu1 bt2 2
5 pages
AAM 5th Practicle (1)
No ratings yet
AAM 5th Practicle (1)
3 pages
Eai
No ratings yet
Eai
4 pages
Assignment 5
No ratings yet
Assignment 5
5 pages
bda 3.1
No ratings yet
bda 3.1
2 pages
ML 1
No ratings yet
ML 1
4 pages
Python Decision tree
No ratings yet
Python Decision tree
2 pages
sklearn
No ratings yet
sklearn
141 pages
Bagging, Random Forest, Gradient boost, AdaBoost & PCA
No ratings yet
Bagging, Random Forest, Gradient boost, AdaBoost & PCA
8 pages
Hussain-assin2_cancrclassification
No ratings yet
Hussain-assin2_cancrclassification
12 pages
lab manual ML
No ratings yet
lab manual ML
23 pages
Lab Manual
No ratings yet
Lab Manual
32 pages
1 Assignment 3 - Classification
No ratings yet
1 Assignment 3 - Classification
16 pages
Lesson 5.0 Supervised Learning with Decision Trees (1)
No ratings yet
Lesson 5.0 Supervised Learning with Decision Trees (1)
16 pages
Implementing Logistic Regression For Iris Using Sklearn and Checking The Accuracy Using Confusion Matrix
No ratings yet
Implementing Logistic Regression For Iris Using Sklearn and Checking The Accuracy Using Confusion Matrix
7 pages
DS 6
No ratings yet
DS 6
2 pages
19mid0034 (Chandru) - ML Lab Fat - Jupyter Notebook
No ratings yet
19mid0034 (Chandru) - ML Lab Fat - Jupyter Notebook
4 pages
7 output
No ratings yet
7 output
4 pages
KRAI LabManual
No ratings yet
KRAI LabManual
77 pages
PR
No ratings yet
PR
17 pages
2. Random Forest Algorithm
No ratings yet
2. Random Forest Algorithm
2 pages
Unsupervised ML
No ratings yet
Unsupervised ML
17 pages
DSBDA6
No ratings yet
DSBDA6
6 pages
ML#07
No ratings yet
ML#07
21 pages
Sharmila Vege Sana 2018
No ratings yet
Sharmila Vege Sana 2018
37 pages
ML 2.3 Prashant
No ratings yet
ML 2.3 Prashant
4 pages
Exp 07 (ML)
No ratings yet
Exp 07 (ML)
4 pages
EXP 07 (ML) - Sarthak
No ratings yet
EXP 07 (ML) - Sarthak
4 pages
Import Numpy As NP Import Pandas As PD
No ratings yet
Import Numpy As NP Import Pandas As PD
7 pages
22IZ023 Nikhil - Exercise 7 a_ Decision Trees
No ratings yet
22IZ023 Nikhil - Exercise 7 a_ Decision Trees
4 pages
RANDOM_FOREST__1737667979
No ratings yet
RANDOM_FOREST__1737667979
11 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
ML L - Ab
No ratings yet
ML L - Ab
13 pages
EXP 07 (ML) - Darshu
No ratings yet
EXP 07 (ML) - Darshu
4 pages
EXP 07 (ML) - Ashu
No ratings yet
EXP 07 (ML) - Ashu
4 pages
SC Assignment Q2
No ratings yet
SC Assignment Q2
7 pages
ml lab programs 2
No ratings yet
ml lab programs 2
16 pages
10 Random - Forest - Algo
No ratings yet
10 Random - Forest - Algo
6 pages
FDP Session 4 (Decision Tree)
No ratings yet
FDP Session 4 (Decision Tree)
1 page
ml keshav
No ratings yet
ml keshav
23 pages
Credit Card Project Review
No ratings yet
Credit Card Project Review
59 pages
A Comparative Analysis of Word Embeddings Techniques for Italian News Categorization
No ratings yet
A Comparative Analysis of Word Embeddings Techniques for Italian News Categorization
17 pages
ML
No ratings yet
ML
11 pages
CSET301 LabW8L2
No ratings yet
CSET301 LabW8L2
1 page
Machine Learning Algorithms, Real-World Applications and Research Directions
No ratings yet
Machine Learning Algorithms, Real-World Applications and Research Directions
73 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
Kiran Kumar Mini
No ratings yet
Kiran Kumar Mini
113 pages
ML 2024 Part6 Classification Unsupervised
No ratings yet
ML 2024 Part6 Classification Unsupervised
43 pages
Application of Machine Learning
No ratings yet
Application of Machine Learning
8 pages
AAM CT-1 Que Bank
No ratings yet
AAM CT-1 Que Bank
2 pages
Credit Card Fraud Detection Using Random Forest & Cart Algorithm
No ratings yet
Credit Card Fraud Detection Using Random Forest & Cart Algorithm
7 pages
Gene Expression Prediction Using Machine Learning Project Presentation
No ratings yet
Gene Expression Prediction Using Machine Learning Project Presentation
14 pages
Machine Learning in Logistics: Machine Learning Algorithms
No ratings yet
Machine Learning in Logistics: Machine Learning Algorithms
33 pages
Chen, 2023
No ratings yet
Chen, 2023
18 pages
Big Data: New Tricks For Econometrics: Hal R. Varian
No ratings yet
Big Data: New Tricks For Econometrics: Hal R. Varian
55 pages
League Legends Study
No ratings yet
League Legends Study
16 pages
Prospects and challenges of using artificial intelligence in the audit process
No ratings yet
Prospects and challenges of using artificial intelligence in the audit process
28 pages
Statistical Arbitrage With ML 1721555596
No ratings yet
Statistical Arbitrage With ML 1721555596
9 pages
Day 2 - Session 2: - KNN - Decision Tree - Random Forest - Naïve Bayes Classification
No ratings yet
Day 2 - Session 2: - KNN - Decision Tree - Random Forest - Naïve Bayes Classification
50 pages
Final Report
No ratings yet
Final Report
40 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
34 pages
Irjmets Journal
No ratings yet
Irjmets Journal
13 pages
Farmer Assistance Using AI Voice Bot - Base Paper
No ratings yet
Farmer Assistance Using AI Voice Bot - Base Paper
9 pages
Unit IV Naïve Bayes and Support Vector Machine
No ratings yet
Unit IV Naïve Bayes and Support Vector Machine
22 pages
Machine Learning Prediction of New York Airbnb Prices
No ratings yet
Machine Learning Prediction of New York Airbnb Prices
5 pages
Detecting Phishing Websites Using Machine Learning
No ratings yet
Detecting Phishing Websites Using Machine Learning
16 pages
RFLR Artificial Intelligence in Asset Management
No ratings yet
RFLR Artificial Intelligence in Asset Management
100 pages
Introduction To Spark With Sparklyr in R
No ratings yet
Introduction To Spark With Sparklyr in R
11 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
Titanic
No ratings yet
Titanic
7 pages
Expected Goals in Soccer
No ratings yet
Expected Goals in Soccer
63 pages
Machine: Learning ATO Z - I
No ratings yet
Machine: Learning ATO Z - I
131 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
28 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet