0% found this document useful (0 votes)

17 views5 pages

Assgn 06 ML - Ipynb - Colab

Uploaded by

Jatan Tandon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views5 pages

Assgn 06 ML - Ipynb - Colab

Uploaded by

Jatan Tandon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

4/2/25, 3:36 PM Assgn_06_ML.

ipynb - Colab

import pandas as pd
add Code add Text

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.tree import DecisionTreeClassifier

df = pd.read_csv("Cancer_data.csv")
print(df.head())
df.info()

mean_radius mean_texture mean_perimeter mean_area mean_smoothness \

0 17.99 10.38 122.80 1001.0 0.11840
1 20.57 17.77 132.90 1326.0 0.08474
2 19.69 21.25 130.00 1203.0 0.10960
3 11.42 20.38 77.58 386.1 0.14250
4 20.29 14.34 135.10 1297.0 0.10030

diagnosis
0 0
1 0
2 0
3 0
4 0
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 mean_radius 564 non-null float64
1 mean_texture 562 non-null float64
2 mean_perimeter 563 non-null float64
3 mean_area 562 non-null float64
4 mean_smoothness 566 non-null float64
5 diagnosis 569 non-null int64
dtypes: float64(5), int64(1)
memory usage: 26.8 KB

df.fillna(df.mean(), inplace=True)

# Independent and dependent features

x = df.iloc[:, :-1]
y = df['diagnosis'] # Target variable

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.30, random_state=2)

# Post Pruning => Intially taking some parameter

treemodel = DecisionTreeClassifier(max_depth=2)
treemodel.fit(X_train, y_train)
DecisionTreeClassifier()

▾ DecisionTreeClassifier i ?

DecisionTreeClassifier()

# Plot
from sklearn import tree
plt.figure(figsize=(10, 10))
tree.plot_tree(treemodel, filled=True)

https://colab.research.google.com/drive/1waKedpF2yHp3ZIfNZqSASAoTSUdGRxzS?authuser=1#printMode=true 1/5
4/2/25, 3:36 PM Assgn_06_ML.ipynb - Colab

[Text(0.5, 0.8333333333333334, 'x[3] <= 697.8\ngini = 0.463\nsamples = 398\nvalue = [145, 253]'),

Text(0.25, 0.5, 'x[2] <= 90.365\ngini = 0.217\nsamples = 283\nvalue = [35, 248]'),
Text(0.375, 0.6666666666666667, 'True '),
Text(0.125, 0.16666666666666666, 'gini = 0.126\nsamples = 236\nvalue = [16, 220]'),
Text(0.375, 0.16666666666666666, 'gini = 0.482\nsamples = 47\nvalue = [19, 28]'),
Text(0.75, 0.5, 'x[1] <= 16.575\ngini = 0.083\nsamples = 115\nvalue = [110, 5]'),
Text(0.625, 0.6666666666666667, ' False'),
Text(0.625, 0.16666666666666666, 'gini = 0.496\nsamples = 11\nvalue = [6, 5]'),
Text(0.875, 0.16666666666666666, 'gini = 0.0\nsamples = 104\nvalue = [104, 0]')]

# Prediction
y_pred = treemodel.predict(X_test)
y_pred

array([1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1,
1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1,
1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1,
0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1,
1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0,
1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0,
1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1])

#Accuracy
from sklearn.metrics import accuracy_score, classification_report
score = accuracy_score(y_test, y_pred)
print(score)

0.8771929824561403

print(classification_report(y_test, y_pred))

precision recall f1-score support

0 0.91 0.76 0.83 67

1 0.86 0.95 0.90 104

accuracy 0.88 171

macro avg 0.89 0.86 0.87 171
weighted avg 0.88 0.88 0.87 171

https://colab.research.google.com/drive/1waKedpF2yHp3ZIfNZqSASAoTSUdGRxzS?authuser=1#printMode=true 2/5
4/2/25, 3:36 PM Assgn_06_ML.ipynb - Colab

# Pre-pruning using GridSearchCV

parameter = {'criterion': ['gini', 'entropy'], 'max_depth': [4, 5, 6, 7, 8, 9],
'splitter': ['best', 'random'], 'max_features': ['auto', 'sqrt', 'log2']}

from sklearn.model_selection import GridSearchCV

cv = GridSearchCV(treemodel, param_grid=parameter, cv=5, scoring='accuracy')
cv.fit(X_train, y_train)

/usr/local/lib/python3.11/dist-packages/sklearn/model_selection/_validation.py:528: FitFailedWarning:
120 fits failed out of a total of 360.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:

--------------------------------------------------------------------------------
120 fits failed with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/sklearn/model_selection/_validation.py", line 866, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/usr/local/lib/python3.11/dist-packages/sklearn/base.py", line 1382, in wrapper
estimator._validate_params()
File "/usr/local/lib/python3.11/dist-packages/sklearn/base.py", line 436, in _validate_params
validate_parameter_constraints(
File "/usr/local/lib/python3.11/dist-packages/sklearn/utils/_param_validation.py", line 98, in validate_parameter_constraints
raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'max_features' parameter of DecisionTreeClassifier must be an int in the

warnings.warn(some_fits_failed_message, FitFailedWarning)
/usr/local/lib/python3.11/dist-packages/sklearn/model_selection/_search.py:1108: UserWarning: One or more of the test scores are non
nan nan 0.89946203 0.8518038 0.87939873 0.90202532
nan nan 0.88446203 0.87943038 0.89443038 0.90449367
nan nan 0.88458861 0.87708861 0.88702532 0.86183544
nan nan 0.89199367 0.88446203 0.88949367 0.89433544
nan nan 0.85917722 0.91205696 0.87439873 0.88949367
nan nan 0.87949367 0.88202532 0.89202532 0.84702532
nan nan 0.86936709 0.87183544 0.90455696 0.86689873
nan nan 0.88949367 0.86677215 0.88449367 0.88449367
nan nan 0.89458861 0.91699367 0.87436709 0.89699367
nan nan 0.89202532 0.88446203 0.88705696 0.89696203
nan nan 0.86949367 0.90205696 0.87686709 0.90205696]
warnings.warn(
▸ GridSearchCV
i ?

▸ best_estimator_:
DecisionTreeClassifier

▸ DecisionTreeClassifier ?

 

cv.best_params_

{'criterion': 'entropy',
'max_depth': 7,
'max_features': 'sqrt',
'splitter': 'random'}

y_pred = cv.predict(X_test)
y_pred

array([1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1,
1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1,
1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0,
0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1,
1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1,
1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1])

score = accuracy_score(y_test, y_pred)

print(score)

0.8888888888888888

print(classification_report(y_test, y_pred))

https://colab.research.google.com/drive/1waKedpF2yHp3ZIfNZqSASAoTSUdGRxzS?authuser=1#printMode=true 3/5
4/2/25, 3:36 PM Assgn_06_ML.ipynb - Colab

precision recall f1-score support

0 0.93 0.78 0.85 67

1 0.87 0.96 0.91 104

accuracy 0.89 171

macro avg 0.90 0.87 0.88 171
weighted avg 0.89 0.89 0.89 171

# Random Forest
from sklearn.ensemble import RandomForestClassifier

X = x.values # Convert to numpy array for Random Forest

y = y.values

# Split again for Random Forest

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

# Create and fit Random Forest

rf = RandomForestClassifier(n_estimators=4, n_jobs=-1)
rf.fit(X_train, y_train)

▾ RandomForestClassifier i ?

RandomForestClassifier(n_estimators=4, n_jobs=-1)

# Check accuracy of Random Forest

rf.score(X_test, y_test)

0.9202127659574468

for i in range(10):
rf = RandomForestClassifier(n_estimators=4, n_jobs=-1)
rf.fit(X_train, y_train)
print(rf.score(X_test, y_test))

0.898936170212766
0.9308510638297872
0.9202127659574468
0.8776595744680851
0.8936170212765957
0.8670212765957447
0.898936170212766
0.9095744680851063
0.8670212765957447
0.925531914893617

from sklearn.svm import SVC

svm_linear=SVC(kernel='linear')
svm_rbf=SVC(kernel='rbf')

# kernel functions -> to handle non linear functions, map low dimen. to high dimen.

svm_linear.fit(X_train,y_train)
svm_rbf.fit(X_train,y_train)

▾ SVC i ?

SVC()

# accuracy score
y_pred_linear=svm_linear.predict(X_test)
y_pred_rbf=svm_rbf.predict(X_test)
accuracy_linear=accuracy_score(y_test,y_pred_linear)
accuracy_rbf=accuracy_score(y_test,y_pred_rbf)

accuracy_linear,accuracy_rbf

(0.925531914893617, 0.8882978723404256)

https://colab.research.google.com/drive/1waKedpF2yHp3ZIfNZqSASAoTSUdGRxzS?authuser=1#printMode=true 4/5
4/2/25, 3:36 PM Assgn_06_ML.ipynb - Colab

https://colab.research.google.com/drive/1waKedpF2yHp3ZIfNZqSASAoTSUdGRxzS?authuser=1#printMode=true 5/5

Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
13 pages
Random Forest Classifier on Banking Dataset
No ratings yet
Random Forest Classifier on Banking Dataset
7 pages
Decision Trees in Sklearn Decision Trees in Sklearn
No ratings yet
Decision Trees in Sklearn Decision Trees in Sklearn
7 pages
CSET301 LabW8L2
No ratings yet
CSET301 LabW8L2
1 page
Reast Cancer Prediction Using Debt
No ratings yet
Reast Cancer Prediction Using Debt
18 pages
Desicion Tree Ipynb
No ratings yet
Desicion Tree Ipynb
6 pages
ML Using Python Programs
No ratings yet
ML Using Python Programs
12 pages
Telecom Churn Proj
No ratings yet
Telecom Churn Proj
4 pages
Car Evaluation Data Analysis & Random Forest Model
No ratings yet
Car Evaluation Data Analysis & Random Forest Model
12 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
1 page
ML Functions
No ratings yet
ML Functions
12 pages
Practical 15 Python
No ratings yet
Practical 15 Python
6 pages
Pca2 1
No ratings yet
Pca2 1
26 pages
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 7
No ratings yet
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 7
23 pages
AIH Lab2
No ratings yet
AIH Lab2
10 pages
Random Forest 1737667979
No ratings yet
Random Forest 1737667979
11 pages
Decision Tree - Jupyter Notebook
No ratings yet
Decision Tree - Jupyter Notebook
4 pages
ML5 Implementation
No ratings yet
ML5 Implementation
32 pages
Experiment 8
No ratings yet
Experiment 8
14 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
MLA Lab 6:-Implementation of Decision Tree
No ratings yet
MLA Lab 6:-Implementation of Decision Tree
16 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Forest - Py
No ratings yet
Forest - Py
50 pages
Lab 3
No ratings yet
Lab 3
6 pages
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
100% (1)
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
11 pages
Facebook Graph Link Prediction
No ratings yet
Facebook Graph Link Prediction
14 pages
Section 2
No ratings yet
Section 2
2 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
Q3-Copy1: Pandas PD Numpy NP CSV
No ratings yet
Q3-Copy1: Pandas PD Numpy NP CSV
7 pages
Najir Shaikh Practical 4
No ratings yet
Najir Shaikh Practical 4
4 pages
Expt7 ML2025 250306 143857
No ratings yet
Expt7 ML2025 250306 143857
5 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
6 pages
PYHTONPRACT
No ratings yet
PYHTONPRACT
4 pages
AML Lab
No ratings yet
AML Lab
14 pages
Chapter 09 CART - Week 06 - 02
No ratings yet
Chapter 09 CART - Week 06 - 02
53 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Python Breast Cancer Prediction Guide
No ratings yet
Python Breast Cancer Prediction Guide
8 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
9 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
Notes 221104 101858
No ratings yet
Notes 221104 101858
32 pages
2021BCS0103
No ratings yet
2021BCS0103
7 pages
Sklearn
No ratings yet
Sklearn
141 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
Random Forest
No ratings yet
Random Forest
8 pages
Lecture 7.2 - DTC Algorithm Implementation
No ratings yet
Lecture 7.2 - DTC Algorithm Implementation
7 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
Heart Disease Prediction Guide
100% (1)
Heart Disease Prediction Guide
73 pages
Linearregression SVM
No ratings yet
Linearregression SVM
3 pages
ML Fat
No ratings yet
ML Fat
9 pages
Decision Tree
No ratings yet
Decision Tree
9 pages
Decision - Tree - Regression - Ipynb - Colab
No ratings yet
Decision - Tree - Regression - Ipynb - Colab
3 pages
ML NEW Final Format
No ratings yet
ML NEW Final Format
37 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
Supple Maximizing Performance in Cs CuBiCl
No ratings yet
Supple Maximizing Performance in Cs CuBiCl
5 pages
Ai Int-1
No ratings yet
Ai Int-1
6 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
15 pages
Cloud Computing - Unit 7 - Week 4
No ratings yet
Cloud Computing - Unit 7 - Week 4
3 pages
Writeup
No ratings yet
Writeup
13 pages
Proposal - CyberSheild Hackathon
No ratings yet
Proposal - CyberSheild Hackathon
7 pages
Content Addressable Networks CAN
No ratings yet
Content Addressable Networks CAN
8 pages
Assgn 04 ML Jatan - Colab
No ratings yet
Assgn 04 ML Jatan - Colab
4 pages
ML Assgn Logistic Wine Quality - Ipynb - Colab
No ratings yet
ML Assgn Logistic Wine Quality - Ipynb - Colab
5 pages
Disciplines and Ideas in The Applied Social Sciences
100% (4)
Disciplines and Ideas in The Applied Social Sciences
22 pages
Santals' Quest for Identity
No ratings yet
Santals' Quest for Identity
4 pages
Technical Seminar Report
No ratings yet
Technical Seminar Report
15 pages
Ray Optics Questions
No ratings yet
Ray Optics Questions
8 pages
Guia 5 Ingles-Octavo Segundo Periodo PDF
No ratings yet
Guia 5 Ingles-Octavo Segundo Periodo PDF
4 pages
Grammar Skills for Young Learners
No ratings yet
Grammar Skills for Young Learners
16 pages
Blooms Taxonomy Action Verbs-1
No ratings yet
Blooms Taxonomy Action Verbs-1
1 page
Chapter 8 Ethics and CSR
100% (1)
Chapter 8 Ethics and CSR
17 pages
Dag Hammarskjöld - Markings (1985, Ballantine Books) PDF
92% (13)
Dag Hammarskjöld - Markings (1985, Ballantine Books) PDF
228 pages
Methods of Analysis of Food Components and Additives Second Edition Chemical Functional Properties of Food Components Semih Otles Fast Download
100% (10)
Methods of Analysis of Food Components and Additives Second Edition Chemical Functional Properties of Food Components Semih Otles Fast Download
103 pages
Testbank For The Pharmacy Technician A Comprehensive Approach 4th Edition Moini SOLUTION MANUAL Solution Manual
No ratings yet
Testbank For The Pharmacy Technician A Comprehensive Approach 4th Edition Moini SOLUTION MANUAL Solution Manual
18 pages
Process Framework Model and Phases in Software Development
No ratings yet
Process Framework Model and Phases in Software Development
26 pages
Successful Pregnancy in Niemann-Pick B
No ratings yet
Successful Pregnancy in Niemann-Pick B
5 pages
Reservoir Performance Analysis Using OFM
No ratings yet
Reservoir Performance Analysis Using OFM
48 pages
Entrepreneur Presentation
No ratings yet
Entrepreneur Presentation
19 pages
CSC DLPs for HUMSS Teachers Training
No ratings yet
CSC DLPs for HUMSS Teachers Training
110 pages
Scheme - G Third Semester (Ce, CR, CS, CV)
No ratings yet
Scheme - G Third Semester (Ce, CR, CS, CV)
35 pages
Software Development Sheet
No ratings yet
Software Development Sheet
23 pages
Cat It62h
0% (1)
Cat It62h
4 pages
Grid Dynamics Investor Presentation 2025
No ratings yet
Grid Dynamics Investor Presentation 2025
27 pages
FAAM Catalogo Traccion Battek Ingles
No ratings yet
FAAM Catalogo Traccion Battek Ingles
7 pages
CTC - Westpoint.edu CTC SENTINEL 012024
No ratings yet
CTC - Westpoint.edu CTC SENTINEL 012024
41 pages
The Golden Broth
No ratings yet
The Golden Broth
8 pages
CHAPTER 1: Introduction To Information and Communications Technology (ICT)
No ratings yet
CHAPTER 1: Introduction To Information and Communications Technology (ICT)
18 pages
11-Introduction To Random variable-27-Jul-2020Material - I - 27-Jul-2020 - Random - Variable - PPT
No ratings yet
11-Introduction To Random variable-27-Jul-2020Material - I - 27-Jul-2020 - Random - Variable - PPT
28 pages
AI Bias in Hiring: A Gender Analysis
No ratings yet
AI Bias in Hiring: A Gender Analysis
2 pages
School Finance
100% (2)
School Finance
14 pages
Character Analysis of Tsunade Senju
No ratings yet
Character Analysis of Tsunade Senju
7 pages
MR Darcys Undoing Abigail Reynolds Instant Download
No ratings yet
MR Darcys Undoing Abigail Reynolds Instant Download
26 pages
ISO 31000 Risk Management Model
No ratings yet
ISO 31000 Risk Management Model
11 pages

Assgn 06 ML - Ipynb - Colab

Uploaded by

Assgn 06 ML - Ipynb - Colab

Uploaded by

4/2/25, 3:36 PM Assgn_06_ML.

mean_radius mean_texture mean_perimeter mean_area mean_smoothness \

# Independent and dependent features

from sklearn.model_selection import train_test_split

# Post Pruning => Intially taking some parameter

[Text(0.5, 0.8333333333333334, 'x[3] <= 697.8\ngini = 0.463\nsamples = 398\nvalue = [145, 253]'),

precision recall f1-score support

0 0.91 0.76 0.83 67

accuracy 0.88 171

# Pre-pruning using GridSearchCV

from sklearn.model_selection import GridSearchCV

Below are more details about the failures:

score = accuracy_score(y_test, y_pred)

precision recall f1-score support

0 0.93 0.78 0.85 67

accuracy 0.89 171

X = x.values # Convert to numpy array for Random Forest

# Split again for Random Forest

# Create and fit Random Forest

# Check accuracy of Random Forest

from sklearn.svm import SVC

You might also like