DMML Lab Report 05

This lab report focuses on feature selection for predicting stroke using data mining techniques. It details the process of dividing the dataset into independent and dependent variables, applying SelectKBest with ANOVA for feature selection, and visualizing the results through bar plots and correlation heatmaps. The report concludes by identifying the top 8 features most correlated with the target variable 'stroke'.

Uploaded by

Atick Arman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views6 pages

DMML Lab Report 05

Uploaded by

Atick Arman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Lab report

Course code: CSE326

Course Title: Data Mining and Machine Learning Lab
Lab report: 05
Topic: Feature Selection.

Submitted To:
Name: Sadman Sadik Khan
Designation: Lecturer
Department: CSE
Daffodil International University

Submitted By:
Name: Fardus Alam
ID: 222-15-6167
Section: 62-G
Department: CSE
Daffodil International University

Submission Date: 15-03-2025

Code: Dividing the dataset to input and target

1. x = df2.drop('stroke',axis=1, inplace = False)

2. y =
df2['stroke'] 3.

Explanation:
Here I divide the whole data set into part, one is independent variables part (x) and another one is
dependent variable-target class (y).

Code: Feature Selection using SelectKBest and Anova test

1. from sklearn.feature_selection import SelectKBest
2. from sklearn.feature_selection import f_classif

5. fit_features = SelectKBest(score_func = f_classif)

6. fit_features.fit(x,y)
7.
8. fs = pd.DataFrame(fit_features.scores_,index=x.columns,
columns = ['score values'])
9.
10. fs.nlargest(7,'score values')
11.

Output:
Explanation:
This code performs feature selection to identify the most important features for predicting stroke using
SelectKBest.
Steps:
1. SelectKBest with f_classif: Selects the top features based on ANOVA F-value.
2. x = df2.drop('stroke', axis=1): Drops the target column (stroke), storing features in x.
3. y = df2['stroke']: Stores the target column (stroke) in y.
4. fit_features.fit(x, y): Fits the SelectKBest model on the data to score the features.
5. pd.DataFrame(fit_features.scores_): Creates a DataFrame of feature scores.
6. fs.nlargest(7, 'score values'): Selects the top 7 features with the highest scores.
Purpose:
 Identifies the most relevant features for predicting stroke.

Code:
1. fs.nlargest(7, 'score values').plot(kind="barh",
figsize=(10, 5), color='y', edgecolor='black')
2. plt.title('Top 7 Features by Score Value', fontsize=16)
3. plt.xlabel('Score Value', fontsize=12)
4. plt.ylabel('Feature', fontsize=12)
5. plt.grid(True, axis='x')
6. plt.tight_layout()
7. plt.show()
8.

Output:
Explanation:
This code generates a horizontal bar plot to visualize the top 7 features based on their score values from
the SelectKBest feature selection.

Code: Correlation

1. plt.figure(figsize=(12,8))
2. sns.heatmap(df2.corr(), annot=True, cmap="coolwarm", linewidths=1)
3. plt.title("Feature Correlation Matrix")
4. plt.show()
5.

Output:

Explanation:
This code generates a correlation heatmap to visualize the relationships between features in df2.
Explanation:
1. plt.figure(figsize=(12, 8)): Sets the figure size to 12x8 inches.
2. sns.heatmap(df2.corr(), annot=True, cmap="coolwarm", linewidths=0.5):
 df2.corr() computes the correlation matrix of df2.
 annot=True annotates the heatmap with correlation values.
 cmap="coolwarm" sets the color map for visualization (cool colors for
negative, warm for positive correlations).
 linewidths=1 adds separation lines between cells.

Code: top 8 columns

1. corr_matrix = df2.corr()
2.
3. corr_with_target = corr_matrix['stroke'].abs()
4.
5. top_8 = corr_with_target.sort_values(ascending=False).head(8)
6. print(f"Top 8 Features based on Correlation with Target: \n{top_8}")
7.

Output:

Explanation:
This code calculates and displays the top 8 features with the highest correlation to the target variable
stroke.
1. df2.corr(): Generates the correlation matrix for all features in df2.
2. corr_matrix['stroke']: Extracts the correlation values of all features with respect to stroke.
3. .abs(): Converts correlations to absolute values to focus on the strength of the
relationships, ignoring the direction.
4. .sort_values(ascending=False).head(8): Sorts the correlation values in descending order and
selects the top 8 features with the highest correlation.

Experiment No.: 9: T. Y. B. Tech (CSE) - II Subject: Open Source Lab-II
No ratings yet
Experiment No.: 9: T. Y. B. Tech (CSE) - II Subject: Open Source Lab-II
4 pages
CQF June 2021 M4L4 Solutions
No ratings yet
CQF June 2021 M4L4 Solutions
14 pages
Experiment 2
No ratings yet
Experiment 2
17 pages
Ex-07 DS
No ratings yet
Ex-07 DS
5 pages
Decision Support
No ratings yet
Decision Support
21 pages
Feature Selection 16891042299
No ratings yet
Feature Selection 16891042299
23 pages
Healthcare-Project-Simplilearn - Week3
No ratings yet
Healthcare-Project-Simplilearn - Week3
7 pages
Samplecode (HDPS)
No ratings yet
Samplecode (HDPS)
29 pages
Eda 3
No ratings yet
Eda 3
6 pages
Heart Disease Prediction Using ML
No ratings yet
Heart Disease Prediction Using ML
16 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
NF Assighment4
No ratings yet
NF Assighment4
5 pages
Heart Disease Report With Comments and Code
No ratings yet
Heart Disease Report With Comments and Code
9 pages
Code Shabab Error 7
No ratings yet
Code Shabab Error 7
5 pages
8 Feature Selection For Machine Learning
No ratings yet
8 Feature Selection For Machine Learning
4 pages
COMP5318
No ratings yet
COMP5318
42 pages
Ai in HC - 2
No ratings yet
Ai in HC - 2
9 pages
Prathamesh KRAI
No ratings yet
Prathamesh KRAI
38 pages
Health Risk Prediction
No ratings yet
Health Risk Prediction
80 pages
Pandas: Reference Sheet
No ratings yet
Pandas: Reference Sheet
9 pages
The Art of Finding The Best Features For Machine Learning - by Rebecca Vickery - Towards Data Science
No ratings yet
The Art of Finding The Best Features For Machine Learning - by Rebecca Vickery - Towards Data Science
14 pages
DA 1 (Datamining)
No ratings yet
DA 1 (Datamining)
7 pages
22101A0040 Exp2
No ratings yet
22101A0040 Exp2
7 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
PRJ-Parkinsons Disease Prediction
No ratings yet
PRJ-Parkinsons Disease Prediction
16 pages
C2 W4 Lab 02 Tree Ensemble
No ratings yet
C2 W4 Lab 02 Tree Ensemble
16 pages
SVM
No ratings yet
SVM
12 pages
ML 1-10
No ratings yet
ML 1-10
53 pages
B.Tech Machine Learning: Feature Selection
No ratings yet
B.Tech Machine Learning: Feature Selection
28 pages
Lab 2
No ratings yet
Lab 2
8 pages
ML pr5
No ratings yet
ML pr5
3 pages
Astros
No ratings yet
Astros
20 pages
Natural Language Understanding
No ratings yet
Natural Language Understanding
14 pages
Day 30 UnderstandingYourData 7steps
No ratings yet
Day 30 UnderstandingYourData 7steps
4 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
Muhammad Ali Ansari 24855 A2
No ratings yet
Muhammad Ali Ansari 24855 A2
5 pages
Stroke Prediction Data Guide
No ratings yet
Stroke Prediction Data Guide
5 pages
Machine Learning Project Guide
No ratings yet
Machine Learning Project Guide
12 pages
Unit1 ML Programs
No ratings yet
Unit1 ML Programs
5 pages
UNITIV BtechIot
No ratings yet
UNITIV BtechIot
43 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Openlab 1
No ratings yet
Openlab 1
17 pages
Codes For Practice
No ratings yet
Codes For Practice
2 pages
Lab 4 - Feature Selection - Appendix
No ratings yet
Lab 4 - Feature Selection - Appendix
3 pages
Medical Data ML
No ratings yet
Medical Data ML
6 pages
Project Data Mining (AMAN YADAV)
No ratings yet
Project Data Mining (AMAN YADAV)
12 pages
Dav Lab Manual
No ratings yet
Dav Lab Manual
28 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
PROJECTS
No ratings yet
PROJECTS
6 pages
Assignment 1 - LP1
No ratings yet
Assignment 1 - LP1
14 pages
Prediction - Ipynb - Colab
No ratings yet
Prediction - Ipynb - Colab
7 pages
Supervised Learning
100% (1)
Supervised Learning
15 pages
Linear Reg Signal and Noise PDF
No ratings yet
Linear Reg Signal and Noise PDF
20 pages
Importing Libraries: Pandas PD Matplotlib - Pyplot PLT Numpy NP
No ratings yet
Importing Libraries: Pandas PD Matplotlib - Pyplot PLT Numpy NP
10 pages
Adaboost 2
No ratings yet
Adaboost 2
9 pages
DMML Lab Report 02
No ratings yet
DMML Lab Report 02
11 pages
Internship Report
No ratings yet
Internship Report
23 pages
Case Study 3
No ratings yet
Case Study 3
3 pages
Motivation Interview Question For Lithuania - Final
100% (1)
Motivation Interview Question For Lithuania - Final
2 pages
Case Study... 1
No ratings yet
Case Study... 1
2 pages
World University of Bangladesh
No ratings yet
World University of Bangladesh
13 pages
Appllo Hospital
No ratings yet
Appllo Hospital
14 pages
Personal Profile: Atick Arman
No ratings yet
Personal Profile: Atick Arman
2 pages
World University of Bangladesh: Submitted by
No ratings yet
World University of Bangladesh: Submitted by
23 pages
Mathematics - Mathematics - Question Paper
No ratings yet
Mathematics - Mathematics - Question Paper
13 pages
Unisa Internationalstudentguide Web
No ratings yet
Unisa Internationalstudentguide Web
27 pages
Creative Nonfiction Drafting Guide
No ratings yet
Creative Nonfiction Drafting Guide
9 pages
Master
No ratings yet
Master
365 pages
FEng Outlines 2016SEP
No ratings yet
FEng Outlines 2016SEP
3 pages
Six Thinking Hats for Decision Making
No ratings yet
Six Thinking Hats for Decision Making
14 pages
ESL Resources for Teachers
No ratings yet
ESL Resources for Teachers
1 page
CB-PAST Forms For Teachers
93% (29)
CB-PAST Forms For Teachers
38 pages
Revised Selection List For The Post of Assistant Professor (Physics) in Higher Education Department. 24 - 03 - 2021
No ratings yet
Revised Selection List For The Post of Assistant Professor (Physics) in Higher Education Department. 24 - 03 - 2021
3 pages
3i's Inquiries, Investigation and Immersion
No ratings yet
3i's Inquiries, Investigation and Immersion
28 pages
Mathematical Aspects of Quantum Field Theory Edson de Faria PDF Download
No ratings yet
Mathematical Aspects of Quantum Field Theory Edson de Faria PDF Download
52 pages
Selin Demircan Statement of Purpose
No ratings yet
Selin Demircan Statement of Purpose
3 pages
Engineering Student's Career Journey
No ratings yet
Engineering Student's Career Journey
1 page
(FINAL REQUIREMENT) Top Secret Billionaire - VARON
No ratings yet
(FINAL REQUIREMENT) Top Secret Billionaire - VARON
3 pages
NAPPS SCHEME For PRI 1-3 - 1st Term 2020-2021-1
No ratings yet
NAPPS SCHEME For PRI 1-3 - 1st Term 2020-2021-1
10 pages
Portable Antiquities Annual Report 2007
No ratings yet
Portable Antiquities Annual Report 2007
224 pages
Motivation Letter
100% (1)
Motivation Letter
2 pages
Yosef 2
No ratings yet
Yosef 2
57 pages
Grammar Unit 9: Rainforest Holiday in Peru
No ratings yet
Grammar Unit 9: Rainforest Holiday in Peru
2 pages
EEE Department Brochure - 02.04-1
No ratings yet
EEE Department Brochure - 02.04-1
3 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
17 pages
Design Thinking Reduces Cognitive Bias
No ratings yet
Design Thinking Reduces Cognitive Bias
14 pages
Reducing High School Students Writing Anxiety Through Specific Scaffolding Activities
No ratings yet
Reducing High School Students Writing Anxiety Through Specific Scaffolding Activities
3 pages
Armed Assailant Guide FINAL
No ratings yet
Armed Assailant Guide FINAL
24 pages
Pre-Nursery Lesson Plan: Sounds 'i', 'n', 'p'
No ratings yet
Pre-Nursery Lesson Plan: Sounds 'i', 'n', 'p'
13 pages
S - Daftar Pustaka - 12020114130133
No ratings yet
S - Daftar Pustaka - 12020114130133
4 pages
ST Dan SPPD
No ratings yet
ST Dan SPPD
47 pages
Atomic Theory Electron Configuration Week1
No ratings yet
Atomic Theory Electron Configuration Week1
46 pages
Universiti Teknologi Mara Test 2
No ratings yet
Universiti Teknologi Mara Test 2
8 pages
6206fefc0c3c8 Course Task 1 Midterm Emergency Room
No ratings yet
6206fefc0c3c8 Course Task 1 Midterm Emergency Room
2 pages