SPA Group 13 - Assignment 2 Problem Statement

Uploaded by

2023dc04090

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views2 pages

SPA Group 13 - Assignment 2 Problem Statement

Uploaded by

2023dc04090

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Objective of Assignment:

 To apply Machine Learning model for the given dataset.

 To prepare a jupyter notebook or Google Colab to build, train and evaluate a Machine Learning
models using MLlib - PySpark DataFrames on Databricks for the given dataset.
 To provide appropriate analysis for the same and do the prediction for the test data and display
the results for the inference.

Please read the instructions carefully.

Dataset - https://www.kaggle.com/datasets/hosammhmdali/heart-disease-dataset

1. Import Libraries/Dataset
a. Download the dataset
b. Import the required libraries

2. Data Visualization and Exploration

a. Print at least 5 rows for sanity check to identify all the features present in the dataset and
if the target matches with them.
b. Print the description and shape of the dataset.
c. Provide appropriate visualization to get an insight about the dataset.
d. Try exploring the data and see what insights can be drawn from the dataset.

3. Data Pre-processing and cleaning

a. Do the appropriate preprocessing of the data like identifying NULL or Missing Values if
any, handling of outliers if present in the dataset, skewed data etc. Apply appropriate
feature engineering techniques for them.
b. Apply the feature transformation techniques like Standardization, Normalization, etc.
You are free to apply the appropriate transformations depending upon the structure and
the complexity of your dataset.
c. Do the correlational analysis on the dataset. Provide a visualization for the same.
4. Data Preparation
a. Do the final feature selection and extract them into Column X and the class label into
Column into Y.
b. Split the dataset into training and test sets.
5. Model Building
a. Perform Model Development using at least three models, separately. You are free to
apply any Machine Learning Models on the dataset by using MLlib- PySpark. Deep
Learning Models are strictly not allowed.
b. Train the model and print the training accuracy and loss values.
6. Performance Evaluation
a. Print the confusion matrix. Provide appropriate analysis for the same.
b. Do the prediction for the test data and display the results for the inference.
Instructions for Assignment Evaluation
1. Since this is a group assignment and only one ZIP file need to upload in the canvas which
consists of two files – HTML and .ipynb .
2. Please follow the naming convention as <Group no>_<Dataset name>.ipynb and <Group
no>_<Dataset name>.html
Eg. – for group 1 with a weather dataset your notebooks should be named as –
Group01_WeatherDataset.ipynb and Group01_WeatherDataset.html
3. Inside each jupyter notebook, you are required to mention your name, Group details and the
Assignment dataset you will be working on.
4. Organize your code in separate sections for each task. Add comments to make the code
readable.
5. Deep Learning Models are strictly not allowed. You are encouraged to learn classical Machine
learning techniques and experience their behaviour.
6. Notebooks without output shall not be considered for evaluation.
Mark Allocation - 10 Marks
1. Import Libraries/Dataset - 1 mark
2. Data Visualization and Exploration - 2 marks
3. Data Pre-processing and cleaning - 2 marks
4. Data Preparation – 2 marks
5. Model Building – 2 marks
6. Performance Evaluation – 1 marks

Reference:
https://docs.databricks.com/getting-started/dataframes-python.html
https://www.kaggle.com/code/towhidultonmoy/end-to-end-pyspark-project
https://www.kaggle.com/code/tientd95/advanced-pyspark-for-exploratory-data-analysis

------

Practical Assignment. Applying Methods of Machine Learning With Example
No ratings yet
Practical Assignment. Applying Methods of Machine Learning With Example
2 pages
Final Project Guidelines: Dataset Selection & Planning
No ratings yet
Final Project Guidelines: Dataset Selection & Planning
3 pages
DS Assignment
No ratings yet
DS Assignment
7 pages
Big Data Framework Final Project
No ratings yet
Big Data Framework Final Project
2 pages
ML Task
No ratings yet
ML Task
4 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Rakesh Kumar - 21554244 - Big Data - Assessment 2
No ratings yet
Rakesh Kumar - 21554244 - Big Data - Assessment 2
23 pages
Assignment-2 IDS
No ratings yet
Assignment-2 IDS
2 pages
1 - Data Preprocessing and Cleaning - 55
No ratings yet
1 - Data Preprocessing and Cleaning - 55
8 pages
Assignment 3-PDS Python-24S3
No ratings yet
Assignment 3-PDS Python-24S3
5 pages
Data Scientist Exercise
No ratings yet
Data Scientist Exercise
2 pages
CSL7620 A2
No ratings yet
CSL7620 A2
2 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Ce473 Project - Fall 2024
No ratings yet
Ce473 Project - Fall 2024
8 pages
ML Lab Syllabus for Students
No ratings yet
ML Lab Syllabus for Students
90 pages
Dsbda Lab - 1 - 1736243987425
No ratings yet
Dsbda Lab - 1 - 1736243987425
10 pages
Subject - Machine Learning Group - E27-24 Name
No ratings yet
Subject - Machine Learning Group - E27-24 Name
18 pages
Project2 - 158755. 4.21
No ratings yet
Project2 - 158755. 4.21
3 pages
Capstone Project Guidelines
No ratings yet
Capstone Project Guidelines
2 pages
Assignment - Building A Predictive Model With PySpark and MLlib
No ratings yet
Assignment - Building A Predictive Model With PySpark and MLlib
5 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
AIML Hard
No ratings yet
AIML Hard
22 pages
Machine Learning Assignment-02
No ratings yet
Machine Learning Assignment-02
2 pages
Assignment - 1 - Machine Learning
No ratings yet
Assignment - 1 - Machine Learning
3 pages
Project Data Scientist Program Group Project
No ratings yet
Project Data Scientist Program Group Project
2 pages
Lab Assignment - SVM - 2024
No ratings yet
Lab Assignment - SVM - 2024
5 pages
Machine Learning Project Guide
No ratings yet
Machine Learning Project Guide
3 pages
W2. Homework - Pipeline
No ratings yet
W2. Homework - Pipeline
1 page
Machine Learning Assignment 2
No ratings yet
Machine Learning Assignment 2
1 page
27 KrishParasShah
No ratings yet
27 KrishParasShah
17 pages
Lab 08 - Data Preprocessing
No ratings yet
Lab 08 - Data Preprocessing
9 pages
Task 2P-1
No ratings yet
Task 2P-1
4 pages
Lab Questionbank
No ratings yet
Lab Questionbank
3 pages
AML ML Practical List
No ratings yet
AML ML Practical List
10 pages
DSBDA Manual
No ratings yet
DSBDA Manual
76 pages
Abinash Nag Project Report CART
No ratings yet
Abinash Nag Project Report CART
40 pages
Important Questions
No ratings yet
Important Questions
4 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Advance Python
No ratings yet
Advance Python
5 pages
Assignment - Jupyter Notebook
No ratings yet
Assignment - Jupyter Notebook
10 pages
Assignmnet
No ratings yet
Assignmnet
25 pages
Kartik MLP 4-9prg
No ratings yet
Kartik MLP 4-9prg
10 pages
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
100% (1)
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
256 pages
Data Science Project - DSI431 (4.1)
No ratings yet
Data Science Project - DSI431 (4.1)
2 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
38 pages
ML Question
No ratings yet
ML Question
2 pages
ML Assignment 1
No ratings yet
ML Assignment 1
15 pages
ML Assign1 Part2 2023
No ratings yet
ML Assign1 Part2 2023
2 pages
Assignment2 2024
No ratings yet
Assignment2 2024
4 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
Machine Learning BCA57204LAB
No ratings yet
Machine Learning BCA57204LAB
41 pages
Theory (10 Marks)
No ratings yet
Theory (10 Marks)
4 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
MBAN Assignment
No ratings yet
MBAN Assignment
2 pages
Index: SR. NO. Practical Name Date of Perform NO. Sign
No ratings yet
Index: SR. NO. Practical Name Date of Perform NO. Sign
28 pages
ML Algorithms for Data Scientists
100% (1)
ML Algorithms for Data Scientists
148 pages
Capstone Project - Jaro-Prof. Babji
No ratings yet
Capstone Project - Jaro-Prof. Babji
5 pages
Creative Nonfiction G12 - Q1 Week 2 PDF
100% (4)
Creative Nonfiction G12 - Q1 Week 2 PDF
10 pages
On Sethe's Trauma and Recovery in Beloved From The Perspective of Trauma Theory
No ratings yet
On Sethe's Trauma and Recovery in Beloved From The Perspective of Trauma Theory
7 pages
Behavior Agreement
No ratings yet
Behavior Agreement
2 pages
Gamification in Education and Learning
No ratings yet
Gamification in Education and Learning
11 pages
Formative
No ratings yet
Formative
7 pages
Five Minds for the Future: Gardner's Theory
No ratings yet
Five Minds for the Future: Gardner's Theory
10 pages
Brain Activity: This Is A Sample Text. Insert Your Desired Text Here
No ratings yet
Brain Activity: This Is A Sample Text. Insert Your Desired Text Here
6 pages
Machine Learning Roadmap
No ratings yet
Machine Learning Roadmap
35 pages
Module 3 Personal Development Plan
No ratings yet
Module 3 Personal Development Plan
6 pages
Biological Psychology: Boris Bornemann, Bethany E. Kok, Anne Böckler, Tania Singer
No ratings yet
Biological Psychology: Boris Bornemann, Bethany E. Kok, Anne Böckler, Tania Singer
10 pages
Unit 1 All Tenses: Simple Present
100% (1)
Unit 1 All Tenses: Simple Present
2 pages
Grade 6 Midterm Exam
No ratings yet
Grade 6 Midterm Exam
2 pages
Digital Sat Inferences
100% (1)
Digital Sat Inferences
13 pages
Sta. Romana Memorial Elementary School Fourth Periodical Test
No ratings yet
Sta. Romana Memorial Elementary School Fourth Periodical Test
3 pages
Tip Course 1
No ratings yet
Tip Course 1
34 pages
Present Continuous Tense Review Workshop: Context (Affirmative Sentences)
No ratings yet
Present Continuous Tense Review Workshop: Context (Affirmative Sentences)
2 pages
15 Characteristics of A 21st-Century Teacher: 1. Learner-Centered Classroom and Personalized Instructions
No ratings yet
15 Characteristics of A 21st-Century Teacher: 1. Learner-Centered Classroom and Personalized Instructions
3 pages
Bennett - Kant's Analytic PDF
No ratings yet
Bennett - Kant's Analytic PDF
147 pages
Advanced ESL - The House On Mango Street - Week 1
100% (2)
Advanced ESL - The House On Mango Street - Week 1
42 pages
Anthony's Speech
No ratings yet
Anthony's Speech
4 pages
Moore, Michael - Justifying Retributivism PDF
100% (1)
Moore, Michael - Justifying Retributivism PDF
36 pages
Discipline in The Classroom
No ratings yet
Discipline in The Classroom
10 pages
Theories of Learning
No ratings yet
Theories of Learning
8 pages
Structure of A Report
100% (2)
Structure of A Report
25 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Naturevsnurture
No ratings yet
Naturevsnurture
4 pages
Personal Hygiene Concepts of Health Education For The Visually Impaired
No ratings yet
Personal Hygiene Concepts of Health Education For The Visually Impaired
11 pages
Exploring The Entrepreneurial Passion-Behavior Relationship Through Implementation Intentions: The Moderating Roles of Entrepreneurial Self-Efficacy and Risk-Taking Propensity
No ratings yet
Exploring The Entrepreneurial Passion-Behavior Relationship Through Implementation Intentions: The Moderating Roles of Entrepreneurial Self-Efficacy and Risk-Taking Propensity
23 pages
Susan L. Woods, Patricia Rockman, Evan Collins - Mindfulness-Based Cognitive Therapy - Embodied Presence and Inquiry in Practice-Context Press (2019)
100% (1)
Susan L. Woods, Patricia Rockman, Evan Collins - Mindfulness-Based Cognitive Therapy - Embodied Presence and Inquiry in Practice-Context Press (2019)
223 pages
All Around 2 TB PDF
75% (4)
All Around 2 TB PDF
106 pages

SPA Group 13 - Assignment 2 Problem Statement

Uploaded by

SPA Group 13 - Assignment 2 Problem Statement

Uploaded by

Objective of Assignment:

 To apply Machine Learning model for the given dataset.

Please read the instructions carefully.

2. Data Visualization and Exploration

3. Data Pre-processing and cleaning

You might also like