Data Science Lecture No 01
Data Science Lecture No 01
01 AI-7 , SEN-5
th th
12/15/2024 1
Course Objective
Objective 1:
To understand complete process of data science.
Objective 2:
To get deep understanding of classification algorithms and regression methods,
feature extraction, and segmentation techniques.
Objective 3:
To understand data analysis methods, various evaluation measures, and ensemble-
based methods.
https://covid19.uthm.edu.my/wp-content/uploads/
2020/04/Data-Science-from-Scratch-First-Principles-with- 2
Python-by-Joel-Grus-z-lib.org_.epub_.pdf
Course Outline
Data Science Complete Process
Data Collection
Data Preprocessing Process
Data Preprocessing Techniques
Regression
Types of Regression
Regression Techniques
Regression Analysis
Feature Engineering
Feature Extraction Techniques
Feature Selection Techniques
https://covid19.uthm.edu.my/wp-content/uploads/
2020/04/Data-Science-from-Scratch-First-Principles-with- 3
Python-by-Joel-Grus-z-lib.org_.epub_.pdf
Course Outline
Learning Paradigms
Supervised Learning
Unsupervised Learning
Semi-Supervised Learning
Classification Algorithms
Common Classification Algorithms
Ensemble-Based Algorithms
Evaluation Measures
Metrics to Evaluate Model Performance
https://covid19.uthm.edu.my/wp-content/uploads/
2020/04/Data-Science-from-Scratch-First-Principles-with- 4
Python-by-Joel-Grus-z-lib.org_.epub_.pdf
Course Outline
Clustering Techniques
Segmentation Techniques
Dimensionality Reduction Techniques
Data Analysis Techniques
Reinforcement Learning
Natural Language Processing
Data Imbalance
Techniques to Handle Imbalanced Data
Deep Learning
https://covid19.uthm.edu.my/wp-content/uploads/
2020/04/Data-Science-from-Scratch-First-Principles-with- 5
Python-by-Joel-Grus-z-lib.org_.epub_.pdf
Course Weight Breakdown
Assessment Instruments with Weights
Homework, Quizzes, Midterms, Final, Assignments etc.)
Theory
o Quizzes 10%
o Assignment 10%
o Midterm 30%
o Presentation/Other Activities 10%
o Final 40%
12/15/2024 6
Course book
Text & Reference Books
Focus will be on topics rather
Chapters of any text
Many Textbooks are available in
market
o Data Science from Scratch by Joel Grus
Copyright © 2019 Joel Grus. All rights
reserved. Printed in the United States of
America.
https://covid19.uthm.edu.my/wp-content/uploads/
2020/04/Data-Science-from-Scratch-First-Principles-with- 7
Python-by-Joel-Grus-z-lib.org_.epub_.pdf
Semester Project
All students in pairs will choose a real time dataset based on the
data you will be finding the insights by applying AI(ML, DL and etc.)
Datasets can be extract from genuine data centers such as kaggal,
IBM data center, and google data center and many more
First assignment will be the description of your data, its parameters
why you have considered the chosen one what technique you are
going to apply
12/15/2024 8
Introduction to Data Science
12/15/2024 9
Lecture Contents
Course introduction
Introduction to Data Science
List of best jobs
Data Science and Artificial Intelligence
Data science is not machine learning
Data science is not statistics
Data science is not big data
Data Science as a UNIFIER
Applications of data science
12/15/2024 10
“Data! Data! Data!” he cried impatiently. “I can’t make bricks without
clay.” —Arthur Conan Doyle
12/15/2024 11
Introduction to Data Science
Data Science
Data science is the application of computational and statistical techniques to
address or gain insight into some problem in the real world
Data science = statistics +
data processing +
machine learning +
scientific inquiry +
visualization +
business analytics +
big data + …
12
12/15/2024 12
Data Scientists
13
https://www.ranker.com/crowdranked-list/best-jobs-in-the-world
https://www.panelplace.com/blogs/top-8-coolest-jobs-world 13
Data Scientists
Data Scientist
The hottest Job of the 21st Century
They find stories, extract knowledge
They are not reporters
12/15/2024 14
Lists of Best Jobs
15
https://www.ranker.com/crowdranked-list/best-jobs-in-the-world
https://www.panelplace.com/blogs/top-8-coolest-jobs-world 15
Lists of Best Jobs
https://www.glassdoor.com/List/Best-Jobs-in-America-LST_KQ0,20.htm
16
https://money.usnews.com/careers/best-jobs/rankings/the-100-best-jobs
Data science and AI
12/15/2024 17
Data science is not machine learning
Machine learning involves computation and statistics, but has not
(traditionally) been very concerned about answering scientific
questions
Machine learning has a heavy focus on fancy algorithms
12/15/2024 18
Data science is not statistics
“Analyzing data computationally,
to understand some phenomenon
in the real world, you say? …
that sounds an awful lot like statistics”
Not many statistics courses have a lecture
on e.g. web scraping, or a lot of data
processing more generally
Plus, statisticians use R, while data scientists
use Python ... clearly these are completely different fields
https://www.linkedin.com/pulse/data-science-statistics-paulo-
19
rios-jr-
Data science is not big data
Sometimes, in order to truly understand and
answer your question, you need massive
amounts of data
But sometimes you don’t
Don’t create more work for yourself than
you need to
12/15/2024 21
DATA SCIENCE APPLICATION
Fraud detection
Investigate fraud patterns in past data
Early detection is important
o Before damage propagates
o Harder than late detection
Precision is important
o False positive and false negative are both bad
Real-time analytics
12/15/2024 22
Cont.
Recommender systems
The ability to offer unique personalized service
Increase sales, click-through rates, conversions, …
Netflix recommender system valued at $1B per year
Amazon recommender system
drives a 20-35% lift in sales annually
Collaborative filtering at scale
12/15/2024 23
Cont.
Predicting why patients are being readmitted
Reduce costs
Improve population health
Find the “why” behind specific populations
being readmitted
Data lakes of multiple data sources
Investigate ties between readmission and
socioeconomic data points, patient history, genetics
12/15/2024 24
Cont.
Moneyball
How to build a baseball team on a very
low budget by relying on data
Sabermetrics: the statistical analysis of baseball
data to objectively evaluate performance
2002 record of 103-59 was joint best in MLB
o Team salary budget: $40 million
Other team: Yankees
o Team salary budget: $120 million
12/15/2024 25
References
https://dsg.uwaterloo.ca/CDSW/slides/Data%20Science%20Present
ation.pdf
https://github.com/MicrosoftLearning/Data-Science-Essentials/
blob/master/Slides/Module%201%20-%20Intro%20to%20Data
%20Science.pptx
Data Science, Computer Science and Mathematical Sciences College
of Engineering Tennessee State University
12/15/2024 26
Upcoming lecture
IDE for data science
Exploration of data
Data and types
12/15/2024 27
Thank You !
12/15/2024 28