[go: up one dir, main page]

0% found this document useful (0 votes)
16 views28 pages

Data Science Lecture No 01

Uploaded by

abdul baqi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views28 pages

Data Science Lecture No 01

Uploaded by

abdul baqi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Lecture No.

01 AI-7 , SEN-5
th th

Course: Introduction to Data Science


Instructor: Dr. Maryum Nisar

12/15/2024 1
Course Objective
Objective 1:
 To understand complete process of data science.
Objective 2:
 To get deep understanding of classification algorithms and regression methods,
feature extraction, and segmentation techniques.
Objective 3:
 To understand data analysis methods, various evaluation measures, and ensemble-
based methods.

https://covid19.uthm.edu.my/wp-content/uploads/
2020/04/Data-Science-from-Scratch-First-Principles-with- 2
Python-by-Joel-Grus-z-lib.org_.epub_.pdf
Course Outline
Data Science Complete Process
 Data Collection
 Data Preprocessing Process
 Data Preprocessing Techniques
Regression
 Types of Regression
 Regression Techniques
 Regression Analysis
Feature Engineering
 Feature Extraction Techniques
 Feature Selection Techniques
https://covid19.uthm.edu.my/wp-content/uploads/
2020/04/Data-Science-from-Scratch-First-Principles-with- 3
Python-by-Joel-Grus-z-lib.org_.epub_.pdf
Course Outline
Learning Paradigms
 Supervised Learning
 Unsupervised Learning
 Semi-Supervised Learning
Classification Algorithms
 Common Classification Algorithms
 Ensemble-Based Algorithms
Evaluation Measures
 Metrics to Evaluate Model Performance

https://covid19.uthm.edu.my/wp-content/uploads/
2020/04/Data-Science-from-Scratch-First-Principles-with- 4
Python-by-Joel-Grus-z-lib.org_.epub_.pdf
Course Outline
Clustering Techniques
 Segmentation Techniques
Dimensionality Reduction Techniques
Data Analysis Techniques
Reinforcement Learning
Natural Language Processing
Data Imbalance
Techniques to Handle Imbalanced Data
Deep Learning
https://covid19.uthm.edu.my/wp-content/uploads/
2020/04/Data-Science-from-Scratch-First-Principles-with- 5
Python-by-Joel-Grus-z-lib.org_.epub_.pdf
Course Weight Breakdown
Assessment Instruments with Weights
 Homework, Quizzes, Midterms, Final, Assignments etc.)
 Theory
o Quizzes 10%
o Assignment 10%
o Midterm 30%
o Presentation/Other Activities 10%
o Final 40%

12/15/2024 6
Course book
Text & Reference Books
 Focus will be on topics rather
Chapters of any text
 Many Textbooks are available in
market
o Data Science from Scratch by Joel Grus
Copyright © 2019 Joel Grus. All rights
reserved. Printed in the United States of
America.

https://covid19.uthm.edu.my/wp-content/uploads/
2020/04/Data-Science-from-Scratch-First-Principles-with- 7
Python-by-Joel-Grus-z-lib.org_.epub_.pdf
Semester Project
All students in pairs will choose a real time dataset based on the
data you will be finding the insights by applying AI(ML, DL and etc.)
Datasets can be extract from genuine data centers such as kaggal,
IBM data center, and google data center and many more
First assignment will be the description of your data, its parameters
why you have considered the chosen one what technique you are
going to apply

12/15/2024 8
Introduction to Data Science

12/15/2024 9
Lecture Contents
Course introduction
 Introduction to Data Science
 List of best jobs
 Data Science and Artificial Intelligence
 Data science is not machine learning
 Data science is not statistics
 Data science is not big data
 Data Science as a UNIFIER
 Applications of data science

12/15/2024 10
“Data! Data! Data!” he cried impatiently. “I can’t make bricks without
clay.” —Arthur Conan Doyle

12/15/2024 11
Introduction to Data Science
Data Science
 Data science is the application of computational and statistical techniques to
address or gain insight into some problem in the real world
 Data science = statistics +
data processing +
machine learning +
scientific inquiry +
visualization +
business analytics +
big data + …

12

12/15/2024 12
Data Scientists

13
https://www.ranker.com/crowdranked-list/best-jobs-in-the-world
https://www.panelplace.com/blogs/top-8-coolest-jobs-world 13
Data Scientists

 Data Scientist
 The hottest Job of the 21st Century
 They find stories, extract knowledge
 They are not reporters

12/15/2024 14
Lists of Best Jobs

15
https://www.ranker.com/crowdranked-list/best-jobs-in-the-world
https://www.panelplace.com/blogs/top-8-coolest-jobs-world 15
Lists of Best Jobs

https://www.glassdoor.com/List/Best-Jobs-in-America-LST_KQ0,20.htm
16
https://money.usnews.com/careers/best-jobs/rankings/the-100-best-jobs
Data science and AI

“Data science produces insights. Machine


learning produces predictions”

12/15/2024 17
Data science is not machine learning
Machine learning involves computation and statistics, but has not
(traditionally) been very concerned about answering scientific
questions
Machine learning has a heavy focus on fancy algorithms

12/15/2024 18
Data science is not statistics
“Analyzing data computationally,
to understand some phenomenon
in the real world, you say? …
that sounds an awful lot like statistics”
Not many statistics courses have a lecture
on e.g. web scraping, or a lot of data
processing more generally
Plus, statisticians use R, while data scientists
use Python ... clearly these are completely different fields
https://www.linkedin.com/pulse/data-science-statistics-paulo-
19
rios-jr-
Data science is not big data
Sometimes, in order to truly understand and
answer your question, you need massive
amounts of data
But sometimes you don’t
Don’t create more work for yourself than
you need to

Gendered language in professor reviews. Image


20
source: http://benschmidt.org/profGender
DATA SCIENCE AS A UNIFIER

12/15/2024 21
DATA SCIENCE APPLICATION
 Fraud detection
 Investigate fraud patterns in past data
 Early detection is important
o Before damage propagates
o Harder than late detection
 Precision is important
o False positive and false negative are both bad
 Real-time analytics

12/15/2024 22
Cont.
 Recommender systems
 The ability to offer unique personalized service
 Increase sales, click-through rates, conversions, …
 Netflix recommender system valued at $1B per year
 Amazon recommender system
drives a 20-35% lift in sales annually
 Collaborative filtering at scale

12/15/2024 23
Cont.
Predicting why patients are being readmitted
 Reduce costs
 Improve population health
 Find the “why” behind specific populations
being readmitted
 Data lakes of multiple data sources
 Investigate ties between readmission and
socioeconomic data points, patient history, genetics

12/15/2024 24
Cont.
Moneyball
 How to build a baseball team on a very
low budget by relying on data
 Sabermetrics: the statistical analysis of baseball
data to objectively evaluate performance
 2002 record of 103-59 was joint best in MLB
o Team salary budget: $40 million
 Other team: Yankees
o Team salary budget: $120 million

12/15/2024 25
References
https://dsg.uwaterloo.ca/CDSW/slides/Data%20Science%20Present
ation.pdf
https://github.com/MicrosoftLearning/Data-Science-Essentials/
blob/master/Slides/Module%201%20-%20Intro%20to%20Data
%20Science.pptx
Data Science, Computer Science and Mathematical Sciences College
of Engineering Tennessee State University

12/15/2024 26
Upcoming lecture
IDE for data science
Exploration of data
Data and types

12/15/2024 27
Thank You !
12/15/2024 28

You might also like