DATA SCIENCE, MACHINE LEARNING & AI
BY UMANG KEJRIWAL
DATA SCIENCE
© UMANG KEJRIWAL
What is Data Science ?
Also known as “Data Driven Science” that makes use of different scientific methods, processes,
algorithms and systems to extract knowledge or insights with its goal to discover hidden patterns from
the raw data.
In other words, it is about finding and exploring data in the real world and then using that knowledge
to solve business problems.
Data Science is applied by the Data Scientist whose primary role is to design and create processes for
the complex as well as large scale datasets.
He is involved in processing, cleaning, verifying all the integrities of the data for the analysis. He also
build predictive models using Machine Learning Algorithms.
© UMANG KEJRIWAL
Example !
Customer Prediction
System can be trained based on
customer behaviour pattern to
predict the likelihood of customer
buying a product.
Service Planning
Restaurants and predict how
many customers will visit on a
weekend and plan their food
inventory to handle the demand.
© UMANG KEJRIWAL
Programming Languages for Data Science
© UMANG KEJRIWAL
Data Life Cycle
Discovery
Communicate
Data Preparation
Results ONE STOP SOLUTION OF ALL THESE OPERATIONS
Operationalize Model Planning
DATA SCIENTIST
Model Building
© UMANG KEJRIWAL
Why Python for Data Science ?
Powerful libraries for Machine Learning Simple & Easy to learn
applications & other scientific computations
Scripting Language as well Fit for many platforms
Perform Data Manipulation, analysis and
High Level and Interpreted language
visualization
© UMANG KEJRIWAL
Data Manipulation
It helps us to extract, filter, and transform our data quickly and efficiently.
NumPy Pandas
© UMANG KEJRIWAL
NumPy
A python package that stands for "Numerical Python“.
It is used for scientific computing which contains an array object.
Installation – “conda install numpy” | “pip install numpy”
“import numpy”
© UMANG KEJRIWAL
Pandas
A python package which is used for data manipulation and analysis.
Also used for data like Matrix data, tabular data, statistical data
and many kinds of heterogeneous data also.
Installation - "conda install pandas" | "pip install pandas"
"import pandas"
© UMANG KEJRIWAL
MACHINE LEARNING
© UMANG KEJRIWAL
What is Machine Learning ?
It is a type of Artificial Intelligence that allows software applications to learn from the data and
become more accurate in predicting outcomes without human intervention.
In other words, it allows software apps to learn of its own by following a set of instructions.
The basic Idea behind Machine Learning is to mimic the way as our brain works.
© UMANG KEJRIWAL
Machine Learning - Flow
Training Data Learn Algorithm Build Model Perform
Feedback
© UMANG KEJRIWAL
Machine Learning - Types
SUPERVISED LEARNING
1
2 UNSUPERVISED LEARNING
REINFORCEMENT LEARNING
3
© UMANG KEJRIWAL
SUPERVISED LEARNING
It is where we have a input variable(X) and output variable(Y) and we use an algorithm to learn the
mapping function from the input to the output.
Y = f(X)
LINEAR REGRESSION LOGISTIC REGRESSION DESCISION TREE
NAIVE BAYES
RANDOM FOREST KNN
CLASSIFIER
© UMANG KEJRIWAL
SUPERVISED LEARNING - ALGORITHMS
LINEAR REGRESSION
It is used to estimate real values like cost of house, salary prediction, number of sales in a company etc.
LOGISTIC REGRESSION
Used for estimating discrete values i.e. binary values 0 or 1, Yes/No, True/False like spam detection, whether a person
can buy certain product or not etc.
© UMANG KEJRIWAL
SUPERVISED LEARNING - ALGORITHMS
DECISION TREE
Basically used for classification problems, when we want to classify data into different categories. Example - Review
greater than 4 is excellent, less than 2 is worst.
RANDOM FOREST
Quite similar to Decision tree but its accuracy is high
© UMANG KEJRIWAL
SUPERVISED LEARNING - ALGORITHMS
NAIVE BAYES CLASSIFIER
Based on Bayes Theorem where we have no relation between b/w different predictors.
Example - Probability of playing golf at different weather conditions.
KNN (K-NEAREST NEIGHBORS)
An object is classified by a majority vote of its neighbours, or case based on a similarity measure.
Example – Search Results
© UMANG KEJRIWAL
UNSUPERVISED LEARNING
It is the training of a model using information that is neither classified not labelled in which
there is no explanation of the data.
It is also called "Clustering Analysis"
Example - Unlabelled picture or audio downloaded from internet
K-MEANS CLUSTERING HIERARCHICAL CLUSTERING
© UMANG KEJRIWAL
REINFORCEMENT LEARNING
It is an area of machine learning where an RL agent learns from the consequences of its
actions, rather than being taught explicitly.
It selects the actions on the basis of its past experiences(exploitation) and also by new choices
(exploration).
Here machine takes the decision on its own.
Q - LEARNING SARSA DQN
© UMANG KEJRIWAL
Applications
Chat Bots
CUSTOMER PRODUCT SUPPORT, Virtual
RECOMMENDATION, ADVERTISING Assistant
FACE RECOGNITION
eCommerce
Image
Recognition
Machine
DRUG DISCOVERY, Learning
DISEASE DIAGNOSIS
Social
Healthcare
Media
SENTIMENT ANALYSIS, SPAM FILTRATION
ALGORITHMIC TRADING, FRAUD Financial
DETECTION Services
© UMANG KEJRIWAL
ARTIFICIAL INTELLIGENCE
© UMANG KEJRIWAL
ARTIFICIAL INTELLIGENCE ARTIFICIAL
INTELLIGENCE
Intelligence: “The capacity to learn and solve problems”
Artificial Intelligence: Artificial intelligence (AI) is the simulation of human intelligence
by machines.
• The ability to solve problems
• The ability to act rationally
• The ability to act like humans © UMANG KEJRIWAL