[go: up one dir, main page]

0% found this document useful (0 votes)
15 views52 pages

Intro To Data Science

The document provides an introduction to data science including defining key terms like data science, analytics, artificial intelligence, machine learning. It also discusses various applications of data science in industries like aviation, manufacturing using examples like Southwest airlines, Procter and Gamble.

Uploaded by

Navjivan Thorat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views52 pages

Intro To Data Science

The document provides an introduction to data science including defining key terms like data science, analytics, artificial intelligence, machine learning. It also discusses various applications of data science in industries like aviation, manufacturing using examples like Southwest airlines, Procter and Gamble.

Uploaded by

Navjivan Thorat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Introduction to Data Science

Swati Satpute
What is Data Science?
• Data Science is the science of Collecting, Storing,
Processing, describing and modeling data.

• Objective is to obtain meaningful insights and


information out of Data.

• Data Science is the practice of extracting


knowledge from massive amount of data, using
methods such as statistics, machine learning,
data mining and predictive analytics.

Collect Store Process Describe Model


Data Science
• E.g. Analyzing Covid data of world / country /or state,
what it is trying to tell?? What can be inference?
• Expected / Anticipated inference
– Number of Positive cases
– Number of people tested
– How the curve looks?
– When can we expect it to go down
– What will be the situation after 30 days?
• Unexpected
– People below poverty are less likely to get infected
– People using internet are less likely to get infected.
– People who are malnourished have maximum chances of
getting affected.
Use Of Data Science in Aviation
Industry
How Big Data and the Industrial
Internet Can Help Southwest Save
$100 Million on Fuel

https://www.ge.com/news/reports/big-data-industrial-internet-can-help-southwest-save-
100-million-fuel
Case Study - Southwest Airlines
• Southwest Airlines- one million flight per year

• Objective –
– Increased efficiency in processing
– Increased Business

• Difficulties
– Countless variables on each one of those flights, such
as the air’s humidity and the fuel load on each leg,
calculating accurate impact on flight performance
Case Study - Southwest Airlines
• Solution – jet engines talking to GE technologies
through data cloud

• Flight Analytics worked on attributes collected for


each flight like
– wind speeds, ambient temperatures, weight of the plane,
maximum thrust and so on

• Output
– Can find pattern in aircraft performance which were hard
to know before
– Flight analytics helped whether add or subtract few
flights from particular route
Case Study - Southwest Airlines
• Found improvements like Planes on a particular
route carry extra fuel.
– Reducing that extra fuel cuts cost of fuel also might
allow extra cargo / extra ticket
– Which means increase in revenue
• System also found ways to save fuel
– In 2014 airlines wasted $4.3 billion of fuel while planes
idled on the tarmac
– System estimates that a 1 percent reduction in jet
fuel use could save the global commercial aviation
industry $30 billion over 15 years.
Attributes wrt to flights
• taxi, takeoff, climb, cruise, descent, approach,
landing
• altitude actually flown because of unexpected
wind conditions to outside air pressure, speed
and weather
• weight of otherwise identical aircraft can vary by
as much as 2,000 pounds because of each
plane’s maintenance history
• real-time data from hundreds of aircraft, coupled
with data on everything from weather to
navigation, allows airlines to plan more precisely.
Use of Data Science in Layoff
• Recently during pandemic situation an air craft
company decided to layoff
• Data gathers
– Personal records like performances, liabilities
– Relation with company (seniority, loyalty)
– Flight records (Carrying extra fuels/ travelling time/
delays )

• Layoffs were decided after mixing all these


attributes.. And finding pattern.
Use of Data Science in
Manufacturing industry
How P&G Uses Big Data
• Procter & Gamble, or P&G, is a large multinational
consumer goods company
– operational in over 70 countries, its products are sold in over 180
countries
– serve almost 5 billion customers with their brands

• They use simulation analytics to design new products


– Simulation analytics help to ensure optimal product
performance by taking into account many different variables and
creating and altering different models or designs virtually.
– Instead of hand-crafting a new design for a disposable diaper,
P&G uses modelling and simulation to create thousands of
iterations in seconds in order to find the best design for a
disposable diaper

https://datafloq.com/read/pg-big-data-turn-diapers-insights/312
How P&G Uses Big Data
• Use of Predictive Analytics
– when developing a new dishwashing liquid, they used
predictive analytics and simulation models to predict
how moisture would excite various fragrance
molecules so that throughout the dishwashing
process consumers get the right fragrance notes at
the right time.
Terms used
• Data Science
• Analytics
• Artificial Intelligence
• Machine Learning
• Deep Learning
What is Data Science?
• Application of Scientific Methods like Statistical
and Machine Learning in order to understand the
phenomena to gain control on it.

• It employs techniques from fields like Computer


science, statistics and Mathematics

• Data science involves Machine Learning,


Clustering, Visualization and many other things
related to data
What is Analytics?
• Analytics is the discovery, interpretation, and
communication of meaningful patterns in data.

• It is science of analyzing Raw data in order to make


conclusions about the information.

• Data analytics techniques can reveal trends and metrics.

E.g.
• Manufacturing companies often record the runtime,
downtime, and work queue for various machines and
then analyze the data to better plan the workloads so the
machines operate closer to peak capacity.
• Data analytics can do point out bottlenecks in
production.
Types of Analytics
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics
Types of Analytics
• Descriptive analytics - What happened?
- Uses historical data to describe what happened.
e.g. How many students visited College Website? Which
Program has minimum enrollments?
Tools used - Simple mathematical and statistical tools, e.g.
spreadsheets

• Diagnostic analytics – Why did it happened?


– This analytics helps to identify anomalies, find patterns,
identify correlations, and determine causal
relationships.
Tools used – Employ Machine Learning Techniques to
identify drivers of change and determine causation
Types of Analytics
• Predictive analytics – What might happen next?
- Focus is shifted from understanding historical events to
creating insights about a current or future state.

- Predictive Analytics helps organizations identify


likelihood of possible outcomes, which can guide on
the best course of action.

E.g.
a) Aerospace industry to predict the effect of maintenance
operations on fuel use,
b) The manufacturing industry to predict future requirements
and optimize warehouse stocking accordingly
Types of Analytics
Prescriptive analytics - What do I need to do?
• Complex type of analytics

• It combines internal data, external sources, and machine-


learning techniques to provide the most effective
recommendations for business decisions.

• In this, a decision-making process is applied to descriptive


and predictive models.
– This leads to finding a combination of existing conditions and
possible decisions that are likely to have the most effect in the
future.

• Prescriptive analytics can provide immense value to an


organization.
Types of Analytics
Artificial Intelligence
• Literal meaning “Human made thinking power”
• Simulations are built to think human like which
will have
– Logic, if.. Then.. rules, decision making capacity

• Objective is to create “create intelligent


systems that can simulate human
intelligence”.
• E.g. Siri, AI in Chess games,
Google's AlphaGo
AlphaGo is the first
computer program to
defeat a professional
human Go player, the
first to defeat a Go
world champion, and
is arguably the
strongest Go player in
history.

Two players, using either white or black stones, take turns placing their stones on a
board. The goal is to surround and capture their opponent's stones or strategically
create spaces of territory. Once all possible moves have been played, both the stones
on the board and the empty points are tallied. The highest number wins.

https://deepmind.com/research/case-studies/alphago-the-story-so-far
Artificial Intelligence
• AI involves (uses) Machine Learning internally

• Based on capabilities, AI can be classified as


– Weak AI
– General AI
– Strong AI
Machine Learning
• Machine learning enables a computer system to
make predictions or take some decisions using
historical data without being explicitly
programmed.

• Machine learning uses a massive amount of


structured, semi-structured data, unstructured
data so that a machine learning model can
generate accurate result or give predictions
based on that data.
Machine Learning
• Machine Learning Algorithms are based on
statistical and mathematical concepts

• ML Algorithms analyse the patterns in the


captured data and can be used to build a
predictive model on the existing phenomena in
business.

• E.g. predicting corona trend, recommender


system, Facebook Auto friend tagging
suggestion
Machine Learning
Broad categorization of Machine learning is
• Supervised Learning
• Unsupervised Learning
• Re-inforcement Learning
Supervised Learning
• Supervised Learning algorithms are used in
classification or prediction

• Classification models predict categorical


class labels; and prediction models predict
continuous valued functions.
• In this target variable (prediction variable) is
known.

• E.g. churn, house pricing


Supervised Learning
• Regression Case: Sales are influenced by the
variables like advertisement expenses, manpower
deployed for sales, cost of products, number of
dealers etc.
Sales = function (Adv. Exp, Manpower , Cost ,
Dealers , … )

• Classification Case: The customer may purchase a


particular product based on some conditions like his
need, his age, his income, his place of residence etc.
Hence we see here
Prob(Customer Purchases) = function(Age, Income,
Residence,…)
Unsupervised Learning
• Unsupervised learning algorithms are those used
where there is no outcome variable to predict
or classify.

• Association rules, data reduction methods, and


clustering techniques are all unsupervised
learning methods.

Reference: Unsupervised Machine Learning: Examples and Use Cases | AltexSoft


Clustering
ML algorithms groups
similar data pieces into
clusters that are not defined
beforehand

An ML model finds any


patterns, similarities,
and/or differences within
uncategorized data
structure by itself. If any
natural groups or classes
exist in data, a model will be
able to discover them.

There is no right or wrong


way to perform grouping as
there was no task set in
advance

Clustering helps unfold


various business insights you
never knew were there
Types of Clustering
• Exclusive clustering or “hard” clustering is the kind
of grouping in which one piece of data can belong
only to one cluster.

• Overlapping clustering or “soft” clustering allows


data items to be members of more than one cluster
with different degrees of belonging

• Hierarchical clustering, aims, as the name


suggests, at creating a hierarchy of clustered data
items. To obtain clusters, data items are either
decomposed or merged based on the hierarchy.
Unsupervised Learning
Examples
• Market Basket Analysis
• Customer segmentation for credit cards
• user categorization by their social media activity.,
Association Rules
• This technique is widely used to analyze customer
purchasing habits, allowing companies to understand
relationships between different products and build more
effective business strategies.
• Recommender systems
– The association rules method is widely used to analyze buyer
baskets and detect cross-category purchase correlations.
Reinforcement Learning
• In this type, there is an agent which/who receives
information from the environment and learns to
choose actions based on rewards or punishment
received

Examples include:
• Self-driving cars
• Robotics

Algorithms:
• Upper Confidence Bound
• Thomson Sampling
Deep Learning
• Deep Learning is a subfield of machine learning
concerned with algorithms inspired by the
structure and function of the brain called
artificial neural networks.

• You still do predictions and classification but


for
– Large amount of data
– On high scale
– For data having huge set of attributes
Strength of Deep Learning
• Best-in-class performance
– Deep networks have achieved accuracies that are far
beyond that of classical ML methods in many domains
including speech, natural language, vision, and playing
games
• Scales effectively with data
• No need for feature engineering
– Feature selection can be avoided
• Adaptable and transferable
– Pre-trained in computer vision, pre-trained image
classification networks are often used as a feature
extraction front-end to object detection and segmentation
networks
Why Deep learning is picking up?
• Now we have fast enough computers and
enough data to actually train large neural
networks
Machine Learning vs Deep Learning

As we construct
larger neural
networks and
train them with
more and more
data, their
performance
continues to
increase.

https://towardsdatascience.com/deep-learning-vs-classical-machine-learning-
9a42c6d48aa
Advantages of Classical Machine Learning
over Deep Learning
• Works better on small data
• Financially and computationally cheap
• Easier to interpret
https://www.geospatialworld.net/blogs/difference-between-ai%EF%BB%BF-machine-
learning-and-deep-learning/
Nature of Data
• Structured data
• Unstructured Data
• Semi Structured Data

https://www.bigdataframework.org/data-types-structured-vs-unstructured-data/
Nature of Data
• Structured Data
– pre-defined data model
– tabular format with relationship between the different
rows and columns.
– straightforward to analyse
– E.g. Excel or SQL
• Unstructured Data
– Does not have a predefined data model
– Unstructured information is typically text-heavy,
• may contain data such as dates, numbers, and facts
– Difficult to analyze
– E.g. audio, video files or No-SQL databases.
– Mongo DB is optimised to store documents.
Nature of Data
• Semi-structured Data
– Semi-structured data is a form of structured data that
does not conform with the formal structure of data
models associated with relational databases or other
forms of data table
– But contain tags or other markers to separate
semantic elements and enforce hierarchies of
records and fields within the data.
– Have self-describing structure
– E.g. JSON and XML
The Data Science Life Cycle
•Capture
•Maintain
•Process
•Analyze
•Communicate

https://datascience.berkeley.edu/about/what-is-data-science/
What Does a Data Scientist Do?
• Work with Business stake holders to understand
their goals and objectives
• Design modeling processes, create
algorithms and predictive algorithms to
extract data that business needs
• Analyze data
• Share insights with management.
Technologies For Data Science
Technical skills Non Technical skills

• R Programming • Intellectual curiosity


• Python Coding • Domain knowledge
• Hadoop Platform • Communication Skills
• SQL Database/Coding • Team work
• Apache Spark
• Machine Learning and AI
• Data Visualization
Jobs
• Data Scientist
• Data Analyst
• Data Engineer
Web References
• https://www.ge.com/news/reports/big-data-industrial-internet-
can-help-southwest-save-100-million-fuel
• https://datafloq.com/read/pg-big-data-turn-diapers-insights/312
• https://www.dezyre.com/article/types-of-analytics-descriptive-
predictive-prescriptive-analytics/209
• https://deepmind.com/research/case-studies/alphago-the-
story-so-far
• https://towardsdatascience.com/deep-learning-vs-classical-
machine-learning-9a42c6d48aa
• https://www.geospatialworld.net/blogs/difference-between-
ai%EF%BB%BF-machine-learning-and-deep-learning/
• https://www.bigdataframework.org/data-types-structured-vs-
unstructured-data
• https://datascience.berkeley.edu/about/what-is-data-science/
Question Bank
• What is difference between Artificial Intelligence
and Machine learning.
• Give 2 examples where Data science is used to
solve problems in India.
• Describe how can we make use of Data Science
in Education sector.
• Compare Machine Learning vs Deep Learning
• Describe Reinforcement learning.
• Name various algorithms under supervised,
unsupervised and semi supervised learning.

You might also like