AGENDA
- What is Data Science
- Why is Data Science Important
- The Data Science Process
- Tools Used in Data Science
- Applications of Data Science
- Challenges in Data Science
- Future of Data Science
- Q&A
WHAT IS DATA SCIENCE?
Data + Science = Data Science
Data Science is the study of data to extract meaningful
insights.
When we put these two elements together, “data + science”
refers to the scientific study of data.
Simple Analogy: Think of it like a detective solving a mystery
using clues (data).
WHAT IS DATA?
•We often use the term data to refer to
RAW information
•This information is either transmitted or
stored
•Data comes in numerous forms,
generated from various sources such as
sensors, social media, transactions, and
more.
•Any kind of information may it be in
numbers or text, or pictures is termed as
Data
TYPES OF
DATA
Data comes in different types. Some
of the common types of data
include:
•Text
•Image
•Video
•Numbers
•Spreadsheets
•Sound
QUALITATIVE VS QUANTITATIVE
DATA
Qualitative Quantitative
• Qualitative data is the • Quantitative Data is the
data that is a descriptive data that is numerical
piece of information. information
• For example, "What a • For example, “1”, “3.65”
nice day it is" etc.
QUANTITATIVE DATA CAN BE OF TWO
TYPES
DISCRETE VS CONTINUOUS
DATA
Discrete Continuous
• Can be expressed as a • Can be any value in an
specific value. interval
• For example, “Number of • For example, “The
months in a year“, amount of oxygen in the
“Number of members in atmosphere”, “Age of
a family” etc. members in a family”
WHAT IS
SCIENCE
Science in data science refers to the methodical
approach used to gather, analyze, and interpret
data. It involves applying rigorous, systematic
techniques to understand patterns, test hypotheses,
and draw conclusions based on empirical evidence.
Scenario: Predicting customer churn for a subscription service
Systematic Approach:
•Definition: The data scientist follows a structured plan. They
start by defining the problem (predicting which customers
are likely to cancel their subscription), collecting relevant
data (e.g., customer usage patterns, service interactions,
demographics), and preparing the data for analysis.
•Example: They gather data on customer behavior, such as
frequency of service use, customer support interactions, and
payment history.
Empirical Evidence:
•Definition: Using the data to find patterns and test
hypotheses. The data scientist applies statistical
methods to analyze the data and identify factors
that are associated with customer churn.
•Example: They discover that customers who have
lower usage rates and more frequent complaints are
more likely to churn.
Repeatability and Accuracy:
•Definition: Ensuring the findings are reliable and
can be consistently replicated. The data scientist
tests their model on different subsets of data to
verify its accuracy and adjust it if necessary.
•Example: They build a predictive model and
validate it by comparing its predictions against
actual outcomes in a separate test dataset.
Theory and Modeling:
•Definition: Developing and applying models to
make predictions. The data scientist uses techniques
like logistic regression or machine learning
algorithms to create a model that predicts customer
churn based on the identified factors.
•Example: They create a model that uses historical
data to predict which customers are at high risk of
leaving.
REAL WORLD APPLICATIONS OF DATA
SCIENCE
Predicting interests of Getting insights from
the audience on customer reviews in Effective targeting of
different online video online stores, food the advertisements
streaming platforms delivery apps etc.
WHY IS DATA SCIENCE
IMPORTANT?
Everyday Examples:
- Shopping Recommendations: Why online stores
suggest items you might like.
- Navigation Apps: How apps find the best route
and avoid traffic.
Impact: Helps businesses make better decisions,
improves customer experiences, and drives
innovation.
DATA SCIENCE AS A UNIFIER
THE DATA SCIENCE PROCESS
Step-by-Step:
1. Ask a Question: What do we want to learn or solve?
2. Collect Data: Gather the necessary information.
3. Clean Data: Remove any errors or irrelevant parts.
4. Analyze Data: Look for patterns and trends.
5. Make Predictions: Use the insights to forecast future outcomes.
6. Share Results: Communicate findings to help make decisions.
TOOLS USED IN DATA SCIENCE
Popular Tools:
- Excel: For basic data handling.
- Python & R: Programming languages for data analysis.
- Tableau: For creating easy-to-understand visualizations.
Analogy: Tools are like different kitchen appliances used to
prepare a meal.
APPLICATIONS OF DATA
SCIENCE
- Healthcare: Predicting disease outbreaks, personalizing
treatments.
- Finance: Detecting fraud, managing risks.
- Entertainment: Recommending movies or music.
- Marketing: Understanding customer preferences.
USES OF DATA SCIENCE
Descriptive analysis
• Descriptive analysis examines data to gain
insights into what happened or what is
happening in the data environment.
• It is characterized by data visualizations such as
pie charts, bar charts, line graphs, tables, or
generated narratives.
USES OF DATA SCIENCE
Diagnostic analysis
• Diagnostic analysis is a deep-dive or detailed
data examination to understand why
something happened.
• It is characterized by techniques such as
drill-down, data discovery, data mining, and
correlations.
USES OF DATA SCIENCE
Predictive analysis
• Predictive analysis uses historical data to make
accurate forecasts about data patterns that may
occur in the future.
• It is characterized by techniques such as machine
learning, forecasting, pattern matching, and
predictive modeling.
USES OF DATA SCIENCE
Prescriptive analysis
• Prescriptive analytics takes predictive data to
the next level.
• It not only predicts what is likely to happen
but also suggests an optimum response to that
outcome.
BENEFITS OF DATA
SCIENCE
Discover unknown transformative patterns
• Data science allows businesses to uncover new
patterns and relationships that have the potential
to transform the organization.
It can reveal low-cost changes to resource
management for maximum impact on profit
margins.
●●●
23
BENEFITS OF DATA
SCIENCE
Innovate new products and solutions
• Data science can reveal gaps and problems that
would otherwise go unnoticed.
• Greater insight about purchase decisions,
customer feedback, and business processes can
drive innovation in internal operations and
external solutions.
BENEFITS OF DATA
SCIENCE
Real-time optimization
• It’s very challenging for businesses, especially
large-scale enterprises, to respond to changing
conditions in real-time.
• This can cause significant losses or disruptions in
business activity.
CHALLENGES IN DATA SCIENCE
Common Issues:
- Data Privacy: Ensuring personal data is protected.
- Data Quality: Making sure data is accurate and reliable.
- Understanding Results: Explaining findings in a simple way.
Analogy: Like trying to solve a puzzle with some missing or
extra pieces.
CONCENTRATION IN DATA
SCIENCE
Mathematics and Applied Mathematics
Applied Statistics/Data Analysis
Solid Programming Skills (R, Python, Julia, SQL)
Data Mining
Data Base Storage and Management
Machine Learning and discovery
FUTURE OF DATA SCIENCE
Artificial intelligence and machine learning innovations have
made data processing faster and more efficient. Industry
demand has created an ecosystem of courses, degrees, and
job positions within the field of data science.
Because of the cross-functional skillset and expertise
required, data science shows strong projected growth over
the coming decades.
Emerging Trends:
- More Automation: Tools that do the heavy lifting.
- Better Predictions: Improving accuracy.
- Ethical Data Use: Focusing on responsible practices.