INTRODUCTION TO DATA SCIENCE
“DATA RULES THE WORLD”
Agenda
• What is Data Science?
• Who is Data Scientist?
– Types of Job Roles
• Prerequisite for Data Science
• Real-time Environment
• How a Data Science Works?
• Tools for Data Science
• Applications, Advantages, Challenges
• What is Data? – Types of data
What is Data Science?
• Data Science is a combination of multiple disciplines
that uses,
1. Statistics
2. Data analysis and
3. Machine Learning
• To analyze data and to extract knowledge and insights
from it.
Why Data science?
• Glass-door has ranked Data Science as its topmost
profession.
• High Paying job role.
• Impactful Problem Solving.
• Example – commercial Industry that wants to maximize its
sales.
Who is Data Scientist?
• One of the top on-demand job in 2024 is Data
scientist.
• A data scientist is a professional who works with an
enormous amount of data to come up with compelling
business insights through the deployment of various
tools, techniques, methodologies, algorithms, etc.
Real-time Environment
• Data Science can be applied in nearly every part of a
business where data is available.
• Some Examples :
– Stock markets
– Industry
– Politics
– Logistic companies
– E-commerce
– Healthcare
Data Algorithm Insight
• Eg: Names, ages, • Algorithm step-by-step • Insight is a meaningful
test scores, and instructions understanding of data
extracurricular activities. analysis result.
What is Data?
• Data - text, observations, figures, images, numbers, graphs, or
symbols.
• For example
– Weights
– Addresses
– Ages
– Names
– Temperatures
– Dates
– Distances
• Data is a raw form of knowledge and, on its own, doesn't carry
any significance or purpose.
Types of Data
• Two Types of Data
– Categorical Data
• Examples – Marital Status, Hair Color, Political Party
– Numerical Data
• Discrete
• Continuous
Data Evolution
• Example – Retail Industry
• (Pre-2000) Retail businesses primarily relied on traditional
data sources, such as sales records and customer feedback
forms.
• (Mid- 2000) Rise of e-commerce and the internet, retailers
started to collect vast amounts of data from online
transactions.
• Power of Data Analytics(Mid -2000)
• Personalization and Customer Experience (Late-2000)
• Machine Learning and AI- present
Data Evolution
• 1920’s-1950’s - Statistics
– Data was work on statistics in multiple domain areas(eg.
Agriculture, biology, Social Science)
• 1960’s - Computer – store data in computer
Organizations and governments started using
computers to store and analyze data.
• 1970’s – Database – Store large dataset
The DBMS and the concept of data warehousing
allowed for more efficient storage and retrieval of data.
Prerequisite For Data Science
• Non-Technical Skills
• Curiosity
• Critical Thinking
• Communication Skill
• Technical Skill
• Programming Language – Python or R
• Machine Learning
• Statistics
• Database
How a Data Science work?
• Identify the Business Problem
• Ask the right question – to understand the problem
• Explore and Collect the data related to that problem.
• Transform the data to a standardized format.
• Clean the data – Remove the erroneous value from the data.
• Handling the missing values with statistical technique.
• Analyze the data with data visualization.
• Build the model for specific dataset.
• Deploy the model and make the decision for business
problem.
Tools For Data Science
• Jupyter Notebook
• Python IDE
• R Studio
• Microsoft Power BI
• Tableau
• SQL
• Excel
Challenges
• Finding and Accessing the Right Data
• Data Quality and Cleaning
• Handling Large Volume of Data
• Balancing Data Security
• Communicating Result to Non-Technical Stakeholders
Applications
• Search Engine – Google
• Transport – Uber
• Finance – Credit Card Fraud Detection
• E-Commerce – Amazon
• Health Care – Early Disease Detection
• Auto Complete – Email, Text editor
• Image Recognition - Facebook
Road-Map to Data Science
• Domain Knowledge
• Mathematics Foundation
» Statistics and Probability
• Computer Science
» Programming Language – Python or R
» Data Base – SQL and Mongo DB
» Machine Learning
» Deep Learning
• Communication
» Data Visualization – Tableau, Power BI Dashboards