0% found this document useful (0 votes)

11 views16 pages

Applied - Data - Science MODULE 1 SEM8

Uploaded by

Dhruv Suvarna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views16 pages

Applied - Data - Science MODULE 1 SEM8

Uploaded by

Dhruv Suvarna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

MODULE 1

Introduction to
Data Science

QData Science Tasks Description Algorithms and Examples

ANS

1
Q Data Science Techniques.

ANS

The techniques used in the steps of a data science process and in conjunction with the term “data science”
are:

➢ Descriptive statistics:
➢ Exploratory visualization
➢ Dimensional slicing
➢ Hypothesis testing
➢ Data engineering
➢ Business intelligence

Descriptive statistics: Descriptive statistics: Computing mean, standard deviation, correlation, and other
descriptive statistics, quantify the aggregate structure of a dataset. This is essential information for
understanding any dataset in order to understand the structure of the data and the relationships

•within the dataset. •They are used in the exploration stage of the data science process.

Exploratory visualization:-

Exploratory visualization: The process of expressing data in visual coordinates enables users to find patterns
and relationships in the data and to comprehend large datasets. Similar to descriptive statistics, they are
integral in the pre- and post-processing steps in data science.

Dimensional slicing:-

Dimensional slicing: Online analytical processing (OLAP) applications, which are prevalent in
organizations, mainly provide information on the data through dimensional slicing, filtering, and pivoting.
OLAP analysis is enabled by a unique database schema design where the data are organized as dimensions
(e.g., products, regions, dates) and quantitative facts or measures (e.g., revenue, quantity). With a well

2
defined database structure, it is easy to slice the yearly revenue by products or combination of region and
products.

•These techniques are extremely useful and may unveil patterns in data (e.g., candy sales

decline after Halloween in the United States).

Hypothesis testing
Hypothesis testing: In confirmatory data analysis, experimental data are collected to evaluate whether a
hypothesis has enough evidence to be supported or not.
• There are many types of statistical testing and they have a wide variety of business applications (e.g., A/B
testing in marketing).

In general, data science is a process where many hypotheses are generated and tested based on observational
data.

•Since the data science algorithms are iterative, solutions can be refined in each step.

Data engineering

Data engineering: Data engineering is the process of sourcing, organizing, assembling, storing, and
distributing data for effective analysis and usage.

Database engineering, distributed storage, and computing frameworks (e.g., Apache Hadoop, Spark, Kafka),
parallel computing, extraction transformation and loading processing, and data warehousing constitute data
engineering techniques.

•Data engineering helps source and prepare for data science learning algorithms.

Business intelligence:-

Business intelligence: Business intelligence helps organizations consume data effectively.

•It helps query the ad hoc data without the need to write the technical query command or use dashboards or
visualizations to communicate the facts and trends. Business intelligence specializes in the secure delivery
of information to right roles and the distribution of information at scale. • Historical trends are usually
reported, but in •combination with data science, both the past and the predicted future data can be
combined. • BI can hold and distribute the results of data •science.
3
Q components of Data Science
Ans

components of Data Science

•1. Statistics
•2. Domain Expertise
•3. Data engineering
•4. Visualization
•5. Advanced computing
•6. Mathematics
•7. Machine learning

1. Statistics: Statistics is one of the most important components of data science. Statistics is
a way to collect and analyze the numerical data in a large amount and finding meaningful
insights from it.
2. Domain Expertise: In data science, domain expertise binds data science together.
•Domain expertise means specialized knowledge or skills of a particular area. In data
science, there are various areas for which we need domain experts.
3. Data engineering: Data engineering is a part of data science, which involves acquiring,
storing, retrieving, and transforming the data. Data engineering also includes metadata (data
about data) to the data.
4. Visualization: Data visualization is meant by representing data in a visual context so that
people can easily understand the significance of data.
• Data visualization makes it easy to access the huge amount of data in visuals.

4
5. Advanced computing: Heavy lifting of data science is advanced computing.
•Advanced computing involves designing, writing, debugging, and maintaining the source
code of computer programs.

6. Mathematics: Mathematics is the critical part of data science. Mathematics involves the
study of quantity, structure, space, and changes. For a data scientist, knowledge of good
mathematics is essential.

7. Machine learning: Machine learning is backbone of data science. Machine learning is all
about to provide training to a machine so that it can act as a human brain. In data science, we
use various machine learning algorithms to solve the problems.

Q difference between data science and data analytics?

Data Science is a field that deals with extracting meaningful information and insights by
applying various algorithms preprocessing and scientific methods on structured and
unstructured data. This field is related to Artificial Intelligence and is currently one of the most
demanded skills. Data science comprises mathematics, computations, statistics, programming,
etc to gain meaningful insights from the large amount of data provided in various formats.
What is Data Analytics
Data Analytics is used to get conclusions by processing the raw data. It is helpful in various
businesses as it helps the company to make decisions based on the conclusions from the data.
Basically, data analytics helps to convert a Large number of figures in the form of data into
Plain English i.e., conclusions which are further helpful in making in-depth decisions.

Below is a table of differences between Data Science and Data

Analytics:

5
s.no Feature Data Science Data Analytics
1 Coding Python is the most commonly used The Knowledge of Python and

Language language for data science along R Language is essential for

with the use of other languages Data Analytics.
such as C++, Java, Perl, etc.
2 In-depth knowledge of
Programming Basic Programming skills is
programming is required for data
Skills necessary for data analytics.
science.
3 Data Science makes use of Data Analytics does not use
Use of Machine
machine learning algorithms to get machine learning to get the
Learning
insights. insight of data.
4 Data Science makes use of Data Hadoop Based analysis is used
Other Skills mining activities for getting for getting conclusions from
meaningful insights. raw data.
5 The Scope of data analysis is
scope The scope of data science is large.
micro i.e., small.
6 Data science deals with Data Analysis makes use of
Goals
explorations and new innovations. existing resources.
7 Data Science mostly deals with Data Analytics deals with
Data Type
unstructured data. structured data.
8 The statistical skills are of
Statistical Statistical skills are necessary in
minimal or no use in data
Skills the field of Data Science..
analytics.
9 Purpose Data Scientists produces both broad Data analytics is more focused on
insights by exploring the data and producing insights to answer
actionable insights that answer specific specific questions and which can
questions. be put into action.
10 Scope and Data Scientists is a multidisciplinary Data analytics is a broad field
Skills

6
field including data engineering, which includes data integration,
computer science, statistics, machine data analysis and data presentation.
learning, and predictive analytics in
addition to presentation of findings.

7
11 Approach Data Scientists prepare, manage and Data analysts prepare, manage
explore large data sets and then and analyze well-defined datasets
develop custom analytical models and to identify trends and create visual
algorithms to produce the required presentations to help organizations
business insights. They also make better, data-driven decisions.
communicate and collaborate with
stakeholders to define project goals
and share findings.
12
Helps businesses forecast, optimize, Helps businesses understand
Business Impact
and innovate using data and utilize data

13 • Advanced statistical
• Statistical analysis
analysis
Key Skills
• Data visualization
• Data visualization

14
Primary Analyze and model data to predict Analyze data to find actionable
Objective and optimize outcomes insights

15 • SQL
• SQL
• Python/R (used for
• Excel
Tools Used advanced analytics)
• Basic analytics tools (e.g., R)
• Advanced analytics tools

• •

8
Q Data Science Process:
Ans

Data Science Process:

•Data Extraction
•Data Preparation
•Exploratory Data Analysis(EDA)
•Predictive analytics
•Model Building
Model deployment

Data Extraction – Data extraction is the process of collecting or

retrieving different types of data from a variety of sources, many of
which may be badly organized or completely unstructured. Data
extraction makes it possible to process, consolidate and refine data so
that it can be stored in a centralized location in order to be modified.
These locations may be cloud-based, on-site or a hybrid of the two.
Data extraction is the most initial step in both ELT (extract, load,
transform) and ETL (extract, transform, load) tasks. ETL/ELT are
themselves part of an absolute data integration strategy.

Data Preparation – Once the data is extracted, it then enters the data
preparation stage. •Data preparation, often referred to as “pre-
9
processing” is the stage at which raw data is cleaned and organized for
the following stage of data processing.During preparation, raw data is
rigorously checked for presence of any errors. •The purpose of this step
is to eliminate poor data (redundant, incomplete, or incorrect data) and
begin to create excellent quality data for the best business intelligence.

Exploratory Data Analysis(EDA) – It refers to the censorious process

of performing initial investigations on data so as to discover meaningful
patterns,to detect anomalies,to test hypotheses and to check assumptions
with the support of graphical representations and summary statistics. It
is a good practice to have an understanding of the data first and try to
gather as many meaningful insights from it. •EDA is all about making
sense of data in hand ,before getting them tarnished with it.

Predictive analytics – It looks at historical and current data patterns to

determine if those patterns are likely to appear again. This allows
investors and businesses to adjust where they use their resources to take
advantage of possible future events. Predictive analytics can also be
used to reduce risk and improve operational efficiencies. Predictive
analytics is a unique kind of technology that forms predictions about
certain unknowns in the future. •It draws on a series of techniques to
make these determinations, including artificial intelligence (AI), data
mining, machine learning, modeling, and statistics.

10
Model Building – In this step, the model building process actually
starts. Here, Data scientists distribute datasets for training and testing.
Techniques like regression, classification, and clustering are applied on
the training data set. When the model gets prepared it gets tested against
the “testing” dataset.

Model deployment: In model deployment the model is deployed in the

desired channel and format. •After careful evaluation and modifications,
the data model will become ready to provide the results in real time.

Q Explain the motivation for data science

Ans
Each key motivation for using data science techniques are explored here.

Volume
Dimensions
Complex
Questions

Volume

As data become more granular, the need to use large volume data to extract information
increases.

• A rapid increase in the volume of data

•exposes the limitations of current analysis methodologies.

11
• Dimensions

•The three characteristics of the Big Data phenomenon are high volume, high
velocity, and high variety.
•The variety of data relates to the multiple types of values (numerical, categorical),
formats of data (audio files, video files), and the application of the data (location
coordinates, graph data).
Every single record or data point contains multiple attributes or variables to provide
context for the record.
•For example, every user record of an ecommerce site can contain attributes such as
products viewed, products purchased, user demographics, frequency of purchase,
clickstream, etc.
Determining the most effective offer for an ecommerce user can involve computing
information across these attributes.
•Each attribute can be thought of as a dimension in the data space.

•The user record has multiple attributes and can be visualized in

multidimensional space. The addition of each dimension increases the
complexity of analysis techniques.
• A simple linear regression model that has one input dimension is relatively easy to
build compared to multiple linear regression models with multiple dimensions.

Complex questions

As more complex data are available for analysis, the complexity of information that
needs to get extracted from data is increasing as well.
• If the natural clusters in a dataset, with hundreds of dimensions, need to be found,
then traditional analysis like hypothesis testing techniques cannot be used in a
12
•scalable fashion.

The machine-learning algorithms need to be leveraged in order to automate searching in the

vast search space.
•Traditional statistical analysis approaches the data analysis problem by assuming a
stochastic model, in order to predict a response variable based on a set of input
variables.

A linear regression is a classic example of this

•technique where the parameters of the model are estimated from the data.

13
•These hypothesis-driven techniques were highly successful in modeling
simple relationships between response and input variables.

However, there is a significant need to extract nuggets of information from

large, complex datasets, where the use of traditional statistical data analysis
techniques is limited

Q Applications of Data Science

Ans

Image recognition and speech recognition: Data science is

currently using for Image and speech recognition.
•When you upload an image on Facebook and start getting the
suggestion to tag to your friends .This automatic tagging
suggestion uses image recognition algorithm, which is part of
data science. When you say something using, "Ok Google, Siri,
Cortana", etc., and these devices respond as per voice control, so
this is possible with speech recognition algorithm.

14
Gaming world: In the gaming world, the use of Machine
learning algorithms is increasing day by day. EA Sports, Sony,
Nintendo, are widely using data science for enhancing user
experience.

Internet search: When we want to search for something on the

internet, then we use different types of search engines such as
Google, Yahoo, Bing, Ask, etc. All these search engines use the
data science technology to make the search experience better,
and you can get a search result with a fraction of seconds.

Transport: Transport industries also using data science

technology to create self-driving cars. •With self-driving cars, it
will be easy to reduce the number of road accidents.

Healthcare: In the healthcare sector, data science is providing

lots of benefits. Data science is being used for tumor detection,
drug discovery, medical image analysis, virtual medical bots,
etc.

Recommendation systems: Most of the companies, such as

Amazon, Netflix, Google Play, etc., are using data science
technology for making a better user experience with
personalized recommendations. Such as, when you search for

15
something on Amazon, and you started getting suggestions for
similar products, so this is because of data science technology.

Risk detection: Finance industries always had an issue of fraud

and risk of losses, but with the help of data science, this can be
rescued. Most of the finance companies are looking for the data
scientist to avoid risk and any type of losses with an increase in
customer satisfaction.

Introduction To Data-Science
No ratings yet
Introduction To Data-Science
246 pages
DS Notes
No ratings yet
DS Notes
159 pages
IDS Unit 1 Notes
No ratings yet
IDS Unit 1 Notes
24 pages
By, Mrs - Prathibha S, Assistant Professor, Departement of CSE, PESITM, Shimoga
No ratings yet
By, Mrs - Prathibha S, Assistant Professor, Departement of CSE, PESITM, Shimoga
13 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
85 pages
CH 1 Introduction To Data Science
100% (1)
CH 1 Introduction To Data Science
27 pages
Notes Unit1 Unit2
No ratings yet
Notes Unit1 Unit2
83 pages
Data Science M-1 Notes
No ratings yet
Data Science M-1 Notes
34 pages
UNIT I Notes BA
No ratings yet
UNIT I Notes BA
79 pages
Data Science
No ratings yet
Data Science
46 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
53 pages
Oracle Exam Dump Merged
No ratings yet
Oracle Exam Dump Merged
84 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
24 pages
Unit 1 DA
No ratings yet
Unit 1 DA
72 pages
Sqe Bettersoftware 0111
No ratings yet
Sqe Bettersoftware 0111
90 pages
CD101 Fundamental of Data Science
No ratings yet
CD101 Fundamental of Data Science
41 pages
Handbook Introduction of Data Science AY 23-24
No ratings yet
Handbook Introduction of Data Science AY 23-24
171 pages
DS Unit-1 PDF
No ratings yet
DS Unit-1 PDF
50 pages
Data Science Basics
No ratings yet
Data Science Basics
25 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
Unit 1-FDS
100% (2)
Unit 1-FDS
18 pages
Datas Unit1
No ratings yet
Datas Unit1
20 pages
DS B&V-1
No ratings yet
DS B&V-1
30 pages
02 Introduction - Fall 23-24
No ratings yet
02 Introduction - Fall 23-24
29 pages
OceanofPDF - Com DATA SCIENCE Simple and Effective Tips An - Benjamin Smith
100% (1)
OceanofPDF - Com DATA SCIENCE Simple and Effective Tips An - Benjamin Smith
122 pages
Data Science CLASS 12 INVESTIGATORY PROJECT
No ratings yet
Data Science CLASS 12 INVESTIGATORY PROJECT
9 pages
Data Science Life Cycle
No ratings yet
Data Science Life Cycle
12 pages
Unit 1
No ratings yet
Unit 1
60 pages
Ch7-Overview of Data Science-Part 1
No ratings yet
Ch7-Overview of Data Science-Part 1
37 pages
DS Unit 1
No ratings yet
DS Unit 1
35 pages
The Transformative Role of Data Science in Contemporary Society
No ratings yet
The Transformative Role of Data Science in Contemporary Society
14 pages
Unit-1 Data Science
No ratings yet
Unit-1 Data Science
74 pages
Fundamentals of Data Science Unit 1
No ratings yet
Fundamentals of Data Science Unit 1
33 pages
cc-06 Semantic-Analysis Slides
No ratings yet
cc-06 Semantic-Analysis Slides
83 pages
Intro Lectures To DSA
0% (1)
Intro Lectures To DSA
17 pages
Task 2a
No ratings yet
Task 2a
16 pages
Unit 1
No ratings yet
Unit 1
28 pages
Computational Data Science - Unit 1
No ratings yet
Computational Data Science - Unit 1
18 pages
Data Science Introduction
No ratings yet
Data Science Introduction
24 pages
Introduction To Data Science What Is Data Science?
No ratings yet
Introduction To Data Science What Is Data Science?
11 pages
Vishwha D
No ratings yet
Vishwha D
29 pages
Basics of Data Science KPK
No ratings yet
Basics of Data Science KPK
38 pages
Differences Between Data Science and Data Analytics
No ratings yet
Differences Between Data Science and Data Analytics
10 pages
Session 1819
No ratings yet
Session 1819
47 pages
DSF 1-2
No ratings yet
DSF 1-2
28 pages
Data Science Consulting Infographics by Slidesgo
No ratings yet
Data Science Consulting Infographics by Slidesgo
12 pages
AI UNIT 1 Data Science
No ratings yet
AI UNIT 1 Data Science
16 pages
Data Science Ppt1 Update
No ratings yet
Data Science Ppt1 Update
67 pages
Data Science Consulting Infographics by Slidesgo - 1
No ratings yet
Data Science Consulting Infographics by Slidesgo - 1
12 pages
Anu Data Scie
No ratings yet
Anu Data Scie
32 pages
2 Data Science Process 06-01-2024
No ratings yet
2 Data Science Process 06-01-2024
32 pages
Data Science and Its Importance
No ratings yet
Data Science and Its Importance
9 pages
Data Science PDF
No ratings yet
Data Science PDF
8 pages
Basic of Ds
No ratings yet
Basic of Ds
14 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
8 pages
Data Science
No ratings yet
Data Science
18 pages
3 VCutWorks Software RDD6584
No ratings yet
3 VCutWorks Software RDD6584
96 pages
Unit 1 Part 1
No ratings yet
Unit 1 Part 1
18 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
7 pages
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
No ratings yet
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
5 pages
Seminar On Data Science
100% (7)
Seminar On Data Science
25 pages
Aa7b3 Data Structure Mcqs
No ratings yet
Aa7b3 Data Structure Mcqs
80 pages
Schneider Magelis Series Catalog
No ratings yet
Schneider Magelis Series Catalog
159 pages
Chap10 HTML Forms
No ratings yet
Chap10 HTML Forms
21 pages
1 1 Intro To Data and Data Science Course Notes
No ratings yet
1 1 Intro To Data and Data Science Course Notes
8 pages
Microsoft Powerbi Connection Adw
No ratings yet
Microsoft Powerbi Connection Adw
11 pages
Information Technology Support Service: Learning Guide #4
No ratings yet
Information Technology Support Service: Learning Guide #4
17 pages
Unit-3 Memory-Management
No ratings yet
Unit-3 Memory-Management
81 pages
FortiPAM Datasheet
No ratings yet
FortiPAM Datasheet
5 pages
Chapter 3.1
No ratings yet
Chapter 3.1
44 pages
Memory Management SCT
No ratings yet
Memory Management SCT
12 pages
24S - W09 - Final
No ratings yet
24S - W09 - Final
10 pages
Game Manual
No ratings yet
Game Manual
21 pages
Shubham Gupta Resume-5 PDF
No ratings yet
Shubham Gupta Resume-5 PDF
1 page
FINAL
No ratings yet
FINAL
28 pages
Item Property
No ratings yet
Item Property
50 pages
X Wave Key
No ratings yet
X Wave Key
1 page
SOCKS Change Your IP To Help You
100% (3)
SOCKS Change Your IP To Help You
3 pages
Temperature Monitor Using Bluetooth Module and An NTC Thermistor
No ratings yet
Temperature Monitor Using Bluetooth Module and An NTC Thermistor
11 pages
Isa 62443-2-3
100% (1)
Isa 62443-2-3
70 pages
Ch5 Group2docx
No ratings yet
Ch5 Group2docx
3 pages
Mib1 Patch en Mhig V0.1.de - en
No ratings yet
Mib1 Patch en Mhig V0.1.de - en
30 pages
Add Your Blog! - R-Bloggers
No ratings yet
Add Your Blog! - R-Bloggers
4 pages
New 308
No ratings yet
New 308
3 pages
Snooping Protocols Examples
No ratings yet
Snooping Protocols Examples
6 pages
Slide Master Bomb Game Generator: - MARCH 2019
No ratings yet
Slide Master Bomb Game Generator: - MARCH 2019
2 pages
INFORMATION SHEETS Techniques For Diagnosing Computer Systems
100% (1)
INFORMATION SHEETS Techniques For Diagnosing Computer Systems
4 pages
PDF To Autocad - Google Search
No ratings yet
PDF To Autocad - Google Search
3 pages
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
Data Science Unveiled: A Practical Guide to Key Techniques
From Everand
Data Science Unveiled: A Practical Guide to Key Techniques
Ed A Norex
No ratings yet

Applied - Data - Science MODULE 1 SEM8

Uploaded by

Applied - Data - Science MODULE 1 SEM8

Uploaded by

MODULE 1

QData Science Tasks Description Algorithms and Examples

decline after Halloween in the United States).

Business intelligence: Business intelligence helps organizations consume data effectively.

components of Data Science

Q difference between data science and data analytics?

Below is a table of differences between Data Science and Data

Language language for data science along R Language is essential for

Data Science Process:

Data Extraction – Data extraction is the process of collecting or

Exploratory Data Analysis(EDA) – It refers to the censorious process

Predictive analytics – It looks at historical and current data patterns to

Model deployment: In model deployment the model is deployed in the

Q Explain the motivation for data science

• A rapid increase in the volume of data

•exposes the limitations of current analysis methodologies.

•The user record has multiple attributes and can be visualized in

The machine-learning algorithms need to be leveraged in order to automate searching in the

A linear regression is a classic example of this

However, there is a significant need to extract nuggets of information from

Q Applications of Data Science

Image recognition and speech recognition: Data science is

Internet search: When we want to search for something on the

Transport: Transport industries also using data science

Healthcare: In the healthcare sector, data science is providing

Recommendation systems: Most of the companies, such as

Risk detection: Finance industries always had an issue of fraud

You might also like