0% found this document useful (0 votes)

9 views43 pages

Chapter 2 Data Science

This chapter provides an overview of data science, including its definition, the role of data scientists, and the data processing life cycle. It discusses the significance of data types, the data value chain, and the basics of Big Data and the Hadoop ecosystem. Additionally, it highlights the advantages and challenges of data science, along with the characteristics and life cycle of Big Data.

Uploaded by

fitsumtesfa1921

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views43 pages

Chapter 2 Data Science

Uploaded by

fitsumtesfa1921

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 43

CHAPTER TWO: Data Science

09/20/2025 1
After completing this chapter, the students will be able to:

➢ Describe what data science is and the role of data scientists.

➢ Differentiate data and information.
➢ Describe data processing life cycle
➢ Understand different data types from diverse perspectives
➢ Describe data value chain in emerging era of big data.
➢ Understand the basics of Big Data.
➢ Describe the purpose of the Hadoop ecosystem components.
09/20/2025 2
2.1. Overview of Data Science

Data science is a multi-disciplinary field that uses scientific

methods, processes, algorithms, and systems to extract knowledge
and insights from structured, semi-structured and unstructured data.
Data Science is the area of study which involves extracting insights
from vast amounts of data by the use of various scientific methods,
algorithms, and processes. It helps you to discover hidden patterns
from the raw data.
09/20/2025 3
Skills important for data science

• Statistics
• Linear algebra
• Programming knowledge

09/20/2025 4
Significant advantages of using Data Science

Data is the oil for today's world. With the right tools, technologies,
algorithms, we can use data and convert it into a distinctive business
advantage.
Data Science can help you to detect fraud using advanced machine
learning algorithms
It helps you to prevent any significant monetary losses

09/20/2025 5
Allows to build intelligence ability in machines

You can perform sentiment analysis to gauge customer brand loyalty

It enables you to take better and faster decisions

Helps you to recommend the right product to the right customer to

enhance your business

09/20/2025 6
Challenges of Data science
High variety of information & data is required for accurate analysis

Not adequate data science talent pool available

Management does not provide financial support for a data science

team
Unavailability of/difficult access to data

09/20/2025 7
Data Science results not effectively used by business decision
makers
Explaining data science to others is difficult

Privacy issues

Lack of significant domain expert

If an organization is very small, they can't have a Data Science

team
09/20/2025 8
What are data and information?
Data can be defined as a representation of facts, concepts, or
instructions in a formalized manner, which should be suitable for
communication, interpretation, or processing, by human or electronic
machines.
It can be described as unprocessed facts and figures.

It is represented with the help of characters such as alphabets (A-Z,

a-z), digits (0-9) or special characters (+, -, /, *, <,>, =, etc.).
09/20/2025 9
Information is the processed data on which decisions and actions
are based.
Information is data that has been processed into a form that is
meaningful to the recipient and is of real or perceived value in the
current or the prospective action or decision of recipient.
Furtherer more, information is interpreted data; created from
organized, structured, and processed data in a particular context.

09/20/2025 10
Data Processing Cycle
Data processing is the re-structuring or re-ordering of data by people
or machines to increase their usefulness and add values for a
particular purpose.
 Data processing consists of the following basic steps: Input,
Processing and Output. These three steps constitute the data
processing cycle.

Fig. 1.Data processing Cycle

09/20/2025 11
Input - in this step, the input data is prepared in some convenient form
for processing.
The form will depend on the processing machine.
For example, when electronic computers are used, the input data can
be recorded on any one of the several types of storage medium, such
as hard disk, CD, flash disk and so on.
Processing- in this step, the input data is changed to produce data in a
more useful form.
 For example, interest can be calculated on deposit to a bank, or a
summary of sales for the month can be calculated from the sales
orders.

09/20/2025 12
Output-at this stage, the result of the proceeding processing step is
collected.
The particular form of the output data depends on the use of the
data.
For example, output data may be payroll for employees.

09/20/2025 13
Data types and their representation
Data types can be described from diverse perspectives.
In computer science and computer programming, for
instance, a data type is simply an attribute of data that tells
the compiler or interpreter how the programmer intends to
use the data.

09/20/2025 14
1. Data types from Computer programming perspective
Almost all programming languages explicitly include the notion of
data type, though different languages may use different terminology.
Common data types include:
Integers(int)- is used to represent whole numbers, mathematically
known as integers
Booleans(bool)- is used to represent restricted to one of two
values:true or false
Characters(char)- is used to represent a single character
Floating-point numbers(float)- is used to represent real numbers
Alphanumeric strings(string)- used to represent a combination of
characters and numbers
09/20/2025 15
2. Data types from Data Analytics perspective
From a data analytics point of view, it is important to
understand that there are three common types of data
types or structures:
Structured
Semi-structured and
Unstructured data types.

09/20/2025 16
Structured Data
Structured data is data that adheres to a pre-defined data
model and is therefore straightforward to analyze.
 Structured data conforms to a tabular format with a
relationship between the different rows and columns.
 Common examples of structured data are Excel files or SQL
databases.
Each of these has structured rows and columns that can be
sorted.

09/20/2025 17
Semi-structured Data
Semi-structured data is a form of structured data that does not conform with
the formal structure of data models associated with relational databases or
other forms of data tables, but nonetheless, contains tags or other markers to
separate semantic elements and enforce hierarchies of records and fields
within the data.
Therefore, it is also known as a self-describing structure.

 Examples of semi-structured data include JSON and XML are forms of

semi-structured
09/20/2025
data. 18
Unstructured Data
Unstructured data is information that either does not have a predefined data
model or is not organized in a pre-defined manner.
Unstructured information is typically text-heavy but may contain data such as
dates, numbers, and facts as well.
This results in irregularities and ambiguities that make it difficult to understand
using traditional programs as compared to data stored in structured databases.
Common examples of unstructured data include audio, video files or NoSQL.

09/20/2025 19
Metadata – Data about Data
• The last category of data type is metadata.
• From a technical point of view, this is not a separate
data structure, but it is one of the most important
elements for Big Data analysis and big data solutions.
• Metadata is data about data.
• It provides additional information about a specific set
of data.
• In a set of photographs, for example, metadata could
describe when and where the photos were taken.
09/20/2025 20
Data value Chain
The Data Value Chain is introduced to describe the information flow
within a big data system as a series of steps needed to generate value
and useful insights from data. The Big Data Value Chain identifies the
following key high-level activities:

Fig2.Data Value Chain

09/20/2025 21
1. Data Acquisition
• It is the process of gathering, filtering, and cleaning data
before it is put in a data warehouse or any other storage
solution on which data analysis can be carried out.
• Data acquisition is one of the major big data challenges in
terms of infrastructure requirements.
• The infrastructure required to support the acquisition of big
data must deliver low, predictable latency in both capturing
data and in executing queries; be able to handle very high
transaction volumes, often in a distributed environment and
support flexible and dynamic data structures.
09/20/2025 22
2. Data Analysis
• It is concerned with making the raw data acquired
amenable to use in decision-making as well as domain-
specific usage.
• Data analysis involves exploring, transforming, and
modeling data with the goal of highlighting relevant
data, synthesizing and extracting useful hidden
information with high potential from a business point of
view.
• Related areas include data mining, business intelligence,
and machine learning.
09/20/2025 23
3. Data Curation
• It is the active management of data over its life cycle to ensure it meets
the necessary data quality requirements for its effective usage.
• Data curation processes can be categorized into different activities
such as content creation, selection, classification, transformation,
validation, and preservation.
• Data curation is performed by expert curators that are responsible for
improving the accessibility and quality of data.
• Data curators (also known as scientific curators or data annotators) hold
the responsibility of ensuring that data are trustworthy, discoverable,
accessible, reusable and fit their purpose.
• A key trend for the duration of big data utilizes community and crowd
sourcing approaches. 24
4. Data Storage
• It is the persistence and management of data in a scalable way
that satisfies the needs of applications that require fast access
to the data.
• Relational Database Management Systems (RDBMS) have
been the main, and almost unique, a solution to the storage
paradigm for nearly 40 years.
• complexity grow, making them unsuitable for big data
scenarios.
• NoSQL technologies have been designed with the scalability
goal in mind and present a wide range of solutions based on
alternative data models. 25
5. Data Usage
• It covers the data-driven business activities that need
access to data, its analysis, and the tools needed to
integrate the data analysis within the business activity.
• Data usage in business decision making can enhance
competitiveness through the reduction of costs,
increased added value, or any other parameter that can
be measured against existing performance criteria.

09/20/2025 26
Basic concepts of big data
What Is Big Data?
• Big data is the term for a collection of data sets so large and
complex that it becomes difficult to process using on-hand
database management tools or traditional data processing
applications.
• In this context, a “large dataset” means a dataset too large to
reasonably process or store with traditional tooling or on a single
computer.
• This means that the common scale of big datasets is constantly
shifting and may vary significantly from organization to
organization.
09/20/2025 27
Big data is characterized by 4V
• Volume: large amounts of data Zeta bytes/Massive datasets
• Velocity: Data is live streaming or in motion
• Variety: data comes in many different forms from diverse sources
• Veracity: can we trust the data? How accurate is it? etc.

Fig 3. Characteristics of Big data

09/20/2025 28
09/20/2025 29
Big Data Solutions
Clustered Computing and Hadoop Ecosystem
Clustered Computing
Because of the qualities of big data, individual computers are
often inadequate for handling the data at most stages.
To better address the high storage and computational needs of
big data, computer clusters are a better fit.
Cluster Computing:-a form of computing, a group of computer
connected through internet and perform like single machine
 Big data clustering software combines the resources of many
smaller machines, seeking to provide a number of benefits:
09/20/2025 30
• Resource Pooling: Combining the available storage space to hold data
is a clear benefit, but CPU and memory pooling are also extremely
important.
• Processing large datasets requires large amounts of all three of these
resources.
• High Availability: Clusters can provide varying levels of fault tolerance
and availability guarantees to prevent hardware or software failures
from affecting access to data and processing.
• This becomes increasingly important as we continue to emphasize the
importance of real-time analytics.
• Easy Scalability: Clusters make it easy to scale horizontally by adding
additional machines to the group.
• This means the system can react to changes in resource requirements
without expanding the physical resources on a machine.
09/20/2025 31
Hadoop and its Ecosystem
Hadoop is an open-source framework intended to make
interaction with big data easier.
Apache open source software framework for reliable,
scalable, distributed computing over massive amount of data
It is a framework that allows for the distributed processing of
large datasets across clusters of computers using simple
programming models.
It is inspired by a technical document published by Google.
The four key characteristics of Hadoop are:

09/20/2025 32
• Economical: Its systems are highly economical as ordinary
computers can be used for data processing.
• Reliable: It is reliable as it stores copies of the data on
different machines and is resistant to hardware failure.
• Scalable: It is easily scalable both, horizontally and vertically.
A few extra nodes help in scaling up the framework.
• Flexible: It is flexible and you can store as much structured and
unstructured data as you need to and decide to use them later.
Hadoop has an ecosystem that has evolved from its four core
components: data management, access, processing, and
storage.
It is continuously growing to meet the needs of Big Data.
09/20/2025 33
It comprises the following components and many others:
 HDFS: Hadoop Distributed File System-where Hadoop stores
data
 YARN: Yet Another Resource Negotiator-A framework for job
scheduling and cluster resource management.
 MapReduce: Programming based Data Processing
 Spark: In-Memory data processing
 PIG, HIVE: Query-based processing of data services
 HBase: NoSQL Database
 Mahout, Spark MLLib: Machine Learning algorithm libraries
 Solar, Lucene: Searching and Indexing
 Zookeeper: Managing cluster
 Oozie: Job Scheduling
09/20/2025 34
Figure 4 Hadoop Ecosystem

09/20/2025 35
Big Data Life Cycle with Hadoop
1. Ingesting data into the system
The first stage of Big Data processing is Ingest.
The data is ingested or transferred to Hadoop from various
sources such as relational databases, systems, or local files.
Sqoop transfers data from RDBMS to HDFS, whereas
Flume transfers event data.
 Sqoop Tool to easily import information from structured
databases (MySQL, Oracle, etc.) and related Hadoop
systems (such as Hive and HBase) into your Hadoop cluster.
09/20/2025 36
2. Processing the data in storage
The second stage is Processing. In this stage, the data is
stored and processed.
The data is stored in the distributed file system, HDFS, and
the NoSQL distributed data, HBase.
Spark and MapReduce perform data processing.
Spark:A fast and general compute engine for Hadoop data.

09/20/2025 37
3. Computing and analyzing data
 The third stage is to Analyze.
Here, the data is analyzed by processing frameworks
such as Pig, Hive, and Impala.
Pig converts the data using a map and reduce and then
analyzes it.
Hive is also based on the map and reduce programming
and is most suitable for structured data.

09/20/2025 38
4. Visualizing the results
The fourth stage is Access, which is performed by tools
such as Hue and Cloudera Search.
In this stage, the analyzed data can be accessed by users.

09/20/2025 39
Advantages and disadvantages of Hadoop
Hadoop is good for:
 processing massive amounts of data through parallelism
handling a variety of data (structured, unstructured, semi-structured)
 using inexpensive commodity hardware
Hadoop is not good for:
 Processing transactions (random access)
 when work cannot be parallelized
 Low latency data access
 Processing lots of small files
Intensive calculations with small amounts of data
09/20/2025 40
Big Data vs Data Science
Factors Big Data Data Science
Concept Handling large Data Analyzing data
Responsibility Processing huge volume of Understand pattern
data and generate insights within and make
decisions
Industry E-commerce ,security Sales, image
services, telecommunication recognition,
advertisement ,risk
analytics
Tools Hadoop Python ,R

09/20/2025 41
THANK YOU ?

09/20/2025 42
Quiz 1 (10%) Time allotted 18’
Write your Name, ID, Section
1.What is clustered computing? and list benefits of clustered computing?
2. What is data science and explain data processing cycle ?
3. What is Hadoop and what are the four core components of Hadoop ?
4. What is Big data and explain characteristics of Big data?
5. What is data type and explain data type from data analytic perspective ?
6. List key high-level activities in Data Value Chain and explain it

09/20/2025 43

Chapter 2 Data Science1
No ratings yet
Chapter 2 Data Science1
41 pages
Data Science: Insights & Challenges
No ratings yet
Data Science: Insights & Challenges
33 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
27 pages
Ict Ch. 2
No ratings yet
Ict Ch. 2
38 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
28 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
33 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Emerging Tech CH 2
No ratings yet
Emerging Tech CH 2
52 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
57 pages
EmgTech Chapter 02
No ratings yet
EmgTech Chapter 02
52 pages
Chapter 2. Introduction To Data Science
No ratings yet
Chapter 2. Introduction To Data Science
41 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
30 pages
Chapter Two
No ratings yet
Chapter Two
14 pages
Chapter-2 Data Science2
No ratings yet
Chapter-2 Data Science2
24 pages
Data Science and Big Data Basics
No ratings yet
Data Science and Big Data Basics
32 pages
Chapter 2 EmTe
No ratings yet
Chapter 2 EmTe
37 pages
ETCh 2
No ratings yet
ETCh 2
36 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
56 pages
ET Ch-2 Data Science PPT
No ratings yet
ET Ch-2 Data Science PPT
28 pages
Chapter 2 (Data Science)
No ratings yet
Chapter 2 (Data Science)
35 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
58 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
30 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
37 pages
Chapter 2 Emerging
No ratings yet
Chapter 2 Emerging
31 pages
(ET) Chapter - 2
No ratings yet
(ET) Chapter - 2
31 pages
Course Name: Introduction To Emerging Technologies
No ratings yet
Course Name: Introduction To Emerging Technologies
24 pages
Chapter 2EMR
No ratings yet
Chapter 2EMR
21 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
Multidisciplinary Field That Uses A Variety
No ratings yet
Multidisciplinary Field That Uses A Variety
48 pages
CH-2 Introduction To Data Science
No ratings yet
CH-2 Introduction To Data Science
26 pages
Data Science
No ratings yet
Data Science
32 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Data Science: Chapter Two
No ratings yet
Data Science: Chapter Two
8 pages
Chaoter Data Science
No ratings yet
Chaoter Data Science
20 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
37 pages
IT 106 - Intro To Data Sciences
No ratings yet
IT 106 - Intro To Data Sciences
32 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
22 pages
Chapter 2 - EMTE - 240216 - 133452
No ratings yet
Chapter 2 - EMTE - 240216 - 133452
47 pages
Chapter 2. Introduction To Data Science
No ratings yet
Chapter 2. Introduction To Data Science
40 pages
Chapter 2. Introduction To Data Science
100% (2)
Chapter 2. Introduction To Data Science
45 pages
Chapter - 2 Data Sciences
No ratings yet
Chapter - 2 Data Sciences
25 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
35 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
55 pages
Introduction To Emerging Technologies Chapter 2
No ratings yet
Introduction To Emerging Technologies Chapter 2
31 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
Emerging CH2
No ratings yet
Emerging CH2
41 pages
Data Lifecycle
No ratings yet
Data Lifecycle
55 pages
Data Science Basics for Students
No ratings yet
Data Science Basics for Students
9 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
52 pages
Chapter 2 EMTE@Kibru 014914
No ratings yet
Chapter 2 EMTE@Kibru 014914
40 pages
Chapter 2: Data Science
No ratings yet
Chapter 2: Data Science
32 pages
Data Science Essentials for Beginners
No ratings yet
Data Science Essentials for Beginners
20 pages
Chapter 2 - Intro To Data Sciences (Updated)
No ratings yet
Chapter 2 - Intro To Data Sciences (Updated)
67 pages
CHAPTER 2 Emerging
No ratings yet
CHAPTER 2 Emerging
8 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
36 pages
SALA
No ratings yet
SALA
18 pages
There Are Three Civil Judgment Review Mechanisms in Ethiopia
No ratings yet
There Are Three Civil Judgment Review Mechanisms in Ethiopia
3 pages
Part Iii
No ratings yet
Part Iii
25 pages
Introduction
No ratings yet
Introduction
1 page
Teyis Fitsum
No ratings yet
Teyis Fitsum
11 pages
Cip Ansewer
No ratings yet
Cip Ansewer
10 pages
Pres CR Ption
No ratings yet
Pres CR Ption
1 page
Date
No ratings yet
Date
2 pages
Revised National Level Newly EXIT - EXAM - Schedule
No ratings yet
Revised National Level Newly EXIT - EXAM - Schedule
7 pages
Succession CH 3
No ratings yet
Succession CH 3
10 pages
Basically
No ratings yet
Basically
1 page
Consular Service Work Process Manual Directive No. 534 2013
No ratings yet
Consular Service Work Process Manual Directive No. 534 2013
47 pages
COC Quesition-1-1-1
No ratings yet
COC Quesition-1-1-1
22 pages
Assignment Ansewers
No ratings yet
Assignment Ansewers
14 pages
Certainly
No ratings yet
Certainly
6 pages
Tesfanesh Fitsum ID SLE 07414
No ratings yet
Tesfanesh Fitsum ID SLE 07414
3 pages
Chapter Four
No ratings yet
Chapter Four
13 pages
CIP II Individual Assignment 2017E
No ratings yet
CIP II Individual Assignment 2017E
3 pages
AN2295 - Developer's Serial Bootloader For M68HC08 and HCS08
No ratings yet
AN2295 - Developer's Serial Bootloader For M68HC08 and HCS08
56 pages
Internal Assessment 1 - Google Forms
No ratings yet
Internal Assessment 1 - Google Forms
13 pages
TABLE OF SPECIFICATIONS ICT and ENTREPRENEURSHIP
No ratings yet
TABLE OF SPECIFICATIONS ICT and ENTREPRENEURSHIP
8 pages
R K Comminucation Centre: Oxygen Recharge List
No ratings yet
R K Comminucation Centre: Oxygen Recharge List
5 pages
Analytic Number Theory: DR Mohib Ali
No ratings yet
Analytic Number Theory: DR Mohib Ali
33 pages
Notes Unit 1 and 2
No ratings yet
Notes Unit 1 and 2
19 pages
Unit 1-5 MCQ Opps
No ratings yet
Unit 1-5 MCQ Opps
134 pages
ISRO Exam Admit Card Guide
No ratings yet
ISRO Exam Admit Card Guide
4 pages
Add SRT To Video Automatically - Happy Scribe
No ratings yet
Add SRT To Video Automatically - Happy Scribe
4 pages
Software Engineering II MCQs
No ratings yet
Software Engineering II MCQs
46 pages
The Official Ubuntu Book Matthew Helmke Download
100% (2)
The Official Ubuntu Book Matthew Helmke Download
56 pages
Datasheet - Ultimate SU650 M.2 2280 SSD - 20190705
No ratings yet
Datasheet - Ultimate SU650 M.2 2280 SSD - 20190705
2 pages
Amazon Managed Streaming For Apache Kafka
No ratings yet
Amazon Managed Streaming For Apache Kafka
11 pages
SERVICE TV PHILIPS
100% (2)
SERVICE TV PHILIPS
145 pages
Torreta Development and Evaluation of A Competency Based Learning Module in Autocad 2d Instructions
No ratings yet
Torreta Development and Evaluation of A Competency Based Learning Module in Autocad 2d Instructions
51 pages
LARK MAX User Manual (English-French-German-Italian-Spanish-Portuguese) V.2.0.0
No ratings yet
LARK MAX User Manual (English-French-German-Italian-Spanish-Portuguese) V.2.0.0
232 pages
Podcasts, Voice Tools and Audio Recorders
No ratings yet
Podcasts, Voice Tools and Audio Recorders
39 pages
Internship Insights for Students
No ratings yet
Internship Insights for Students
39 pages
6.2.3 Homework Questions
0% (1)
6.2.3 Homework Questions
4 pages
Robotics Chapter 8 - PLCs
No ratings yet
Robotics Chapter 8 - PLCs
16 pages
Math Operations Practice and Solutions
No ratings yet
Math Operations Practice and Solutions
10 pages
(OptiX OSN 8800) Communication Between The NCE and A GNE Fails
No ratings yet
(OptiX OSN 8800) Communication Between The NCE and A GNE Fails
4 pages
Shumyk Ivanna, COa15-19, Seminar 1
No ratings yet
Shumyk Ivanna, COa15-19, Seminar 1
19 pages
Hadjaz Imane
No ratings yet
Hadjaz Imane
83 pages
Computer and Communication System Final
No ratings yet
Computer and Communication System Final
6 pages
Structures and Unions
No ratings yet
Structures and Unions
18 pages
ML Endsem
No ratings yet
ML Endsem
3 pages
pm0271 Guidelines For Bluetooth Low Energy Stack Programming On stm32wb Stm32wba Mcus Stmicroelectronics
No ratings yet
pm0271 Guidelines For Bluetooth Low Energy Stack Programming On stm32wb Stm32wba Mcus Stmicroelectronics
95 pages
Azure DevOps Certification Training Brochure
0% (1)
Azure DevOps Certification Training Brochure
18 pages
Add2Exchange Setup for IT Admins
No ratings yet
Add2Exchange Setup for IT Admins
175 pages

Chapter 2 Data Science

Uploaded by

Chapter 2 Data Science

Uploaded by

CHAPTER TWO: Data Science

➢ Describe what data science is and the role of data scientists.

Data science is a multi-disciplinary field that uses scientific

You can perform sentiment analysis to gauge customer brand loyalty

It enables you to take better and faster decisions

Helps you to recommend the right product to the right customer to

Not adequate data science talent pool available

Management does not provide financial support for a data science

Lack of significant domain expert

If an organization is very small, they can't have a Data Science

It is represented with the help of characters such as alphabets (A-Z,

Fig. 1.Data processing Cycle

 Examples of semi-structured data include JSON and XML are forms of

Fig2.Data Value Chain

Fig 3. Characteristics of Big data

You might also like