0% found this document useful (0 votes)

914 views19 pages

Chapter 1 Introduction To Big Data

This document provides an introduction to big data, including definitions and characteristics. It defines big data as massive datasets from various sources that are analyzed to reveal patterns and optimize decision making. Key characteristics of big data include volume, velocity, and variety. There are three main types of data: structured, unstructured, and semi-structured. Traditional data management stored structured data in data warehouses, while big data tools like Hadoop can handle large volumes and varieties of data more efficiently. The document provides examples of how insurance companies, manufacturers, hotels, and public services can benefit from big data analytics.

Uploaded by

shubham.ojha2102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

914 views19 pages

Chapter 1 Introduction To Big Data

Uploaded by

shubham.ojha2102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Chapter 1

Introduction to Big Data

Introduction

1. What is BigData?

2. BigData Characteristics

3. Types of BigData

4. Traditional vs. Big Data business approach

5. Case study of Big Data Solutions

What is BigData?

• Massive datasets

• Collected from variety of data sources

• E-business and social media creates 2.5 Exabyte(1018 byte) of

data per day.

• To reveal new insights for optimized decision making.

• Used to stored for analysis to reveal hidden correlation and

patterns which is “BIG DATA ANALYTICS”
Trends of Data Generation

Year: 2020
Data: 50 ZB

Year: 2017
Data: 30 ZB

Year: 2010
Data: 20 ZB

Year:
2006
Data: 10
ZB
Big Data: Results of 3 computing Trends

Social Network Big Data Cloud Computing

Mobile
compu
ting
Volume of Big Data

Big Data (In Petabytes)

Web (In Terabytes)

CRM (In Gigabytes)

ERP (In Megabytes)

Transaction Operations

Customer Segmentation Support

Offer History Dynamic Pricing Behavior Weblogs

Sensor RFID UserClick Mobile Web

Characteristics of Big Data

1. Volume

2. Velocity

3. Variety
Five V’s of Big Data
Types of Big Data
What is Structured Data?

• Structured data usually resides in relational databases (RDBMS).

• Even text strings of variable length like names are contained in records,
making it a simple matter to search.

• Data may be human- or machine-generated as long as the data is created

within an RDBMS structure.
• This format is eminently searchable both with human generated queries and
via algorithms using type of data and field names, such as alphabetical or
numeric, currency or date.

• Common relational database applications with structured data include airline

reservation systems, inventory control, sales transactions, and ATM
activity.

• Structured Query Language (SQL) enables queries on this type of structured

data within relational databases.
What is Unstructured Data?
• Unstructured data has internal structure but is not structured via pre-
defined data models or schema.
• It may be textual or non-textual, and human- or machine-generated.
• It may also be stored within a non-relational database like NoSQL.

• Typical human-generated unstructured data includes:

1. Text files: Word processing, spreadsheets, presentations, email, logs.
2. Social Media: Data from Facebook, Twitter, LinkedIn.
3. Website: YouTube, Instagram, photo sharing sites.
4. Mobile data: Text messages, locations.
5. Communications: Chat, IM, phone recordings, collaboration software.
6. Media: MP3, digital photos, audio and video files.
7. Business applications: MS Office documents, productivity applications.

• Typical machine-generated unstructured data includes:

1. Satellite imagery: Weather data, land forms, military movements.
2. Scientific data: Oil and gas exploration, space exploration, seismic imagery,
atmospheric data.
3. Digital surveillance: Surveillance photos and video.
4. Sensor data: Traffic, weather, oceanographic sensors.
What is Semi-structured data ?

• Semi-structured data maintains internal tags and markings that identify

separate data elements, which enables information grouping and
hierarchies.
• Both documents and databases can be semi-structured.
• Email is a very common example of a semi-structured data type.

• Examples of Semi-structured Data:

1. Markup language XML : XML is a set of document encoding rules that defines
a human- and machine-readable format.
2. Open standard JSON (JavaScript Object Notation) : Its structure consists of
name/value pairs (or object, hash table, etc.) and an ordered value list (or array,
sequence, list).
3. NoSQL : NoSQL databases differ from relational databases because they do not
separate the organization (schema) from the data. It also allows for easier data
exchange between databases. Some newer NoSQL databases
ike MongoDB and Couchbase .
Traditional data management Approach

• Traditional data management store structure data in data

mart and data warehouses which are distributed
throughout the organization.

• Copying all the data from each of these systems to a

centralized location and keeping it updated is not an easy
task.

• Moreover, sampling the data will not serve the purpose of

extracting required information.

• This approach was able to handle huge volume of

transactions but up to an extent.
Big Data Approach

• Many IT tools are available for Big Data projects.

• Hadoop- Storage requirement

• Apache Spark- Stream Processing

• When used, these tools can dramatically reduce the time-to-

value- in most cases from more than 2 years to less than 4
months.
Advantages of using Hadoop:

1. Scalability
2. No pre-processing of data
3. Handles un-structure data
4. No limit of data and time
5. Protection against H/W failure
Beneficial Domains
• Insurance companies: To understand the likelihood of fraud by
accessing the internal and external data while processing claims.

• Manufacturers and Distributers: benefitted by realizing supply

chain issues earlier so that they can take decisions on different logistical
approaches to avoid the additional cost associated with material delays,
overstock or stock-out conditions.

• Hotels and Telecommunications companies: to serves

customers likely to have better clarity on customer needs.

• Public Services: such as traffic, ambulance, transportations, etc can

optimize their delivery mechanism.

• Smart city: To make cities more efficient and sustainable

to improve the lives of the citizens.
Case Study
1. Clickstream Analytics
2. Feedback analysis using word count
Thank you

JNDI Java Application Guide
No ratings yet
JNDI Java Application Guide
14 pages
Notes of Unit - 5 (DBMS)
No ratings yet
Notes of Unit - 5 (DBMS)
8 pages
OSI Model: A Guide for Developers
100% (1)
OSI Model: A Guide for Developers
6 pages
Network Protocols Explained
No ratings yet
Network Protocols Explained
115 pages
Case Study On Dbms & Rdbms
No ratings yet
Case Study On Dbms & Rdbms
36 pages
Indexer
No ratings yet
Indexer
11 pages
Ids Unit 2 Data Types and Attributes Notes
No ratings yet
Ids Unit 2 Data Types and Attributes Notes
40 pages
Introduction To The Linux Operating System
No ratings yet
Introduction To The Linux Operating System
5 pages
CS Unit 5
No ratings yet
CS Unit 5
7 pages
Bit Slice Processor
No ratings yet
Bit Slice Processor
1 page
Talend Data Integration Advanced
No ratings yet
Talend Data Integration Advanced
2 pages
Bca III - IV Semfinal2022-23
No ratings yet
Bca III - IV Semfinal2022-23
27 pages
Java OOPs Concepts Presentation
No ratings yet
Java OOPs Concepts Presentation
11 pages
Big Data Analytics Unit - 1 Notes
No ratings yet
Big Data Analytics Unit - 1 Notes
24 pages
Car Driving Schhol Project Report
100% (1)
Car Driving Schhol Project Report
41 pages
Internet Standards & TCP/IP Guide
No ratings yet
Internet Standards & TCP/IP Guide
13 pages
Random Access Protocol
No ratings yet
Random Access Protocol
18 pages
Industrial Extreme Programming: Submitted By: Group 3 Submitted To
No ratings yet
Industrial Extreme Programming: Submitted By: Group 3 Submitted To
7 pages
MSDOS Networking for IT Students
No ratings yet
MSDOS Networking for IT Students
5 pages
Paging: Conversion of Logical Address Into Physical Address
No ratings yet
Paging: Conversion of Logical Address Into Physical Address
2 pages
What Is Data Structure
No ratings yet
What Is Data Structure
49 pages
Virtual Memory in Operating System.
No ratings yet
Virtual Memory in Operating System.
21 pages
5 Pca
No ratings yet
5 Pca
14 pages
Practical Journal Sna With Writeups
No ratings yet
Practical Journal Sna With Writeups
37 pages
Enterprise Information Architecture Component Model - Chapter 5
100% (1)
Enterprise Information Architecture Component Model - Chapter 5
27 pages
Characteristics of Data Structures
No ratings yet
Characteristics of Data Structures
2 pages
File Access Methods in System
No ratings yet
File Access Methods in System
6 pages
SAN Question Bank With Solution
No ratings yet
SAN Question Bank With Solution
13 pages
Unit-II BDA
No ratings yet
Unit-II BDA
19 pages
NoSQL Course for B.Tech Students
No ratings yet
NoSQL Course for B.Tech Students
85 pages
Characteristics Multi Processors
No ratings yet
Characteristics Multi Processors
7 pages
Lab Manual No 03
No ratings yet
Lab Manual No 03
29 pages
UNIT-1 DBMS AKTU Class Nots PDF
100% (1)
UNIT-1 DBMS AKTU Class Nots PDF
24 pages
Trace Baced Collection
No ratings yet
Trace Baced Collection
7 pages
Cs2056 Distributed Systems
100% (1)
Cs2056 Distributed Systems
1 page
Indian Contribution To Parallel Processing
No ratings yet
Indian Contribution To Parallel Processing
5 pages
CN Handwritten Notes
No ratings yet
CN Handwritten Notes
75 pages
Types of OS and Computing Environments
No ratings yet
Types of OS and Computing Environments
5 pages
DCA6201 Operating System (All Units) PDF
No ratings yet
DCA6201 Operating System (All Units) PDF
258 pages
Unix File System Case Study
No ratings yet
Unix File System Case Study
23 pages
Chapter 2 - Memory Management (Simple Systems)
No ratings yet
Chapter 2 - Memory Management (Simple Systems)
31 pages
Paying Guest Accommodation Project
No ratings yet
Paying Guest Accommodation Project
95 pages
STV Lab Final Yr
No ratings yet
STV Lab Final Yr
83 pages
Unit - III
No ratings yet
Unit - III
34 pages
MySQL Tutorial
No ratings yet
MySQL Tutorial
176 pages
J2EE Short Note
No ratings yet
J2EE Short Note
3 pages
Hibernate Architecture
No ratings yet
Hibernate Architecture
18 pages
File Handling
No ratings yet
File Handling
4 pages
Sonali DBMS Notes
100% (13)
Sonali DBMS Notes
61 pages
Serializability
No ratings yet
Serializability
10 pages
Library Management System SRS Report Lib
No ratings yet
Library Management System SRS Report Lib
13 pages
5 Software - Architectures - Detailed - PPT
100% (1)
5 Software - Architectures - Detailed - PPT
12 pages
Module 1 BDA
No ratings yet
Module 1 BDA
103 pages
Unit - I Part I
No ratings yet
Unit - I Part I
48 pages
Unit 1
No ratings yet
Unit 1
26 pages
Unit 1.1 - Introduction To Big Data Analytics
No ratings yet
Unit 1.1 - Introduction To Big Data Analytics
19 pages
Module 1 Intro To Big Data - Hadoop
No ratings yet
Module 1 Intro To Big Data - Hadoop
55 pages
Bda M1
No ratings yet
Bda M1
111 pages
Big Data UNIT I
No ratings yet
Big Data UNIT I
91 pages
BSNL KL RP-02870 L2100 Final Report
No ratings yet
BSNL KL RP-02870 L2100 Final Report
22 pages
Encrypted Document Analysis
91% (11)
Encrypted Document Analysis
109 pages
MN67644 ENG CANtoEthernetadaptor
No ratings yet
MN67644 ENG CANtoEthernetadaptor
41 pages
International Day of The World's Indigenous Peoples by Slidesgo
No ratings yet
International Day of The World's Indigenous Peoples by Slidesgo
55 pages
Food Processing Technology: A Virtual Exhibition On
No ratings yet
Food Processing Technology: A Virtual Exhibition On
1 page
Sujit Skill
No ratings yet
Sujit Skill
2 pages
2210 - 02 - Status Basic - Program Design 2024
No ratings yet
2210 - 02 - Status Basic - Program Design 2024
6 pages
Class 5 Chapter 1
No ratings yet
Class 5 Chapter 1
3 pages
0 - Course Introduction
No ratings yet
0 - Course Introduction
21 pages
1.5 Workbook Finished
No ratings yet
1.5 Workbook Finished
11 pages
CECS 7510 Assignment II
No ratings yet
CECS 7510 Assignment II
4 pages
Cs8392 Object Oriented Programming: Unit Iii Exception Handling and I/O
No ratings yet
Cs8392 Object Oriented Programming: Unit Iii Exception Handling and I/O
43 pages
Slide Show
No ratings yet
Slide Show
27 pages
Surya
No ratings yet
Surya
87 pages
ARIES Recovery Algorithm
No ratings yet
ARIES Recovery Algorithm
4 pages
Types of Internet A Complete Introduction
No ratings yet
Types of Internet A Complete Introduction
6 pages
Syllabus: 4-Years B.Tech. in Computer Science and Engineering (Specialization in Artificial Intelligence)
No ratings yet
Syllabus: 4-Years B.Tech. in Computer Science and Engineering (Specialization in Artificial Intelligence)
15 pages
Fashion Communication Pupil With An Eye For Detail. Critical Skills in Brand Analysis, Editorial, and Web Management
No ratings yet
Fashion Communication Pupil With An Eye For Detail. Critical Skills in Brand Analysis, Editorial, and Web Management
1 page
T5 Homework 5 Secondary Storage Answers
No ratings yet
T5 Homework 5 Secondary Storage Answers
1 page
Invoice
No ratings yet
Invoice
1 page
LPDDR5信号完整性研究
No ratings yet
LPDDR5信号完整性研究
14 pages
Pip Adg003 Specification For Editing Practices: Work Processes
No ratings yet
Pip Adg003 Specification For Editing Practices: Work Processes
34 pages
Division: Check For 0 Divisor Long Division Approach
No ratings yet
Division: Check For 0 Divisor Long Division Approach
27 pages
YAH Application Status 23'
No ratings yet
YAH Application Status 23'
31 pages
Book Store Proposal
No ratings yet
Book Store Proposal
8 pages
Fix Dell System Restore Issues
No ratings yet
Fix Dell System Restore Issues
6 pages
Resolving QuickBooks Error H303
No ratings yet
Resolving QuickBooks Error H303
2 pages
Task 1 Instructions
No ratings yet
Task 1 Instructions
3 pages
Unit5 Computer Vision Questionans
No ratings yet
Unit5 Computer Vision Questionans
17 pages
AWS Glue
100% (1)
AWS Glue
225 pages

Chapter 1 Introduction To Big Data

Uploaded by

Chapter 1 Introduction To Big Data

Uploaded by

Chapter 1

Introduction to Big Data

4. Traditional vs. Big Data business approach

5. Case study of Big Data Solutions

• Collected from variety of data sources

• E-business and social media creates 2.5 Exabyte(1018 byte) of

• To reveal new insights for optimized decision making.

• Used to stored for analysis to reveal hidden correlation and

Social Network Big Data Cloud Computing

Big Data (In Petabytes)

Web (In Terabytes)

CRM (In Gigabytes)

ERP (In Megabytes)

Customer Segmentation Support

Offer History Dynamic Pricing Behavior Weblogs

Sensor RFID UserClick Mobile Web

• Structured data usually resides in relational databases (RDBMS).

• Data may be human- or machine-generated as long as the data is created

• Common relational database applications with structured data include airline

• Structured Query Language (SQL) enables queries on this type of structured

• Typical human-generated unstructured data includes:

• Typical machine-generated unstructured data includes:

• Semi-structured data maintains internal tags and markings that identify

• Examples of Semi-structured Data:

• Traditional data management store structure data in data

• Copying all the data from each of these systems to a

• Moreover, sampling the data will not serve the purpose of

• This approach was able to handle huge volume of

• Many IT tools are available for Big Data projects.

• Hadoop- Storage requirement

• When used, these tools can dramatically reduce the time-to-

• Manufacturers and Distributers: benefitted by realizing supply

• Hotels and Telecommunications companies: to serves

• Public Services: such as traffic, ambulance, transportations, etc can

• Smart city: To make cities more efficient and sustainable

You might also like