0% found this document useful (0 votes)

28 views35 pages

Chapter 14

The document discusses Big Data and NoSQL, focusing on their significance in modern business and the characteristics that differentiate them from traditional databases. It covers the Hadoop framework, its components, and various NoSQL database types, including document and graph databases like MongoDB and Neo4j. Additionally, it introduces NewSQL as a database model that combines the benefits of SQL with the scalability of NoSQL.

Uploaded by

Xenos Playground aka Boxman Studios

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views35 pages

Chapter 14

Uploaded by

Xenos Playground aka Boxman Studios

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

COMP255

Chapter 14
Big Data and NoSQL

1
Learning Objectives
●
Explain the role of Big Data in modern business
●
Describe the primary characteristics of Big Data and
how these go beyond the traditional “3 Vs”
●
Explain how the core components of the Hadoop
framework operate
●
Identify the major components of the Hadoop
ecosystem

2
Learning Objectives
●
Summarize the four major approaches of the NoSQL
data model and how they differ from the relational model
●
Describe the characteristics of NewSQL databases
●
Understand how to work with document databases
using MongoDB
●
Understand how to work with graph databases using
Neo4j

3
Big Data: Definitions
●
Volume: quantity of data to be stored
●
Scaling up: keeping the same number of systems
but migrating each one to a larger system
●
Scaling out: when the workload exceeds server
capacity, it is spread out across a number of
servers

4
Big Data: Definitions
●
Velocity: speed at which data is entered into system
and must be processed
●
Stream processing: focuses on input processing
and requires analysis of data stream as it enters the
system
●
Feedback loop processing: analysis of data to
produce actionable results

5
Feedback Loop Processing

6
Big Data: Definitions
●
Variety: variations in the structure of data to be
stored
●
Structured data: fits into a predefined data
model
●
Unstructured data: does not fit into a predefined
model

7
Big Data
●
Big Data generally refers to a set of data that
displays the characteristics of volume, velocity,
and variety (the 3 Vs) to an extent that makes
the data unsuitable for management by a
relational database management system.

8
Other Definitions
●
Variability: changes in meaning of data based on context
●
Sentimental analysis: attempts to determine if a statement
conveys a positive, negative, or neutral attitude about a topic
●
Veracity: trustworthiness of data
●
Value: degree data can be analyzed for meaningful insight
●
Visualization: ability to graphically resent data to make it
understandable

9
Big Data: What to Do?
●
Use Hadoop
●
De facto standard for most Big Data storage
and processing
●
Java-based framework for distributing and
processing very large data sets across clusters
of computers

10
Hadoop Components
●
Hadoop Distributed File System (HDFS): low-
level distributed file processing system that can
be used directly for data storage
●
MapReduce: programming model that supports
processing large data sets

11
HDFS Characteristics
●
High volume: default block sizes is 64 MB and can be configured to
even larger values
●
Write-once, read-many: model simplifies concurrency issues and
improves data throughput
●
Streaming access: optimized for batch processing of entire files as a
continuous stream of data
●
Fault tolerance: designed to replicate data across many different
devices so that when one fails, data is still available from another
device

12
HDFS

13
HDFS
●
Client Node: writes or accesses data
●
Name Node: holds meta data
– Which blocks are associated with which files
– Where the blocks are stored
●
Data Node: hold data
– Data is replicated over multiple data nodes

14
Adding a New File
●
Client node tells name node it wants to add a file
●
The name node...
– Adds the new file name to the metadata
– Determines a new block numbers for the file
– Determines a list of which data nodes the blocks will be stored
– Passes that information back to the client node
●
Client node sends blocks to data nodes
●
Data nodes write the data

15
Reading Data
●
Client node tells name node it wants to read a file
●
Name node returns blocks and data nodes where
the file is stored
●
Client node contacts closest data nodes on the
network for the data
●
Data nodes send data to client node

16
Map Reduce
●
Framework used to process large data sets across clusters
●
Breaks down complex tasks into smaller subtasks, performing the
subtasks, and producing a final result
●
Map function takes a collection of data and sorts and filters it into a
set of key-value pairs
– Mapper program performs the map function
●
Reduce summaries results of map function produce a single result
– Reducer program performs the reduce function

17
More Than Just Hadoop

18
More Than Just Hadoop
●
Hive
– Data warehousing system that sits on top of HDFS
and supports its own SQL-like language
●
Pig
– Tool that compiles a high-level scripting language,
named Pig Latin, into MapReduce jobs for executing
in Hadoop

19
More Than Just Hadoop
●
Flume
– Component for ingesting data in Hadoop
●
Sqoop
– Tool for converting data back and forth between a
relational database and the HDFS

20
More Than Just Hadoop
●
Hbase
– Column-oriented NoSQL database designed to sit
on top of the HDFS that quickly processes sparse
datasets
●
Impala
– The first SQL on Hadoop application

21
NoSQL
●
Unfortunate name
– ! No SQL
– “Not Only” SQL
●
A new generation of database management
systems that is not based on the traditional
relational database model

22
NoSQL Examples

23
Key Value Databases

24
Document Databases

25
MongoDB
●
Popular document database
– Among the NoSQL databases currently available, MongoDB has been
one of the most successful in penetrating the database market
●
MongoDB, comes from the word humongous as its developers
intended their new product to support extremely large data sets
– High availability
– High scalability
– High performance

26
MongoDB Uses JSON Documents

27
Mongo Commands
db.inventory.insertMany([
{ item: "journal", qty: 25, size: { h: 14, w: 21, uom: "cm" }, status: "A" },
{ item: "notebook", qty: 50, size: { h: 8.5, w: 11, uom: "in" }, status: "A" },
{ item: "paper", qty: 100, size: { h: 8.5, w: 11, uom: "in" }, status: "D" },
{ item: "planner", qty: 75, size: { h: 22.85, w: 30, uom: "cm" }, status: "D" },
{ item: "postcard", qty: 45, size: { h: 10, w: 15.25, uom: "cm" }, status: "A" }
]);
db.inventory.find( {} ) SELECT * FROM inventory

db.inventory.find( { status: "D" } ) SELECT * FROM inventory WHERE status = "D"

28
Column/Row Oriented Databases

29
Graph Databases

30
Neo4j
●
Even though Neo4j is not yet as widely adopted as MongoDB, it has been one
of the fastest growing NoSQL databases
●
Graph databases still work with concepts similar to entities and relationships
– Focus is on the relationships
●
Graph databases are used in environments with complex relationships among
entities
– Heavily reliant on interdependence among their data
●
Neo4j provides several interface options
– Designed with Java programming in mind

31
Neo4j Commands
CREATE (rob:Person{name:'Roberto'}), (isidro:Person{name:'Isidro'}),
(tony:Person{name:'Antonio'}), (nora:Person{name:'Nora'}),
(lily:Person{name:'Lilian'}), (freddy:Person{name:'Alfredo'}),
(lucas:Person{name:'Lucas'}), (mau:Person{name:'Mauricio'}),
(alb:Person{name:'Albina'}), (reg:Person{name:'Regina'}),
(j:Person{name:'Joaquín'}), (julian:Person{name:'Julián'})

CREATE
(rob)-[:FriendsWith]->(isidro), (rob)-[:FriendsWith]->(tony), (rob)-[:FriendsWith]->(reg),
(rob)-[:FriendsWith]->(mau), (rob)-[:FriendsWith]->(julian),
(tony)-[:FriendsWith]->(reg), (tony)-[:FriendsWith]->(j),
(alb)-[:FriendsWith]->(reg), (lily)-[:FriendsWith]->(isidro), (lily)-[:FriendsWith]->(j),
(mau)-[:FriendsWith]->(lucas), (lucas)-[:FriendsWith]->(nora), (freddy)-[:FriendsWith]->(nora);

32
Neo4j Commands

MATCH friendships=()-[:FriendsWith]-()
RETURN friendships

MATCH friends=(a:Person{name:'Lucas'})-[:FriendsWith]-(friend)
RETURN friends

33
NewSQL
●
Database model that attempts to provide ACID-
compliant transactions across a highly distributed
infrastructure
– Latest technologies to appear in the data
management area to address Big Data problems
– No proven track record
– Have been adopted by relatively few organizations

34
NewSQL
●
NewSQL databases support:
– SQL as the primary interface
– ACID-compliant transactions
●
Similar to NoSQL, NewSQL databases also support:
– Highly distributed clusters
– Key-value or column-oriented data stores

AMOS - Usermanual.pms - AMOS PMS Vessel User Guide
100% (5)
AMOS - Usermanual.pms - AMOS PMS Vessel User Guide
76 pages
21st Century Boys v02, (2007) (Obxist)
No ratings yet
21st Century Boys v02, (2007) (Obxist)
205 pages
Influence of Matrix Type On WHIMS Performace in The Magnetic Processing of Iron Ores
No ratings yet
Influence of Matrix Type On WHIMS Performace in The Magnetic Processing of Iron Ores
2 pages
Relativism in Ethics - William Shaw
No ratings yet
Relativism in Ethics - William Shaw
4 pages
03 Unit Bda Hadoop, Map Reduce
No ratings yet
03 Unit Bda Hadoop, Map Reduce
80 pages
Big Data Complete Notes
No ratings yet
Big Data Complete Notes
9 pages
2 Big Data Analytics-Hadoop R21 A7902 ABP
No ratings yet
2 Big Data Analytics-Hadoop R21 A7902 ABP
16 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
31 pages
2 BDA A6515 Hadoop
No ratings yet
2 BDA A6515 Hadoop
55 pages
07 BigData DataAnalysis
No ratings yet
07 BigData DataAnalysis
66 pages
Big Data Pyq 21-22
No ratings yet
Big Data Pyq 21-22
9 pages
Unit 2
No ratings yet
Unit 2
41 pages
Bigdata Unit 4
No ratings yet
Bigdata Unit 4
97 pages
BIG - DATA - Unit 4
No ratings yet
BIG - DATA - Unit 4
99 pages
BDA (2) Merged
No ratings yet
BDA (2) Merged
29 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
26 pages
Unit 1 BDA
No ratings yet
Unit 1 BDA
43 pages
Lecture 8
No ratings yet
Lecture 8
34 pages
Big Data Unit-Ii Notes
No ratings yet
Big Data Unit-Ii Notes
7 pages
BDT Viva Questions
No ratings yet
BDT Viva Questions
2 pages
No SQL
No ratings yet
No SQL
38 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
153 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
Chap 4
No ratings yet
Chap 4
18 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
BDA Assignment1 BE6 20
No ratings yet
BDA Assignment1 BE6 20
10 pages
Nosqldbs
No ratings yet
Nosqldbs
149 pages
4.1 Intro Nosql
No ratings yet
4.1 Intro Nosql
43 pages
Introduction To Big Data and NoSQL
No ratings yet
Introduction To Big Data and NoSQL
52 pages
4.1 Intro Nosql-Converted-133751863122661863
No ratings yet
4.1 Intro Nosql-Converted-133751863122661863
43 pages
Big Data Slides
No ratings yet
Big Data Slides
26 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
NoSQL DBs
No ratings yet
NoSQL DBs
46 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Bda CHP 3
No ratings yet
Bda CHP 3
75 pages
Hadoop
No ratings yet
Hadoop
61 pages
Unit 6
No ratings yet
Unit 6
143 pages
Big Data Module 1,2,3
No ratings yet
Big Data Module 1,2,3
59 pages
Overview of NoSQL
No ratings yet
Overview of NoSQL
17 pages
1.5 Module-1
No ratings yet
1.5 Module-1
21 pages
05 NoSQL
No ratings yet
05 NoSQL
21 pages
R23 IDS Unit3
No ratings yet
R23 IDS Unit3
36 pages
Biggdata
No ratings yet
Biggdata
24 pages
Chapter-1-Introduction To Big Data
No ratings yet
Chapter-1-Introduction To Big Data
25 pages
06 NoSQL
No ratings yet
06 NoSQL
80 pages
Bcis5420 - Lecture Note - ch6 - Big Data Technologies
No ratings yet
Bcis5420 - Lecture Note - ch6 - Big Data Technologies
24 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
L8 Big Data Management en
No ratings yet
L8 Big Data Management en
58 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
Lecture8 - Big Data (Hadoop)
No ratings yet
Lecture8 - Big Data (Hadoop)
29 pages
BD by Maaz
No ratings yet
BD by Maaz
19 pages
Big Data Deals With Large Data Sets
No ratings yet
Big Data Deals With Large Data Sets
4 pages
The Big Data Technology Landscape
No ratings yet
The Big Data Technology Landscape
36 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
The DynamoDB Handbook: Practical Solutions for Modern NoSQL Database Management
From Everand
The DynamoDB Handbook: Practical Solutions for Modern NoSQL Database Management
Robert Johnson
No ratings yet
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
DBA's Guide to NoSQL
From Everand
DBA's Guide to NoSQL
The Enlightened DBA
5/5 (1)
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
From Everand
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
Adam Jones
No ratings yet
Mastering DuckDB: High-Performance Analytics Made Easy
From Everand
Mastering DuckDB: High-Performance Analytics Made Easy
Robert Johnson
No ratings yet
Sqoop Essentials: Definitive Reference for Developers and Engineers
From Everand
Sqoop Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
Chapter 08 2
No ratings yet
Chapter 08 2
20 pages
Chapter 04
No ratings yet
Chapter 04
29 pages
Chapter 3 J v8.0 V04
No ratings yet
Chapter 3 J v8.0 V04
150 pages
Chapter 06
No ratings yet
Chapter 06
46 pages
Chapter 02
No ratings yet
Chapter 02
45 pages
Lhu Comp 200: Chapter 2 (2 C) Application Layer
No ratings yet
Lhu Comp 200: Chapter 2 (2 C) Application Layer
37 pages
Review - Normal Forms2
No ratings yet
Review - Normal Forms2
17 pages
CAP Theorem
No ratings yet
CAP Theorem
15 pages
Intro-Databases For Big Data
No ratings yet
Intro-Databases For Big Data
10 pages
SQL Views & Procedures
No ratings yet
SQL Views & Procedures
23 pages
Query Optimization
No ratings yet
Query Optimization
10 pages
SQL Triggers & Functions
No ratings yet
SQL Triggers & Functions
16 pages
Columnar Database
No ratings yet
Columnar Database
18 pages
SQL Queries5
No ratings yet
SQL Queries5
20 pages
BLAME! Master Edition v01 (2016) (Digital) (Danke-Empire)
100% (1)
BLAME! Master Edition v01 (2016) (Digital) (Danke-Empire)
396 pages
Quality Indicators For The Care of Older Adults W Disabilities in Longterm Care Wbased On Maslow Hierarchy of Needs
No ratings yet
Quality Indicators For The Care of Older Adults W Disabilities in Longterm Care Wbased On Maslow Hierarchy of Needs
7 pages
SQL Functions
No ratings yet
SQL Functions
18 pages
Works of Arthur Schopenhauer - Arthur Schopenhauer
100% (1)
Works of Arthur Schopenhauer - Arthur Schopenhauer
2,370 pages
Review of DB Concepts
No ratings yet
Review of DB Concepts
27 pages
BLAME! Master Edition v03 (2017) (Digital) (Danke-Empire)
100% (1)
BLAME! Master Edition v03 (2017) (Digital) (Danke-Empire)
341 pages
Eliot PsychoanalyticInterpretationGroup 1920
No ratings yet
Eliot PsychoanalyticInterpretationGroup 1920
21 pages
Chapter 6 Management A Practical Introduction
No ratings yet
Chapter 6 Management A Practical Introduction
6 pages
A Suggested Modification To Maslow's Need Hierarchy
No ratings yet
A Suggested Modification To Maslow's Need Hierarchy
6 pages
Examining Maslow's Hierarchy Need Theory in The Social Media Adoption
No ratings yet
Examining Maslow's Hierarchy Need Theory in The Social Media Adoption
11 pages
Deutsch GroupFormation 1973
No ratings yet
Deutsch GroupFormation 1973
20 pages
86EIGHTY-SIX Vol 10 Light Novel Fragmental Neoteny - Asato Asato
No ratings yet
86EIGHTY-SIX Vol 10 Light Novel Fragmental Neoteny - Asato Asato
289 pages
BLAME! Master Edition v02 (2016) (Digital) (Danke-Empire)
No ratings yet
BLAME! Master Edition v02 (2016) (Digital) (Danke-Empire)
364 pages
Android Debug Bridge (Adb)
No ratings yet
Android Debug Bridge (Adb)
24 pages
Vetius Valens and The Planetary Week
100% (1)
Vetius Valens and The Planetary Week
28 pages
Developing Key Performance Indicators in Tableau
No ratings yet
Developing Key Performance Indicators in Tableau
10 pages
Hospital System (ERD)
No ratings yet
Hospital System (ERD)
4 pages
Magnesium
No ratings yet
Magnesium
9 pages
Review On Mechanistic-Empirical Pavement Design Method: Raman Kumar Dr. Pardeep Kumar Gupta
No ratings yet
Review On Mechanistic-Empirical Pavement Design Method: Raman Kumar Dr. Pardeep Kumar Gupta
6 pages
Cover
No ratings yet
Cover
65 pages
Computational Fluid Dynamics (CFD) Modeling of Run-of-River Intakes
No ratings yet
Computational Fluid Dynamics (CFD) Modeling of Run-of-River Intakes
9 pages
Proverb Dan Riddle
No ratings yet
Proverb Dan Riddle
11 pages
Math g3 m1 Full Module
No ratings yet
Math g3 m1 Full Module
325 pages
Peat Phenol PPM
100% (1)
Peat Phenol PPM
4 pages
Datasheet Item 4 SQ 12403653
No ratings yet
Datasheet Item 4 SQ 12403653
7 pages
Higher Mathematics Mathematics: PAPER 5 Mechanics 2 (M2)
No ratings yet
Higher Mathematics Mathematics: PAPER 5 Mechanics 2 (M2)
4 pages
Finalpaper
No ratings yet
Finalpaper
9 pages
Is 13882
No ratings yet
Is 13882
44 pages
Circulating Currents Control For Parallel Grid-Connected Three-Phase Inverters
No ratings yet
Circulating Currents Control For Parallel Grid-Connected Three-Phase Inverters
5 pages
A Detailed Lesson Plan
0% (1)
A Detailed Lesson Plan
3 pages
fl23 Algebra1 Ipe 03 07
No ratings yet
fl23 Algebra1 Ipe 03 07
10 pages
MS - Energy Changes Exam Questions
No ratings yet
MS - Energy Changes Exam Questions
2 pages
Ariel KBK KBT Manual
100% (1)
Ariel KBK KBT Manual
193 pages
Advanced Data Modeling in Power BI
No ratings yet
Advanced Data Modeling in Power BI
31 pages
A Practical Approach For Modeling A Bevel Gear: Brendan Bijonowski
No ratings yet
A Practical Approach For Modeling A Bevel Gear: Brendan Bijonowski
8 pages
Present Value BMAT
No ratings yet
Present Value BMAT
11 pages
Gvim Commands Notes
No ratings yet
Gvim Commands Notes
4 pages
Fortigate Traffic Shaping 40 mr2 PDF
No ratings yet
Fortigate Traffic Shaping 40 mr2 PDF
56 pages
Activity 1 - DataStructures
No ratings yet
Activity 1 - DataStructures
13 pages
Centrífuga Alfa-Laval BREW2000
No ratings yet
Centrífuga Alfa-Laval BREW2000
2 pages
The Sun - A Short Summary
No ratings yet
The Sun - A Short Summary
2 pages

Chapter 14

Uploaded by

Chapter 14

Uploaded by

COMP255

db.inventory.find( { status: "D" } ) SELECT * FROM inventory WHERE status = "D"

You might also like