0% found this document useful (0 votes)

7 views30 pages

BigData NoSQL

Uploaded by

Huzaifa Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views30 pages

BigData NoSQL

Uploaded by

Huzaifa Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Big Data and NoSQL

Modified from slides by Perry

Hoekstra (Perficient, Inc) and
Database Systems concepts, 7th
Ed.
Motivation

 Very large volumes of data being collected

– Driven by growth of web, social media, and more
recently internet-of-things
– Web logs were an early source of data
• Analytics on web logs has great value for
advertisements, web site structuring, what posts to
show to a user, etc
 Big Data: differentiated from data handled by
earlier generation databases
– Volume: much larger amounts of data stored
– Velocity: much higher rates of insertions
– Variety: many types of data, beyond relational
data

2
Querying Big Data

 Transaction processing systems that need very

high scalability
– Many applications willing to sacrifice ACID
properties and other database features, if they can
get very high scalability
 Query processing systems that
– Need very high scalability, and
– Need to support non-relation data

3
History of the World

 Relational Databases – mainstay of business

 Web-based applications caused spikes
– Especially true for public-facing e-Commerce sites
 Developers begin to front RDBMS with memcache or
integrate other caching mechanisms within the
application (ie. Ehcache)

4
Scaling Up

 Issues with scaling up when the dataset is just too

big
 RDBMS were not designed to be distributed
 Began to look at multi-node database solutions
 Known as ‘scaling out’ or ‘horizontal scaling’
 Different approaches include:
Master-slave
– Sharding

5
Scaling RDBMS – Master/Slave

 Master-Slave
– All writes are written to the master. All reads
performed against the replicated slave databases
– Critical reads may be incorrect as writes may not have
been propagated down
– Large data sets can pose problems as master needs to
duplicate data to slaves

6
Scaling RDBMS - Sharding

 Partition or sharding
– Scales well for both reads and writes
– Not transparent, application needs to be partition-
aware
– Can no longer have relationships/joins across
partitions
– Loss of referential integrity across shards

7
Other ways to scale RDBMS

 Multi-Master replication
 INSERT only, not UPDATES/DELETES
 No JOINs, thereby reducing query time
– This involves de-normalizing data
 In-memory databases

8
What is NoSQL?

 Stands for Not Only SQL

 Class of non-relational data storage systems
 Usually do not require a fixed table schema nor do
they use the concept of joins
 All NoSQL offerings relax one or more of the ACID
properties (will talk about the CAP theorem)

9
Why NoSQL?

 For data storage, an RDBMS cannot be the

be-all/end-all
 Just as there are different programming languages,
need to have other data storage tools in the toolbox
 A NoSQL solution is more acceptable to a client now
than 5 years ago

10
How did we get here?

 Explosion of social media sites (Facebook,

Twitter) with large data needs
 Rise of cloud-based solutions such as Amazon
S3 (simple storage solution)
 Just as moving to dynamically-typed
languages (Ruby/Groovy), a shift to
dynamically-typed data with frequent schema
changes
 Open-source community

11
Dynamo and BigTable

 Three major papers were the seeds of the NoSQL

movement
– BigTable (Google)
– Dynamo (Amazon)
• Gossip protocol (discovery and error detection)
• Distributed key-value data store
• Eventual consistency
– CAP Theorem (discuss in a sec ..)

12
The Perfect Storm

 Large datasets, acceptance of alternatives, and

dynamically-typed data has come together in a
perfect storm
 Not a backlash/rebellion against RDBMS
 SQL is a rich query language that cannot be rivaled
by the current list of NoSQL offerings

13
CAP Theorem

 Three properties of a system: consistency,

availability and partitions
 You can have at most two of these three properties
for any shared-data system
 To scale out, you have to partition. That leaves
either consistency or availability to choose from
– In almost all cases, you would choose availability over
consistency

14
15
Availability

 Traditionally, thought of as the server/process

available five 9’s (99.999 %).
 However, for large node system, at almost any point
in time there’s a good chance that a node is either
down or there is a network disruption among the
nodes.
– Want a system that is resilient in the face of network
disruption

16
Consistency Model

 A consistency model determines rules for visibility

and apparent order of updates.
 For example:
– Row X is replicated on nodes M and N
– Client A writes row X to node N
– Some period of time t elapses.
– Client B reads row X from node M
– Does client B see the write from client A?
– Consistency is a continuum with tradeoffs
– For NoSQL, the answer would be: maybe
– CAP Theorem states: Strict Consistency can't be
achieved at the same time as availability and partition-
tolerance.

17
Eventual Consistency

 When no updates occur for a long period of time,

eventually all updates will propagate through the
system and all the nodes will be consistent
 For a given accepted update and a given node,
eventually either the update reaches the node or the
node is removed from service
 Known as BASE (Basically Available, Soft state,
Eventual consistency), as opposed to ACID

18
What kinds of NoSQL

 NoSQL solutions fall into two major areas:

– Key/Value or ‘the big hash table’.
• Amazon S3 (Dynamo)
• Voldemort
• Scalaris
– Schema-less which comes in multiple flavors,
column-based, document-based or graph-
based.
• Cassandra (column-based)
• CouchDB (document-based)
• Neo4J (graph-based)
• HBase (column-based)

19
Key/Value

Pros:
– very fast
– very scalable
– simple model
– able to distribute horizontally

Cons:
- many data structures (objects) can't be easily modeled
as key value pairs

20
Schema-Less

Pros:
- Schema-less data model is richer than key/value pairs
- eventual consistency
- many are distributed
- still provide excellent performance and scalability

Cons:
- typically no ACID transactions or joins

21
Common Advantages

 Cheap, easy to implement (open source)

 Data are replicated to multiple nodes (therefore identical
and fault-tolerant) and can be partitioned
– Down nodes easily replaced
– No single point of failure
 Easy to distribute
 Don't require a schema
 Can scale up and down
 Relax the data consistency requirement (CAP)

22
What am I giving up?

 joins
 group by
 order by
 ACID transactions
 SQL as a sometimes frustrating but still powerful
query language
 easy integration with other applications that support
SQL

23
Cassandra

 Originally developed at Facebook

 Follows the BigTable data model: column-oriented
 Uses the Dynamo Eventual Consistency model
 Written in Java
 Open-sourced and exists within the Apache family
 Uses Apache Thrift as it’s API

24
Cassandra and Consistency

 Talked previous about eventual consistency

 Cassandra has programmable read/writable
consistency
– One: Return from the first node that responds
– Quorom: Query from all nodes and respond with the
one that has latest timestamp once a majority of
nodes responded
– All: Query from all nodes and respond with the one
that has latest timestamp once all nodes responded.
An unresponsive node will fail the node

27
Cassandra and Consistency

– Zero: Ensure nothing. Asynchronous write done in

background
– Any: Ensure that the write is written to at least 1 node
– One: Ensure that the write is written to at least 1
node’s commit log and memory table before receipt to
client
– Quorom: Ensure that the write goes to node/2 + 1
– All: Ensure that writes go to all nodes. An
unresponsive node would fail the write

28
Some Statistics

 Facebook Search
 MySQL > 50 GB Data
– Writes Average : ~300 ms
– Reads Average : ~350 ms
 Rewritten with Cassandra > 50 GB Data
– Writes Average : 0.12 ms
– Reads Average : 15 ms

29
Don’t forget about the DBA

 It does not matter if the data is deployed on a

NoSQL platform instead of an RDBMS.
 Still need to address:
– Backups & recovery
– Capacity planning
– Performance monitoring
– Data integration
– Tuning & optimization
 What happens when things don’t work as
expected and nodes are out of sync or you
have a data corruption occurring at 2am?
 Who you gonna call?
– DBA and SysAdmin need to be on board

30
Where would I use it?

 Where would I use a NoSQL database?

 Do you have somewhere a large set of uncontrolled,
unstructured, data that you are trying to fit into a
RDBMS?
– Log Analysis
– Social Networking Feeds (many firms hooked in
through Facebook or Twitter)
– External feeds from partners (EAI)
– Data that is not easily analyzed in a RDBMS such as
time-based data
– Large data feeds that need to be massaged before
entry into an RDBMS

31
Summary

 Leading users of NoSQL datastores are social

networking sites such as Twitter, Facebook, LinkedIn,
and Digg.
 To implement a single feature in Cassandra, Digg
has a dataset that is 3 terabytes and 76 billion
columns.
 Not every problem is a nail and not every solution is
a hammer.

No SQL
No ratings yet
No SQL
49 pages
No SQL
No ratings yet
No SQL
109 pages
Nosql: Under The Guidence of P.Ramesh Babu
No ratings yet
Nosql: Under The Guidence of P.Ramesh Babu
15 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
43 pages
Module 1
No ratings yet
Module 1
69 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
13 pages
NoSQL for Tech Professionals
No ratings yet
NoSQL for Tech Professionals
29 pages
Unit 4 BDA
No ratings yet
Unit 4 BDA
22 pages
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
No ratings yet
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
102 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
NoSQL Databases
No ratings yet
NoSQL Databases
52 pages
Lec 24
No ratings yet
Lec 24
16 pages
BDS Session 10
No ratings yet
BDS Session 10
70 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
NoSQL
No ratings yet
NoSQL
18 pages
NoSQL Databases for CS Students
No ratings yet
NoSQL Databases for CS Students
56 pages
Nosql Database: New Era of Databases For Big Data Analytics - Classification, Characteristics and Comparison
No ratings yet
Nosql Database: New Era of Databases For Big Data Analytics - Classification, Characteristics and Comparison
17 pages
NoSQL vs. Cloud Data Storage Systems
No ratings yet
NoSQL vs. Cloud Data Storage Systems
17 pages
Seminar Topic Nosql
No ratings yet
Seminar Topic Nosql
73 pages
2.1.SummerSOC2015 Tutorial NoSQL
No ratings yet
2.1.SummerSOC2015 Tutorial NoSQL
62 pages
Chapter - 4 - NoSQL - 1676181987
No ratings yet
Chapter - 4 - NoSQL - 1676181987
85 pages
BDS Session 5 - NoSQL DB
No ratings yet
BDS Session 5 - NoSQL DB
51 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
NoSQL for Tech Professionals
No ratings yet
NoSQL for Tech Professionals
30 pages
Nosql Tricks
No ratings yet
Nosql Tricks
34 pages
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
No ratings yet
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
42 pages
Fdocuments - in Nosql-Seminar
No ratings yet
Fdocuments - in Nosql-Seminar
40 pages
4.NoSQL 1
No ratings yet
4.NoSQL 1
69 pages
Unit - I - Nosql
No ratings yet
Unit - I - Nosql
12 pages
NO SQL Unit 1
No ratings yet
NO SQL Unit 1
66 pages
DBMS Lecture13 NoSQL
No ratings yet
DBMS Lecture13 NoSQL
31 pages
NoSQL Database Technology - A Survey and Comparison of Systems
No ratings yet
NoSQL Database Technology - A Survey and Comparison of Systems
44 pages
Lecture 8 Chapter 5 Part 4 Big Data Storage Concepts
No ratings yet
Lecture 8 Chapter 5 Part 4 Big Data Storage Concepts
9 pages
NoSQL for Data Engineers
No ratings yet
NoSQL for Data Engineers
144 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Big Data Analytics Unit-2
No ratings yet
Big Data Analytics Unit-2
30 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
BDA CW Chapter 3
No ratings yet
BDA CW Chapter 3
9 pages
BDA (2) Merged
No ratings yet
BDA (2) Merged
29 pages
Intro No SQL
No ratings yet
Intro No SQL
44 pages
Visual Guide To NoSQL Systems - Nathan Hurst's Blog
No ratings yet
Visual Guide To NoSQL Systems - Nathan Hurst's Blog
10 pages
NoSQL for Tech Professionals
No ratings yet
NoSQL for Tech Professionals
40 pages
Slide 6 NoSQL Database and HBase Tutorial
No ratings yet
Slide 6 NoSQL Database and HBase Tutorial
110 pages
BIG - DATA - Unit 4
No ratings yet
BIG - DATA - Unit 4
99 pages
Unit VI - 1
No ratings yet
Unit VI - 1
31 pages
Intro To NoSQL DBs
No ratings yet
Intro To NoSQL DBs
44 pages
NoSQL Intro
No ratings yet
NoSQL Intro
26 pages
No SQL
No ratings yet
No SQL
32 pages
NoSQL Databases and Big Data Storage Systems
No ratings yet
NoSQL Databases and Big Data Storage Systems
4 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Overview of NoSQL
No ratings yet
Overview of NoSQL
17 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
No SQL
No ratings yet
No SQL
4 pages
Chapter24 Nosql Dbs
No ratings yet
Chapter24 Nosql Dbs
35 pages
4 - Key-Value Storage
No ratings yet
4 - Key-Value Storage
109 pages
Leases Part1
No ratings yet
Leases Part1
53 pages
Accounting for Tax Differences
No ratings yet
Accounting for Tax Differences
60 pages
Consolidation SFP
No ratings yet
Consolidation SFP
36 pages
Consolidation PartB
No ratings yet
Consolidation PartB
29 pages
Consolidation SFP
No ratings yet
Consolidation SFP
36 pages
Solution Set 1
No ratings yet
Solution Set 1
28 pages
Employee Management System
No ratings yet
Employee Management System
34 pages
3 BSC Computer Science
No ratings yet
3 BSC Computer Science
57 pages
Automating Large-Scale Data Quality Verification
No ratings yet
Automating Large-Scale Data Quality Verification
14 pages
Entity Framework-Full
No ratings yet
Entity Framework-Full
127 pages
UNIT-1 Introduction To Database Systems
No ratings yet
UNIT-1 Introduction To Database Systems
37 pages
IAL IT Scheme-of-Work U4 011019
No ratings yet
IAL IT Scheme-of-Work U4 011019
35 pages
Sports Academy Management Report
No ratings yet
Sports Academy Management Report
57 pages
DP-900 Course Content
No ratings yet
DP-900 Course Content
4 pages
SGGS Institute of Engg & Technology,: Vishnupuri, Nanded
No ratings yet
SGGS Institute of Engg & Technology,: Vishnupuri, Nanded
9 pages
BCS755A Syllabus
No ratings yet
BCS755A Syllabus
3 pages
Sports Equipment Inventory Management System
75% (12)
Sports Equipment Inventory Management System
47 pages
Lab Manual For Introduction To Database Systems: Lab-05 Data Definition Language (DDL)
No ratings yet
Lab Manual For Introduction To Database Systems: Lab-05 Data Definition Language (DDL)
23 pages
B.Tech CSE Course Outline
No ratings yet
B.Tech CSE Course Outline
71 pages
DBMS MCQ
No ratings yet
DBMS MCQ
4 pages
E-Commerce Portal System: Introduction and Objective of The Project: 1.project Description
No ratings yet
E-Commerce Portal System: Introduction and Objective of The Project: 1.project Description
32 pages
Cloud Application Development Questions
No ratings yet
Cloud Application Development Questions
15 pages
Tableau Developer Skills Set and Requirement As Developer
No ratings yet
Tableau Developer Skills Set and Requirement As Developer
1 page
Beginning C 5 0 Databases Second Edition Agarwal No Waiting Time
100% (4)
Beginning C 5 0 Databases Second Edition Agarwal No Waiting Time
138 pages
NoSQL Gnosis. - Resp
No ratings yet
NoSQL Gnosis. - Resp
22 pages
Unit-4 (OOAD)
92% (12)
Unit-4 (OOAD)
83 pages
Cognos Architecture
No ratings yet
Cognos Architecture
58 pages
Advanced DBMS Course Overview
No ratings yet
Advanced DBMS Course Overview
2 pages
Comparison of Relational Database With Document-Oriented Database (Mongodb) For Big Data Applications
No ratings yet
Comparison of Relational Database With Document-Oriented Database (Mongodb) For Big Data Applications
7 pages
Sqlbase: Database Administrator'S Guide
No ratings yet
Sqlbase: Database Administrator'S Guide
384 pages
Ict Practical
No ratings yet
Ict Practical
21 pages
Dbms Lab Programs
No ratings yet
Dbms Lab Programs
6 pages
DBMS Presentation
No ratings yet
DBMS Presentation
10 pages
Sy Bcs Rdbms
No ratings yet
Sy Bcs Rdbms
113 pages
Tutorial 6-11 Questions 202105
No ratings yet
Tutorial 6-11 Questions 202105
11 pages

BigData NoSQL

Uploaded by

BigData NoSQL

Uploaded by

Big Data and NoSQL

Modified from slides by Perry

 Very large volumes of data being collected

 Transaction processing systems that need very

 Relational Databases – mainstay of business

 Issues with scaling up when the dataset is just too

 Stands for Not Only SQL

 For data storage, an RDBMS cannot be the

 Explosion of social media sites (Facebook,

 Three major papers were the seeds of the NoSQL

 Large datasets, acceptance of alternatives, and

 Three properties of a system: consistency,

 Traditionally, thought of as the server/process

 A consistency model determines rules for visibility

 When no updates occur for a long period of time,

 NoSQL solutions fall into two major areas:

 Cheap, easy to implement (open source)

 Originally developed at Facebook

 Talked previous about eventual consistency

– Zero: Ensure nothing. Asynchronous write done in

 It does not matter if the data is deployed on a

 Where would I use a NoSQL database?

 Leading users of NoSQL datastores are social

You might also like