[go: up one dir, main page]

0% found this document useful (0 votes)
47 views79 pages

ECS781P-9-Cloud Data Management

The document discusses traditional relational database systems and their ACID properties, as well as the evolution of cloud applications and requirements for scalable cloud data management. It introduces concepts like horizontal scaling, data partitioning, and database types like NoSQL that sacrifice strict consistency for improved performance and scalability in the cloud. Challenges of partitioning like data and load skew, as well as rebalancing partitions as nodes are added or removed, are also covered.

Uploaded by

Yen-Kai Cheng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views79 pages

ECS781P-9-Cloud Data Management

The document discusses traditional relational database systems and their ACID properties, as well as the evolution of cloud applications and requirements for scalable cloud data management. It introduces concepts like horizontal scaling, data partitioning, and database types like NoSQL that sacrifice strict consistency for improved performance and scalability in the cloud. Challenges of partitioning like data and load skew, as well as rebalancing partitions as nodes are added or removed, are also covered.

Uploaded by

Yen-Kai Cheng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

ECS781P

CLOUD COMPUTING

Cloud Data Management


Lecturer: Dr. Sukhpal Singh Gill
School of Electronic Engineering and Computer Science

Ignacio Castro| Cloud Computing 1


Cloud Computing: roadmap for this module

▪ Network layer:
▪ Networking
▪ Application layer:
▪ Client/server, RPC, Web Services
▪ REST
▪ Performance:
▪ SLA
▪ Management
▪ Security
▪ Trends
▪ Monolithic applications → microservices
▪ Serverless: “hide complexity”
Contents

▪ Data at Cloud Scale


▪ Dealing with partitioning
▪ Dealing with replication
▪ Cloud Data Management Systems
Traditional Data Management Systems

▪ Table model for data, based on relational algebra


▪ Very powerful query language (SQL)
▪ Requires expertise, careful design of table structure, indices and
queries
▪ Data consistency and reliability are guaranteed by ACID
properties
▪ Atomicity, Consistency, Isolation, and Durability
▪ Single, powerful machines (vertical scalability)
SQL Tables

products
orders PID DESCRIPTION PRICE
1 Intel i7 £400
OID CID ODATE SDATE
2 Surface Pro £899
1 1 29/4/2018 NULL
3 32Gb Ram £200
2 2 20/1/2018 24/1/2008
4 Raspberry PI+ £30
5 Geforce 1080 £4000
clients
order_details
CID NAME COMPANY ADDRESS
1 Theresa GOVUK Downing st OID PID AMOUNT
2 Colin QMUL Mile End Rd 1 1 2
1 5 2
2 2 1
Sample SQL Queries

▪ INSERT INTO orders VALUES (1000, 1, ‘2018-04-29’,


‘2018-05-01’);
▪ UPDATE products SET PRICE=8000, DESCRIPTION=‘Geforce
1080’ WHERE PID = ‘1’;
▪ DELETE FROM orders WHERE ODATE < ‘2012-12-31’ and
SDATE IS NOT NULL;
▪ SELECT DESCRIPTION, PRICE FROM products;
▪ SELECT * FROM products;
▪ SELECT * FROM products WHERE PRICE < 300;
▪ SELECT * FROM products WHERE PID = ‘1’;
ACID Database Properties

▪ Atomicity
▪ All or nothing: transactions must act as a whole, or not at all
▪ No changes within a transaction can persist if any change within the transaction fails
▪ Consistency
▪ Must transform database from one valid state to another: changes made by a
transaction respect all database integrity constraints and the database remains in a
consistent state at the end of a transaction
▪ Isolation
▪ Transactions cannot see changes made by other concurrent transactions
▪ Concurrent transactions leave the database in the same state as if the transactions
were executed serially
▪ Durability
▪ Database data are persistent,
▪ Changes made by a transaction that completed successfully are stored permanently
How much data is generated in the cloud?
Evolution of Cloud applications
Concerns of cloud application developers

▪ Reliability: the system should continue to work correctly in the


face of adversity

▪ Scalability: as the system growths, there should be reasonable


ways of dealing with that growth

▪ Maintainability: over time, it should be productive to not only


maintain the current behaviour of the system, but also adapt it
to new use cases
Requirements for Cloud Data management

▪ Reliability: preventing data loss, making database services


highly available
▪ Preventing single points of failure
▪ Scalability: more data to store, more users requesting data
▪ Latency: Internet users from all locations
▪ Maintainability
▪ As services/applications change, so might the data model
NoSQL databases

▪ NoSQL (Not only SQL) systems


▪ Started in the 2000s, exploded in 2010s
▪ Motivation:
▪ scalability
▪ simpler data models
▪ programmer friendly
ACID vs BASE

▪NoSQL sacrifices ACID for


performance/scalability
▪ Updates eventually propagated, but limited
guarantees on the consistency of reads
▪“BASE” instead of “ACID”:
▪ BASE = Basically Available, Soft state,
Eventually consistent
▪ ACID = Atomicity, Consistency,
Isolation, Durability
NoSQL database types and examples

▪ Key/value Databases
▪ Simple value or row, indexed by a key
▪ e.g. Voldemort, Vertica, memcached
▪ Big table Databases
▪ “a sparse, distributed, persistent multidimensional sorted map”
▪ e.g. Google BigTable, Azure Table Storage, Amazon SimpleDB, Apache
Cassandra
▪ Document Databases
▪ Multi-field documents (or objects) with JSON access
▪ e.g. MongoDB, RavenDB (.NET specific), CouchDB
▪ Graph Databases
▪ Manage nodes, edges, and properties
▪ e.g. Neo4j, TitanDB
NoSQL databases landscape

Support for various interfaces for data


access

Logically model data using loosely typed


extensible data schema

Horizontal scaling through data


distribution model across multiple nodes

Data persists either in disk or memory or


both; sometimes in pluggable custom
stores
Taking advantage of cloud elasticity

▪ Vertical scaling
▪ Use more powerful VMs… up to a point
▪ Horizontal scaling
▪ Data is partitioned into multiple cloud instances
▪ Need mapping from data to partition
▪ Replication
▪ Achieve reliability
▪ Replica management (updates)
Contents

▪ Data at Cloud Scale


▪ Dealing with partitioning
▪ Dealing with replication
▪ Cloud Data Management Systems
How to scale: Partition

▪ Partition: division of a logical database or its constituent


elements into distinct independent parts
▪ Critical for scalability
▪ Two main types of partitioning:
▪ Horizontal
▪ vertical
Partition

Division of a logical database or its elements into distinct


independent parts
▪ Critical for scalability
▪ Two main types of partitioning:
▪ Horizontal
▪ vertical
Types of data partitioning

▪ Horizontal partitioning: different rows into different tables


▪ Vertical partitioning: different columns into different tables

Horizontal partitioning Vertical partitioning

https://docs.microsoft.com/en-us/azure/architecture/best-practices/data-partitioning
Horizontal partitioning

▪ Horizontally scaling data storage


▪ Known as sharding in the database world
▪ Data partitions are served from different machines
▪ Improves scalability → queries can be parallelised
▪ Clients need to know how to locate the data
Finding the data: request routing

▪ Clients need to know where the data is located


▪ Multiple approaches

[Klepperman, Designing Data Intensive Applications, pg 215]


Common partitioning strategies

▪ Range partitioning: split the space of keys into ranges,


and allocate ranges to partitions

https://docs.oracle.com/cd/B10500_01/server.920/a96524/c12parti.htm
Common partitioning strategies

▪ Range partitioning: split the space of keys into ranges,


and allocate ranges to partitions

Ben
A-H
Dami
Zico
Ignacio I-P

Felix
Joseph
Q-Z

https://docs.oracle.com/cd/B10500_01/server.920/a96524/c12parti.htm
Common partitioning strategies

▪ Range partitioning: split the space of keys into ranges,


and allocate ranges to partitions
▪ Hash partitioning: hash each key (essentially obtaining a
random number), and allocate these numbers to
partitions (using the modulo operation)

https://docs.oracle.com/cd/B10500_01/server.920/a96524/c12parti.htm
Common partitioning strategies

▪ Hash partitioning: hash each key (essentially obtaining a


random number), and allocate these numbers to
partitions using the modulo operation
Hash(x) Mod 3
1
Ben 18 0
Dami 06 0
Zico 22 1 2
Ignacio 20 2
Felix 95 2
3
Joseph 19 1

https://docs.oracle.com/cd/B10500_01/server.920/a96524/c12parti.htm
Partition challenges

▪ Scalability:
▪ dealing with data growth

▪ Partition rebalancing:
▪ mapping data to nodes

▪ Querying across partitions:


▪ finding the right data
Scalability challenges: skew

▪ Data skew: some partitions might be holding more data than


others
▪ Some partition schemes don’t guarantee equal number of elements
▪ Some keys might hold more values than others

▪ Load skew: Partitions might not receive the same amount of


requests
Scalability challenges: rebalancing partitions

▪ Changing the number of nodes might require remapping


between nodes and partitions
▪ Worst case scenario: hash mod N
▪ Adding a new node to a hash partitioned element would change the
location of every item
▪ Good strategy for most schemes:
▪ Fixed number of virtual partitions
▪ Nodes have tokens, these tokens can be handed over gracefully as
elements are added to the system
Challenges in Rebalancing Partitions

Adding a new node to a hash partitioned element would change


the location of every item!

Hash(x) Mod 3 Mod 4 1


Ben 18 0 2
Dami 06 0 2
2
Zico 22 1 2
Ignacio 20 2 0
Felix 95 2 1 3
Joseph 19 1 3
4
https://docs.oracle.com/cd/B10500_01/server.920/a96524/c12parti.htm
Common partitioning strategies

▪ Hash partitioning + ranges

Hash(x) 0-10
Ben 18
Dami 06
11-20
Zico 22
Ignacio 20
Felix 95 21-30
Joseph 19
76-100
Solution: Consistent Hashing

▪ Special type of partitioning strategy that reduces the


impact of nodes entering and leaving the system
▪ Each database object is mapped to a point on a
circumference (‘hash ring’) by hashing its key value (as
in hash partitioning)
▪ Each available machine (node) is mapped to a point on
the edge of the same circle
▪ To find the node to store an object:
▪ Hashes the object’s key to a point on the edge of the circle,
▪ Walks clockwise around the circle until it encounters the
first node
Consistent Hashing

A Objects o1 and o4 are


o1 o2 stored on the node A

B Object o2 is stored on
the node B
o4
Object o3 is stored on
o3 the node C
C
Consistent Hashing: node changes

▪ A node leaves the network: the node in the clockwise direction


stores all new objects that would belong to the failed node
▪ The data objects of the failed node have to be redistributed to
remaining nodes
▪ Nodes added to the network:
▪ Mapped to a point in the hash ring
▪ All objects that map between the point of the new node and the first
counter clock wise neighbour, map to the new node
Consistent Hashing: node changes

A
o1 o2

B
o4

o3

C
Consistent Hashing: node changes

The node C has left and the node


D has entered the network
A
o1 o2

D B
o4

o3

C
Consistent Hashing: node changes

The node C has left and the node


D has entered the network
A
o1 o2
Object o1 is stored on
the node A
D B
o4
Object o2 is stored on
the node B
o3
Objects o3 and o4 are
stored on the node D
Query challenges: Range queries across partitions

Select count (*)


62 from Emp
Partition-1
Partition 1
Query where age > 50
Server-1 AND
sal > 90,000’;
Partition-2

440 Query
. Server-2

. . Query
Coordinator
. .
Emp Table
.
Ans = 62 + 440 + ... + 1,123 = 99,000
1,123 Query
Partition-k Server-k
Secondary indexes help find matching rows
Contents

▪ Data at Cloud Scale


▪ Dealing with partitioning
▪ Dealing with replication
▪ Cloud Data Management Systems
Data replication

▪ Horizontally scaling data storage


▪ Multiple nodes are responsible to store the same
data
▪ All replicas must be synchronised
▪ Multiple variant techniques that use replication:
▪ Memory caches
▪ Leader-based replication
▪ Clustering
Memory Caches

▪ Transient, partitioned and replicated in-memory databases


▪ Replicate most frequently requested data
▪ Benefits:
▪ Fast response to clients
▪ Off-loading database servers
Memcached

▪ High performance distributed in-memory caching service that


manages key-value pairs
▪ Very similar to a Hashtable, dictionary
▪ Key-value API: get and set methods
▪ Goal: attend majority of requests for read-heavy workloads
▪ Popular: used in Facebook, LinkedIn, Flickr…
Memcached use with standard DB

Web App
Memcache get(key) Memcached DB
d Client Node

get_from_db(key)

foo = memcached.get(fooId)
if foo is None:
foo = get_from_db(fooId)
memcached.set(fooId, foo)
Scaling Horizontally Memcached

Example: Memcached
Node #1
Web App hash(key) = 2
Memcached
Memcache get(key) DB
d Client ..
Node #2
.
Memcached
Node #N
get_from_db(key)

▪ Multiple Memcached nodes, oblivious that they are part of a


cluster
▪ Client hashes keys, partitioning them between Memcached
nodes
Leader based replication

▪ One server is dedicated to writes (leader)


▪ A number of replica servers are dedicated to satisfy
reads (followers)
▪ The leader propagates data updates to followers
▪ If the leader crashes before completing replication to at least one
follower, the write operation is lost
▪ Otherwise the most up to date follower undertakes the role of the leader
Clustering

▪ Cluster: Group of redundant computers that provide a


continued service and can sustain high user workloads
▪ High-availability clusters provide continued service
in the event of failures to any element of the
infrastructure.
▪ Redundancy eliminates Single Points Of Failures
▪ When a HA cluster detects a hardware/software fault, it
immediately restarts the application on another system
without requiring administrative intervention, a process
known as failover
Synchronous vs Asynchronous replication

▪ Synchronous: the Leader waits for confirmation that the write was
received, then report success to the user and makes the write visible

▪ Asynchronous: the leader sends the message but does not wait for
the follower’s response

[Klepperman, Designing Data Intensive Applications, pg 154]


The problem with synchronous updates

▪ Safe, but negatively impact latency


▪ Some update strategies help (e.g. chain replication)
▪ High availability of a service is often characterized by
small latency
▪ Amazon: 0.1 secs more response time will cost them 1% in
sales
▪ Akamai: 0.1 secs more response time causes 7% drop in
traffic,
▪ Google: 0.5 secs in latency caused traffic to drop by 20%
▪ Asynchronous updates improve latency, at the cost of potential
side effects
Risk 1: Non-monotonic reads

▪ User can see things “moving backward in time”


▪ Can happen if a user makes several reads from different replicas

Follower 1 has
little lag and
reflects changes

Follower 2 has
larger lag and it is
not up to date

User makes the


same query twice

[Klepperman, Designing Data Intensive Applications]


Risk 2: Violation of causality

▪ If some partitions are replicated slower than others, an observer


may see the answer before they see the question

Information does
not propagate
homogeneously
across partitions!

Observer sees the


answer before
the question
Goal: linearisability

Intuition: the system should appear as if there was only


one copy of the data
▪ Usual expectation from developers
▪ Equivalent to atomic consistency in ACID databases
▪ As soon as a write operation is complete, all reads will
view the updated value
▪ If one client reads one updated value, all future reads
will also see that updated value (or any further
update)
▪ Requires constraints in replica update rules
Impact of network failures (partitions)

▪ A network interruption forces a choice between linearizability


and availability
Brewer’s CAP Theorem

▪ Brewer’s Conjecture (2000)


▪ Symposium on Principles of Distributed Computing
▪ Formally proven in 2002 (Gilbert and Lynch, MIT)
▪ A distributed system cannot guarantee at the
same time all three of:
▪ Consistency (Linearisability)
▪ Availability
▪ Network Partitioning tolerance
Brewer’s CAP Theorem

NA

João Ricardo Lourenço et al. "Choosing the right NoSQL database for the job: a quality attribute evaluation." Journal of Big Data 2.1 (2015): 18.
CAP Theorem properties

▪ Network Partition tolerance - the service operates normally in the


presence of network partitions
A network partition occurs if two or more “islands” of network nodes
cannot connect to each other, messages are highly delayed or lost.
▪ Strong Consistency – Linearisability. Reads either obtain the latest
updated value, or an error
▪ Availability - The system is available to server requests

Network partitioning is a network failure that causes the members to split into multiple groups such that a member in a group
cannot communicate with members in other groups. In a partition scenario, all sides of the original cluster operate
independently assuming members in other sides are failed.
Consistent systems

If application requires consistency (linearizability) and


some replicas are disconnected (e.g. network problem)

some replicas cannot process requests while


disconnected ➔ unavailability

▪ wait until the network problem is fixed, or


▪ return an error
Eventually Consistent systems

▪ Asynchronous updates:
“replicas update in the background”
▪ The system converges eventually
▪ Not a single model:
▪ active research area,
▪ many consistency models: between linearizability and pure asynchronous
updates
Diversity of database features

NA
▪ No choice: network partitions will happen
(whether you like it or not)
▪ Thus, a better way of phrasing CAP would be:
either Consistent or Available when Partitioned
Critique of CAP

▪ No choice: network partitions will happen


(whether you like it or not)

▪ A better way of phrasing CAP would be:


either Consistent or Available when Partitioned
Contents

▪ Data at Cloud Scale


▪ Dealing with partitioning
▪ Dealing with replication
▪ Cloud Data Management Systems
Cassandra

▪ A free and open-source, distributed,


wide column store, NoSQL database
management system designed to
handle large amounts of data across
many commodity servers
▪ Initially developed at Facebook to
power the Facebook inbox search
feature
▪ Facebook released Cassandra as an
open-source project on Google code in
July 2008.
Apache Cassandra

▪ Open Source Distributed Database


▪ Based on two influential NoSQL systems:
▪ Amazon Dynamo and Google BigTable
▪ Released in 2008, used by Facebook, eBay, etc.
▪ Great example of partition AND replication features
Cassandra features

1. Elastic: read and write throughput increases linearly


as new machines are added
▪ High performance for write operations
2. Supports content replication, per node and
datacenter
3. Decentralised: fault tolerant with no single point of
failure; no “master” node
4. Data model: Column based, multi dimensional
hashtable, no join capabilities
5. Tunable consistency, per operation
Cassandra partitioning

▪ Consistent hashing: multiple Cassandra nodes form a ring


▪ Rings can span multiple datacentres (Cassandra is aware of topology)
▪ Virtual nodes (vnodes): every physical node is split into many
virtual nodes (256 default) for easier partition rebalancing

[Hewitt and Carpenter, Cassandra the Definitive Guide, 2nd Ed]


Cassandra Data Replication

▪ Keys have associated a replication factor (number of replicas


over the network)
▪ The nodes in charge of storing replicas are the ones
immediately to the right (i.e., clockwise) of the node that has to
store the key copy based on the consistent hash function
Cassandra: Access to Data Replicas

#1

#6 #2

HASH
VALUE

ReplicationFactor = 1
Coordinator #5 #3

Client
#4
Cassandra: Access to Data Replicas

#1

#6 #2

HASH
VALUE

ReplicationFactor = 3
Coordinator #5 #3

Client
#4
Remember: update propagation across replicas

reply reply reply

Which option is best?


Tunable Consistency: Write / Read Operations

▪ How many replicas need to reply for the operation to


become a success?
Level Description
ANY One node Eventually consistent system
QUORUM N/2 + 1 replicas
LOCAL_QUORUM N/2 + 1 replicas in local data centre
EACH_QUORUM N/2 + 1 replicas in each data centre
ALL All replicas
Consistent system:
no response if one replica is
not accessible!
https://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
Tunable consistency : Write / Read Operations

▪ Level of consistency depends on:


▪ N = replication factor
▪ W = number of write nodes
▪ R = number of read nodes
▪ Consistency is an spectrum!
▪ “Golden rule” (i.e., rule of thumb)
▪ R + W <= N : eventual consistency
▪ R + W > N “Strong consistency
Still not a strong as
linearizability!
Data model of Cassandra

▪ Based on Bigtable (Google):


▪ 1 single table
▪ Chang, Fay, et al. "Bigtable: A distributed storage system for
structured data." ACM Transactions on Computer Systems
(TOCS) 26.2 (2008): 1-26.
▪ Similarities with SQL databases (tables, columns…)
▪ Less powerful
▪ Similar query language
eBay Cassandra Deployment

https://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376
eBay Cassandra Deployment
High availability / minimise latency

2 copies in
each DC

Only 1 node
needs to reply
https://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376
Recommended bibliography

▪ M. Kleppmann, Designing Data-Intensive Applications.


▪ Part II: Distributed Data
▪ Chapters 5, 6, 9
▪ De Candia et al, “Dynamo: Amazon's highly available
key-value store”, SOSP 2007
▪ W. Vogels, “Eventually Consistent”, CACM 2009
79
Cloud Computing: roadmap for this module

▪ Network layer:
▪ Networking
▪ Application layer:
▪ Client/server, RPC, Web Services
▪ REST
▪ Performance:
▪ SLA
▪ Management
▪ Security
▪ Trends
▪ Monolithic applications → microservices
▪ Serverless: “hide complexity”
Ignacio Castro| Cloud Computing 81

You might also like