ECS781P-9-Cloud Data Management
ECS781P-9-Cloud Data Management
CLOUD COMPUTING
▪ Network layer:
▪ Networking
▪ Application layer:
▪ Client/server, RPC, Web Services
▪ REST
▪ Performance:
▪ SLA
▪ Management
▪ Security
▪ Trends
▪ Monolithic applications → microservices
▪ Serverless: “hide complexity”
Contents
products
orders PID DESCRIPTION PRICE
1 Intel i7 £400
OID CID ODATE SDATE
2 Surface Pro £899
1 1 29/4/2018 NULL
3 32Gb Ram £200
2 2 20/1/2018 24/1/2008
4 Raspberry PI+ £30
5 Geforce 1080 £4000
clients
order_details
CID NAME COMPANY ADDRESS
1 Theresa GOVUK Downing st OID PID AMOUNT
2 Colin QMUL Mile End Rd 1 1 2
1 5 2
2 2 1
Sample SQL Queries
▪ Atomicity
▪ All or nothing: transactions must act as a whole, or not at all
▪ No changes within a transaction can persist if any change within the transaction fails
▪ Consistency
▪ Must transform database from one valid state to another: changes made by a
transaction respect all database integrity constraints and the database remains in a
consistent state at the end of a transaction
▪ Isolation
▪ Transactions cannot see changes made by other concurrent transactions
▪ Concurrent transactions leave the database in the same state as if the transactions
were executed serially
▪ Durability
▪ Database data are persistent,
▪ Changes made by a transaction that completed successfully are stored permanently
How much data is generated in the cloud?
Evolution of Cloud applications
Concerns of cloud application developers
▪ Key/value Databases
▪ Simple value or row, indexed by a key
▪ e.g. Voldemort, Vertica, memcached
▪ Big table Databases
▪ “a sparse, distributed, persistent multidimensional sorted map”
▪ e.g. Google BigTable, Azure Table Storage, Amazon SimpleDB, Apache
Cassandra
▪ Document Databases
▪ Multi-field documents (or objects) with JSON access
▪ e.g. MongoDB, RavenDB (.NET specific), CouchDB
▪ Graph Databases
▪ Manage nodes, edges, and properties
▪ e.g. Neo4j, TitanDB
NoSQL databases landscape
▪ Vertical scaling
▪ Use more powerful VMs… up to a point
▪ Horizontal scaling
▪ Data is partitioned into multiple cloud instances
▪ Need mapping from data to partition
▪ Replication
▪ Achieve reliability
▪ Replica management (updates)
Contents
https://docs.microsoft.com/en-us/azure/architecture/best-practices/data-partitioning
Horizontal partitioning
https://docs.oracle.com/cd/B10500_01/server.920/a96524/c12parti.htm
Common partitioning strategies
Ben
A-H
Dami
Zico
Ignacio I-P
Felix
Joseph
Q-Z
https://docs.oracle.com/cd/B10500_01/server.920/a96524/c12parti.htm
Common partitioning strategies
https://docs.oracle.com/cd/B10500_01/server.920/a96524/c12parti.htm
Common partitioning strategies
https://docs.oracle.com/cd/B10500_01/server.920/a96524/c12parti.htm
Partition challenges
▪ Scalability:
▪ dealing with data growth
▪ Partition rebalancing:
▪ mapping data to nodes
Hash(x) 0-10
Ben 18
Dami 06
11-20
Zico 22
Ignacio 20
Felix 95 21-30
Joseph 19
76-100
Solution: Consistent Hashing
B Object o2 is stored on
the node B
o4
Object o3 is stored on
o3 the node C
C
Consistent Hashing: node changes
A
o1 o2
B
o4
o3
C
Consistent Hashing: node changes
D B
o4
o3
C
Consistent Hashing: node changes
440 Query
. Server-2
. . Query
Coordinator
. .
Emp Table
.
Ans = 62 + 440 + ... + 1,123 = 99,000
1,123 Query
Partition-k Server-k
Secondary indexes help find matching rows
Contents
Web App
Memcache get(key) Memcached DB
d Client Node
get_from_db(key)
foo = memcached.get(fooId)
if foo is None:
foo = get_from_db(fooId)
memcached.set(fooId, foo)
Scaling Horizontally Memcached
Example: Memcached
Node #1
Web App hash(key) = 2
Memcached
Memcache get(key) DB
d Client ..
Node #2
.
Memcached
Node #N
get_from_db(key)
▪ Synchronous: the Leader waits for confirmation that the write was
received, then report success to the user and makes the write visible
▪ Asynchronous: the leader sends the message but does not wait for
the follower’s response
Follower 1 has
little lag and
reflects changes
Follower 2 has
larger lag and it is
not up to date
Information does
not propagate
homogeneously
across partitions!
NA
João Ricardo Lourenço et al. "Choosing the right NoSQL database for the job: a quality attribute evaluation." Journal of Big Data 2.1 (2015): 18.
CAP Theorem properties
Network partitioning is a network failure that causes the members to split into multiple groups such that a member in a group
cannot communicate with members in other groups. In a partition scenario, all sides of the original cluster operate
independently assuming members in other sides are failed.
Consistent systems
▪ Asynchronous updates:
“replicas update in the background”
▪ The system converges eventually
▪ Not a single model:
▪ active research area,
▪ many consistency models: between linearizability and pure asynchronous
updates
Diversity of database features
NA
▪ No choice: network partitions will happen
(whether you like it or not)
▪ Thus, a better way of phrasing CAP would be:
either Consistent or Available when Partitioned
Critique of CAP
#1
#6 #2
HASH
VALUE
ReplicationFactor = 1
Coordinator #5 #3
Client
#4
Cassandra: Access to Data Replicas
#1
#6 #2
HASH
VALUE
ReplicationFactor = 3
Coordinator #5 #3
Client
#4
Remember: update propagation across replicas
https://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376
eBay Cassandra Deployment
High availability / minimise latency
2 copies in
each DC
Only 1 node
needs to reply
https://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376
Recommended bibliography
▪ Network layer:
▪ Networking
▪ Application layer:
▪ Client/server, RPC, Web Services
▪ REST
▪ Performance:
▪ SLA
▪ Management
▪ Security
▪ Trends
▪ Monolithic applications → microservices
▪ Serverless: “hide complexity”
Ignacio Castro| Cloud Computing 81