Felix Gessert
Wolfram Wingerath
Norbert Ritter
Fast and
Scalable
Cloud Data
Management
Fast and Scalable Cloud Data Management
Felix Gessert • Wolfram Wingerath • Norbert Ritter
Fast and Scalable Cloud Data
Management
Felix Gessert
Baqend GmbH
Hamburg, Germany
Wolfram Wingerath
Baqend GmbH
Hamburg, Germany
Norbert Ritter
Department of Informatics
University of Hamburg
Hamburg, Germany
ISBN 978-3-030-43505-9
ISBN 978-3-030-43506-6 (eBook)
https://doi.org/10.1007/978-3-030-43506-6
© Springer Nature Switzerland AG 2020
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Our research for this book goes back a long way. It all started as early as
2010 with a bachelor’s thesis on how to make use of the purely expiration-based
caching mechanisms of the web in a database application with rigorous consistency
requirements. Strong encouragement by fellow researchers and growing developer
interest eventually made us realize that the task of providing low latency for users
in globally distributed applications does not only pose an interesting research
challenge but also an actual real-world problem that was still mostly unsolved.
This revelation eventually led to the creation of Baqend, a Backend-as-a-Service
platform designed for developing fast web applications. We built Baqend on
knowledge gathered in many bachelor’s and master’s and a number of PhD theses.
Technically, Baqend is rooted in our research systems, Orestes for web caching with
tunable consistency, its extension Quaestor for query result caching, and InvaliDB
for scalable push-based real-time queries for end-users. We are telling you all
this for a reason: Because given its origin, this book does not only condense our
knowledge after years of research done in a practical context but it also encapsulates
our view on the concepts and systems that are currently out there. While we try to
provide a balanced overview of the current state of affairs in data management and
web technology, we are clearly opinionated with regard to certain best practices and
architectural patterns. We would like to consider this a positive trait of this book—
and we hope you agree;-)
Hamburg, Germany
Hamburg, Germany
Hamburg, Germany
December 2019
Felix Gessert
Wolfram Wingerath
Norbert Ritter
v
Contents
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Modern Data Management and the Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Latency vs. Throughput. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Challenges in Modern Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Outline of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2
3
3
6
8
2
Latency in Cloud-Based Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Three-Tier Architectures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Request Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.3 Problems of Server-Rendered Architectures . . . . . . . . . . . . . . . . . .
2.2 Two-Tier Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Request Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Problems of Client-Rendered Architectures . . . . . . . . . . . . . . . . . . .
2.3 Latency and Round-Trip Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Cloud Computing as a Source of Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.2 Service Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.3 Deployment Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.4 Latency in Cloud Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
14
15
15
16
17
19
19
21
21
23
23
24
25
26
27
3
HTTP for Globally Distributed Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 HTTP and the REST Architectural Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Latency on the Web: TCP, TLS and Network Optimizations . . . . . . . . .
3.3 Web Caching for Scalability and Low Latency . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Types of Web Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 Scalability of Web Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.3 Expiration-Based Web Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
33
35
38
39
40
41
vii
viii
4
5
Contents
3.3.4 Invalidation-Based Web Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.5 Challenges of Web Caching for Data Management . . . . . . . . . . .
3.4 The Client Perspective: Processing, Rendering, and Caching
for Mobile and Web Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Client-Side Rendering and Processing. . . . . . . . . . . . . . . . . . . . . . . . .
3.4.2 Client-Side Caching and Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Challenges and Opportunities: Using Web Caching for Cloud
Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
44
Systems for Scalable Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 NoSQL Database Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Data Models: Key-Value, Wide-Column and Document Stores . . . . . .
4.2.1 Key-Value Stores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Document Stores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.3 Wide-Column Stores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Pivotal Trade-Offs: Latency vs. Consistency vs. Availability . . . . . . . . .
4.3.1 CAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2 PACELC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Relaxed Consistency Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Strong Consistency Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.2 Staleness-Based Consistency Models. . . . . . . . . . . . . . . . . . . . . . . . . .
4.4.3 Session-Based Consistency Models . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5 Offloading Complexity to the Cloud: Database- and
Backend-as-a-Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.1 Database-as-a-Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.2 Backend-as-a-Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.3 Multi-Tenancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
57
59
59
60
60
61
62
63
63
65
66
68
Caching in Research and Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1 Reducing Latency: Replication, Caching and Edge Computing . . . . . .
5.1.1 Eager and Lazy Geo-Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.2 Caching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.3 Edge Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.4 Challenges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Server-Side, Client-Side and Web Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Server-Side Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.2 Client-Side Database Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.3 Caching in Object-Relational and Object-Document
Mappers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.4 Web Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Cache Coherence: Expiration-Based vs. Invalidation-Based
Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 Expiration-Based Caching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
46
48
50
50
70
70
72
73
75
75
85
87
87
88
88
89
89
89
90
92
93
94
94
Contents
ix
5.3.2 Leases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.3 Piggybacking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.4 Time-to-Live (TTL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.5 Invalidation-Based Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.6 Browser Caching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.7 Web Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Query Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.1 Peer-to-Peer Query Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.2 Mediators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.3 Query Caching Proxies and Middlewares . . . . . . . . . . . . . . . . . . . . .
5.4.4 Search Result Caching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.5 Summary Data Structures for Caching . . . . . . . . . . . . . . . . . . . . . . . .
5.5 Eager vs. Lazy Geo Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.1 Replication and Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.2 Eager Geo-Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.3 Lazy Geo-Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
95
96
96
99
100
100
101
101
101
102
103
104
104
105
108
113
114
6
Transactional Semantics for Globally Distributed Applications . . . . . . . .
6.1 Latency vs. Distributed Transaction Processing . . . . . . . . . . . . . . . . . . . . . . .
6.1.1 Distributed Transaction Architectures . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Entity Group Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Multi-Shard Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4 Client-Coordinated Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.5 Middleware-Coordinated Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.6 Deterministic Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.7 Summary: Consistency vs. Latency in Distributed Applications . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
131
131
132
137
138
139
140
141
142
143
7
Polyglot Persistence in Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1 Functional and Non-functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.1 Implementation of Polyglot Persistence . . . . . . . . . . . . . . . . . . . . . . .
7.2 Multi-Tenancy and Virtualization in Cloud-Based Deployments . . . . .
7.2.1 Database Privacy and Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.2 Service Level Agreements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Auto-Scaling and Elasticity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4 Database Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4.1 Consistency Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5 REST APIs, Multi-Model Databases and Backend-as-a-Service . . . . .
7.5.1 Backend-as-a-Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5.2 Polyglot Persistence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
149
150
152
154
156
157
158
159
159
160
161
163
164
164
x
Contents
8
The NoSQL Toolbox: The NoSQL Landscape in a Nutshell . . . . . . . . . . . .
8.1 Sharding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.1 Range Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.2 Hash Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.3 Entity-Group Sharding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3 Storage Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.4 Query Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.5 Summary: System Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
175
176
176
177
177
178
179
182
183
187
9
Summary and Future Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
9.1 From Abstract Requirements to Concrete Systems. . . . . . . . . . . . . . . . . . . . 192
9.2 Future Prospects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
About the Authors
Felix Gessert is the CEO and co-founder of the Backend-as-a-Service company
Baqend.1 During his PhD studies at the University of Hamburg, he developed the
core technology behind Baqend’s web performance service. He is passionate about
making the web faster by turning research results into real-world applications. He
frequently talks at conferences about exciting technology trends in data management
and web performance. As a Junior Fellow of the German Informatics Society (GI),
he is working on new ideas to facilitate the research transfer of academic computer
science innovation into practice.
Wolfram “Wolle” Wingerathis the leading data engineer at Baqend1 where he is
responsible for data analytics and all things related to real-time query processing.
During his PhD studies at the University of Hamburg, he conceived the scalable
design behind Baqend’s real-time query engine and thereby also developed a strong
background in real-time databases and related technology such as scalable stream
processing, NoSQL database systems, cloud computing, and Big Data analytics.
Eager to connect with others and share his experiences, he regularly speaks at
developer and research conferences.
Norbert Ritter is a full professor of computer science at the University of Hamburg,
where he heads the Databases and Information Systems (DBIS) group. He received
his PhD from the University of Kaiserslautern in 1997. His research interests
include distributed and federated database systems, transaction processing, caching,
cloud data management, information integration, and autonomous database systems.
He has been teaching NoSQL topics in various courses for several years. Seeing
the many open challenges for NoSQL systems, he, Wolle, and Felix have been
organizing the annual Scalable Cloud Data Management Workshop2 to promote
research in this area.
1 Baqend:
2 Scalable
https://www.baqend.com/.
Cloud Data Management Workshop: https://scdm.cloud.
xi
Chapter 1
Introduction
Today, web performance is governed by round-trip latencies between end devices
and cloud services. Depending on their location, users therefore often experience
latency as loading delays when browsing through websites and interacting with
content from apps. Since latency is responsible for page load times, it strongly
affects user satisfaction and central business metrics such as customer retention
rates or the time spent on a site. Users expect websites to load quickly and respond
immediately. However, client devices are always separated from cloud backends by
a physical network. The latency for data to travel between devices and cloud servers
therefore dominates the perceived performance of many applications today.
A wealth of studies [Eve16] shows that many business metrics as well as basic
user behavior heavily depend on web performance. At the same time, websites
and workloads continuously become more complex while the amount of processed
and stored data increases. Additionally, more and more users access websites and
services from unreliable mobile networks and different geographical locations.
Performance therefore constitutes one of the central challenges of web technology.
Cloud computing has emerged as a means to simplify operations and deployment
as well as improve the performance of application backends. The rise of cloud
computing enables applications to leverage storage and compute resources from
a large shared pool of infrastructure. The volume and velocity at which data is
generated and delivered have led to the creation of NoSQL databases that provide
scalability, availability, and performance for data-driven workloads. Combining
these two technology trends as cloud data management, scalable database systems
are now frequently deployed and managed through cloud infrastructures. While
cloud data management supports various scalability requirements that have been
impossible with deployments on-premises [LS13, Zha+14], it introduces a performance problem: high latency between application users and cloud services is an
inherent characteristic of the distributed nature of cloud computing and the web.
In this book, we present a comprehensive discussion of current latency reduction
techniques in cloud data management. Throughout the book, we therefore explore
© Springer Nature Switzerland AG 2020
F. Gessert et al., Fast and Scalable Cloud Data Management,
https://doi.org/10.1007/978-3-030-43506-6_1
1
2
1 Introduction
the different ways to improve application performance in the frontend (e.g. through
optimizing the critical rendering path), at the network level (using protocol optimizations), and within the backend (via caching, replication, and other measures).
In doing so, we aim to facilitate an understanding of the related trade-offs between
performance, scalability, and data freshness.
1.1 Modern Data Management and the Web
Across the application stack, slow page load times have three sources, as illustrated
in Fig. 1.1. When a web page is requested, the first source of loading time is the
backend. It consists of application servers and database systems that assemble the
page. The latency of individual OLTP queries and the processing time for rendering
HTML further slow down the delivery of the site [TS07]. The frontend, i.e. the
page displayed and executed in the browser, is another source of delay. Parsing of
HTML, CSS, and JavaScript as well as the execution of JavaScript that can block
other parts of the rendering pipeline all contribute to the overall waiting time. As of
2018, loading an average website requires more than 100 HTTP requests [Arc] that
need to be transferred over the network. This requires numerous round-trip times
that are bounded by physical network latency. This source of delay typically has the
most significant impact on page load time in practice [Gri13].
Any performance problem in web applications can be allocated to these three
drivers of latency. When a website is requested by a client, it is generated by the
Fig. 1.1 The three primary sources of latency and performance problems of web applications:
frontend rendering, network delay, and backend processing
1.3 Challenges in Modern Data Management
3
backend, thus causing processing time. The website’s HTML is transferred to the
browser and all included resources (e.g., scripts, images, stylesheets, data, queries)
are requested individually causing additional network latency. Rendering and script
execution in the client also contribute to overall latency.
1.2 Latency vs. Throughput
Network bandwidth, client resources, computing power, and database technology
have improved significantly in recent years [McK16]. Nonetheless, latency is
still restricted by physical network round-trip times as shown in Fig. 1.2. When
network bandwidth increases, page load time does not improve significantly above
5 MBit/s for typical websites. If latency can be reduced, however, there is a
proportional decrease in overall page load time. These results illustrate that cloudbased applications can only be accelerated through latency reduction. Since latency
is incurred at the frontend, network, and backend levels, it can only be minimized
with an end-to-end system design.
The increasing adoption of cloud computing has led to a growing significance of latency for overall performance. Both users and different application
components are now frequently separated by wide-area networks. Database-as-aService (DBaaS) and Backend-as-a-Service (BaaS) models allow storing data in
the cloud to substantially simplify application development [Cur+11a]. However,
their distributed nature makes network latency critical [Coo13]. When clients (e.g.,
browsers or mobile devices) and application servers request data from a remote
DBaaS or BaaS, the application is blocked until results are received from the cloud
data center. As web applications usually rely on numerous queries for every screen,
latency can quickly become the central performance bottleneck.
Fueled by the availability of DBaaS and BaaS systems with powerful
REST/HTTP APIs for developing websites and mobile apps, the single-page
application architecture gained popularity. In this two-tier architecture, clients
directly consume data from cloud services without intermediate web and application
servers as in three-tier architectures. Single-page applications allow more flexible
frontends and facilitate a more agile development process. In single-page
applications, data is not aggregated and pre-rendered in the application server, but
assembled in the client through many individual requests. Consequently, the number
of latency-critical data requests tends to be even higher in two-tier applications than
in typical three-tier stacks [Wag17].
1.3 Challenges in Modern Data Management
The latency problem has been tackled mainly by replication [DeC+07, Cha+08,
Hba, Qia+13, Coo+08, Sov+11, Llo+13, Llo+11] and caching techniques [Lab+09,
Page Load Time (ms)
4
1 Introduction
3500
3000
2500
2000
1500
1000
500
0
1
2
3
4
5
6
7
8
Bandwidth in MBit/s (at 60ms latency)
9
10
Page Load Time (ms)
5000
4000
3000
2000
1000
0
240 220 200 180 160 140 120 100 80 60
Latency in ms (at 5MBit/s bandwidth)
40
20
0
Fig. 1.2 The dependency of page load time on bandwidth (data rate) and latency. For typical
websites, increased bandwidth has a diminishing return above 5 MBit/s, whereas any decrease in
latency leads to a proportional decrease in page load time. The data points were collected by Belshe
[Bel10] who used the 25 most accessed websites
PB03, Dar+96, Alt+03, LGZ04, Luo+02, Bor+03] to distribute the database system
and its data. However, the majority of work on replication and caching is limited by
a lack of generality: Most solutions are tied to specific types of data or applications
(e.g., static web content), trade read latency against higher write latency, or do
not bound data staleness. Furthermore, latency and performance improvements for
database systems do not solve the end-to-end performance problem. The core
problem is that state-of-the-art database systems are not designed to be directly
accessed by browsers and mobile devices as they lack the necessary abstractions
for access control and business logic. Therefore, servers still need to aggregate data
for clients and thus increase latency [Feh+14]. The goal of this book is to provide an
overview over the spectrum of techniques for low latency in all architectural layers,
ranging from the frontend over the network to the backend.
Improving the performance of mostly static data has a long history [GHa+96].
However, latency and consistency are particularly challenging for dynamic data
that in contrast to static data can change unpredictably at any point in time
1.3 Challenges in Modern Data Management
5
[WGW+20]. A typical website consists of some mostly static files such as scripts,
stylesheets, images, and fonts. Web APIs, JSON data, and HTML files, on the
other hand, are dynamic and therefore commonly considered uncacheable [Lab+09].
Dynamic data can have various forms depending on the type of the application and
the underlying storage [Kle17]. Hence, the latency problem is equally relevant for
both standard file- and record-based access via a primary key or an identifier (e.g.,
a URL) as well as query results that offer a dynamic view of the data based on
query predicates. As an example, consider an e-commerce website. For the website
to load fast, files that make up the application frontend have to be delivered with
low latency, e.g., the HTML page for displaying the shop’s landing page. Next,
data from the database systems also needs to be delivered fast, e.g., the state of the
shopping cart or product detail information. And lastly, the performance of queries
like retrieving recommended products, filtering the product catalog or displaying
search results also heavily depends on latency.
Latency is not only problematic for end users, but it also has a detrimental effect
on transaction processing [Bak+11, Shu+13, DAEA10, PD10, Kra+13, DFR15a,
Kal+08, Zha+15b, Dra+15]. Many applications require the strong guarantees of
transactions to preserve application invariants and correct semantics. However, both
lock-based and optimistic concurrency control protocols have an abort probability
that depends on the overall transaction duration [BN09, Tho98]. If individual
operations are subject to high latency, the overall transaction duration is prolonged
and consequently, the probability of a deadlock or conflict exhibits a superlinear
increase [WV02]. In environments with high latency, the performance of transaction
processing is thus determined by latency. This is for example the case if an end user
is involved in the transaction (e.g., during the checkout in reservation system) or if
the server runs the transaction against a remote DBaaS. To increase the effectiveness
of transactions, low latency is therefore required, too.
Besides the problem of high network latencies, the applicability of database
systems in cloud environments is considerably restricted by the lack of elastic
horizontal scalability mechanisms and missing abstraction of storage and data
models [DAEA13, Stö+15]. In today’s cloud data management, most DBaaS
systems offer their functionalities through REST APIs. Yet today, there has been
no systematic effort on deriving a unified REST interface that takes into account the
different data models, schemas, consistency concepts, transactions, access-control
mechanisms, and query languages to expose cloud data stores through a common
interface without restricting their functionality or scalability.
The complete ecosystem of data management is currently undergoing heavy
changes. The unprecedented scale at which data is consumed and generated today
has shown a large demand for scalable data management and given rise to nonrelational, distributed NoSQL database systems [DeC+07, Cha+08, Hba, LM10,
CD13, SF12, ZS17]. Two central problems triggered this process:
• vast amounts of user-generated content in modern applications and the resulting
request loads and data volumes
6
1 Introduction
• the desire of the developer community to employ problem-specific data models
for storage and querying
To address these needs, various data stores have been developed by both industry
and research, arguing that the era of one-size-fits-all database systems is over
[Sto+07]. Therefore, these systems are frequently combined to leverage each system
in its respective sweet spot. Polyglot persistence is the concept of using different
database systems within a single application domain, addressing different functional
and non-functional needs with each system [SF12]. Complex applications need
polyglot persistence to deal with a wide range of data management requirements.
The overhead of and the necessary know-how for managing multiple database
systems prevent many applications from employing efficient polyglot persistence
architectures. Instead, developers are often forced to implement one-size-fits-all
solutions that do not scale well and cannot be operated efficiently. Even with stateof-the-art DBaaS systems, applications still have to choose one specific database
technology [HIM02, Cur+11a].
The rise of polyglot persistence [SF12] introduces two specific problems. First,
it imposes the constraint that any performance and latency optimization must not
be limited to only a single database system. Second, the heterogeneity and sheer
amount of these systems make it increasingly difficult to select the most appropriate
system for a given application. Current research and industry initiatives focus on
solving specific problems by introducing new database systems or new approaches
within the scope of specific, existing data stores. However, the problem of selecting
the most suitable systems and orchestrating their interaction is yet unsolved as is the
problem of offering low latency for a polyglot application architecture.
In this book, we address each of the aforementioned challenges by providing
a detailed view on the current technology for achieving and maintaining high
performance in cloud-based application stacks. We aim to clarify today’s challenges,
discuss available solutions, and indicate problems that are yet unsolved. Reading
this book will help you choose the right technology for building fast applications—
throughout the entire software stack.
1.4 Outline of the Book
In Chap. 2, we start with a 10,000 feet view of web-based application stacks
today and make the case that low latency for end-users can only be achieved
when optimizing the backend, the frontend, and the network in between. To paint
the big picture of where latency arises in globally distributed applications, we
contrast client-rendered two-tier with server-rendered three-tier application stacks
and provide a quick overview over current cloud service and deployment models.
Next in Chap. 3, we address the relevance of the HTTP infrastructure and REST
communication paradigm as the prevalent mechanisms for global data access in
many Web and mobile applications. After a technical primer on the lower-level
1.4 Outline of the Book
7
protocols involved in information exchange over HTTP, we summarize caching
mechanisms defined in the HTTP specification for achieving low latency and
scalability. We subsequently turn to performance considerations for application
developers. To this end, we first describe the rendering process in modern web
applications and then establish the context between the critical rendering path, client
caching and storage, and application performance.
Chapter 4 then focuses on technology for scalable data management as the
primary challenge for achieving low latency. We first explore the design space of
(NoSQL) data management systems along different axes, especially data models
(key-value, document, wide-column) and different notions of consistency (CAP/PACELC, weak/eventual/strong consistency). In greater detail, we consider the
relationship between latency, consistency, and availability in order to expound
the conception that they cannot be considered separately during system design:
Optimizing system performance with respect to one of these properties often leads
to degraded performance with respect to the others.
In Chap. 5, we drill further down into caching technology as a means to
compensate physical distance between servers and clients, categorizing the topic
along three different dimensions. First, we distinguish the different locations at
which caching takes place, namely the client, the server, or intermediate parties
such as reverse proxies. Second, we distinguish the granularity of the cached entities
such as files, records, or pages. Third, we distinguish expiration- and invalidationbased caching as competing strategies for keeping caches up-to-date. To underpin
the importance of caching for performance in modern data management, we put an
emphasis on approaches for caching database query results in this chapter and also
shed light on the relationship between (geo-)replication and caching.
We start Chap. 6 with a brief recap of the ACID principle as the gold standard
for transactional guarantees and of the different concurrency control schemes,
such as pessimistic lock-based, multi-version, and forward-oriented and backwardoriented optimistic approaches. After a brief illustration of the impact of transactions
on latency, we discuss the wealth of systems in research and practice that offer
relaxed guarantees in favor of accelerated data access. We then consider different
approaches for transactions in distributed architectures such as entity group or
multi-shard transactions. Finally, we contrast client- and middleware-coordinated
transactions and discuss the benefits and limitations of purely deterministic transactions.
In Chap. 7, we consider the different challenges related to polyglot persistence
architectures. We discuss cloud-hosted data management services (DBaaS/BaaS)
as an alternative to on-premise solutions and the different challenges of operating
systems in federated deployments, such as multi-tenancy and virtualization, privacy
and encryption, as well as resource management and scalability. We briefly dive
into the area of database performance benchmarking with respect to both traditional
measures such as latency or throughput, but also consistency properties which and
ways to quantify them. The chapter closes with a brief consideration of REST APIs,
multi-model databases, and Backend-as-a-Service offerings available today.
8
1 Introduction
The gist of the book is condensed into Chap. 8 where we present the NoSQL
Toolbox as a concept for mapping functional and non-functional application
requirements to the concrete technologies used for implementing them. The toolbox
acknowledges four major building blocks for data management systems which
are dissected one by one: sharding, replication, storage management, and query
processing. The chapter closes with succinct case studies to exemplify how the
capabilities of real data management systems translate to a toolbox classification.
Finally in Chap. 9, we present the NoSQL Decision Tree as the prime take
away of this book: It provides a simple way to steer application architects towards
a set of potentially useful data management systems, provided only few input
parameters on the application requirements. The remainder of the book covers
aspects of data management that are not covered extensively here, specifically
stream management and push-based real-time database systems, business analytics,
and machine learning.
References
[Alt+03] Mehmet Altinel et al. “Cache Tables: Paving the Way for an Adaptive Database
Cache”. In: VLDB. 2003, pp. 718–729. URL: http://www.vldb.org/conf/2003/papers/
S22P01.pdf.
[Arc] HTTP Archive. http://httparchive.org/trends.php. Accessed: 2018-07-14. 2018.
[Bak+11] J. Baker et al. “Megastore: Providing scalable, highly available storage for interactive
services”. In: Proc. of CIDR. Vol. 11. 2011, pp. 223–234.
[Bel10] Mike Belshe. More Bandwidth Doesnât Matter (much). Tech. rep. Google Inc., 2010.
[BN09] Philip A. Bernstein and Eric Newcomer. Principles of Transaction Processing.
Morgan Kaufmann, 2009. ISBN: 1-55860-415-4.
[Bor+03] Christof Bornhövd et al. “DBCache: Middle-tier Database Caching for Highly
Scalable e-Business Architectures”. In: Proceedings of the 2003 ACM SIGMOD
International Conference on Management of Data, San Diego, California, USA, June
9–12, 2003. Ed. by Alon Y. Halevy, Zachary G. Ives, and AnHai Doan. ACM, 2003,
p. 662. DOI: 10.1145/872757.872849.
[CD13] Kristina Chodorow and Michael Dirolf. MongoDB - The Definitive Guide.
O’Reilly, 2013. ISBN: 978-1-449-38156-1. URL: http://www.oreilly.de/catalog/
9781449381561/index.html.
[Cha+08] Fay Chang et al. “Bigtable: A distributed storage system for structured data”. In: ACM
Transactions on Computer Systems (TOCS) 26.2 (2008), p. 4.
[Coo+08] B. F. Cooper et al. “PNUTS: Yahoo!’s hosted data serving platform”. In: PVLDB 1.2
(2008), pp. 1277–1288. URL: http://dl.acm.org/citation.cfm?id=1454167 (visited on
09/12/2012).
[Coo13] Brian F. Cooper. “Spanner: Google’s globally-distributed database”. In: 6th Annual
International Systems and Storage Conference, SYSTOR ’13, Haifa, Israel - June 30 July 02, 2013. Ed. by Ronen I. Kat, Mary Baker, and Sivan Toledo. ACM, 2013, p. 9.
DOI: 10.1145/2485732.2485756.
[Cur+11a] Carlo Curino et al. “Relational Cloud: A Database-as-a-Service for the Cloud”. In:
Proc. of CIDR. 2011. URL: http://dspace.mit.edu/handle/1721.1/62241 (visited on
04/15/2014).
[DAEA10] Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. “G-store: a scalable data store
for transactional multi key access in the cloud”. In: Proceedings of the 1st ACM
symposium on Cloud computing. ACM. 2010, pp. 163–174.
References
9
[DAEA13] Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. “ElasTraS: An elastic,
scalable, and self-managing transactional database for the cloud”. en. In: ACM
Transactions on Database Systems 38.1 (Apr. 2013), pp. 1–45. ISSN: 03625915.
DOI: 10.1145/2445583.2445588. URL : http://dl.acm.org/citation.cfm?doid=2445583.
2445588 (visited on 11/25/2016).
[Dar+96] Shaul Dar et al. “Semantic Data Caching and Replacement”. In: VLDB’96, Proceedings of 22th International Conference on Very Large Data Bases, September 3–6,
1996, Mumbai (Bombay), India. Ed. by T. M. Vijayaraman et al. Morgan Kaufmann,
1996, pp. 330–341. URL: http://www.vldb.org/conf/1996/P330.PDF.
[DeC+07] G. DeCandia et al. “Dynamo: amazon’s highly available key-value store”. In: ACM
SOSP. Vol. 14. 17. ACM. 2007, pp. 205–220. URL: http://dl.acm.org/citation.cfm?id=
1294281 (visited on 09/12/2012).
[DFR15a] A. Dey, A. Fekete, and U. Röhm. “Scalable distributed transactions across heterogeneous stores”. In: 2015 IEEE 31st International Conference on Data Engineering.
2015, pp. 125–136. DOI: 10.1109/ICDE.2015.7113278.
[Dra+15] Aleksandar DragojeviÄ et al. “No compromises: distributed transactions with consistency, availability, and performance”. en. In: Proceedings of the 25th Symposium
on Operating Systems Principles. ACM. ACM Press, 2015, pp. 54–70. ISBN: 9781-4503-3834-9. DOI: 10.1145/2815400.2815425. URL: http://dl.acm.org/citation.cfm?
doid=2815400.2815425 (visited on 11/25/2016).
[Eve16] Tammy Everts. Time Is Money: The Business Value of Web Performance.
O’Reilly
Media,
2016.
https://www.amazon.com/Time-MoneyURL :
Business-Value-Performance-ebook/dp/B01GGQKXPS%3FSubscriptionId
%3D0JYN1NVW651KCA56C102%26tag%3Dtechkie-20%26linkCode%3Dxm2
%26camp%3D2025%26creative%3D165953%26creativeASIN%3DB01GGQKXPS.
[Feh+14] Christoph Fehling et al. Cloud Computing Patterns - Fundamentals to Design, Build,
and Manage Cloud Applications. Springer, 2014. ISBN: 978-3-7091-1567-1. DOI:
10.1007/978-3-7091-1568-8.
[GHa+96] Jim Gray, Pat Hell and, et al. “The dangers of replication and a solution”. In: SIGMOD
Rec. 25.2 (June 1996), pp. 173–182.
[Gri13] Ilya Grigorik. High performance browser networking. English. [S.l.]: O’Reilly Media,
2013. ISBN: 1-4493-4476-3 978-1-4493-4476-4. URL: https://books.google.de/books?
id=tf-AAAAQBAJ.
[Hba] HBase. http://hbase.apache.org/. (Accessed on 05/25/2017). 2017. URL: http://hbase.
apache.org/ (visited on 07/16/2014).
[HIM02] H. Hacigumus, B. Iyer, and S. Mehrotra. “Providing database as a service”. In:
Data Engineering, 2002. Proceedings. 18th International Conference on. 2002, pp.
29–38. URL: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=994695 (visited on
10/16/2012).
[Kal+08] R. Kallman et al. “H-store: a high-performance, distributed main memory transaction
processing system”. In: Proceedings of the VLDB Endowment 1.2 (2008), pp. 1496–
1499.
[Kle17] Martin Kleppmann. Designing Data-Intensive Applications. English. 1 edition.
O’Reilly Media, Jan. 2017. ISBN: 978-1-4493-7332-0.
[Kra+13] Tim Kraska et al. “MDCC: Multi-data center consistency”. In: EuroSys. ACM,
2013, pp. 113–126. URL: http://dl.acm.org/citation.cfm?id=2465363 (visited on
04/15/2014).
[Lab+09] Alexandros Labrinidis et al. “Caching and Materialization for Web Databases”.
In: Foundations and Trends in Databases 2.3 (2009), pp. 169–266. DOI:
10.1561/1900000005.
[LGZ04] Per-Åke Larson, Jonathan Goldstein, and Jingren Zhou. “MTCache: Transparent MidTier Database Caching in SQL Server”. In: Proceedings of the 20th International
Conference on Data Engineering, ICDE 2004, 30 March - 2 April 2004, Boston, MA,
USA. Ed. by Z. Meral Özsoyoglu and Stanley B. Zdonik. IEEE Computer Society,
2004, pp. 177–188. DOI: 10.1109/ICDE.2004.1319994.
10
1 Introduction
[Llo+11] Wyatt Lloyd et al. “Don’t settle for eventual: scalable causal consistency for widearea storage with COPS”. In: Proceedings of the Twenty-Third ACM Symposium
on Operating Systems Principles. ACM, 2011, pp. 401–416. URL: http://dl.acm.org/
citation.cfm?id=2043593 (visited on 01/03/2015).
[Llo+13] Wyatt Lloyd et al. “Stronger semantics for low-latency geo-replicated storage”. In:
Presented as part of the 10th USENIX Symposium on Networked Systems Design and
Implementation (NSDI 13). 2013, pp. 313–328.
[LM10] Avinash Lakshman and Prashant Malik. “Cassandra: a decentralized structured
storage system”. In: ACM SIGOPS Operating Systems Review 44.2 (2010), pp. 35–
40. URL: http://dl.acm.org/citation.cfm?id=1773922 (visited on 04/15/2014).
[LS13] Wolfgang Lehner and Kai-Uwe Sattler. Web-Scale Data Management for the Cloud.
Englisch. Auflage: 2013. New York: Springer, Apr. 2013. ISBN: 978-1-4614-6855-4.
[Luo+02] Qiong Luo et al. “Middle-tier database caching for e-business”. In: Proceedings of the
2002 ACM SIGMOD International Conference on Management of Data, Madison,
Wisconsin, June 3–6, 2002. Ed. by Michael J. Franklin, Bongki Moon, and Anastassia
Ailamaki. ACM, 2002, pp. 600–611. DOI: 10.1145/564691.564763.
[McK16] Martin McKeay. Akamaiâs State of the Internet Report Q4 2016. Tech. rep. Akamai,
2016.
[PB03] Stefan Podlipnig and László Böszörményi. “A survey of Web cache replacement strategies”. In: ACM Comput. Surv. 35.4 (2003), pp. 374–398. DOI:
10.1145/954339.954341.
[PD10] Daniel Peng and Frank Dabek. “Large-scale Incremental Processing Using Distributed Transactions and Notifications.” In: OSDI. Vol. 10. 2010, pp. 1–15.
URL : https://www.usenix.org/legacy/events/osdi10/tech/full_papers/Peng.pdf?origin=
publication_detail (visited on 01/03/2015).
[Qia+13] Lin Qiao et al. “On brewing fresh espresso: LinkedIn’s distributed data serving
platform”. In: Proceedings of the 2013 international conference on Management of
data. ACM, 2013, pp. 1135–1146. URL: http://dl.acm.org/citation.cfm?id=2465298
(visited on 09/28/2014).
[SF12] Pramod J. Sadalage and Martin Fowler. NoSQL distilled: a brief guide to the emerging
world of polyglot persistence. Pearson Education, 2012.
[Shu+13] Jeff Shute et al. “F1: A distributed SQL database that scales”. In: Proceedings of the
VLDB Endowment 6.11 (2013). 00004, pp. 1068–1079.
[Sov+11] Yair Sovran et al. “Transactional storage for geo-replicated systems”. In: Proceedings
of the Twenty-Third ACM Symposium on Operating Systems Principles. ACM, 2011,
pp. 385–400.
[Sto+07] M. Stonebraker et al. “The end of an architectural era:(it’s time for a complete
rewrite)”. In: Proceedings of the 33rd international conference on Very large data
bases. 2007, pp. 1150–1160. URL: http://dl.acm.org/citation.cfm?id=1325981 (visited
on 07/05/2012).
[Stö+15] Uta Störl et al. “Schemaless NoSQL Data Stores - Object-NoSQL Mappers to the
Rescue?” In: Datenbanksysteme für Business, Technologie und Web (BTW), 16.
Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme” (DBIS),
4.-6.3.2015 in Hamburg, Germany. Proceedings. Ed. by Thomas Seidl et al.
Vol. 241. LNI. GI, 2015, pp. 579–599. URL: http://subs.emis.de/LNI/Proceedings/
Proceedings241/article13.html (visited on 03/10/2015).
[Tho98] A. Thomasian. “Concurrency control: methods, performance, and analysis”. In: ACM
Computing Surveys (CSUR) 30.1 (1998). 00119, pp. 70–119. URL: http://dl.acm.org/
citation.cfm?id=274443 (visited on 10/18/2012).
[TS07] Andrew S. Tanenbaum and Maarten van Steen. Distributed systems - principles and
paradigms, 2nd Edition. Pearson Education, 2007. ISBN: 978-0-13-239227-3.
[Wag17] Jeremy Wagner. Web Performance in Action: Building Faster Web Pages.
Manning Publications, 2017. ISBN: 1617293776. URL: https://www.amazon.
com/Web-Performance-Action-Building-Faster/dp/1617293776?SubscriptionId=
References
11
0JYN1NVW651KCA56C102&tag=techkie-20&linkCode=xm2&camp=2025&
creative=165953&creativeASIN=1617293776.
[WGW+20] WolframWingerath, Felix Gessert, ErikWitt, et al. “Speed Kit: A Polyglot & GDPRCompliant Approach For Caching Personalized Content”. In: 36th IEEE International
Conference on Data Engineering, ICDE 2020, Dallas, Texas, April 20–24, 2020.
2020.
[WV02] G. Weikum and G. Vossen. Transactional information systems. Series in Data
Management Systems. Morgan Kaufmann Pub, 2002. ISBN: 9781558605084. URL:
http://books.google.de/books?hl=de&lr=&id=wV5Ran71zNoC&oi=fnd&pg=PP2&
dq=transactional+information+systems&ots=PgJAaN7R5X&sig=Iya4r9DiFhmb_
wWgOI5QMuxm6zU (visited on 06/28/2012).
[Zha+14] Liang Zhao et al. Cloud Data Management. Englisch. Auflage: 2014. Springer, 2014.
[Zha+15b] Irene Zhang et al. “Building consistent transactions with inconsistent replication”. In:
Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015,
Monterey, CA, USA, October 4–7, 2015. Ed. by Ethan L. Miller and Steven Hand.
ACM, 2015, pp. 263–278. DOI: 10.1145/2815400.2815404.
[ZS17] Albert Y. Zomaya and Sherif Sakr, eds. Handbook of Big Data Technologies. Springer,
2017. ISBN: 978-3-319-49339-8. DOI: 10.1007/978-3-319-49340-4.
Chapter 2
Latency in Cloud-Based Applications
The continuous shift towards cloud computing has established two primary architectures: two-tier and three-tier applications. Both architectures are susceptible to
latency at different levels. The concrete realization can build upon different cloud
models, in particular, Database/Backend-as-a-Service, Platform-as-a-Service, and
Infrastructure-as-a-Service [YBDS08].
Modern web applications need to fulfill several non-functional requirements:
• High availability guarantees that applications remain operational despite failure
conditions such as network partitions, server failures, connectivity issues and
human error.
• Elastic scalability enables applications to handle any growth and decrease in
load (e.g., user requests and data volume), by automatically allocating or freeing
storage and computing resources in a distributed cluster.
• Fast page loads and response times are essential to maximize user satisfaction,
traffic, and revenue.
• An engaging user experience significantly helps to make users productive and
efficient.
• A fast time-to-market is the result of the appropriate development, testing, and
deployment abstractions to quickly release an application to production.1
In this chapter, we discuss the three- and two-tier architectures in the context of
the above requirements, before examining the technical foundations of the backend,
network, and frontend in the following chapters.
1 Despite all recent advances in programming languages, tooling, cloud platforms, and frameworks,
studies indicate that over 30% of all web projects are delivered late or over-budget, while 21% fail
to meet their defined requirements [Kri15].
© Springer Nature Switzerland AG 2020
F. Gessert et al., Fast and Scalable Cloud Data Management,
https://doi.org/10.1007/978-3-030-43506-6_2
13
14
2 Latency in Cloud-Based Applications
2.1 Three-Tier Architectures
The three-tier architecture is a well-known pattern for structuring client-server
applications [TS07, Feh+14, HW03]. The idea is, to segregate application concerns
into three different functional tiers (components). This has the advantage that tiers
are loosely coupled, thus facilitating easier development. Furthermore, each tier
can be scaled independently based on required resources. The canonical tiers are
the presentation tier, the business logic tier and the data tier. In the literature,
different definitions of three-tier architectures are used. Tanenbaum and van Steen
[TS07] differentiate between web servers, application servers and database servers
as three different tiers of a web application. Fehling et al. [Feh+14] argue that
web and application servers are typically just one tier, whereas in a real three-tier
application, the presentation tier is completely decoupled from the business logic
tier, e.g., by message queues.
We will distinguish between the two-tier and three-tier architecture based on
the location of the presentation tier. As shown in Fig. 2.1, the classic three-tier
architecture includes the presentation layer as part of the backend application. This
means that an application or web server executes the presentation and business logic
while the data tier serves and stores data using one or more database systems. The
client’s browser is served the rendered representation, typically in the form of an
HTML file and supporting stylesheets (CSS) and JavaScript files (JS). As the client
does not execute any significant portion of the presentation and business logic, this
architecture is also referred to as a thin client architecture. Any user interactions that
require business logic (e.g., posting a comment on a social network) are forwarded to
the server tiers, which are responsible for performing the desired task. This usually
implies the server-rendering of a new HTML view representing a response to the
invoked action. An advantage of separating the data tier and business logic tier is
that business logic can be stateless and scales efficiently.
Fig. 2.1 The three-tier web application architecture
2.1 Three-Tier Architectures
15
2.1.1 Request Flow
The high-level request flow in a server-rendered three-tier architecture is the
following (cf. [Feh+14]):
1. The client requests the website over the HTTP protocol.
2. The web server accepts the request and calls the components for handling the
corresponding URL. Usually, the web server is not requested directly, but a
load balancer distributes requests over available web servers. The request can
be directly executed in the web server (e.g., in PHP) or invoked over the network
(e.g., through AJP) or using a queuing system (e.g., RabbitMQ) [Cha15].
3. In the application server, the business logic is executed.
4. Any data required to render the current view is queried from the database and
updates are applied to reflect the application state.
5. The response is sent to the client as an HTML document. The web server directly
answers subsequent requests for static resources like images and scripts.
2.1.2 Implementation
As a large part of the web uses three-tier architectures, a considerable amount of
environments and frameworks for developing and hosting three-tier applications
exist. In the context of cloud computing, three-tier architectures can be implemented
on Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) clouds
[HDF13, MB16].
PaaS cloud providers such as Microsoft Azure [Azu], Google App Engine
[App], and Heroku [Clob] offer managed operating systems, application servers,
and middleware for running web applications in a scalable fashion. While the
provider prescribes the runtime environment (e.g., supporting Python applications),
the application logic can be freely defined. The PaaS abstracts from maintenance
and provisioning of operating systems and servers to unburden the application from
operational aspects such as scaling, system upgrades, and network configuration. It
therefore provides a useful paradigm for the development of three-tier applications.
For example, Microsoft Azure [Azu] has a built-in notion of the three tiers, as it
distinguishes between web roles (the presentation tier), storage services (the data
tier) and worker roles (the business logic tier). Web roles and worker roles are scaled
independently and decoupled by storage abstractions such as queues, wide-column
models, and file systems [Cal+11].
In the IaaS model, full control over virtual machines is left to the tenant.
This implies that three-tier architectures can use the same technology stacks as
applications in non-cloud environments (on-premises). For example, Amazon Web
Services (AWS) [Amab] and Google Cloud Platform (GCP) [Gooa] provide the
management infrastructure to provision individual virtual machines or containers
that can run arbitrary software for each tier in the architectures. Typically, a web
16
2 Latency in Cloud-Based Applications
server (e.g., Apache, IIS, or Nginx [Ree08]), application server (e.g., Tomcat or
Wildfly [Wil]) or reverse proxy (e.g., Varnish [Kam17]) is combined with a web
application framework in a particular programming language running the business
logic and parts of the presentation tier (e.g., Python with Django, Java with Spring
MVC, or Ruby with Sinatra [The, Wal14]). The business logic tier in turn either
employs a database system also hosted on the IaaS provider or connects to Databaseas-a-Service offerings to persist and retrieve data.
The microservice architecture is a refinement of the three-tier architecture that
decomposes the three tiers of the backend [New15, Nad+16]. The central idea of
microservices is to decompose the application into functional units that are loosely
coupled and interact with each other through REST APIs. Microservices thus offer a
light-weight alternative to service-oriented architectures (SOA) and the WebService
standards [Alo+04]. In contrast to three-tier applications, microservices do not share
state through a data tier. Instead, each microservice is responsible for separately
maintaining the data it requires to fulfill its specified functionality. One of the major
reasons for the adoption of microservices is that they allow scaling the development
of large distributed applications: each team can individually develop, deploy, and
test microservices as long as the API contracts are kept intact. When combined with
server-rendering, i.e., the generation of HTML views for each interaction in a web
application, microservices still exhibit the same performance properties as threetier architectures. Some aspects even increase in complexity, as each microservice
is a point of failure and response times for answering a request through aggregation
from multiple microservice responses are subject to latency stragglers.
2.1.3 Problems of Server-Rendered Architectures
The non-functional requirements introduced at the beginning of this chapter are
particularly challenging to fulfill in three-tier and service architectures with a serverside presentation tier:
High Availability. As all tiers depend upon the data tier for shared state, the
underlying database systems have to be highly available. Any unavailability in
the data tier will propagate to the other tiers, thus amplifying potential partial
failures into application unavailability.
Elastic Scalability. All tiers need to be independently and elastically scalable,
which can induce severe architectural complexity. For instance, if requests passed
from the presentation tier to the business logic tier exceed the capacities of the
business logic tier, scaling rules have to be triggered without dropping requests.
Alternatively, non-trivial backpressure (flow control) mechanisms [Kle17] have
to be applied to throttle upstream throughput. In practice, tiers are often
decoupled through message queues, which—similar to database systems—have
inherent availability-consistency-performance trade-offs.
2.2 Two-Tier Architectures
17
Fast Page Loads. Server-rendering implies that the delivery of a response is
blocked until the slowest service or query returns which hinders fast page loads.
Even if each query and service produces a low average or median response time,
the aggregate response times are governed by extreme value distributions that
have a significantly higher expected value [WJW15, VM14]. While the request
is blocked, the client cannot perform any work as the initial HTML document is
the starting point for any further processing in the browser and for subsequent
requests [WGW+20]. Of the potentially hundreds of requests [Arc], each is
furthermore bounded by network latency that increases with the distance to the
server-side application logic.
Engaging User Experience. As each user interaction (e.g., navigation or submitting a form) produces a new HTML document, the indirection between
the user’s interactions and observed effects become noticeable. A well-studied
result from psychology and usability engineering is that for the user to gain the
impression of directly modifying objects in the user interface, response times
have to be below 100 ms [Mil68, Nie94, Mye85]. Even if the delivery of static
assets is fast, rendering an HTML document, applying updates to the database
and performing relevant queries is usually infeasible if any significant network
latency is involved. For users, this conveys the feeling of an unnatural, indirect
interaction pattern [Nie94].
Fast Time-to-Market. Besides the above performance problems, server-side
rendering also induces problems for the software development process. All user
interactions need to be executed on the server. In modern web applications,
the user interface has to be engaging and responsive. Therefore, parts of the
presentation logic are replicated between the server-side presentation tier and
the JavaScript logic of the frontend. This duplicates functionality, increasing
development complexity and hindering maintainability. Furthermore, by splitting
the frontend from the server-side processing, unintended interdependencies arise:
frontend developers or teams have to rely on the backend development to
proceed, in order to work on the design and structure of the frontend. This hinders
agile, iterative development methodologies such as Scrum [SB02] and Extreme
Programming (XP) [Bec00] from being applied to frontend and backend teams
separately. As applications shift towards more complex frontends, the coupling
of frontend and backend development inevitably increases time-to-market.
2.2 Two-Tier Architectures
The two-tier architecture evolved [Feh+14] to tackle the problems of rigid threetier architectures. By two-tier architectures, we will refer to applications that shift
the majority of presentation logic into the client. Business logic can be shared
or divided between client and server, whereas the data tier resides on the server,
to reflect application state across users. The two-tier model is popular for native
mobile applications, that are fundamentally based on the user interfaces components
18
2 Latency in Cloud-Based Applications
Fig. 2.2 The two-tier web application architecture
offered by the respective mobile operating system (iOS, Windows, Android) and
packaged into an installable app bundle [Hil16]. Many web applications also follow
this model and are referred to as single-page applications (SPAs), due to their
ability to perform user interactions without loading a new HTML page [MP14]. We
will discuss the two-tier architecture in the context of web applications, but most
aspects also apply to native mobile apps.
The two-tier architecture is illustrated in Fig. 2.2. Rendering in the client is performed through the browser’s JavaScript runtime engine that consumes structured
data directly from the server (e.g., product detail information), usually in the form of
JSON2 [Cro06]. The data tier is therefore responsible for directly serving database
objects and queries to clients. The business logic tier is optional and split into
unprotected parts directly executed in the client and parts that require confidentiality,
security and stricter control and are therefore executed co-located with the data tier.
Server-side business logic includes enforcing access control, validating inputs, and
performing any protected business logic (e.g., placing an order in an e-commerce
shop). Actions carried out by the client can be directly modeled as update operations
on the database, with a potential validation and rewriting step enforced by the server.
2 The JavaScript Object Notation (JSON) is a self-contained document format, consisting of objects
(key-value pairs) and arrays (ordered lists), that can be arbitrarily nested. JSON has gained
popularity due to its simpler structure compared to XML. It can be easily processed in JavaScript
and thus became the widely-used format for document databases such as MongoDB [CD13],
CouchDB [ALS10], Couchbase [Lak+16], and Espresso [Qia+13] to reduce the impedance
mismatch [Mai90].
2.2 Two-Tier Architectures
19
2.2.1 Request Flow
The request flow in two-tier web application architectures is slightly different from
three-tier architectures:
1. With the initial request, the client retrieves the HTML document containing the
single-page application logic.
2. The server or cloud service returns the HTML document and the accompanying
JavaScript files. In contrast to server-rendered architectures, the frontend’s
structure is data-independent and therefore does not require any database
queries or business logic.
3. The client evaluates the HTML and fetches any referenced files, in particular, the
JavaScript containing the presentation logic.
4. Via JavaScript, the data required to display the current application view are
fetched from the server via a REST/HTTP3 API either in individual read
operations or using a query language (e.g., MongoDB [CD13] or GraphQL
[Gra]).
5. The frontend renders the data using the presentation logic of the JavaScript
frontend, typically expressed through a template language.
6. User interactions are sent as individual requests and encode the exact operation
performed. The response returns the data necessary to update the frontend
accordingly.
2.2.2 Implementation
The technology choices for three-tier architectures also apply to the realization of
two-tier architectures. IaaS and Paas offer low-level abstractions for building REST
APIs consumed by single-page applications. Most web application frameworks
have support for developing not only server-rendered HTML views, but also
for structuring REST APIs. In the Java ecosystem, REST interfaces have been
standardized [HS07]. In most other web languages such as (server-side) JavaScript
(Node.js), Ruby, Python, and PHP, frameworks employ domain-specific languages
or method annotations for minimizing the overhead of defining REST endpoints
(e.g., in Ruby on Rails, Django, .NET WCF, Grails, Express, and the Play
framework [WP11, The]). Static files of single-page applications are delivered from
a web server, the web application framework, or a content delivery network. The
REST APIs are consumed by the frontend that is technologically independent of
the backend and only requires knowledge about the REST resources to implement
client-server interactions. One notable exception is the idea of isomorphic (also
3 Besides
HTTP, real time-capable protocols like Web Sockets, Server-Sent Events (SSE), or
WebRTC can be employed [Gri13].
20
2 Latency in Cloud-Based Applications
called universal) JavaScript that applies the concept of sharing code (e.g., validation
of user inputs) between a frontend and backend that are both implemented in
JavaScript [HS16, Dep, Hoo, Par].
Database-as-a-Service (DBaaS) and Backend-as-a-Service (BaaS) models provide high-level abstractions for building and hosting two-tier applications. In the
case of a DBaaS, the data tier is directly exposed to clients. As this is insufficient if
protected business logic or access control are required, BaaS systems extend the data
APIs with common building blocks for business logic in single-page applications.
Typical BaaS APIs and functionalities consumed in two-tier applications are:
•
•
•
•
•
•
•
Delivery of static files, in particular, the single-page application assets
DBaaS APIs for access to structured data
Login and registration of users
Authorization on protected data
Execution of server-side business logic and invocation of third-party services
Sending of push notifications
Logging and tracking of user data
In Sect. 4.5 we will discuss the characteristics of the DBaaS and BaaS models in
detail.
As the frontend becomes more complex and handles the presentation logic
and significant parts of the business logic, appropriate tooling and architectures
gained relevance. Therefore, numerous JavaScript frameworks for developing and
structuring single-page applications have been developed. A large part of these
frameworks is based on the Model-View-Controller (MVC) pattern [KP+88] or
variants thereof (e.g., Model-View-ViewModel [Gos05]). In client-side MVC architectures, the views generate the document visible to the end user, usually by defining
a template language. The model contains the data displayed in the views, so that it
embodies both application state and user interface state. A model is filled with data
retrieved from the server’s data APIs. Controllers handle the interaction between
views and models (e.g., events from user inputs) and are responsible for clientserver communication. The MVC pattern has been adopted by most widely-used
JavaScript frameworks such as Angular [Ang], Ember [Emb], Vue [Vue], and
Backbone [Bac]. Recently, component-based architectures have been proposed as
an alternative to MVC frameworks through projects such as Facebook’s React [Rea].
Components represent views, but also encompass event handling and user interface
state.
In contrast to two-tier applications, any technological decisions made in the
frontend are largely independent of the backend, as a REST API is the only point of
coupling. Some frontend frameworks additionally offer server-side tooling to prerender client views. This can improve the performance of the initial page load and is
necessary for crawlers of search engines that do not evaluate JavaScript for indexing.
In native mobile applications, the same principles as for single-page applications
apply. A major architectural difference is that the frontend is compiled ahead-oftime so that its business and presentation logic can only be changed with an explicit
update of the app. Furthermore, static files are usually not provided by the backend,
2.3 Latency and Round-Trip Time
21
but packaged into an installable app bundle, which shifts the problem of initial load
times to both client-side performance and latency of the consumed server APIs.
2.2.3 Problems of Client-Rendered Architectures
Two-tier architectures can improve on several of the difficulties imposed by threetier architectures, while other non-functional requirements remain challenging:
High Availability and Elastic Scalability. The task of providing high availability with elastic scaling is shifted to the BaaS or DBaaS backend. As these systems
employ a standard architecture shared between all applications built on them,
availability and scalability can be tackled in a generic, application-independent
fashion. As a DBaaS/BaaS is a managed service, it can furthermore eliminate
availability and scalability problems introduced by operational errors such as
flawed deployments, inappropriate autoscaling rules, or incompatible versions.
Fast Page Loads. Navigation inside a single-page application is fast, as only
the missing data required for the next view is fetched, instead of reloading the
complete page. On the other hand, data requests become very latency-critical, as
the initial page load depends on data being available for client-side rendering. In
two-tier applications, the client can start its processing earlier as there is no initial
HTML request blocked in the server by database queries and business logic.
Engaging User Experience. Single-page applications are able to achieve a high
degree of interactivity, as much of the business logic can be directly executed on
the client. This allows applying updates immediately to remain under the critical
threshold of 100 ms for interaction delays.
Fast Time-to-Market. As the frontend and backend are loosely coupled through
a REST API and based on different technology stacks, the development process
is accelerated. The implementation of the frontend and backend can proceed
independently, enabling individual development, testing and deployment cycles
for a faster time-to-market.
In summary, many applications are moving towards client-rendered, two-tier
architectures, to improve the user experience and development process. This shift
reinforces the requirement for low latency, as data transferred from the server to the
client is critical for fast navigation and initial page loads.
2.3 Latency and Round-Trip Time
Two primary factors influence network performance: latency and bandwidth
[Gri13]. Latency refers to the time that passes from the moment a packet or signal
is sent from a source to the moment it is received by the destination. Bandwidth
refers to the throughput of data transfer for a network link. We will use the wide-
22
2 Latency in Cloud-Based Applications
spread term bandwidth (measured in Megabit per second; MBit/s) throughout this
book, though the formal term data rate (or transmission rate) is more precise,
as bandwidth in signal theory defines the difference between an upper and lower
frequency [Cha15].
Network packets sent from one host to another host travel through several routers
and are nested in different network protocols (e.g., Ethernet, IP, TCP, TLS, HTTP).
There are different delays at each hop that add up to the end-to-end latency [KR10]:
Processing Delay (dproc ). The time for parsing the protocol header information,
determining the destination of a packet and calculating checksums determines
the processing delay. In modern networks, the processing delay is in the order of
microseconds [Cha15].
Queuing Delay (dqueue ). Before a packet is sent over a physical network link, it
is added to a queue. Thus, the number of packets that arrived earlier defines for
how long a packet will be queued before transmission over the link. If queues
overflow, packets are dropped. This packet loss leads to increased latency as the
network protocols have to detect the loss and resend the packet.4
Transmission Delay (dtrans ). The transmission delay denotes the time for completely submitting a packet to the network link. Given the size of a packet S and
the link’s bandwidth, resp. transmission rate R, the transmission delay is S/R.
For example, to transfer a packet with S = 1500 B over a Gigabit Ethernet with
R = 1 Gb/s a transmission delay of dtrans = 12 µs is incurred.
Propagation Delay (dprop ). The physical medium of the network link, e.g., fiber
optics or copper wires, defines how long it takes to transfer the signal encoding
the packet to the next hop. Given the propagation speed of the medium in m/s
and the distance between two hops, the propagation delay can be calculated.
If a packet has to pass through N − 1 routers between the sender and receiver,
the end-to-end latency L is defined through the average processing, queuing,
transmission and propagation delays [KR10]:
L = N · (dproc + dqueue + dtrans + dprop )
(2.1)
Latency (also called one-way latency) is unidirectional, as it does not include the
time for a packet to travel back. Round-trip time (RTT) on the other hand, measures
the time from the source sending a request until receiving a response. RTT therefore
includes the latency in each direction and the processing time dserver required for
generating a response:
RT T = 2 · L + dserver
4 The
(2.2)
large buffer sizes can also lead to a problem called buffer bloat in which queues are always
operating at their maximum capacity. This is often caused by TCP congestion algorithms that
increase throughput until package loss occurs. With large queues, many packets can be buffered
and delayed before a packet loss occurs, which negatively impacts latency[APB09, Gri13].
2.4 Cloud Computing as a Source of Latency
23
In most cases, the propagation delay will play the key role in latency, as
networking infrastructure has improved many aspects of queuing, transmission,
and processing delay significantly. However, propagation delay depends on the
constant speed of light and the geographic distance between two hosts. For example,
the linear distance between Hamburg and San Francisco is 8879 km. Given an
ideal network without any delays except the propagation at the speed of light
(299,792,458 m/s), the minimum achievable latency is L ≈ 29.62 ms and roundtrip time RT T ≈ 59.23 ms. Therefore, to reduce end-to-end latency, distances have
to be shortened.
We will discuss the effects of network protocols such as HTTP and TLS on endto-end latency in Chap. 3. Grigorik [Gri13] gives an in-depth overview of latency
and network protocols specifically relevant for the web. Kurose and Ross [KR10]
as well as Forouzan [For12] discuss the foundations of computer networking. Van
Mieghem [VM14] provides a formal treatment of how networks can be modeled,
analyzed and simulated stochastically.
2.4 Cloud Computing as a Source of Latency
Besides the two-tier and three-tier architecture, there are numerous other ways to
structure applications [Feh+14]. Cloud computing is quickly becoming the major
backbone of novel technologies across application fields such as web and mobile
applications, Internet of Things (IoT), smart cities, virtual and augmented reality,
gaming, streaming, data science, and Big Data analytics. Cloud computing delivers
on the idea of utility computing introduced by John McCarthy in 1961 that
suggests that computing should be a ubiquitous utility similar to electricity and
telecommunications [AG17]. In the context of cloud computing, there are several
sources of latency across all types of application architectures. In this section, we
will summarize the architecture-independent latency bottlenecks that contribute to
the overall performance of cloud-based applications.
In the literature, cloud computing has been defined in various different ways
[LS13, YBDS08, MB16, Feh+14, Buy+09, MG09, TCB14]. Throughout this book,
we will use the widely accepted NIST definition [MG09]. It distinguishes between
five characteristics of cloud offerings and groups them into three service models
and four deployment models. The nature of the service and deployment models
motivates why latency is of utmost relevance in cloud computing.
2.4.1 Characteristics
The characteristics of cloud offerings explain how cloud computing is desirable for
both customers and providers. Providers offer on-demand self-service, which means
that consumers can provision services and resources in a fully automated process.
24
2 Latency in Cloud-Based Applications
Broad network access enables the cloud services to be consumed by any client
technology that has Internet access. Cloud providers apply resource pooling (multitenancy) to share storage, networking, and processing resources across tenants
to leverage economies of scale for reduced costs. Rapid elasticity demands that
resources can be freed and allocated with minimal delay, building the foundation
for scalability. The provider exposes a measured service that is used for pay-peruse pricing models with fine-grained control, monitoring and reporting of resource
usage to the consumer. In practice, the major reasons for companies to adopt cloud
computing is the ability to replace capital expenditures (CAPEX) that would have
been necessary to acquire hardware and software into operational expenditures
(OPEX) incurred by the usage of pay-per-use cloud services. The major incentive
for providers is the ability to exploit economies of scale and accommodate new
business models.
2.4.2 Service Models
Based on increasing degree of abstraction, three high-level service models can be
distinguished:
Infrastructure-as-a-Service (IaaS). In an IaaS cloud, low-level resources such
as computing (e.g., containers [Mer14] and virtual machines [Bar+03]), networking (e.g., subnets, load balancers, and firewalls [GJP11]) and storage
(e.g., network-attached storage) can be provisioned. This allows deploying
arbitrary applications in the cloud while leaving control of the infrastructure
to the IaaS provider. In IaaS clouds, latency is particularly relevant for crossnode communication, potentially across different data centers (e.g., between an
application server and a replicated database). Example offerings are Amazon
Elastic Compute Cloud (EC2) [Amab], Softlayer [Sof], Joyent [Joy], and Google
Compute Engine (GCE) [Gooa].
Platform-as-a-Service (PaaS). Consumers of PaaS clouds run applications on a
technology stack of services, programming languages, and application platforms
defined by the provider including explicit support for developing, testing, deploying and hosting the application. In addition to the infrastructure, a PaaS provider
also manages operating systems and networks. The role of latency in a PaaS is
critical: as there is no control over native computing and storage resources, data
management has to be consumed as a service either from the same provider or an
external DBaaS. Examples of PaaS vendors are Microsoft Azure [Azu], Amazon
Beanstalk [Aws], IBM Bluemix [Ibm], Google App Engine [App], and Heroku
[Clob].
Software-as-a-Service (SaaS). A SaaS provides a specific cloud-hosted application to users (e.g., email, word processors, spreadsheets, customer relationship
management, games, virtual desktops). The provider completely abstracts from
the cloud infrastructure and only allows customization and configuration of
2.4 Cloud Computing as a Source of Latency
25
the application. Almost all SaaS offerings are consumed as web applications
via HTTP, so that client-server latency is crucial for both initial loads and
performance of interactions. Examples include Microsoft Office 365 [Off],
Salesforce [Onl], and Slack [Sla].
Besides the above three models, other “XaaS” (Everything-as-a-Service) models
have been proposed, for example, Storage-as-a-Service, Humans-as-a-Service, and
Function-as-a-Service amongst many others [KLAR10, Din+13, TCB14, MB16,
Has+15]. Database-as-a-Service (DBaaS) and Backend-as-a-Service (BaaS) as
discussed in Sect. 4.5 cut across the three canonical levels of IaaS, PaaS, and SaaS
and can be employed in each of the models.
2.4.3 Deployment Models
Deployment models describe different options for delivering and hosting cloud
platforms.
Public Cloud. A public cloud is operated by a business, academic, or government
organization on its infrastructure and can be used by the general public.
Commercial cloud offerings such as Amazon EC2, Google App Engine, and
Salesforce fall in this category. In public clouds, latency to users and third-party
services is critical for performance.
Private Cloud. A private cloud provides exclusive use for one organization and
is hosted on-premises of the consumer. This implies that the hardware resources
are mostly static and in order to gain elasticity, public cloud resources may be
added on demand, e.g., during load spikes (cloud bursting [Guo+12]). Besides
commercial solutions such as VMWare vCloud [Cloc], various open-source
platforms for private PaaS and IaaS clouds have been developed, including
OpenStack [BWA13], Eucalyptus [Nur+09], and Cloud Foundry [Cloa]. As
private clouds usually cannot exploit a globally distributed set of data centers,
tackling wide-area latency to end users is a key challenge.
Hybrid Cloud. In a hybrid cloud (also called multi-cloud deployment), two or
more clouds are composed to combine their benefits. There are frameworks for
addressing multiple clouds through a common API, e.g., jclouds [Apaa] and
libCloud [Apab] as well as commercial providers for multi-cloud deployments,
scaling and bursting such as RightScale [Rig], Scalr [Sca], and Skytap [Sky]. Any
communication between different cloud platforms is highly latency-sensitive.
When offloading critical components like data storage to a different cloud,
incurred latency can be prohibitive and outweigh the advantages. On the other
hand, if data management makes use of the broader geographic reach of multiple
providers through caching or replication [WM13], latency can be reduced
substantially as we will show in the next chapters.
26
2 Latency in Cloud-Based Applications
The NIST definition [MG09] also defines a community cloud, as a cloud shared
between organizations with common concerns. Though the model is not in common
use, the same latency challenges apply: composed backends and remote users are
subject to latency bottlenecks.
2.4.4 Latency in Cloud Architectures
In cloud-based applications, latency stems from various sources introduced by the
composition of different service and deployment models. We group the latency into
three categories:
1. Round-trip times within a data center network or LAN are usually in the order
of single-digit milliseconds.
2. Latencies between two co-located data centers are in the order of 10 ms.
3. For hosts from two different geographical locations, latency often reaches 100 ms
and more.
Figure 2.3 illustrates the typical latency contributions of several communication
links within a distributed web application. In the example, the client is separated
from the backend by a high-latency wide area network (WAN) link. The application’s business logic is hosted on an IaaS platform and distributed across multiple
servers interconnected via local networks. The data tier consists of a database
service replicated across different availability zones. For a synchronously replicated
database system, the latency between two data centers therefore defines the response
time for database updates (for example in the Amazon RDS database service
[Ver+17]).
Most complex applications integrate heterogeneous services for different functions of the application. For example, an external DBaaS might be consumed from
the main application over a high-latency network, since it is shared between two
different applications or provides a level of scalability that a database hosted on
an IaaS could not provide. Parts of the application might also be developed with
a service model that fits the requirements better, for example by offloading user
authentication to microservices running on a PaaS. A BaaS could be integrated to
handle standard functions such as push notifications. High latency also occurs if
third-party services are integrated, for example, a social network in the frontend
or a SaaS for payments in the backend. Overall, the more providers and services
are involved in the application architecture, the higher the dependency on low
latency for performance. As almost all interactions between services evolve around
exchanging and loading data, the techniques proposed in this book apply to the
latency problems in the example.
For further details on cloud models, please refer to Murugesan and Bojanova
[MB16], who provide a detailed overview of cloud computing and its foundational
concepts and technologies. Bahga and Madisetti [BM13] review the programming
models and APIs of different cloud platforms.
References
27
Fig. 2.3 Potential sources of latency in distributed, cloud-based applications
In the following chapters, we will provide detailed background on network
performance and the state of the art in data management to highlight the different
opportunities for tackling latency across the application stack.
References
[AG17] Nick Antonopoulos and Lee Gillam, eds. Cloud Computing: Principles, Systems and
Applications (Computer Communications and Networks). 2nd ed. 2017. Springer, July
2017. ISBN: 9783319546445. URL: http://amazon.com/o/ASIN/3319546449/.
[Alo+04] Gustavo Alonso et al. “Web services”. In: Web Services. Springer, 2004, pp. 123–149.
28
2 Latency in Cloud-Based Applications
[ALS10] J. Chris Anderson, Jan Lehnardt, and Noah Slater. CouchDB - The Definitive Guide:
Time to Relax. O’Reilly, 2010. ISBN: 978-0-596-15589-6. URL: http://www.oreilly.de/
catalog/9780596155896/index.html.
[Amab] Amazon Web Services AWS â Server Hosting & Cloud Services. https://aws.amazon.
com/de/. (Accessed on 05/20/2017). 2017.
[Ang] Angular Framework. https://angulario/. (Accessed on 05/26/2017). 2017.
[Apaa] Apache jclouds. https://jclouds.apache.org/. (Accessed on 06/05/2017). 2017.
[Apab] Apache Libcloud. http://libcloud.apache.org/index.html. (Accessed on 06/05/2017).
2017.
[APB09] Mark Allman, Vern Paxson, and Ethan Blanton. TCP congestion control. Tech. rep.
2009.
[App] App Engine (Google Cloud Platform). https://cloud.google.com/appengine/.
(Accessed on 05/20/2017). 2017.
[Arc] HTTP Archive. http://httparchive.org/trends.php. Accessed: 2018-07-14. 2018.
[Aws] AWS Elastic Beanstalk - PaaS Application Management. https://aws.amazon.com/de/
elasticbeanstalk/. (Accessed on 05/20/2017). 2017.
[Azu] Microsoft Azure: Cloud Computing Platform & Services. https://azure.microsoft.com/
en-us/. (Accessed on 05/20/2017). 2017.
[Bac] Backbone.js. http://backbonejs.org/. (Accessed on 05/26/2017). 2017.
[Bar+03] P. Barham et al. “Xen and the art of virtualization”. In: ACM SIGOPS Operating
Systems Review. Vol. 37. 2003, pp. 164–177. URL: http://dl.acm.org/citation.cfm?id=
945462%7C (visited on 10/09/2012).
[Bec00] Kent Beck. Extreme programming explained: embrace change. addison-wesley professional, 2000.
[BM13] Arshdeep Bahga and Vijay Madisetti. Cloud Computing: A Hands-on Approach.
CreateSpace Independent Publishing Platform, 2013.
[Buy+09] R. Buyya et al. “Cloud computing and emerging IT platforms: Vision, hype, and
reality for delivering computing as the 5th utility”. In: Future Generation computer
systems 25.6 (2009), pp. 599–616. URL: http://www.sciencedirect.com/science/article/
pii/S0167739X08001957 (visited on 06/29/2012).
[BWA13] Meenakshi Bist, Manoj Wariya, and Amit Agarwal. “Comparing delta, open stack
and Xen Cloud Platforms: A survey on open source IaaS”. In: Advance Computing
Conference (IACC), 2013 IEEE 3rd International. IEEE. 2013, pp. 96–100.
[Cal+11] Brad Calder et al. “Windows Azure Storage: a highly available cloud storage service
with strong consistency”. In: Proceedings of the Twenty-Third ACM Symposium on
Operating Systems Principles. ACM. ACM, 2011, pp. 143–157. URL: http://dl.acm.
org/citation.cfm?id=2043571 (visited on 04/16/2014).
[CD13] Kristina Chodorow and Michael Dirolf. MongoDB - The Definitive Guide.
O’Reilly, 2013. ISBN: 978-1-449-38156-1. URL: http://www.oreilly.de/catalog/
9781449381561/index.html.
[Cha15] Lee Chao. Cloud Computing Networking: Theory, Practice, and Development.
Auerbach Publications, 2015. URL: https://www.amazon.com/Cloud-ComputingNetworking-Practice-Development-ebook/dp/B015PNEOGC?SubscriptionId=
0JYN1NVW651KCA56C102&tag=techkie-20&linkCode=xm2&camp=2025&
creative=165953&creativeASIN=B015PNEOGC.
[Cloa] Cloud Application Platform - Devops Platform | Cloud Foundry. https://www.
cloudfoundry.org/. (Accessed on 06/05/2017). 2017.
[Clob] Cloud Application Platform | Heroku. https://www.heroku.com/. (Accessed on
05/20/2017). 2017.
[Cloc] vCloud Suite, vSphere-Based Private Cloud: VMware. http://www.vmware.com/
products/vcloud-suite.html. (Accessed on 06/05/2017). 2017.
[Cro06] Douglas Crockford. “JSON: Javascript object notation”. In: URL http://www.json.org
(2006).
References
29
[Dep] Deployd: a toolkit for building realtime APIs. https://github.com/deployd/deployd.
(Accessed on 05/20/2017). 2017. URL: https://github.com/deployd/deployd (visited
on 02/19/2017).
[Din+13] Hoang T Dinh et al. “A survey of mobile cloud computing: architecture, applications,
and approaches”. In: Wireless communications and mobile computing 13.18 (2013),
pp. 1587–1611.
[Emb] Ember.js Framework. https://www.emberjs.com/. (Accessed on 05/26/2017). 2017.
[Feh+14] Christoph Fehling et al. Cloud Computing Patterns - Fundamentals to Design, Build,
and Manage Cloud Applications. Springer, 2014. ISBN: 978-3-7091-1567-1. DOI:
10.1007/978-3-7091-1568-8.
[For12] A Behrouz Forouzan. Data communications & networking. Tata McGraw-Hill Education, 2012.
[GJP11] K. Gilly, C. Juiz, and R. Puigjaner. “An up-to-date survey in web load balancing”. In:
World Wide Web 14.2 (2011), pp. 105–131. URL: http://www.springerlink.com/index/
P1080033328U8158.pdf (visited on 09/12/2012).
[Gooa] Google Cloud Computing, Hosting Services & APIs – Google Cloud Platform. https://
cloud.google.com/. (Accessed on 05/20/2017). 2017.
[Gos05] John Gossmann. Introduction to Model/View/ViewModel pattern for building
WPF apps. https://blogs.msdn.microsoft.com/johngossman/2005/10/08/introductionto-modelviewviewmodel-pattern-for-building-wpf-apps/. (Accessed on 05/26/2017).
Aug. 2005.
[Gra] GraphQL. https://facebook.github.io/graphql/. (Accessed on 05/25/2017). 2017.
[Gri13] Ilya Grigorik. High performance browser networking. English. [S.l.]: O’Reilly Media,
2013. ISBN: 1-4493-4476-3 978-1-4493-4476-4. URL: https://books.google.de/books?
id=tf-AAAAQBAJ.
[Guo+12] Tian Guo et al. “Seagull: Intelligent Cloud Bursting for Enterprise Applications”. In:
2012 USENIX Annual Technical Conference, Boston, MA, USA, June 13–15, 2012.
Ed. by Gernot Heiser and Wilson C. Hsieh. USENIX Association, 2012, pp. 361–366.
URL : https://www.usenix.org/conference/atc12/technical-sessions/presentation/guo.
[Has+15] Ibrahim Abaker Targio Hashem et al. “The rise of “big data” on cloud computing:
Review and open research issues”. In: Inf. Syst. 47 (2015), pp. 98–115. DOI:
10.1016/j.is.2014.07.006.
[HDF13] Kai Hwang, Jack Dongarra, and Geoffrey C Fox. Distributed and cloud computing:
from parallel processing to the internet of things. Morgan Kaufmann, 2013.
[Hil16] Tony Hillerson. Seven Mobile Apps in Seven Weeks: Native Apps, Multiple
Platforms. Pragmatic Bookshelf, 2016. URL: https://www.amazon.com/SevenMobile-Apps-Weeks-Platforms-ebook/dp/B01L9W8AQS?SubscriptionId=
0JYN1NVW651KCA56C102&tag=techkie-20&linkCode=xm2&camp=2025&
creative=165953&creativeASIN=B01L9W8AQS.
[Hoo] GitHub - hoodiehq/hoodie: A backend for Offline First applications. https://github.
com/hoodiehq/hoodie. (Accessed on 05/25/2017). 2017. URL: https://github.com/
hoodiehq/hoodie (visited on 02/17/2017).
[HS07] Marc Hadley and P Sandoz. “JSR 311: Java api for RESTful web services”. In:
Technical report, Java Community Process (2007).
[HS16] Stephan Hochhaus and Manuel Schoebel. Meteor in action. Manning Publ., 2016.
[HW03] Gregor Hohpe and Bobby Woolf. “Enterprise Integration Pattern”. In: Addison-Wesley
Signature Series (2003).
[Ibm] IBM Bluemix â Cloud-Infrastruktur, Plattformservices, Watson, & weitere
PaaS-Lösungen. https://www.ibm.com/cloud-computing/bluemix. (Accessed on
05/20/2017). 2017.
[Joy] Joyent | Triton. https://www.joyent.com/. (Accessed on 06/05/2017). 2017.
[Kam17] Poul-Henning Kamp. Varnish HTTP Cache. https://varnishcache.org/. (Accessed on
04/30/2017). 2017. URL: https://varnish-cache.org/ (visited on 01/26/2017).
30
2 Latency in Cloud-Based Applications
[KLAR10] Heba Kurdi, Maozhen Li, and HS Al-Raweshidy. “Taxonomy of Grid Systems”.
In: Handbook of research on P2P and grid systems for service-oriented computing:
Models, Methodologies and Applications. IGI Global, 2010, pp. 20–43.
[Kle17] Martin Kleppmann. Designing Data-Intensive Applications. English. 1 edition.
O’Reilly Media, Jan. 2017. ISBN: 978-1-4493-7332-0.
[KP+88] Glenn E Krasner, Stephen T Pope, et al. “A description of the modelview- controller
user interface paradigm in the smalltalk-80 system”. In: Journal of object oriented
programming 1.3 (1988), pp. 26–49.
[KR10] James F Kurose and Keith W Ross. Computer networking: a top-down approach. Vol.
5. Addison-Wesley Reading, 2010.
[Kri15] Michael Krigsman. Research: 25 percent of web projects fail. http://www.zdnet.com/
article/research-25-percentof-web-projects-fail/. (Accessed on 04/30/2017). 2015.
URL : http://www.zdnet.com/article/research-25-percent-of-web-projects-fail/.
[Lak+16] Sarath Lakshman et al. “Nitro: A Fast, Scalable In-Memory Storage Engine for
NoSQL Global Secondary Index”. In: PVLDB 9.13 (2016), pp. 1413–1424. URL:
http://www.vldb.org/pvldb/vol9/p1413-lakshman.pdf.
[LS13] Wolfgang Lehner and Kai-Uwe Sattler. Web-Scale Data Management for the Cloud.
Englisch. Auflage: 2013. New York: Springer, Apr. 2013. ISBN: 978-1-4614-6855-4.
[Mai90] David Maier. “Representing database programs as objects”. In: Advances in database
programming languages. ACM. 1990, pp. 377–386.
[MB16] San Murugesan and Irena Bojanova. Encyclopedia of Cloud Computing. John Wiley
& Sons, 2016.
[Mer14] Dirk Merkel. “Docker: lightweight linux containers for consistent development and
deployment”. In: Linux Journal 2014.239 (2014), p. 2.
[MG09] Peter Mell and Tim Grance. “The NIST definition of cloud computing”. In: National
Institute of Standards and Technology 53.6 (2009), p. 50.
[Mil68] Robert B Miller. “Response time in man-computer conversational transactions”. In:
Proceedings of the December 9–11, 1968, fall joint computer conference, part I. ACM.
1968, pp. 267–277.
[MP14] M Mikowski and J Powell. Single Page Applications. 2014.
[Mye85] Brad A Myers. “The importance of percent-done progress indicators for computerhuman interfaces”. In: ACM SIGCHI Bulletin. Vol. 16. 4. ACM. 1985, pp. 11–17.
[Nad+16] Irakli Nadareishvili et al. Microservice Architecture: Aligning Principles, Practices,
and Culture. “O’Reilly Media, Inc.”, 2016.
[New15] Sam Newman. Building microservices - designing fine-grained systems, 1st Edition. O’Reilly, 2015. ISBN: 9781491950357. URL: http://www.worldcat.org/oclc/
904463848.
[Nie94] Jakob Nielsen. Usability engineering. Elsevier, 1994.
[Nur+09] Daniel Nurmi et al. “The eucalyptus open-source cloud-computing system”. In: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing
and the Grid. IEEE Computer Society. 2009, pp. 124–131.
[Off] Office 365 for Business. https://products.office.com/enus/business/office. (Accessed
on 06/05/2017). 2017.
[Onl] Salesforce Online CRM. https://www.salesforce.com/en. (Accessed on 06/05/2017).
2017.
[Par] Parse Server. http://parseplatform.github.io/docs/parse-server/guide/. (Accessed on
07/28/2017). 2017. URL: http://parseplatform.github.io/docs/parse-server/guide/ (visited on 02/19/2017).
[Qia+13] Lin Qiao et al. “On brewing fresh espresso: LinkedIn’s distributed data serving
platform”. In: Proceedings of the 2013 international conference on Management of
data. ACM, 2013, pp. 1135–1146. URL: http://dl.acm.org/citation.cfm?id=2465298
(visited on 09/28/2014).
[Rea] React - A JavaScript library for building user interfaces. https://facebook.github.io/
react/. (Accessed on 05/26/2017). 2017.
References
31
[Ree08] Will Reese. “Nginx: the high-performance web server and reverse proxy”. In: Linux
Journal 2008.173 (2008), p. 2.
[Rig] RightScale Cloud Management. http://www.rightscale.com/. (Accessed on
06/05/2017). 2017.
[SB02] Ken Schwaber and Mike Beedle. Agile software development with Scrum. Vol. 1.
Prentice Hall Upper Saddle River, 2002.
[Sca] Scalr: Enterprise-Grade Cloud Management Platform. https://www.scalr.com/.
(Accessed on 06/05/2017). 2017.
[Sky] Skytap. https://www.skytap.com/. (Accessed on 06/05/2017). 2017.
[Sla] Slack. https://slack.com/. (Accessed on 06/05/2017). 2017.
[Sof] SoftLayer | Cloud Servers, Storage, Big Data, & More IAAS Solutions. http://www.
softlayer.com/. (Accessed on 06/05/2017). 2017.
[TCB14] Adel Nadjaran Toosi, Rodrigo N Calheiros, and Rajkumar Buyya. “Interconnected
cloud computing environments: Challenges, taxonomy, and survey”. In: ACM Computing Surveys (CSUR) 47.1 (2014), p. 7.
[The] Django Web Framework. https://www.djangoproject.com/. (Accessed on 05/20/2017).
2017.
[TS07] Andrew S. Tanenbaum and Maarten van Steen. Distributed systems - principles and
paradigms, 2nd Edition. Pearson Education, 2007. ISBN: 978-0-13-239227-3.
[Ver+17] Alexandre Verbitski et al. “Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases”. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL,
USA, May 14–19, 2017. Ed. by Semih Salihoglu et al. ACM, 2017, pp. 1041–1052.
DOI: 10.1145/3035918.3056101.
[VM14] Piet Van Mieghem. Performance analysis of complex networks and systems.
Cambridge University Press, 2014. URL: http://books.google.de/books?hl=de&
lr=&id=lc3aWG0rL_MC&oi=fnd&pg=PR11&dq=mieghem+performance&ots=
ohyJ3Qz2Lz&sig=1MOrNY0vHG-D4pDsf_DygD_3vDY (visited on 10/03/2014).
[Vue] Vue.js. https://vuejs.org/. (Accessed on 05/26/2017). 2017.
[Wal14] Craig Walls. Spring in Action: Covers Spring 4. Manning Publications, 2014.
ISBN : 161729120X. URL : https://www.amazon.com/Spring-Action-Covers-4/dp/
161729120X?SubscriptionId=0JYN1NVW651KCA56C102&tag=techkie-20&
linkCode=xm2&camp=2025&creative=165953&creativeASIN=161729120X.
[WGW+20] Wolfram Wingerath, Felix Gessert, ErikWitt, et al. “Speed Kit: A Polyglot & GDPRCompliant Approach For Caching Personalized Content”. In: 36th IEEE International
Conference on Data Engineering, ICDE 2020, Dallas, Texas, April 20–24, 2020.
2020.
[Wil] WildFly Homepage Âů WildFly. http://wildfly.org/. (Accessed on 05/20/2017). 2017.
[WJW15] Da Wang, Gauri Joshi, and Gregory Wornell. “Using straggler replication to reduce
latency in large-scale parallel computing”. In: ACM SIGMETRICS Performance
Evaluation Review 43.3 (2015), pp. 7–11.
[WM13] Zhe Wu and Harsha V. Madhyastha. “Understanding the latency benefits of multicloud webservice deployments”. In: Computer Communication Review 43.2 (2013),
pp. 13–20. DOI: 10.1145/2479957.2479960.
[WP11] Erik Wilde and Cesare Pautasso. REST: from research to practice. Springer Science
& Business Media, 2011.
[YBDS08] Lamia Youseff, Maria Butrico, and Dilma Da Silva. “Toward a unified ontology
of cloud computing”. In: Grid Computing Environments Workshop, 2008. GCE’08.
IEEE. 2008, pp. 1–10.
Chapter 3
HTTP for Globally Distributed
Applications
For any distributed application, the network plays a significant role for performance.
In the web, the central protocol is HTTP (Hypertext Transfer Protocol) [Fie+99]
that determines how browsers communicate with web servers and that is used as the
basis for REST APIs (Representational State Transfer). For cloud services across
different deployment and service models, REST APIs are the default interface
for providing access to storage and compute resources, as well as high-level
services. Most DBaaS, BaaS, and NoSQL systems provide native REST APIs to
achieve a high degree of interoperability and to allow access from heterogeneous
environments. This chapter reviews relevant foundations of HTTP and networking
with respect to performance and latency, as well as their role in cloud data
management. In particular, we will highlight which challenges the standardized
behavior of the web caching infrastructure imposes for data-centric services.
3.1 HTTP and the REST Architectural Style
The REST architectural style was proposed by Fielding as an a-posteriori explanation for the success of the web [Fie00]. REST is a set of constraints that—
when imposed on a protocol design—yield the beneficial system properties of
scalability and simplicity the designers of the HTTP standard developed for the
web [Fie+99]. Most services in cloud computing environments are exposed as
REST/HTTP1 services, as they are simple to understand and consume in any
programming language and environment [DFR15b]. Another advantage of HTTP
is its support by mature and well-researched web infrastructure. REST and HTTP
1 In
principle, the REST architectural style is independent of its underlying protocol. However,
as HTTP dominates in practical implementations, we will refer to REST as its combination with
HTTP [WP11].
© Springer Nature Switzerland AG 2020
F. Gessert et al., Fast and Scalable Cloud Data Management,
https://doi.org/10.1007/978-3-030-43506-6_3
33
34
3 HTTP for Globally Distributed Applications
are not only the default for web and mobile applications but also an alternative to
backend-side RPC-based (Remote Procedure Call) approaches (e.g., XML RPC or
Java RMI [Dow98]), binary wire protocols (e.g., PostgreSQL protocol [Pos]) and
web services (the SOAP and WS-* standards family [Alo+04]).
HTTP is an application-layer protocol on top of the Transmission Control
Protocol (TCP) [Pos81] and lays the foundation of the web. With REST, the key
abstractions of interactions are represented by HTTP resources identified by URLs.
In a DBaaS API, these resources could for example be queries, transactions, objects,
schemas, and settings. Clients interact with these resources through the uniform
interface of the HTTP methods GET, PUT, POST, and DELETE. Any interface is
thus represented as a set of resources that can be accessed through HTTP methods.
Methods have different semantics: GET requests are called safe, as they are free
of side-effects (nullipotent). PUT and DELETE requests are idempotent, while
POST requests may have side-effects that are non-idempotent. The actual data
(e.g., database objects) can take the form of any standard content type which can
dynamically be negotiated between the client and server through HTTP (content
negotiation). Many REST APIs have default representations in JSON, but other
formats (e.g., XML, text, images) are possible, too. This extensibility of REST APIs
allows services to present responses in a format that is appropriate for the respective
use case [RAR13].
The integration and connection of resources is achieved through hypermedia, i.e.,
the mutual referencing of resources [Amu17]. These references are similar to links
on web pages. A resource for a query result could for instance have references to the
objects matching the query predicate. Hypermedia can render a REST interface selfdescriptive. In that case, an initial URL to a root resource is sufficient to explore the
complete interface by following references and interpreting self-describing standard
media types. HTTP is a request-response protocol, which means that the client has
to pose a request to receive a response. For the server to proactively push data, other
protocols are required.
The constraints of REST describe common patterns to achieve scalability
[Fie00]. In the context of cloud services, these constraints are:
Client-Server. There is a clear distinction between a client (e.g., a browser
or mobile device) and the server (e.g., a cloud service or web server) that
communicate with each other using a client-initiated request-response pattern
[Fie+99].
Statelessness. If servers are stateless, requests can be load-balanced, and servers
may be replicated for horizontal scalability.
Caching. Using caching, responses can be reused for future requests by serving
them from intermediate web caches.
Uniform Interface. All interactions are performed using four basic HTTP methods to create, read, update, and delete (CRUD) resources.
Layered System. Involving intermediaries (e.g., web caches, load balancers,
firewalls, and proxies) in the communication path results in a layered system.
3.2 Latency on the Web: TCP, TLS and Network Optimizations
35
In practice, many REST APIs do not adhere to all constraints and are often
referred to as web APIs, i.e., custom programming interfaces using HTTP as a
transport protocol [RAR13]. For example, the Parse BaaS uses POST methods
to perform idempotent operations and GET requests for operations with sideeffects [Par]. As a consequence, such web APIs are potentially unscalable and
may be treated incorrectly by intermediaries. Unlike web services, REST does
not require interface descriptions and service discovery. However, the OpenAPI
initiative is an attempt to standardize the description of REST APIs and allowing
code generation for programming languages [Ope]. Richardson et al. [RAR13],
Allamaraju [All10], Amundsen [Amu17], and Webber et al. [WPR10] provide a
comprehensive treatment of REST and HTTP.
The challenge for data management is to devise a REST API that leverages HTTP
for scalability through statelessness and caching and that is generic enough to be
applicable to a broad spectrum of database systems. To this end, a resource structure
for different functional capabilities is required (e.g., queries and transactions)
as well as system-independent mechanisms for stateless request processing and
caching of reads and queries.
3.2 Latency on the Web: TCP, TLS and Network
Optimizations
For interoperability reasons, REST APIs are the predominant type of interface in
cloud data management. HTTP on the other hand has to be used by any website. The
performance and latency of HTTP communication are determined by the protocols
that are involved during each HTTP request.
Figure 3.1 shows the latency components of a single HTTP request illustrated
with exemplary delays:
1. First, the URL’s domain (e.g., example.com) is resolved to an IP address using
a UDP-based DNS lookup. To this end, the client contacts a configured DNS
resolver. If the DNS entry is uncached, the resolver will contact a root DNS
server that redirects to a DNS server responsible for the top-level domain (e.g.,
for .com). That name server will in turn redirect to the authoritative name server
registered by the owner of the domain. This name server then returns one or
Fig. 3.1 Latency components across network protocols of an HTTP request against a TLS-secured
URL
36
2.
3.
4.
5.
3 HTTP for Globally Distributed Applications
multiple IP addresses for the requested host name. Depending on the location
of the (potentially geo-redundant) DNS servers and the state of their caches, a
typical DNS query will return in 10–100 ms. Like in HTTP, DNS caching is
based on TTLs with its associated staleness problems [TW11].
Next, a TCP connection between the client and the server is established
using a three-way handshake. In the first round-trip, connection parameters are
negotiated (SYN, SYN-ACK packets). In the second round-trip, the client can
send the first portion of the payload. There is ongoing research on TCP fast open
[Che+14], a mechanism to avoid one round-trip by sending data in the first SYN
packet.
If the server supports and requires end-to-end encryption through HTTPS,
respectively TLS (Transport Layer Security), a TLS handshake needs to be
performed [Gri13]. This requires two additional round-trips during which the
server’s certificate is checked, session keys are exchanged, and a cipher suite
for encryption and signatures is negotiated. TLS protocol extensions have been
specified to allow data transmission during half-open TLS connections to reduce
TLS overhead to one round-trip (TLS false start). Alternatively, clients can reuse
previous session parameters negotiated with the same server, to abbreviate the
handshake (TLS session resumption).2
When the connection is established, the client sends an HTTP request that
consists of an HTTP method, a URL, the protocol version as well as HTTP
headers encoding additional information like the desired content type and
supported compression algorithms. The server processes the requests and either
fully assembles the response or starts transmitting it, as soon as data is available
(chunked encoding). The delay to the moment where the client receives the first
response bytes is referred to as time-to-first-byte (TTFB).
Even though the connection is fully established, the response cannot necessarily
be transmitted in a single round-trip, but requires multiple iterations for the content download. TCP employs a slow-start algorithm that continuously increases
the transmission rate until the full aggregate capacity of all involved hops is
saturated without packet loss and congestion [Mat+97]. Numerous congestion
control algorithms have been proposed, most of which rely on packet loss as
an indicator of network congestion [KHR02, WDM01]. For large responses,
multiple round-trips are therefore required to transfer data over a newly opened
connection, until TCP’s congestion window is sufficiently sized.3 Increasing
the initial TCP congestion window from 4 to 10 segments is ongoing work
2 Furthermore,
the QUIC (Quick UDP Internet Connections) protocol has been proposed as UDPbased alternative to HTTP that has no connection handshake overhead [Gri13]. A new TLS
protocol version with no additional handshakes has also been proposed [Res17].
3 The relationship between latency and potential data rate is called the bandwidth-delay product
[Gri13]. For a given round-trip latency (delay), the effective data rate (bandwidth) is computed as
the maximum amount of data that can be transferred (product) divided by the delay. For example,
if the current TCP congestion window is 16 KB and the latency 100 ms, the maximum data rate is
1.31 MBit/s.
3.2 Latency on the Web: TCP, TLS and Network Optimizations
37
[Chu+13] and allows for typically 10·1500 B = 15 KB of data transmitted with a
single round-trip, given the maximum transmission unit of 1500 B of an Ethernet
network.
In the best case and with all optimizations applied, an HTTP request over a new
connection can hence be performed with one DNS round-trip and two server roundtrips. DNS requests are aggressively cached, as IPs for DNS names are considered
stable. The DNS overhead is therefore often minimal and can additionally be tackled
by geo-replicated DNS servers that serve requests to nearby users (DNS anycast).
To minimize the impact of TCP and TLS handshakes, clients keep connections open
for reuse in future requests, which is an indispensable optimization, in particular for
request-heavy websites.
The current protocol version 2 of HTTP [IET15] maintains the semantics of
the original HTTP standard [KR01] but improves many networking inefficiencies.
Some optimizations are inherent, while others require active support by clouds
services:
• Multiplexing all requests over one TCP connection avoids the overhead of
multiple connection handshakes and circumvents head-of-line blocking.4
• Header Compression applies compression to HTTP metadata to minimize the
impact of repetitive patterns (e.g., always requesting JSON as a format).
• If a server implements Server Push, resources can be sent to the client proactively whenever the server assumes that they will be requested. This requires
explicit support by cloud services, as the semantics and usage patterns define,
which content should be pushed to reduce round-trips. However, inadequate use
of pushed resources hurts performance, as the browser cache is rendered useless.
• By defining dependencies between resources, the server can actively prioritize
important requests.
As of 2017, still less than 20% of websites and APIs employ HTTP/2 [Usa].
When all above protocols are in optimal use, the remaining latency bottleneck is the
round-trip latency between API and browser clients and the server answering HTTP
requests.
In mobile networks, the impact of HTTP request latency is even more severe.
Additional latency is caused by the mobile network infrastructure. With the older
2G and 3G mobile network standards, latencies between 100 ms (HSPA) and 750 ms
(GPRS) are common [Gri13, Ch. 7]. With modern 4G LTE-Advanced (Long Term
Evolution) networks, the standards prescribe strict latency bounds for better user
experience. As mobile devices share radio frequencies for data transmission, access
has to be mediated and multiplexed. This process is performed by a radio resource
controller (RRC) located in the radio towers of the LTE cells that together comprise
the radio access network (RAN). At the physical level, several latency-critical steps
are involved in a request by a mobile device connected via a 4G network:
4 Head-of-line
blocking occurs when a request is scheduled, but no open connection can be used,
as responses have not yet been received.
38
3 HTTP for Globally Distributed Applications
1. When a mobile device sends or receives data and was previously idle, it negotiates physical transmission parameters with the RRC. The standard prescribes
that this control-plane latency must not exceed 100 ms [DPS13].
2. Any packet transferred from the mobile device to the radio tower must have a
user-plane latency of below 5 ms.
3. Next, the carrier transfers the packet from the radio tower to a packet gateway
connected to the public Internet. This core network latency is not bounded.
4. Starting from the packet gateway, normal Internet routing with variable latency
is performed.
Thus, in modern mobile networks, one-way latency will be at least 5–105 ms
higher than in conventional networks. The additional latency is incurred for
each HTTP request and each TCP/TLS connection handshake, making latency
particularly critical for mobile websites and apps.
In summary, to achieve low latency for REST and HTTP, many network
parameters have to be explicitly optimized at the level of protocol parameters,
operating systems, network stacks, and servers [Gri13]. In-depth engineering details
of TCP/IP, DNS, HTTP, TLS, and mobile networking are provided by Grigorik
[Gri13], Kurose and Ross [KR10], and Tanenbaum [TW11]. However, with all
techniques and best practices applied, physical latency from the client to the server
remains the main bottleneck, as well as the time-to-first-byte caused by processing
in the backend. Both latency contributions can be addressed through caching.
3.3 Web Caching for Scalability and Low Latency
HTTP allows resources to be declared cacheable. They are considered fresh
for a statically assigned lifetime called time-to-live (TTL). Any cache in the
request/response chain between client and server will serve a cached object without
contacting the origin server. The HTTP caching model’s update strategy is purely
expiration-based: once a TTL has been delivered, the respective resource cannot
be invalidated before the TTL has expired. In the literature, expiration-based
caching is also known as the lease model [How+88, Mog94, Vak06] and has
been proposed by Gray et al. [GC89] long before HTTP. In contrast, invalidationbased caches use out-of-band protocols to receive notifications about URLs that
should be purged from the cache (e.g., non-standardized HTTP methods or separate
purging protocols). This model is in wide use for many non-HTTP caches, too
[Car13, ERR11, Lwe10, Bla+10, Lab+09]. As the literature is lacking a survey of
web caching in the light of data management, we give a concise overview of web
cache types, scalability mechanisms, and consistency aspects of expiration-based
and invalidation-based HTTP caching.
3.3 Web Caching for Scalability and Low Latency
39
Fig. 3.2 Different types of web caches distinguished by their location. Caches 1–3 are expirationbased, while caches 4–6 are invalidation-based
3.3.1 Types of Web Caches
The closer a web cache is to the network edge, the more the network latency
decreases. We distinguish between six types of web caches, based on their network
location as shown in Fig. 3.2 (cf. [Lab+09, Nag04]):
Client Cache. A cache can be directly embedded in the application as part of the
browser, mobile app, or an HTTP library [Fie+99]. Client caches have the lowest
latency, but are not shared between clients and rather limited in size.
Forward Proxy Cache. Forward proxy caches are placed in networks as shared
web caches for all clients in that network. Being very close to the application,
they achieve a substantial decrease in network latency. Forward proxy caches can
either be configured as explicit proxies by providing configuration information to
clients through protocols such as PAC and WPAD [Gou+02] or by transparently
intercepting outgoing, unencrypted TCP connections.
Web Proxy Cache. Internet Service Providers (ISPs) deploy web proxy caches in
their networks. Besides accelerating HTTP traffic for end users, this also reduces
transit fees at Internet exchange points. Like client and forward proxy caches,
web proxy caches are purely expiration-based.
Content Delivery Network (CDN) Cache. CDNs provide a distributed network
of web caches that can be controlled by the backend [PB07]. CDN caches are
designed to be scalable and multi-tenant and can store massive amounts of cached
data. Like reverse proxy caches and server caches, CDN caches are usually
invalidation-based.
40
3 HTTP for Globally Distributed Applications
Reverse Proxy Cache. Reverse proxy caches are placed in the server’s network
and accept incoming connections as a surrogate for the server [Kam17]. They
can be extended to perform application-specific logic, for example, to check
authentication information and to perform load balancing over backend servers.
Server Cache. Server caches offload the server and its database system
by caching intermediate data, query results, and shared data structures
[Fit04, Nis+13, Xu+14, Can+01b, Gar+08, Bro+13]. Server caches are not
based on HTTP, but explicitly orchestrated by the database system (e.g.,
DBCache [Bor+04]), a specialized middleware (e.g., Quaestor [Ges+17]), or
the application tier (e.g., Memcache [Fit04]).
The defining characteristic of all web caches is that they transparently interpret
HTTP caching metadata as read-through caches. This means that when a request
causes a cache miss, the request is forwarded to the next cache or the origin server
and then the response is cached according to the provided TTL. Web caches always
forward write requests, as these come in the form of opaque POST, PUT, and
DELETE requests whose semantics are implicit properties of a REST/HTTP API.
The effectiveness of web caching is measured by a cache hit ratio that captures the
percentage of all requests that were served from a cache and the byte hit ratio that
expresses the corresponding data volume.
3.3.2 Scalability of Web Caching
To employ web caches for cloud data management, they have to support scalability.
It is widely unknown in the database community that web caches scale through
the same primary mechanisms as most NoSQL databases: replication and hash
sharding. Figure 3.3 gives an overview of these techniques in the context of web
caches. Load balancers that can work on different levels of the protocol stack
forward HTTP requests to web caches using a policy like round-robin or a uniform
distribution [GJP11]. In contrast to database systems, no replication protocols are
required, as each replica fetches missing resources on demand. Partitioning the
space of cached objects for a cluster of caches is achieved by hash sharding the
space of URLs. Requests can then be forwarded to URL partitions through the
Cache Array Routing Protocol (CARP) [Wan99]. Hierarchies of communicating
web caches (cache peering [KR01]) build on query-based protocols like the Inter
Cache Protocol (ICP) [Wes97], the Hypertext Caching Protocol (HTCP) [VW99],
or Cache Digests [Fan+00]. The underlying idea of query-based protocols is that
checking another cache replica’s entries is more efficient than forwarding a request
to the origin server. Finally, global meshes of web caches in CDNs can rely on intercluster exchanges for geo-replication [PB08]. In practice, CDN providers exploit the
fact that a cache lookup of a URL maps well to a key-value interface. This allows
scaling cache clusters by deploying web caches as a proxy on top of a distributed
key-value store [Spa17].
3.3 Web Caching for Scalability and Low Latency
41
Fig. 3.3 Scalability mechanisms of web caches: replication, sharding, query-based hierarchies,
and geo-replication
Web caching increases read scalability and fault tolerance, as objects can still be
retrieved from web caches if the backend is temporarily unavailable [RS03]. As web
caches only fetch content lazily, elasticity is easy to achieve: web cache replicas can
be added at any time to scale reads.
3.3.3 Expiration-Based Web Caching
HTTP defines a Cache-Control header that both clients and servers leverage to
control caching behavior. The server uses it to specify expiration, whereas the client
employs it for validation.
Expirations are provided as TTLs at the granularity of seconds in order to be
independent from clock synchronization. Additionally an Age header indicates how
much time has passed since the original request, to preserve correct expirations
when caches communicate with each other. The actual expiration time texp is
then computed using the local clock’s timestamp at the moment the response
was received nowres (), giving texp = nowres () + T T L − Age. The server can
set separate expirations for shared web caches (s-max-age) and client caches
(max-age). Furthermore, it can restrict that responses should not be cached
at all (no-cache and must-revalidate), should only be cached in client
caches (private), or should not be persisted (no-store). By default, the
cache key that uniquely identifies a cached response consists of the URL and the
host. The Vary header allows to extend the cache key through specified request
42
3 HTTP for Globally Distributed Applications
headers, e.g., Accept-Language, in order to cache the same resource in various
representations.
Clients and web caches can revalidate objects by asking the origin server for
potential modifications of a resource based on a version number (called ETag
in HTTP) or a Last-Modified date (cache validators). The client thus has a
means to explicitly request a fresh object and to save transfer time, if resources
have not changed. Revalidations are performed through conditional requests based
on If-Modified-Since and If-None-Match headers. If the timestamp
or version does not match for the latest resource (e.g., a database object), the
server returns a full response. Otherwise, an empty response with a 304 Not
Modified status code is returned.
Figure 3.4 illustrates the steps a web cache performs when handling a request:
if the object of the requested URL was not previously cached, the web cache
forwards the request to the backend. If a cache hit occurs, the cache determines
whether the local copy of the resource is still fresh by checking now() > texp .
If the object is still fresh, it is returned to the client without any communication
to the backend. If now() > texp and the cached resource has a cache validator,
the web cache revalidates the resource, otherwise, the request is forwarded. This
logic is performed for any cache in the chain from the client cache to reverse
proxy caches. In a revalidation, clients can furthermore bound the age of a response
(max-age and min-fresh), allow expired responses (max-stale) or explicitly
load cached versions (only-if-cached). CDNs and reverse proxies typically
ignore revalidation requests and simply serve the latest cached copy, in order to
secure the origin against revalidation attacks [PB08].
The consistency model of expiration-based caching is -atomicity. The problem
is that is a high, fixed TTL in the order of hours to weeks [RS03], as accurate
TTLs for dynamic data are impossible to determine. This makes the native caching
model of HTTP unsuitable for data management and is the reason why REST APIs
of DBaaS, BaaS, and NoSQL systems explicitly circumvent HTTP caching [Dep,
Hoo, Par, ALS10, Amaa, Dyn, Cal+11, Bie+15, Dat].
Fig. 3.4 Validation of resource freshness in expiration-based HTTP caching
3.3 Web Caching for Scalability and Low Latency
43
3.3.4 Invalidation-Based Web Caching
CDNs and reverse proxy caches are invalidation-based HTTP caches. They extend
the expiration-based caching model and additionally expose (non-standardized)
interfaces for asynchronous cache invalidation. The backend has to explicitly send
an invalidation to every relevant invalidation-based cache. While CDN APIs forward
invalidations internally with efficient broadcasting protocols (e.g., bimodal multicast
[Spa17]), employing many reverse proxies can lead to a scalability problem, if many
invalidations occur. In general, an invalidation is required if a resource was updated
or deleted and invalidation-based caches have observed an expiration time greater
than the current time: ∃texp : now() < texp . For DBaaS/BaaS systems this condition
is non-trivial to detect, since updates may affect otherwise unrelated query results
and objects.
Besides their invalidation interfaces, CDNs (e.g., Akamai and Fastly [BPV08,
Spa17]) and reverse proxies (e.g., Varnish, Squid, Nginx, Apache Traffic Server
[Kam17, Ree08, Wes04]) often also provide further extensions to HTTP caching:
• Limited application logic can be executed in the cache. For example, the Varnish
Control Language (VCL) allows to manipulate requests and responses, perform
health checks and validate headers [Kam17].
• Prefetching mechanisms proactively populate the cache with resources that are
likely to be requested in the near future.
• Edge-side templating languages like ESI [Tsi+01] allow to assemble responses
from cached data and backend requests.
• By assigning tags to cacheable responses, efficient bulk invalidations of related
resources can be performed (tag-based invalidation).
• Distributed Denial of Service (DDoS) attacks can automatically be mitigated and
detected before the backend is compromised [PB08].
• Updated resources can be proactively pushed (prewarming).
• Real-time access logs may be used by the application for analytics and accounting.
• Stale resources can be served while the backend is offline (stale-on-error)
or during revalidations (stale-while-revalidate) [IET15].
• Speed Kit5 [WGW+20] leverages Service Workers to control and maintain
the client cache, reroute requests to a CDN, and apply transparent image
optimization.
For latency, an important characteristic of invalidation-based caches is their
ability to maintain long-lived backend connections that incoming requests can be
multiplexed over. This significantly reduces the overhead of connection handshakes
as they only have to be performed over low-latency links between clients and
CDN edge nodes. In many cases, cloud services have end-to-end encryption as
5 Speed
Kit: https://speed-kit.com.
44
3 HTTP for Globally Distributed Applications
a requirement for authenticity, privacy, data integrity, and confidentiality. To this
end, TLS certificates are deployed to CDNs and reverse proxies to terminate TLS
connections on the network edge and to establish different connections to the
backend. Thus, for encrypted REST APIs and websites, only client, CDN, and
reverse proxy caches apply for HTTP caching, whereas forward and web proxy
caches only observe encrypted traffic.
Previous research on web caching has focused on cache replacement strategies
[PB03, Bre+99, Lab+09], CDN architectures [PB08, FFM04, Fre10], cache cooperation [KR01, Wes97, VW99, Fan+00], proxy and client extensions [Rab+03, Tsi+01,
BR02], and changes to the caching model itself [KW97, Wor94, KW98, Bhi+02].
Further treatments of expiration-based and invalidation-based web caching are
provided by Rabinovich and Spatscheck [RS03], Labrindis et al. [Lab+09], Nagaraj
[Nag04], Buyya et al. [BPV08], and Grigorik [Gri13].
3.3.5 Challenges of Web Caching for Data Management
Both expiration-based and invalidation-based caching are challenging for data management, as they interfere with the consistency mechanisms of database systems.
Figure 3.5 gives an example of how web caching affects consistency.
1. On the first request, the server has to set a TTL for the response. If the TTL is
too low, caching has no effect. If it is too high, clients will experience many
stale reads. Due to the dynamic nature of query results and objects in data
management, TTLs are not known in advance.
2. When a second client updates the previously read object before its TTL expired,
caches are in an inconsistent state. Even if the server could issue an invalidation
(which is usually impossible for query results), the invalidation is asynchronous
and only takes effect at some later point in time.
3. Reads that happen between the completed update and the initially provided
expiration time will cause stale reads at expiration-based caches.
In conclusion, web caching for data management is considerably restricted
because of several challenges:
• Expiration-based caching either degrades consistency (high TTLs) or causes
very high cache miss rates (low TTLs).
• Cache coherence for DBaaS and BaaS REST APIs is currently achieved by
marking all types of dynamic data as uncacheable.
• Currently, TTL estimation is a manual and error-prone process leading to low
caching efficiency as TTLs do not adapt to changing workloads and differences
between individual query responses.
• Cache invalidation requires detecting changes to files, objects, and query results
in real-time based on the updates performed against the data management API.
3.4 The Client Perspective: Processing, Rendering, and Caching for Mobile. . .
45
Fig. 3.5 Cache coherence problems of web caches for data management caused by access of two
different clients
• Fetching dynamic data (e.g., query results) via REST/HTTP requires contacting
a remote server, which involves the full end-to-end latency from the client to the
server.
• With standard HTTP caching, clients cannot control consistency requirements
on a per-user, per-session, or per-operation basis, as the server provides the HTTP
caching metadata used by intermediate caches.
3.4 The Client Perspective: Processing, Rendering, and
Caching for Mobile and Web Applications
Frontend performance is concerned with how fast data can be rendered and
computations be performed at the client side. Incidentally, the frontend is often not
considered during the design of a data management solution. However, as the SDK
46
3 HTTP for Globally Distributed Applications
and API layer of a DBaaS/BaaS reach into the environment of the mobile device
and utilize its networking and caching capabilities, some aspects of browsers are
highly relevant for end-to-end performance. We will specifically examine frontend
performance for browsers. In native mobile apps, most principles apply too, but
applications can choose from different storage options like the files system and
embedded relational databases. Due to the absence of a browser cache, though, the
task of maintaining cache consistency with remote storage has to be handled by the
application.
As of 2018, an average website downloads 107 different HTTP resources with
a total size of over 3 MB of data to be transferred [Arc]. The web has evolved
through three major forms of websites. Hypertext documents are simple text-based
documents interconnected through links and formatted through basic markup for the
content’s structure. Web pages enrich hypertext documents through support for rich
media types such as images, audio, and video, as well as complex layout and styling
of the document’s appearance. Finally, web applications add behavior to websites
through JavaScript logic and the ability to programmatically request REST/HTTP
APIs (Ajax). Web applications are usually implemented with single-page application frameworks that help to structure the application through architectural patterns
and templating for rendering data into UI elements [Ang, Emb, Vue, Rea]. With
the growing prevalence and complexity of web applications, the impact of latency
increases.
3.4.1 Client-Side Rendering and Processing
The critical rendering path (CRP) describes the process that a browser performs in
order to render a website from HTML, JavaScript, and CSS resources [Fir16, Gri13].
The dependency graph between these critical resources, i.e., files required for the
initial paint, determines the length, size, and weight of the CRP. The length of the
CRP is the minimum number of network round-trips required to render the web
page. The size of the CRP is the number of critical resources that are loaded. The
weight (also called “critical bytes”) of the CRP is the combined size of all critical
resources measured in bytes.
The execution of the CRP is illustrated in Fig. 3.6. After receiving the HTML
from the network, the browser starts parsing it into a Document Object Model
(DOM). If the HTML references CSS and JavaScript resources, the parser (respectively its look-ahead heuristics) will trigger their background download as soon
as they are discovered. The CSS stylesheet is parsed into a CSS object model
(CSSOM). CSS is render-blocking, as rendering can only proceed when the CSSOM
is fully constructed and thus all styling information available. JavaScript can modify
and read from both the DOM and CSSOM. It is parser-blocking as the HTML
parser blocks until the discovered JavaScript is executed. Furthermore, JavaScript
execution blocks until the CSSOM is available causing a chain of interdependencies.
Only when the DOM and the CSSOM are constructed, and JavaScript is executed,
3.4 The Client Perspective: Processing, Rendering, and Caching for Mobile. . .
47
Fig. 3.6 The critical rendering path as a model for frontend performance
the browser starts to combine styling and layout information into a render tree,
computes a layout, and paints the page on the screen.
The process of frontend performance optimization involves reducing the size,
length, and weight of the CRP. Typical steps are loading JavaScript asynchronously,
deferring its parsing, preconnecting and preloading critical resources, inlining
critical CSS, applying compression, minification, and concatenation, optimizing
JavaScript execution and CSS selector efficiency, and loading “responsive” images
based on screen size [Wag17]. HTTP/2 eliminates the necessity for many common
performance workarounds that negatively impact cacheability, for example, concatenation of resources [IET15].
End-user performance can be measured using different web performance metrics:
• Browsers implement events that indicate the completeness of the rendering
process. The DomContentLoaded event is fired once the DOM has been
constructed and no stylesheets block JavaScript execution.
• The first paint occurs when the browser renders the page for the first time.
Depending on the structure of the CRP this can, for example, be a blank page
with a background color or a visually complete page. The first paint metric can
be refined to the first meaningful paint [Sak17] which is defined through the paint
that produces the largest change in the visual layout.
• Once all resources of the website (in particular images, JavaScript and
stylesheets) have been downloaded and processed, the load event is fired.
The event indicates the completion of loading from an end user’s perspective.
48
3 HTTP for Globally Distributed Applications
However, any asynchronous requests triggered through JavaScript are not
captured in the load event. Therefore, the DomContentLoaded and load event
can be decreased by loading resources through code without actually improving
user-perceived performance.
• As all above metrics do not capture the rendering process itself, the speed index
metric was proposed as ameans of quantifying visual completeness over time
∞
[Mee12]. It is defined as 0 1 − V C(t) dt, where V C(t) ∈ [0, 1] is the visual
completeness as a function of time. Experimentally, the speed index is usually
calculated through video analysis of a browser’s loading process. In contrast to
other metrics, the speed index also accounts for API requests performed by web
applications.
Latency remains the major factor for frontend performance, once all common
frontend optimizations (e.g., inlined critical CSS) and network optimizations (e.g.,
gzip compression) have been applied. The length of the CRP determines how
many round-trips occur before the user is presented with the first rendered result.
In the ideal case, the length of the CRP can be reduced to one single roundtrip by only including asynchronous JavaScript and inlining CSS. In practice,
however, the length and size of the critical rendering path is usually much longer
[Wag17, Fir16]. The increasing predominance of web applications based on rich
client-side JavaScript frameworks that consume data via API requests extends the
impact of latency beyond the CRP. During navigation and rendering, the latency of
asynchronously fetched resources is crucial to display data quickly and to apply user
interactions without perceptible delay.
3.4.2 Client-Side Caching and Storage
In recent years, it became evident that moving more application logic into the client
also requires persistence options to maintain application state within and across
user sessions. Several client-side storage and caching APIs have been standardized
and implemented in browsers. A comprehensive overview of client-side storage
mechanisms is provided by Camden [Cam16]. In the following, we provide an
overview of storage technologies relevant for this book:
HTTP Browser Cache. The browser cache [IET15, Fie+99] works similar to
other HTTP caches, except that it is exclusive to one user. Its main advantage
is that it transparently operates on any HTTP resource. On the other hand,
however, it cannot be programmatically controlled by the JavaScript application
and operates purely expiration-based. Also, cached data can be evicted at any
time making it impossible to build application logic on the presence of cached
client-side data.
Cookies. Through the HTTP Cookie header, the server can store strings in the
client. Cookie values are automatically attached to each client request [IET15].
Cookies are very limited in control, size, and flexibility and therefore mainly
3.4 The Client Perspective: Processing, Rendering, and Caching for Mobile. . .
49
used for session state management and user tracking. Cookies frequently cause
performance problems as they can only be accessed synchronously and have to
be transferred with each request.
Web SQL. The goal of the WebSQL specification is to provide SQL-based access
to an embedded relational database (e.g., SQLite) [Cam16]. However, as browser
support is lacking, the development of WebSQL has mostly ceased in favor of the
IndexedDB API.
IndexedDB. The IndexedDB specification [AA17] describes a low-level database
API that offers key-value storage, cursors over indices, and transactions. Despite
its lack of a declarative query language, it can be used to implement an embedded
database system in the client. In contrast to the browser cache, storage is
persistent and controlled via an API. However, this implies that custom cache
coherence or replication is required if IndexedDB is used to store a subset of the
backend database.
Service Worker Cache. Service Workers are background processes that can
intercept, modify, and process HTTP requests and responses of a website
[Ama16]. This allows implementing advanced network behavior such as an
offline mode that continues serving responses even though the user lacks a mobile
network connection. The Service Worker cache is a persistent, asynchronous
map storing pairs of HTTP requests and responses. The default cache coherence
mechanism is to store data indefinitely. However, the JavaScript code of the Service Worker can modify this behavior and implement custom cache maintenance
strategies.
Local and Session Storage. The DOM storage APIs [Cam16] allow persisting
key-value pairs locally for a single session (SessionStorage) or across
sessions (LocalStorage). The API only allows blocking get and set
operations on keys and values. Due to its synchronous nature, the API is not
accessible in background JavaScript processes (e.g., Service Workers).
The central problem of client-side storage and caching abstractions is that they
have to be manually controlled by the application. Besides first attempts, there is
furthermore no coupling between query languages and persistence APIs employed
in the client and the DBaaS/BaaS [ALS10, Go+15, Lak+16]. This forces application
developers to duplicate data-centric business logic and maintain cache consistency
manually. The error-prone and complex task of manual cache maintenance prevents
many applications from incorporating client-side storage into the application’s data
management. Client-side caching and storage standards potentially enable serving
web applications in the absence of network connectivity (offline mode). However,
this also requires new mechanisms for cache coherence of reads and query results
as well as synchronization and concurrency control for updates made while being
offline.
50
3 HTTP for Globally Distributed Applications
3.5 Challenges and Opportunities: Using Web Caching for
Cloud Data Management
The network and the protocols involved in communication with cloud services are
the fundamental cause of high latency. In this chapter, we discussed how most
aspects of networking can be optimized, leaving end-to-end latency resulting from
physical distance as the major remaining performance challenge. Even though
REST and HTTP are widely used for DBaaS, BaaS, and NoSQL systems, their
caching model is not easy to combine with the requirements of data management:
Expiration-based caches interfere with the consistency guarantees of database
systems, whereas invalidation-based caching requires non-trivial change detection
for dynamic data. Frontend performance is defined by the highly latency-dependent
critical rendering path. Modern browsers potentially allow latency reduction for
data-centric API requests through storage abstractions. However, cache coherence
needs to be solved in order to avoid sacrificing consistency for reduced latency.
While latency reduction through HTTP caching is a mostly open problem for
cloud data management, there are approaches (e.g. the Cache Sketch [Ges+15])
that consolidate expiration-based caching with data management in transparent
fashion while preserving invariants such as consistency guarantees and correctness
of query results. At the time of writing, however, Baqend6 is the only commercial
implementation of such an approach.
In the upcoming chapters, we will address the latency, scalability, and consistency challenges across the data management stack to achieve better performance
for a wide spectrum of web and mobile applications.
References
[AA17] Joshua Bell Ali Alabbas. Indexed Database API 2.0. https://w3c.github.io/
IndexedDB/. (Accessed on 07/14/2017). 2017.
[All10] Subbu Allamaraju. Restful web services cookbook: solutions for improving scalability
and simplicity. “O’Reilly Media, Inc.”, 2010.
[Alo+04] Gustavo Alonso et al. “Web services”. In: Web Services. Springer, 2004, pp. 123–149.
[ALS10] J. Chris Anderson, Jan Lehnardt, and Noah Slater. CouchDB - The Definitive Guide:
Time to Relax. O’Reilly, 2010. ISBN: 978-0-596-15589-6. URL: http://www.oreilly.de/
catalog/9780596155896/index.html.
[Amaa] Amazon Simple Storage Service (S3). //aws.amazon.com/documentation/s3/.
(Accessed on 07/28/2017). 2017. URL: //aws.amazon.com/documentation/s3/ (visited
on 02/18/2017).
[Ama16] Sean Amarasinghe. Service worker development cookbook. English. OCLC:
958120287. 2016. ISBN: 978-1-78646-952-6. URL: http://lib.myilibrary.com?id=
952152 (visited on 01/28/2017).
6 Baqend:
https://www.baqend.com/.
References
51
[Amu17] Mike Amundsen. RESTful Web Clients: Enabling Reuse Through Hypermedia. 1st ed.
O’Reilly Media, Feb. 2017. ISBN: 9781491921906. URL: http://amazon.com/o/ASIN/
1491921900/.
[Ang] Angular Framework. https://angular.io/. (Accessed on 05/26/2017). 2017.
[Arc] HTTP Archive. http://httparchive.org/trends.php. Accessed: 2018-07-14. 2018.
[Bhi+02] Manish Bhide et al. “Adaptive push-pull: Disseminating dynamic web data”. In: IEEE
Transactions on Computers 51.6 (2002), pp. 652–668.
[Bie+15] Christopher D Bienko et al. IBM Cloudant: Database as a Service Advanced Topics.
IBM Redbooks, 2015.
[Bla+10] Roi Blanco et al. “Caching search engine results over incremental in- dices”. In:
Proceedings of the 33rd international ACM SIGIR conference on Research and
development in information retrieval. ACM, 2010, pp. 82–89. URL: http://dl.acm.org/
citation.cfm?id=1835466 (visited on 04/24/2015).
[Bor+04] C. BornhÃüvd et al. “Adaptive database caching with DBCache”. In: Data Engineering 27.2 (2004), pp. 11–18. URL: http://sipew.org/staff/bornhoevd/IEEEBull’04.pdf
(visited on 06/28/2012).
[BPV08] Rajkumar Buyya, Mukaddim Pathan, and Athena Vakali, eds. Content Delivery
Networks (Lecture Notes in Electrical Engineering). 2008th ed. Springer Sept. 2008.
ISBN : 9783540778868. URL : http://amazon.com/o/ASIN/3540778861/.
[BR02] Laura Bright and Louiqa Raschid. “Using Latency-Recency Profiles for Data Delivery
on the Web”. In: VLDB 2002, Proceedings of 28th International Conference on Very
Large Data Bases, August 20–23, 2002, Hong Kong, China. Morgan Kaufmann, 2002,
pp. 550–561. URL: http://www.vldb.org/conf/2002/S16P01.pdf.
[Bre+99] Lee Breslau et al. “Web caching and Zipf-like distributions: Evidence and implications”. In: INFOCOM’99. Eighteenth Annual Joint Conference of the IEEE Computer
and Communications Societies. Proceedings. IEEE. Vol. 1. IEEE. IEEE, 1999,
pp. 126–134. URL: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=749260 (visited on 01/03/2015).
[Bro+13] Nathan Bronson et al. “TAO: Facebook’s Distributed Data Store for the Social Graph.”
In: USENIX Annual Technical Conference. 2013, pp. 49–60. URL: http://dl.frz.ir/
FREE/papers-we-love/datastores/tao-facebook-distributed-datastore.pdf (visited on
09/28/2014).
[Cal+11] Brad Calder et al. “Windows Azure Storage: a highly available cloud storage service
with strong consistency”. In: Proceedings of the Twenty-Third ACM Symposium on
Operating Systems Principles. ACM. ACM, 2011, pp. 143–157. URL: http://dl.acm.
org/citation.cfm?id=2043571 (visited on 04/16/2014).
[Cam16] Raymond Camden. Client-side data storage: keeping it local. First edition. OCLC:
ocn935079139. Beijing: O’Reilly, 2016. ISBN: 978-1-4919-3511-8.
[Can+01b] K. SelÃğuk Candan et al. “Enabling Dynamic Content Caching for Databasedriven Web Sites”. In: SIGMOD. New York, NY, USA: ACM, 2001, pp. 532–
543. ISBN: 1-58113-332-4. DOI: 10.1145/375663.375736. URL: http://doi.acm.org/10.
1145/375663.375736 (visited on 10/04/2014).
[Car13] Josiah L. Carlson. Redis in Action. Greenwich, CT, USA: Manning Publications Co.,
2013. ISBN: 1617290858, 9781617290855.
[Che+14] Yuchung Cheng et al. Tcp fast open. Tech. rep. 2014.
[Chu+13] Jerry Chu et al. “Increasing TCP’s initial window”. In: (2013).
[Dat] Google Cloud Datastore. https://cloud.google.com/datastore/docs/concepts/overview.
(Accessed on 05/20/2017). 2017. URL: https://cloud.google.com/datastore/docs/
concepts/overview (visited on 02/18/2017).
[Dep] Deployd: a toolkit for building realtime APIs. https://github.com/deployd/deployd.
(Accessed on 05/20/2017). 2017. URL: https://github.com/deployd/deployd (visited
on 02/19/2017).
[DFR15b] Akon Dey, Alan Fekete, and Uwe Rohm. “REST+T: Scalable Transactions
over HTTP”. In: IEEE, Mar 2015, pp. 36–41. ISBN: 978-1-4799-8218-9. DOI:
52
3 HTTP for Globally Distributed Applications
10.1109/IC2E.2015.11. URL: http://ieeexplore.ieee.org/document/7092896/ (visited
on 11/25/2016).
[Dow98] Troy Bryan Downing. Java RMI: remote method invocation. IDG Books Worldwide,
Inc., 1998.
[DPS13] Erik Dahlman, Stefan Parkvall, and Johan Skold. 4G: LTE/LTE- advanced for mobile
broadband. Academic press, 2013.
[Dyn] DynamoDB. http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/
Introduction.html. (Accessed on 05/20/2017). 2017. URL: http://docs.aws.
amazon.com/amazondynamodb/latest/developerguide/Introduction.html
(visited
on 01/13/2017).
[Emb] Ember.js Framework. https://www.emberjs.com/. (Accessed on 05/26/2017). 2017.
[ERR11] Mohamed El-Refaey and Bhaskar Prasad Rimal. “Grid, soa and cloud computing:
On-demand computing models”. In: Computational and Data Grids: Principles,
Applications and Design: Principles, Applications and Design (2011), p. 45.
[Fan+00] Li Fan et al. “Summary cache: a scalable wide-area web cache sharing protocol”.
In: IEEE/ACM TON 8.3 (2000), pp. 281–293. URL: http://dl.acm.org/citation.cfm?id=
343572 (visited on 10/04/2014).
[FFM04] Michael J. Freedman, Eric Freudenthal, and David Mazieres. “Democratizing Content
Publication with Coral.” In: NSDI. Vol. 4. 2004, pp. 18–18. URL: https://www.usenix.
org/legacy/events/nsdi04/tech/full_papers/freedman/freedman_html/ (visited on
09/28/2014).
[Fie+99] R. Fielding et al. “RFC 2616: Hypertext Transfer ProtocolâHTTP/1.1, 1999”. In:
URL http://www.rfc.net/rfc2616.html (1999).
[Fie00] R. T Fielding. “Architectural styles and the design of network-based software
architectures”. PhD thesis. Citeseer, 2000.
[Fir16] Maximiliano Firtman. High Performance Mobile Web: Best Practices for Optimizing
Mobile Web Apps. 1st ed. O’Reilly Media, Sept. 2016. ISBN: 9781491912553. URL:
http://amazon.com/o/ASIN/1491912553/.
[Fit04] Brad Fitzpatrick. “Distributed caching with Memcached”. In: Linux journal 2004.124
(2004), p. 5.
[Fre10] Michael J. Freedman. “Experiences with CoralCDN: A Five-Year Operational View”
In: NSDI. 2010, pp. 95–110. URL: http://static.usenix.org/legacy/events/nsdi10/tech/
full_papers/freedman.pdf (visited on 01/03/2015).
[Gar+08] Charles Garrod et al. “Scalable query result caching for web applications”. In:
Proceedings of the VLDB Endowment 1.1 (2008), pp. 550–561. URL: http://dl.acm.
org/citation.cfm?id=1453917 (visited on 04/24/2015).
[GC89] Cary G. Gray and David R. Cheriton. “Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency”. In: Proceedings of the Twelfth ACM
Symposium on Operating System Principles, SOSP 1989, The Wigwam, Litchfield
Park, Arizona, USA, December 3–6, 1989. Ed. by Gregory R. Andrews. ACM, 1989,
pp. 202–210. DOI: 10.1145/74850.74870.
[Ges+15] Felix Gessert et al. “The Cache Sketch: Revisiting Expiration-based Caching in the
Age of Cloud Data Management”. In: Datenbanksysteme fÃijr Business, Technologie
und Web (BTW), 16. Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme”. GI, 2015.
[Ges+17] Felix Gessert et al. “Quaestor: Query Web Caching for Database-as-a-Service
Providers”. In: Proceedings of the VLDB Endowment (2017).
[GJP11] K. Gilly, C. Juiz, and R. Puigjaner. “An up-to-date survey in web load balancing”. In:
World Wide Web 14.2 (2011), pp. 105–131. URL: http://www.springerlink.com/index/
P1080033328U8158.pdf (visited on 09/12/2012).
[Go+15] Younghwan Go et al. “Reliable, Consistent, and Efficient Data Sync for Mobile Apps”.
In: Proceedings of the 13th USENIX Conference on File and Storage Technologies,
FAST 2015, Santa Clara, CA, USA, February 16–19, 2015. Ed. by Jiri Schindler and
Erez Zadok. USENIX Association, 2015, pp. 359–372. URL: https://www.usenix.org/
conference/fast15/technical-sessions/presentation/go.
References
53
[Gou+02] D. Gourley et al. HTTP: The Definitive Guide. Definitive Guides. O’Reilly
Media, 2002. ISBN: 9781449379582. URL: https://books.google.de/books?id=
qEoOl9bcV_cC.
[Gri13] Ilya Grigorik. High performance browser networking. English. [S.l.]: O’Reilly Media,
2013. ISBN: 1-4493-4476-3 978-1-4493-4476-4. URL: https://books.google.de/books?
id=tf-AAAAQBAJ.
[Hoo] GitHub - hoodiehq/hoodie: A backend for Offline First applications. https://github.
com/hoodiehq/hoodie. (Accessed on 05/25/2017). 2017. URL: https://github.com/
hoodiehq/hoodie (visited on 02/17/2017).
[How+88] John H. Howard et al. “Scale and Performance in a Distributed File System”. In: ACM
Trans. Comput. Syst. 6.1 (1988), pp. 51–81. DOI: 10.1145/35037.35059.
[Kam17] Poul-Henning Kamp. Varnish HTTP Cache. https://varnish-cache.org/. (Accessed on
04/30/2017). 2017. URL: https://varnish-cache.org/ (visited on 01/26/2017).
[KHR02] Dina Katabi, Mark Handley, and Charles E. Rohrs. “Congestion control for high
bandwidth-delay product networks”. In: Proceedings of the ACM SIGCOMM 2002
Conference on Applications, Technologies, Architectures, and Protocols for Computer
Communication, August 19–23, 2002, Pittsburgh, PA, USA. Ed. by Matthew Mathis et
al. ACM, 2002, pp. 89–102. DOI: 10.1145/633025.633035.
[KR01] B. Krishnamurthy and J. Rexford. “Web Protocols and Practice, HTTP/1.1, Networking Protocols, Caching, and Traffic Measurement”. In: Recherche 67 (2001), p. 02.
URL : http://www.lavoisier.fr/livre/notice.asp?id=O3OWRLAROSSOWB (visited on
06/30/2012).
[KR10] James F Kurose and Keith W Ross. Computer networking: a top-down approach.
Vol. 5. Addison-Wesley Reading, 2010.
[KW97] Balachander Krishnamurthy and Craig E. Wills. “Study of Piggyback Cache Validation for Proxy Caches in the World Wide Web”. In: 1st USENIX Symposium on
Internet Technologies and Systems, USITS’97, Monterey, California, USA, December 8–11, 1997. USENIX, 1997. URL: http://www.usenix.org/publications/library/
proceedings/usits97/krishnamurthy.html.
[KW98] Balachander Krishnamurthy and Craig E. Wills. “Piggyback Server Invalidation for
Proxy Cache Coherency”. In: Computer Networks 30.1-7 (1998), pp. 185–193. DOI:
10.1016/S0169-7552(98)00033-6.
[Lab+09] Alexandros Labrinidis et al. “Caching and Materialization for Web Databases”.
In: Foundations and Trends in Databases 2.3 (2009), pp. 169–266. DOI:
10.1561/1900000005
[Lak+16] Sarath Lakshman et al. “Nitro: A Fast, Scalable In-Memory Storage Engine for
NoSQL Global Secondary Index”. In: PVLDB 9.13 (2016), pp. 1413–1424. URL:
http://www.vldb.org/pvldb/vol9/p1413-lakshman.pdf.
[Lwe10] Bernhard Lwenstein. Benchmarking of Middleware Systems: Evaluating and Comparing the Performance and Scalability of XVSM (MozartSpaces), JavaSpaces
(GigaSpaces XAP) and J2EE (JBoss AS). VDM Verlag, 2010.
[Mat+97] Matthew Mathis et al. “The macroscopic behavior of the TCP congestion avoidance
algorithm”. In: ACM SIGCOMM Computer Communication Review 27.3 (1997),
pp. 67–82. URL: http://dl.acm.org/citation.cfm?id=264023 (visited on 09/28/2014).
[Mee12] Patrick Meenan. Speed Index - WebPagetest Documentation. https://sites.google.
com/a/webpagetest.org/docs/using-webpagetest/metrics/speed-index. (Accessed on
07/16/2017). 2012.
[Mog94] Jeffrey C. Mogul. “Recovery in Spritely NFS”. In: Computing Systems 7.2
(1994), pp. 201–262. URL: http://www.usenix.org/publications/compsystems/1994/
spr_mogul.pdf.
[Nag04] S. V. Nagaraj. Web caching and its applications Vol. 772. Springer 2004.
http://books.google.de/books?hl=de&lr=&id=UgFhOl2lF0oC&oi=fnd&
URL :
pg=PR11&dq=web+caching+and+its+applications&ots=X0Ow-cvXMH&sig=
eNu7MDyfbGLKMGxwv6MZpZlyo6c (visited on 06/28/2012).
54
3 HTTP for Globally Distributed Applications
[Nis+13] Rajesh Nishtala et al. “Scaling Memcache at Facebook”. In: NSDI. USENIX Association, 2013, pp. 385–398.
[Ope] Open API Initiative. https://www.openapis.org/. (Accessed on 07/28/2017). 2017.
[Par] Parse Server. http://parseplatform.github.io/docs/parse-server/guide/. (Accessed on
07/28/2017). 2017. URL: http://parseplatform.github.io/docs/parse-server/guide/ (visited on 02/19/2017).
[PB03] Stefan Podlipnig and László Böszörményi. “A survey of Web cache replacement strategies”. In: ACM Comput. Surv. 35.4 (2003), pp. 374–398. DOI:
10.1145/954339.954341.
[PB07] Al-Mukaddim Khan Pathan and Rajkumar Buyya. “A taxonomy and survey of
content delivery networks”. In: Grid Computing and Distributed Systems Laboratory,
University of Melbourne, Technical Report (2007), p. 4. URL: http://cloudbus.org/
reports/CDN-Taxonomy.pdf (visited on 09/28/2014).
[PB08] Mukaddim Pathan and Rajkumar Buyya. “A Taxonomy of CDNs”. English. In:
Content Delivery Networks. Ed. by Rajkumar Buyya, Mukaddim Pathan, and Athena
Vakali. Vol. 9. Lecture Notes Electrical Engineering. Springer Berlin Heidelberg,
2008, pp. 33–77. ISBN: 978-3-540-77886-8. URL: http://dx.doi.org/10.1007/978-3540-77887-5_2.
[Pos] PostgreSQL: Documentation: 9.6: High Availability, Load Balancing, and Replication. https://www.postgresql.org/docs/9.6/static/high-availability.html (Accessed on
07/28/2017). 2017. URL: https://www.postgresql.org/docs/9.6/static/high-availability.
html (visited on 02/04/2017).
[Pos81] Jon Postel. “Transmission control protocol”. In: (1981).
[Rab+03] Michael Rabinovich et al. “Moving Edge-Side Includes to the Real Edge the Clients”. In: 4th USENIX Symposium on Internet Technologies and Systems,
USITS’03, Seattle, Washington, USA, March 26–28, 2003. Ed. by Steven D. Gribble.
USENIX, 2003. URL: http://www.usenix.org/events/usits03/tech/rabinovich.html.
[RAR13] Leonard Richardson, Mike Amundsen, and Sam Ruby. RESTful Web APIs: Services
for a Changing World. “O’Reilly Media, Inc.”, 2013.
[Rea] React - A JavaScript library for building user interfaces. https://facebook.github.io/
react/. (Accessed on 05/26/2017). 2017.
[Ree08] Will Reese. “Nginx: the high-performance web server and reverse proxy”. In: Linux
Journal 2008.173 (2008), p. 2.
[Res17] E. Rescorla. The Transport Layer Security (TLS) Protocol Version 1.3 (Draft). https://
tools.ietf.org/html/draft-ietf-tls-tls13-21. (Accessed on 07/29/2017). 2017.
[RS03] M. Rabinovich and O. Spatscheck. “Web caching and replication”. In: SIGMOD
Record 32.4 (2003), p. 107. URL: http://www.sigmod-org/publications/sigmod.record/
0312/20.WebCachingReplication2.pdf (visited on 06/28/2012).
[Sak17] Kunihiko Sakamoto. Time to First Meaningful Paint: a layout-based approach. https://
docs.google.com/document/d/1BR94tJdZLsin5poeet0XoTW60M0SjvOJQttKTJK8HI/. (Accessed on 07/16/2017). 2017.
[Spa17] Bruce Spang. Building a Fast and Reliable Purging System https://www.fastly.
com/blog/building-fast-and-reliable-purging-system/ (Accessed on 07/30/2017). Feb
2017.
[Tsi+01] Mark Tsimelzon et al. “ESI language specification 1.0”. In: Akamai Technologies, Inc.
Cambridge, MA, USA, Oracle Corporation, Redwood City, CA, USA (2001), pp. 1–0.
[TW11] Andrew S. Tanenbaum and David Wetherall. Computer networks, 5th Edition. Pearson, 2011. ISBN: 0132553171. URL: http://www.worldcat.org/oclc/698581231.
[Usa] Usage Statistics of HTTP/2 for Websites, July 2017. https://w3techs.com/technologies/
details/ce-http2/all/all (Accessed on 07/29/2017). 2017.
[Vak06] Athena Vakali. Web Data Management Practices: Emerging Techniques and Technologies: Emerging Techniques and Technologies. IGI Global, 2006.
[Vue] Vue.js. https://vuejs.org/. (Accessed on 05/26/2017). 2017.
References
55
[VW99] Paul Vixie and Duane Wessels. Hyper Text Caching Protocol (HTCP/0.0). Tech. rep.
1999.
[Wag17] Jeremy Wagner. Web Performance in Action: Building Faster Web Pages.
Manning Publications, 2017. ISBN: 1617293776. URL: https://www.amazon.
com/Web-Performance-Action-Building-Faster/dp/1617293776?SubscriptionId=
0JYN1NVW651KCA56C102&tag=techkie-20&linkCode=xm2&camp=2025&
creative=165953&creativeASIN=1617293776.
[Wan99] J. Wang. “A survey of web caching schemes for the internet”. In: ACM SIGCOMM
Computer Communication Review 29.5 (1999), pp. 36–46. URL: http://dl.acm.org/
citation.cfm?id=505701 (visited on 06/28/2012).
[WDM01] Jörg Widmer, Robert Denda, and Martin Mauve. “A survey on TCP-friendly congestion control”. In: IEEE network 15.3 (2001), pp. 28–37.
[Wes04] Duane Wessels. Squid - the definitive guide: making the most of your internet.
O’Reilly, 2004. ISBN: 978-0-596-00162-9. URL: http://www.oreilly.de/catalog/squid/
index.html.
[Wes97] Duane Wessels. “Application of internet cache protocol (ICP), version 2”. In: (1997).
[WGW+20] Wolfram Wingerath, Felix Gessert, Erik Witt, et al. “Speed Kit: A Polyglot & GDPRCompliant Approach For Caching Personalized Content”. In: 36th IEEE International
Conference on Data Engineering, ICDE 2020, Dallas, Texas, April 20–24, 2020.
2020.
[Wor94] Kurt Jeffery Worrell. “Invalidation in Large Scale Network Object Caches”. In:
(1994).
[WP11] Erik Wilde and Cesare Pautasso. REST: from research to practice. Springer Science
& Business Media, 2011.
[WPR10] Jim Webber, Savas Parastatidis, and Ian Robinson. REST in practice: Hypermedia and
systems architecture. “O’Reilly Media, Inc.”, 2010.
[Xu+14] Yuehai Xu et al. “Characterizing Facebook’s Memcached Workload”. In: IEEE
Internet Computing 18.2 (2014), pp. 41–49.
[IET15] IETF. “RFC 7540 - Hypertext Transfer Protocol Version 2 (HTTP/2)”. In: (2015).
Chapter 4
Systems for Scalable Data Management
Irrespective of the server-side architecture, scalable data management is the primary
challenge for high performance. Business and presentation logic can be designed
to scale by virtue of stateless processing or by offloading the problem of state to
a shared data store. Therefore, the requirements of high availability and elastic
scalability depend on database systems.
Today, data is produced and consumed at a rapid pace. This has led to novel
approaches for scalable data management subsumed under the term “NoSQL”
database systems to handle the ever-increasing data volume and request loads.
However, the heterogeneity and diversity of the numerous existing systems impede
the well-informed selection of a data store appropriate for a given application
context. In this section, we will provide a high-level overview of the current NoSQL
landscape. In Chap. 8, we will furthermore survey commonly used techniques for
sharding, replication, storage management, and query processing in these systems to
derive a classification scheme for NoSQL databases. A straightforward and abstract
decision model for restricting the choice of appropriate NoSQL systems based on
application requirements concludes the book in Chap. 9.
4.1 NoSQL Database Systems
Traditional relational database management systems (RDBMSs) provide robust
mechanisms to store and query structured data under strong consistency and
transaction guarantees and have reached an unmatched level of reliability, stability,
and support through decades of development. In recent years, however, the amount
of useful data in some application areas has become so vast that it cannot be stored
or processed by traditional database solutions. User-generated content in social
networks and data retrieved from large sensor networks are only two examples
of this phenomenon commonly referred to as Big Data [Lan01]. A class of novel
© Springer Nature Switzerland AG 2020
F. Gessert et al., Fast and Scalable Cloud Data Management,
https://doi.org/10.1007/978-3-030-43506-6_4
57
58
4 Systems for Scalable Data Management
data storage systems able to cope with the management of Big Data are subsumed
under the term NoSQL databases, many of which offer horizontal scalability and
higher availability than relational databases by sacrificing querying capabilities and
consistency guarantees. These trade-offs are pivotal for service-oriented computing
and “as-a-service” models since any stateful service can only be as scalable and
fault-tolerant as its underlying data store.
Please note, that throughout this book, we address Big Data management, i.e.,
database and application techniques for dealing with data at high velocity, volume,
and variety (coined as the “three Vs” [ZS17]). We only cover Big Data analytics,
where it directly concerns the design of our low-latency methodology for data
management and refer to our tutorials for further background on systems and
approaches for analytics [GR15, GR16, GWR17, WGR+18, WGR19].
There are dozens1 of NoSQL database systems and it is hard for practitioners and
researchers to keep track of where they excel, where they fail or even where they
differ, as implementation details change quickly and feature sets evolve over time.
In this section, we therefore aim to provide an overview of the NoSQL landscape
by discussing employed concepts rather than system specificities and explore the
requirements typically posed to NoSQL database systems, the techniques used to
fulfill these requirements and the trade-offs that have to be made in the process.
Our focus lies on key-value, document, and wide-column stores since these NoSQL
categories cover the most relevant techniques and design decisions in the space of
scalable data management and are well suitable for the context of scalable cloud
data management.
In order to abstract from implementation details of individual NoSQL systems,
high-level classification criteria can be used to group similar data stores into
categories. As shown in Fig. 4.1, we will describe how NoSQL systems can be cat-
Fig. 4.1 The two high-level approaches of categorizing NoSQL systems according to data models
and consistency-availability trade-offs
1 An
extensive list of NoSQL database systems can be found at http://nosql-database.org/.
4.2 Data Models: Key-Value, Wide-Column and Document Stores
59
egorized by their data model (key-value stores, document stores, and wide-column
stores) and the safety-liveness trade-offs in their design (CAP and PACELC).
4.2 Data Models: Key-Value, Wide-Column and Document
Stores
The most commonly employed distinction between NoSQL databases is the way
they store and allow access to data. Each system covered in this overview can be
categorized as either a key-value store, document store, or wide-column store.
4.2.1 Key-Value Stores
A key-value store consists of a set of key-value pairs with unique keys. Due to
this simple structure, it only supports get and put operations. As the nature of the
stored value is transparent to the database, pure key-value stores do not support
operations beyond simple CRUD (Create, Read, Update, Delete). Key-value stores
are therefore often referred to as schemaless [SF12]: any assumptions about the
structure of stored data are implicitly encoded in the application logic (schema-onread [Kle17]) and not explicitly defined through a data definition language (schemaon-write).
The obvious advantages of this data model lie in its simplicity. The very simple
abstraction makes it easy to partition and query data so that the database system can
achieve low latency as well as high throughput. However, if an application demands
more complex operations, e.g., range queries, this data model is not powerful
enough. Figure 4.2 illustrates how user account data and settings might be stored in a
key-value store. Since queries more complex than simple lookups are not supported,
data has to be analyzed inefficiently in application code to extract information like
whether cookies are supported or not (cookies: false).
Fig. 4.2 Key-value stores
offer efficient storage and
retrieval of arbitrary values
60
4 Systems for Scalable Data Management
Fig. 4.3 Document stores are
aware of the internal structure
of the stored entity and thus
can support queries
4.2.2 Document Stores
A document store is a key-value store that restricts values to semi-structured formats
such as JSON documents like the one illustrated in Fig. 4.3. This restriction in
comparison to key-value stores brings great flexibility in accessing the data. It is not
only possible to fetch an entire document by its ID, but also to retrieve only parts
of a document, e.g., the age of a customer, and to execute queries like aggregations,
query-by-example or even full-text search.
4.2.3 Wide-Column Stores
Wide-column stores inherit their name from the image that is often used to explain
the underlying data model: a relational table with many sparse columns. Technically,
however, a wide-column store is closer to a distributed multi-level2 sorted map: the
first-level keys identify rows which themselves consist of key-value pairs. The firstlevel keys are called row keys, the second-level keys are called column keys. This
storage scheme makes tables with arbitrarily many columns feasible because there
is no column key without a corresponding value. Hence, null values can be stored
without any space overhead. The set of all columns is partitioned into so-called
column families to co-locate columns on disk that are usually accessed together.
On disk, wide-column stores do not co-locate all data from each row, but instead,
values of the same column family and from the same row. Hence, an entity (a
row) cannot be retrieved by one single lookup as in a document store but has to be
joined from the columns of all column families. However, this storage layout usually
enables highly efficient data compression and makes retrieving only a portion of an
entity fast. All data is stored in lexicographic order of the keys, so that rows that are
accessed together are physically co-located, given a careful key design. As all rows
are distributed into contiguous ranges (so-called tablets) among different tablet
servers, row scans only involve few servers and thus are very efficient.
2 In
some systems (e.g., BigTable and HBase), multi-versioning is implemented by adding a
timestamp as a third-level key.
4.3 Pivotal Trade-Offs: Latency vs. Consistency vs. Availability
61
Fig. 4.4 Data in a wide-column store
Bigtable [Cha+08], which pioneered the wide-column model, was specifically
developed to store a large collection of web pages as illustrated in Fig. 4.4. Every
row in the table corresponds to a single web page. The row key is a concatenation
of the URL components in reversed order, and every column key is composed of
the column family name and a column qualifier, separated by a colon. There are two
column families: the “contents” column family with only one column holding the
actual web page and the “anchor” column family holding links to each web page,
each in a separate column. Every cell in the table (i.e., every value accessible by
the combination of row and column key) can be versioned by timestamps or version
numbers. It is important to note that much of the information of an entity lies in the
keys and not only in the values [Cha+08].
4.3 Pivotal Trade-Offs: Latency vs. Consistency vs.
Availability
Another defining property of a database apart from how data is stored and how
it can be accessed is the level of consistency that is provided. Some databases
are built to guarantee strong consistency and serializability (ACID3 .), while others
favor availability (BASE4 .). This trade-off is inherent to every distributed database
system and the huge number of different NoSQL systems shows that there is a wide
spectrum between the two paradigms. In the following, we explain the two theorems
CAP and PACELC according to which database systems can be categorized by their
respective positions in this spectrum.
3 ACID
4 BASE
[HR83]: Atomicity, Consistency, Isolation, Durability.
[Pri08]: Basically Available, Soft-state, Eventually consistent.
62
4 Systems for Scalable Data Management
4.3.1 CAP
Like the famous FLP Theorem5 [FLP85], the CAP Theorem, presented by Eric
Brewer at PODC 2000 [Bre00] and later proven by Gilbert and Lynch [GL02],
is one of the most influential impossibility results in the field of distributed
computing. It places an upper bound on what can be accomplished by a distributed
system. Specifically, it states that a sequentially consistent read/write register6 that
eventually responds to every request, cannot be realized in an asynchronous system
that is prone to network partitions. In other words, the register can guarantee at most
two of the following three properties at the same time:
• Consistency (C). Reads and writes are always executed atomically and are
strictly consistent (linearizable [HW90]). Put differently, all clients have the same
view on the data at all times.
• Availability (A). Every non-failing node in the system can always accept read
and write requests from clients and will eventually return with a meaningful
response, i.e., not with an error message.
• Partition-tolerance (P). The system upholds the previously displayed consistency guarantees and availability in the presence of message loss between the
nodes or partial system failure.
Brewer argues that a system can be both available and consistent in normal operation, but in the presence of a network partition, this is not possible: if the system
continues to work in spite of the partition, there is some non-failing node that has
lost contact to the other nodes and thus has to decide to either continue processing
client requests to preserve availability (AP, eventually consistent systems) or to
reject client requests in order to uphold consistency guarantees (CP). The first option
violates consistency because it might lead to stale reads and conflicting writes,
while the second option obviously sacrifices availability. There are also systems
that usually are available and consistent but fail completely when there is a partition
(CA), for example, single-node systems. It has been shown that the CAP-theorem
holds for any consistency property that is at least as strong as causal consistency,
which also includes any recency bounds on the permissible staleness of data (atomicity) [MAD+11]. Serializability as the correctness criterion of transactional
isolation does not require strong consistency. However, similar to consistency,
serializability cannot be achieved under network partitions either [DGMS85].
The classification of NoSQL systems as either AP, CP or CA vaguely reflects
the individual systems’ capabilities and hence is widely accepted as a means for
high-level comparisons. However, it is important to note that the CAP Theorem
5 The
FLP theorem states, that in a distributed system with asynchronous message delivery, no
algorithm can guarantee to reach a consensus between participating nodes if one or more of them
can fail by stopping.
6 A read/write register is a data structure with only two operations: setting a specific value (set)
and returning the latest value that was set (get).
4.4 Relaxed Consistency Models
63
actually does not state anything on normal operation; it merely expresses whether
a system favors availability or consistency in the face of a network partition.
In contrast to the FLP-Theorem, the CAP theorem assumes a failure model that
allows arbitrary messages to be dropped, reordered or delayed indefinitely. Under
the weaker assumption of reliable communication channels (i.e., messages always
arrive but asynchronously and possibly reordered) a CAP-system is in fact possible
using the Attiya, Bar-Noy, Dolev algorithm [ABN+95], as long as a majority of
nodes are up.7
4.3.2 PACELC
The shortcomings of the CAP Theorem were addressed by Abadi [Aba12] who
points out that the CAP Theorem fails to capture the trade-off between latency and
consistency during normal operation, even though it has proven to be much more
influential on the design of distributed systems than the availability-consistency
trade-off in failure scenarios. He formulates PACELC which unifies both trade-offs
and thus portrays the design space of distributed systems more accurately. From
PACELC, we learn that in case of a Partition, there is an Availability-Consistency
trade-off; Else, i.e., in normal operation, there is a Latency-Consistency trade-off.
This classification offers two possible choices for the partition scenario (A/C)
and also two for normal operation (L/C) and thus appears more fine-grained than
the CAP classification. However, many systems cannot be assigned exclusively to
one single PACELC class and one of the four PACELC classes, namely PC/EL, can
hardly be assigned to any system.
In summary, NoSQL database systems support applications in achieving horizontal scalability, high availability and backend performance through differentiated
trade-offs in functionality and consistency.
4.4 Relaxed Consistency Models
CAP and PACELC motivate that there is a broad spectrum of choices regarding
consistency guarantees and that the strongest guarantees are irreconcilable with high
availability. In the following, we examine different consistency models that fulfill
two requirements needed in modern application development. First, the models
must exhibit sufficient power to precisely express latency-consistency trade-offs
7 Therefore, consensus as used for coordination in many NoSQL systems either natively [Bak+11]
or through coordination services like Chubby and Zookeeper [Hun+10] is considered a “harder”
problem than strong consistency, as it cannot even be guaranteed in a system with reliable channels
[FLP85].
64
4 Systems for Scalable Data Management
introduced by caching and replication. Second, the consistency models must have
the simplicity to allow easy reasoning about application behavior for developers and
system architects.
As summarized in Fig. 4.5, NoSQL systems exhibit various relaxed consistency
guarantees that are usually a consequence of replication and caching. Eventual
consistency is a commonly used term to distinguish between strongly consistent
(linearizable) systems and systems with relaxed guarantees. Eventual consistency
is slightly stronger than weak consistency, as it demands that in the absence of
failures, the system converges to a consistent state. The problem with eventual
consistency is that it purely represents a liveness guarantee, i.e., it asserts that
some property is eventually reached [Lyn96]. However, it lacks a safety guarantee:
eventual consistency does not prescribe which state the database converges to
[Bai15, p. 20]. For example, the database could eventually converge to a null value
for every data item and would still be eventually consistent. For this reason, more
specific relaxed consistency models provide a framework for reasoning about safety
guarantees that are weaker than strong, immediate consistency.
The idea of relaxing correctness guarantees is wide-spread in the database
world. Even in single-node systems, providing ACID and in particular serializability incurs performance penalties through limited concurrency and contention,
especially on multi-core hardware [Gra+76]. As a consequence, weak isolation
models relax the permissible transaction schedules by allowing certain concurrency
anomalies that are not present under serializability. Bailis et al. [Bai+13b] surveyed
Fig. 4.5 An overview of selected consistency models. Arrows indicate which models are subsumed by a stronger model
4.4 Relaxed Consistency Models
65
18 representative systems claiming to provide ACID or “NewSQL”8 guarantees. Of
these systems, only three provided serializability by default, and eight did not offer
serializable isolation at all.
4.4.1 Strong Consistency Models
The strongest consistency guarantee in a concurrent system is linearizability (see
Definition 4.1) introduced by Herlihy and Wing [HW90]. A linearizable system
behaves analogously to a single-node system, i.e., each read and write appears to
be applied at one defined point in time between invocation and response. While
linearizability is the gold standard for correctness, it is not only subject to the CAP
theorem, but also hard to implement at scale [Lee+15, Ajo+15, Bal+15, DGMS85,
Kra+13, Ter+13, BK13, BT11, Wad+11].
Definition 4.1 An execution satisfies linearizability, if all operations are totally
ordered by their arrival time. Any read with an invocation time larger than the
response time of a preceding write is able to observe its effects. Concurrent
operations must guarantee sequential consistency, i.e., overlapping write operations
become visible to all reads in a defined global order.
Sequential consistency (see Definition 4.2) is a frequently used model in
operating system and hardware design that is slightly weaker than linearizability.
It does not guarantee any recency constraints, but it ensures that writes become
visible for each client in the same order. So in contrast to linearizability, the global
ordering of operations is not required to respect real-time ordering, only the local
real-time ordering for each client is preserved.
Definition 4.2 An execution satisfies sequential consistency, if there is a global
order of read and write operations that is consistent with the local order in which
they were submitted by each client.
Consistency in replicated systems is sometimes confused with consistency in
ACID transactions. With respect to ACID, consistency implies that no integrity
constraints are violated, e.g., foreign key constraints. In distributed, replicated systems, consistency is an ordering guarantee for reads and writes that are potentially
executed concurrently and on different copies of the data. The main correctness
criterion for transactional isolation is serializability, which does not require strong
consistency. If conflict serializability is combined with strong consistency, it is
8 The
term NewSQL was coined by relational database vendors seeking to provide similar
scalability and performance as NoSQL databases while maintaining well-known abstractions such
as SQL as a query language and ACID guarantees [Gro+13]. This is achieved by introducing tradeoffs that are mostly similar to that of NoSQL databases. Examples are H-Store [Kal+08], VoltDB
[SW13], Clustrix [Clu], NuoDB [Nuo], and Calvin [Tho+12] that are discussed in Chap. 6.
66
4 Systems for Scalable Data Management
referred to as strict (or strong) serializability (e.g., in Spanner [Coo13]) or commit
order-preserving conflict serializability (COCSR) [WV02]. Just as linearizability,
serializability is also provably irreconcilable with high availability [Bai+13c].
4.4.2 Staleness-Based Consistency Models
To increase efficiency, staleness-based models allow stale reads, i.e., returning
outdated data. The two common measures for quantifying staleness are (wall-clock)
time and object versions. In contrast, k-atomicity (see Definition 4.3) [AAB05]
bounds staleness by only allowing reads to return a value written by one of the k
preceding updates. Thus k-atomicity with k = 1 is equivalent to linearizability.
Definition 4.3 An execution satisfies k-atomicity, if any read returns one of the
versions written by the k preceding, completed writes that must have a global order
that is consistent with real-time order.
-atomicity (see Definition 4.4) introduced by Golab et al. [GLS11] expresses
a time-based recency guarantee. Intuitively, is the upper bound on staleness
observed for any read in the system, i.e., it never happens that the application reads
data that has been stale for longer than time units.
Definition 4.4 An execution satisfies -atomicity, if any read returns either the
latest preceding write or the value of a write that returned at most time units ago.
-atomicity is a variant of the influential atomic semantics definition introduced
by Lamport in the context of inter-process communication [Lam86b, Lam86a].
Atomicity and linearizability are equivalent [VV16], i.e., they demand that there is
a logical point of linearization between invocation and response for each operation
at which it appears to be applied instantaneously [HW90]. An execution is atomic, if by decreasing the start time of each read operation by produces an
atomic execution. Lamport also introduced two relaxed properties of regular and
safe semantics that are still often used in the literature. In the absence of a concurrent
write, regular and safe reads behave exactly like atomic reads. However, during
concurrent writes, safe reads are allowed to return arbitrary values.9 A read under
regular semantics returns either the latest completed write or the result of any
concurrent write. The extension of safety and regularity to -safety, -regularity,
k-safety, and k-regularity is straightforward [AAB05, GLS11, Bai+14b].
Other time-based staleness models from the literature are very similar to atomicity. Delta consistency by Singla et al. [SRH97], timed consistency by
Torres-Rojas et al. [TAR99], and bounded staleness by Mahajan et al. [Mah+11]
all express that a write should become visible before a defined maximum delay.
9 The
usefulness of this property has been criticized for database systems, as no typical database
would return values that have never been written, even under concurrent writes [Ber14].
4.4 Relaxed Consistency Models
67
Fig. 4.6 An example execution of interleaved reads and writes from three clients that yields
different read results depending on the consistency model. Brackets indicate the time between
invocation and response of an operation
-atomicity is hard to measure experimentally due to its dependency on a global
time. Golab et al. [Gol+14] proposed Ŵ-atomicity as a closely related alternative that
is easier to capture in benchmarks. The central difference is that the Ŵ parameter also
allows writes to be reordered with a tolerance of Ŵ time units, whereas -atomicity
only considers earlier starting points for reads, while maintaining the order of writes.
With NoSQLMark, we proposed an experimental methodology to measure lower
and upper staleness bounds [Win+15].
For illustration of these models, please consider the example execution in
Fig. 4.6. The result x of the read operation performed by client C3 depends on the
consistency model:
• With atomicity (including k = 1 and = 0) or linearizability, x can be either
2 or 3. x cannot be 4 since the later read of 3 by client C2 would then violate
linearizability.
• Under sequential consistency semantics, x can be 0 (the initial value), 1, 2, 3,
or 4. As C3 only performs a read, no local order has to be maintained. It can be
serialized to the other clients’ operations in any order.
• Given regular semantics, x can be either 2, 3, or 4.
• Under safe semantics, x can be any value.
• For -atomicity with = 1, x can be 2 or 3. With = 2, x can be 1, 2, or 3: if
the begin of the read was stretched by = 2 time units to begin at time 1, then
1, 2, and 3 would be reads satisfying atomicity.
• For k-atomicity with k = 2, x can be 1, 2, or 3: compared to atomicity, a lag of
one older object version is allowed.
-atomicity and k-atomicity can be extended to the probabilistic guarantees
(, p)-atomicity and (k, p)-atomicity (see Definition 4.5) [Bai+14b]. This allows
expressing the average time or version-based lag as a distribution. For consistency
benchmarks and simulations, these values are preferable, as they express more
68
4 Systems for Scalable Data Management
details than -atomicity and k-atomicity which are just bounded by the maximum
encountered values [BWT17, Ber14, Bai+14b].
Definition 4.5 An execution satisfies (, p)-atomicity, if reads are -atomic with
probability p. Similarly, an execution satisfies (k, p)-atomicity, if reads are katomic with probability p.
4.4.3 Session-Based Consistency Models
Data-centric consistency models like linearizability and -atomicity describe consistency from the provider’s perspective, i.e., in terms of synchronization schemes
to provide certain guarantees. Client-centric or session-based models take the
perspective of clients interacting with the database and describe guarantees an
application expects within a session.
Monotonic writes consistency (see Definition 4.6) guarantees that updates from
a client do not get overwritten or reordered. Systems that lack this guarantee make
it hard to reason about how updates behave, as they can be seen by other clients in a
different order [Vog09]. For example, in a social network without monotonic write
consistency, posts by a user could be observed in a different, potentially nonsensical
order by other users.
Definition 4.6 A session satisfies monotonic writes consistency, if the order of all
writes from that session is maintained for reads.
Monotonic reads consistency (see Definition 4.7) guarantees that if a client has
read version n of an object, it will later only see versions ≥ n [TS07]. For example
on a content website, this would prevent a user from first seeing a revised edition of
an article and then upon a later return to the page reading the unrevised article.
Definition 4.7 A session satisfies monotonic reads consistency, if reads return
versions in a monotonically increasing order.
With read your writes consistency (see Definition 4.8) clients are able to observe
their own interactions. For example, in a web application with user-generated
content, a user could reload the page and still see the update he applied.
Definition 4.8 A session satisfies read your writes consistency, if reads return a
version that is equal to or higher than the latest version written in that session.
Combining the above three session guarantees yields the PRAM consistency
level (see Definition 4.9) [LS88a]. It prescribes that all clients observe writes from
different processes in their local order, i.e., as if the writes were in a pipeline.
However, in contrast to sequential consistency, there is no global order for writes.
Definition 4.9 If monotonic writes consistency, monotonic reads consistency, and
read your writes consistency are guaranteed, pipelined random access memory
(PRAM) consistency is satisfied.
4.4 Relaxed Consistency Models
69
With writes follow reads consistency (see Definition 4.10), applications get the
guarantee that their writes will always be accompanied by the relevant information
that might have influenced the write. For example, writes follow reads (also called
session causality) prevents the anomaly of a user responding to a previous post or
comment on a website where other users would observe the response without seeing
the original post it is based on.
Definition 4.10 A session satisfies writes follow reads consistency, if its writes
are ordered after any other writes that were observed by previous reads in the
session.
Causal consistency (see Definition 4.11) [Ady99, Bai+13c] combines the previous session guarantees. It is based on the concept of potential causality introduced
through Lamport’s happened-before relation in the context of message passing
[Lam78]. An operation a causally depends on an operation b, if [HA90]:
1. a and b were issued by the same client and the database received b before a,
2. a is a read that observed the write b, or
3. a and b are connected transitively through condition 1. and/or 2.
In distributed systems, causality is often tracked using vector clocks [Fid87].
Causal consistency can be implemented through a middleware or directly in the
client by tracking causal dependencies and only revealing updates when their
causal dependencies are visible, too [Bai+13a, Ber+13]. Causal consistency is the
strongest guarantee that can be achieved with high availability in the CAP theorem’s
system model of unreliable channels and asynchronous messaging [MAD+11]. The
reason for causal consistency being compatible with high availability is that causal
consistency does not require convergence of replicas and does not imply staleness
bounds [GH02]. Replicas can be in completely different states, as long as they only
return writes where causal dependencies are met.
Bailis et al. [Bai+13a] argued that potential causality leads to a high fan-out of
potentially relevant data. Instead, application-defined causality can help to minimize
the actual dependencies. In practice, however, potential causality can be determined
automatically through dependency tracking (e.g., in COPS [Llo+11]), while explicit
causality forces application developers to declare dependencies.
Causal consistency can be combined with a timing constraint demanding that the
global ordering respects causal consistency with tolerance for each read, yielding
a model called timed causal consistency [TM05]. This model is weaker than atomicity: timed causal consistency with = 0 yields causal consistency, while
-atomicity with = 0 yields linearizability.
Definition 4.11 If both PRAM and writes follow reads are guaranteed, causal
consistency is satisfied.
Besides the discussed consistency models, many different deviations have been
proposed and implemented in the literature. Viotti and Vukolic [VV16] give a
comprehensive survey and formal definitions of consistency models. In particular,
70
4 Systems for Scalable Data Management
they review the overlapping definitions used in different lines of work across the
distributed systems, operating systems, and database research community.
While strong guarantees are a sensible default for application developers,
consistency guarantees are often relaxed in practice to shift the trade-off towards
non-functional availability and performance requirements.
4.5 Offloading Complexity to the Cloud: Database- and
Backend-as-a-Service
Cloud data management is the research field tackling the design, implementation,
evaluation and application implications of database systems in cloud environments
[GR15, GR16, GWR17, WGR+18, WGR19, WGW+20]. We group cloud data
management systems into two categories: Database-as-a-Service (DBaaS) and
Backend-as-a-Service (BaaS). In the DBaaS model, only data management is
covered. Therefore, application logic in a two- and three-tier architecture has to
employ an additional IaaS or PaaS cloud. BaaS combines a DBaaS with custom
application logic and standard APIs for web and app development. BaaS is a
form of serverless computing, an architectural approach that describes applications
which mostly rely on cloud services for both application logic and storage [Rob16].
Besides the BaaS mode, serverless architectures can also make use of Function-asa-Service (FaaS) providers, that provide scalable and stateless execution of business
logic functions in a highly elastic environment (e.g., AWS Lambda, and Google
Cloud Functions). BaaS combines the concepts of DBaaS with a FaaS execution
layer for business logic.
4.5.1 Database-as-a-Service
Hacigumus et al. [HIM02] introduced DBaaS as an approach to run databases
without acquiring hardware or software. As the landscape of DBaaS systems
has become a highly heterogeneous ecosystem, we propose a two-dimensional
classification as shown in Fig. 4.7. The first dimension is the data model ranging
from structured relational systems over semi-structured or schema-free data to
completely unstructured data. The second dimension describes the deployment
model.
Cloud-deployed databases use an IaaS or PaaS cloud to provision an operating
system and the database software as an opaque application. Cloud providers usually
maintain a repository of pre-built machine images containing RDBMSs, NoSQL
databases, or analytics platforms that can be deployed as a virtual machine (VM)
[Bar+03]. While cloud-deployed systems allow for a high degree of customization,
4.5 Offloading Complexity to the Cloud: Database- and Backend-as-a-Service
71
Fig. 4.7 Classes of cloud databases and DBaaS systems according to their data model and
deployment model
maintenance (e.g., operating system and database updates), as well as operational
duties (e.g., scaling in and out) have to be implemented or performed manually.
In managed cloud databases, the service provider is responsible for configuration, scaling, provisioning, monitoring, backup, privacy, and access control
[Cur+11a]. Many commercial DBaaS providers offer standard database systems
(e.g., MongoDB, Redis, and MySQL) as a managed service. For example MongoDB
Atlas provides a managed NoSQL database [Mon], Amazon Elastic Map-Reduce
[Amab] is an Analytics-as-a-Service based on managed Hadoop clusters, and Azure
SQL Server offers a managed RDBMS [Muk+11].
DBaaS providers can also specifically develop a proprietary database or
cloud infrastructure to achieve scalability and efficiency goals that are harder to
implement with standard database systems. A proprietary architecture enables codesign of the database or analytics system with the underlying cloud infrastructure.
For example, Amazon DynamoDB provides a large-scale, multi-tenant NoSQL
database loosely based on the Dynamo architecture [Dyn], and Google provides
machine learning (ML) APIs for a variety of classification and clustering tasks
[Goob].
We refer to Chap. 7 for a discussion of DBaaS deployments in the context of
polyglot persistence and to Lehner and Sattler [LS13] and Zhao et al. [Zha+14] for
a more comprehensive overview on DBaaS research.
72
4 Systems for Scalable Data Management
Fig. 4.8 Architecture and usage of a Backend-as-a-Service
4.5.2 Backend-as-a-Service
Many data access and application patterns are very similar across different web
and mobile applications and can therefore be standardized. This was recognized by
the industry and led to BaaS systems that integrate DBaaS with application logic
and predefined building blocks, e.g., for push notifications, user login, static file
delivery, etc. BaaS is a rather recent trend and similar to early cloud computing
and Big Data processing, progress is currently driven by industry projects, while
structured research has yet to be established [Use, Par, Dep].
Figure 4.8 gives an overview of a generic BaaS architecture as similarly found
in commercial services (e.g., Azure Mobile Services, Firebase, Kinvey, and Baqend
[Baq]) as well as open-source projects (e.g., Meteor [HS16], Deployd [Dep], Hoodie
[Hoo], Parse Server [Par], BaasBox [Bas], and Apache UserGrid [Use]).
The BaaS cloud infrastructure consists of three central components. The DBaaS
component is responsible for data storage and retrieval. Its abstraction level can
range from structured relational, over semi-structured JSON to opaque files. The
FaaS component is concerned with the execution of server-side business logic,
for example, to integrate third-party services and perform data validation. It can
either be invoked as an explicit API or be triggered by DBaas operations. The
standard API component offers common application functionality in a conventionover-configuration style, i.e., it provides defaults for tasks such as user login, push
4.5 Offloading Complexity to the Cloud: Database- and Backend-as-a-Service
73
notifications, and messaging that are exposed for each tenant individually. The cloud
infrastructure is orchestrated by the BaaS provider to ensure isolated multi-tenancy,
scalability, availability, and monitoring.
The BaaS is accessed through a REST API [Dep, Hoo, Par, Bas, Use] (and
sometimes WebSockets [HS16]) for use with different client technologies. To handle
not only native mobile applications but also websites, BaaS systems usually provide
file hosting to deliver website assets like HTML and script files to browsers.
The communication with the BaaS is performed through SDKs employed in the
frontend. The SDKs provide high-level abstractions to application developers, for
example, to integrate persistence with application data models [Tor+17].
A feature that has been gaining popularity in recent years is real-time queries
as offered by Firebase, Meteor, Baqend, and others [WGR20]. Real-time queries
are typically provided over a WebSocket connection, since they require a persistent
communication channel between client and server. While the ability to have informational updates pushed to the client is highly useful for developing collaborative
or other reactive applications, the major challenge for providing real-time queries
lies within scalability [Win19]: At the time of writing, Baqend’s real-time query
engine is the only one that scales with both write throughput and query concurrency.
For a detailed discussion of the current real-time database landscape, we refer to
[WRG19].
BaaS systems are thus confronted with even stronger latency challenges than a
DBaaS: all clients access the system via high-latency WAN network so that latency
for retrieving objects, files, and query results determines application performance.
Similar to DBaaS systems, BaaS APIs usually provide persistence on top of one
single database technology, making it infeasible to achieve all potential functional
and non-functional application requirements. The problem is even more severe when
all tenants are co-located on a shared database cluster. In that case, one database
system configuration (e.g., the replication protocol) prescribes the guarantees for
each tenant [ADE12].
4.5.3 Multi-Tenancy
The goal of multi-tenancy in DBaaS/BaaS systems is to allow efficient resource
pooling across tenants so that only the capacity for the global average resource
consumption has to be provisioned and resources can be shared. There is an inherent
trade-off between higher isolation of tenants and efficiency of resource sharing
[ADE12]. As shown in Fig. 4.9, the boundary between tenant-specific resources and
shared provider resources can be drawn at different levels of the software stack
[MB16, p. 562]:
• With private operating system (OS) virtualization, each tenant is assigned to
one or multiple VMs that execute the database process. This model achieves
74
4 Systems for Scalable Data Management
Fig. 4.9 Different approaches to multi-tenancy in DBaaS/BaaS systems. The dashed line indicates
the boundary between shared and tenant-specific resources
a high degree of isolation, similar to IaaS clouds. However, resource reuse is
limited as each tenant has the overhead of a full OS and database process.
• By allocating a private process to each tenant, the overhead of a private OS
can be mitigated. To this end, the provider orchestrates the OS to run multiple
isolated database processes. This is usually achieved using container technology
such as Docker [Mer14] that isolates processes within a shared OS.
• Efficiency can be further increased if tenants only possess a private schema
within a shared database process. The database system can thus share various
system resources (e.g., the buffer pool) between tenants to increase I/O efficiency.
• The shared schema model requires all tenants to use the same application that
dictates the common schema. The schema can be adapted to specific tenant
requirements by extending it with additional fields or tables [KL11]. A shared
schema is frequently used in SaaS applications such as Salesforce [Onl].
The major open challenge for multi-tenancy of NoSQL systems in cloud environments is database independence and the combination with multi-tenant FaaS code
execution. If a generic middleware can expose unmodified data stores as a scalable,
multi-tenant DBaaS/BaaS, the problems of database and service architectures are
decoupled, and polyglot persistence is enabled. In Chap. 7, we will go into more
detail on polyglot persistence in modern data management.
Most research efforts in the DBaaS community have been concerned with multitenancy and virtualization [Aul+11, Aul+08, Aul+09, KL11, SKM08, WB09, JA07],
database privacy and encryption [KJH15, Gen09, Pop+11, Pop14, Kar+16, PZ13,
Pop+14], workload management [Cun+07, Zha+14, ABC14, Bas12, Xio+11,
Ter+13, LBMAL14, Pad+07, Sak14], resource allocation [Mad+15, Sou+09],
automatic scaling [Kim+16, LBMAL14], and benchmarking [Dey+14, Coo+10,
Coo+10, Pat+11, BZS13, Ber+14, BT11, BK13, BT14, Ber15, Ber14]. However,
several DBaaS and BaaS challenges still require further research [Ges19]:
• Low latency access to DBaaS systems, to improve application performance and
allow distribution of application logic and data storage
References
75
• Unified REST/HTTP access to polyglot data stores with service level agreements for functional and non-functional guarantees
• Elastic scalability of read and query workloads for arbitrary database systems
• Generic, database-independent APIs and capabilities for fundamental data
management abstractions such as schema management, FaaS business logic, realtime queries, multi-tenancy, search, transactions, authentication, authorization,
user management, and file storage for single databases and across databases.
4.6 Summary
The challenges of building low-latency applications can be attributed to three
different layers in the application stack: the frontend, the backend, and the network.
In Chap. 2, we started with the technical foundations of scalable cloud-based web
applications and compared different architectural designs with respect to the way
that data is accessed and assembled. In Chap. 3, we then explored the network
performance of web applications which is determined by the design of the HTTP
protocol and the constraints of the predominant REST architectural style, and we
reviewed the mechanisms that HTTP provides for web caching and how they relate
to the infrastructure of the Internet. In this chapter, we turned to data management
and gave an overview of NoSQL system’s data models, their different notions of
consistency, and their use for Database- and Backend-as-a-Service cloud service
models.
In the following chapters, we will continue with a focus on data management
by surveying today’s caching technology in Chap. 5, transactional semantics in
distributed systems in Chap. 6, and approaches for polyglot persistence in Chap. 7.
In Chaps. 8 and 9, we will conclude with a classification of today’s NoSQL
landscape and a projection of possible future developments, respectively.
References
[AAB05] Amitanand S. Aiyer, Lorenzo Alvisi, and Rida A. Bazzi. “On the Availability of Nonstrict Quorum Systems”. In: Distributed Computing, 19th International Conference,
DISC 2005, Cracow, Poland, September 26–29, 2005, Proceedings. Ed. by Pierre
Fraigniaud. Vol. 3724. Lecture Notes in Computer Science. Springer, 2005, pp. 48–62.
DOI: 10.1007/11561927_6.
[Aba12] D. Abadi. “Consistency tradeoffs in modern distributed database system design: CAP
is only part of the story”. In: Computer 45.2 (2012), pp. 37–42. URL: http://ieeexplore.
ieee.org/xpls/abs_all.jsp?arnumber=6127847 (visited on 10/10/2012).
[ABC14] Ioannis Arapakis, Xiao Bai, and B. Barla Cambazoglu. “Impact of Response Latency
on User Behavior in Web Search”. In: Proceedings of the 37th International ACM
SIGIR Conference on Research & Development in Information Retrieval. SIGIR
’14. Gold Coast, Queensland, Australia: ACM, 2014, pp. 103–112. ISBN: 978-1-45032257-7. DOI: 10.1145/2600428.2609627. URL: http://doi.acm.org/10.1145/2600428.
2609627.
76
4 Systems for Scalable Data Management
[ABN+95] H. Attiya, A. Bar-Noy, et al. “Sharing memory robustly in message-passing systems”.
In: JACM 42.1 (1995).
[ADE12] Divyakant Agrawal, Sudipto Das, and Amr El Abbadi. Data Management in the
Cloud: Challenges and Opportunities. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2012. DOI: 10.2200/S00456ED1V01Y201211DTM032.
[Ady99] Atul Adya. “Weak consistency: a generalized theory and optimistic implementations
for distributed transactions”. PhD thesis. Massachusetts Institute of Technology, 1999.
URL : http://www.csd.uoc.gr/~hy460/pdf/adya99weak.pdf (visited on 01/03/2015).
[Ajo+15] Phillipe Ajoux et al. “Challenges to adopting stronger consistency at scale”. In: 15th
Workshop on Hot Topics in Operating Systems (HotOS XV). 2015. URL: https://
www.usenix.org/conference/hotos15/workshop-program/presentation/ajoux (visited
on 11/28/2016).
[Amab] Amazon Web Services AWS â Server Hosting & Cloud Services. https://aws.amazon.
com/de/. (Accessed on 05/20/2017). 2017.
[Aul+08] S. Aulbach et al. “Multi-tenant databases for software as a service: schema-mapping
techniques”. In: Proceedings of the 2008 ACM SIGMOD international conference on
Management of data. 2008, pp. 1195–1206. URL: http://dl.acm.org/citation.cfm?id=
1376736 (visited on 11/15/2012).
[Aul+09] Stefan Aulbach et al. “A comparison of flexible schemas for software as a service”.
In: Proceedings of the ACM SIGMOD International Conference on Management of
Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009. Ed. by
Ugur Çetintemel et al. ACM, 2009, pp. 881–888. DOI: 10.1145/1559845.1559941.
[Aul+11] Stefan Aulbach et al. “Extensibility and Data Sharing in evolving multitenant
databases”. In: Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11–16, 2011, Hannover, Germany. Ed. by Serge Abiteboul et
al. IEEE Computer Society, 2011, pp. 99–110. DOI: 10.1109/ICDE.2011.5767872.
[Bai+13a] Peter Bailis et al. “Bolt-on Causal Consistency”. In: Proceedings of the 2013 ACM
SIGMOD International Conference on Management of Data. SIGMOD ’13. New
York, New York, USA: ACM, 2013, pp. 761–772.
[Bai+13b] Peter Bailis et al. “HAT, not CAP: Highly available transactions”. In: Workshop on
Hot Topics in Operating Systems. 2013.
[Bai+13c] Peter Bailis et al. “Highly Available Transactions: Virtues and Limitations”. In:
Proceedings of the VLDB Endowment 7.3 (2013). 00001.
[Bai+14b] Peter Bailis et al. “Quantifying eventual consistency with PBS”. en. In: The
VLDB Journal 23.2 (Apr. 2014), pp. 279–302. ISSN: 1066-8888, 0949-877X.
DOI: 10.1007/s00778-013-0330-1. URL : http://link.springer.com/10.1007/s00778013-0330-1 (visited on 01/03/2015).
[Bai15] Peter Bailis. “Coordination Avoidance in Distributed Databases”. PhD thesis. University of California, Berkeley, USA, 2015. URL: http://www.escholarship.org/uc/item/
8k8359g2.
[Bak+11] J. Baker et al. “Megastore: Providing scalable, highly available storage for interactive
services”. In: Proc. of CIDR. Vol. 11. 2011, pp. 223–234.
[Bal+15] Valter Balegas et al. “Putting consistency back into eventual consistency”.
en. In: ACM Press, 2015, pp. 1–16. ISBN: 978-1-4503-3238-5. DOI:
10.1145/2741948.2741972. URL: http://dl.acm.org/citation.cfm?doid=2741948.
2741972 (visited on 11/25/2016).
[Baq] News BaaS Benchmark. https://github.com/Baqend/news-benchmark (Accessed on
09/08/2018). 2018.
[Bar+03] P. Barham et al. “Xen and the art of virtualization”. In: ACM SIGOPS Operating
Systems Review. Vol. 37. 2003, pp. 164–177. URL: http://dl.acm.org/citation.cfm?id=
945462%7C (visited on 10/09/2012).
[Bas] The BaasBox server. https://github.com/baasbox/baasbox. (Accessed on 05/20/2017).
2017. URL: https://github.com/baasbox/baasbox (visited on 02/19/2017).
References
77
[Bas12] Salman A. Baset. “Cloud SLAs: present and future”. In: ACM SIGOPS Operating
Systems Review 46.2 (2012), pp. 57–66. URL: http://dl.acm.org/citation.cfm?id=
2331586 (visited on 01/03/2015).
[Ber+13] David Bermbach et al. “A Middleware Guaranteeing Client-Centric Consistency on
Top of Eventually Consistent Datastores”. In: 2013 IEEE International Conference
on Cloud Engineering, IC2E 2013, San Francisco, CA, USA, March 25–27, 2013.
IEEE Computer Society, 2013, pp. 114–123. DOI: 10.1109/IC2E.2013.32.
[Ber+14] David Bermbach et al. “Towards an Extensible Middleware for Database Benchmarking”. In: Performance Characterization and Bench-marking. Traditional to Big Data
- 6th TPC Technology Conference, TPCTC 2014, Hangzhou, China, September 1–
5, 2014. Revised Selected Papers. Ed. by Raghunath Nambiar and Meikel Poess.
Vol. 8904. Lecture Notes in Computer Science. Springer, 2014, pp. 82–96. DOI:
10.1007/978-3-319-15350-6_6.
[Ber14] David Bermbach. Benchmarking Eventually Consistent Distributed Storage Systems.
eng. Karlsruhe, Baden: KIT Scientific Publishing, 2014. ISBN: 978-3-7315-0186-2
3-7315-0186-4 978-3-7315-0186-2.
[Ber15] David Bermbach. “An Introduction to Cloud Benchmarking”. In: 2015 IEEE International Conference on Cloud Engineering, IC2E 2015, Tempe, AZ, USA, March 9–13,
2015. IEEE Computer Society, 2015, p. 3. DOI: 10.1109/IC2E.2015.65.
[BK13] David Bermbach and Jörn Kuhlenkamp. “Consistency in Distributed Storage Systems
- An Overview of Models, Metrics and Measurement Approaches”. In: Networked
Systems - First International Conference, NETYS 2013, Marrakech, Morocco, May
2–4, 2013, Revised Selected Papers. Ed. by Vincent Gramoli and Rachid Guerraoui.
Vol. 7853. Lecture Notes in Computer Science. Springer, 2013, pp. 175–189. DOI:
10.1007/978-3-642-40148-0_13.
[Bre00] Eric A. Brewer. Towards Robust Distributed Systems. 2000.
[BT11] David Bermbach and Stefan Tai. “Eventual consistency: How soon is eventual?
An evaluation of Amazon S3’s consistency behavior”. In: Proceedings of the 6th
Workshop on Middleware for Service Oriented Computing, MW4SOC 2011, Lisbon,
Portugal, December 12–16, 2011. Ed. by Karl M. Göschka, Schahram Dustdar, and
Vladimir Tosic. ACM, 2011, p. 1. DOI: 10.1145/2093185.2093186.
[BT14] David Bermbach and Stefan Tai. “Benchmarking Eventual Consistency: Lessons
Learned from Long-Term Experimental Studies”. In: 2014 IEEE International
Conference on Cloud Engineering, Boston, MA, USA, March 11–14, 2014. IEEE
Computer Society, 2014, pp. 47–56. DOI: 10.1109/IC2E.2014.37.
[BWT17] David Bermbach, Erik Wittern, and Stefan Tai. Cloud Service Benchmarking Measuring Quality of Cloud Services from a Client Perspective. Springer, 2017. ISBN:
978-3-319-55482-2. DOI: 10.1007/978-3-319-55483-9.
[BZS13] David Bermbach, Liang Zhao, and Sherif Sakr. “Towards Comprehensive Measurement of Consistency Guarantees for Cloud-Hosted Data Storage Services”. In:
Performance Characterization and Benchmarking - 5th TPC Technology Conference,
TPCTC 2013, Trento, Italy, August 26, 2013, Revised Selected Papers. Ed. by
Raghunath Nambiar and Meikel Poess. Vol. 8391. Lecture Notes in Computer Science.
Springer, 2013, pp. 32–47. DOI: 10.1007/978-3-319-04936-6_3.
[Cha+08] Fay Chang et al. “Bigtable: A distributed storage system for structured data”. In: ACM
Transactions on Computer Systems (TOCS) 26.2 (2008), p. 4.
[Clu] Clustrix: A New Approach to Scale-Out RDBMS. http://www.clustrix.com/wpcontent/uploads/2017/01/Whitepaper-ANewApproachtoScaleOutRDBMS.pdf.
(Accessed on 05/20/2017). 2017. URL: http://www.clustrix.com/wp-content/
uploads/2017/01/Whitepaper-ANewApproachtoScaleOutRDBMS.pdf (visited on
02/18/2017).
[Coo+10] Brian F. Cooper et al. “Benchmarking cloud serving systems with YCSB”. In:
Proceedings of the 1st ACM symposium on Cloud computing. ACM, 2010, pp. 143–
154. URL: http://dl.acm.org/citation.cfm?id=1807152 (visited on 11/26/2016).
78
4 Systems for Scalable Data Management
[Coo13] Brian F. Cooper. “Spanner: Google’s globally-distributed database”. In: 6th Annual
International Systems and Storage Conference, SYSTOR ’13, Haifa, Israel - June 30 July 02, 2013. Ed. by Ronen I. Kat, Mary Baker, and Sivan Toledo. ACM, 2013, p. 9.
DOI: 10.1145/2485732.2485756.
[Cun+07] Ítalo S. Cunha et al. “Self-Adaptive Capacity Management for Multi-Tier Virtualized
Environments”. In: Integrated Network Management, IM 2007. 10th IFIP/IEEE
International Symposium on Integrated Network Management, Munich, Germany, 21–
25 May 2007. IEEE, 2007, pp. 129–138. DOI: 10.1109/INM.2007.374777.
[Cur+11a] Carlo Curino et al. “Relational Cloud: A Database-as-a-Service for the Cloud”. In:
Proc. of CIDR. 2011. URL: http://dspace.mit.edu/handle/1721.1/62241 (visited on
04/15/2014).
[Dep] Deployd: a toolkit for building realtime APIs. https://github.com/deployd/deployd.
(Accessed on 05/20/2017). 2017. URL: https://github.com/deployd/deployd (visited
on 02/19/2017).
[Dey+14] Anamika Dey et al. “YCSB+T: Benchmarking web-scale transactional databases”. In:
Data Engineering Workshops (ICDEW), 2014 IEEE 30th International Conference
on. IEEE. 2014, pp. 223–230.
[DGMS85] Susan B Davidson, Hector Garcia-Molina, and Dale Skeen. “Consistency in a
partitioned network: a survey”. In: ACM Computing Surveys (CSUR) 17.3 (1985),
pp. 341–370.
[Dyn] DynamoDB. http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/
Introduction.html. (Accessed on 05/20/2017). 2017. URL: http://docs.aws.
amazon.com/amazondynamodb/latest/developerguide/Introduction.html
(visited
on 01/13/2017).
[Fid87] Colin J Fidge. “Timestamps in message-passing systems that preserve the partial
ordering”. In: (1987).
[FLP85] M. J. Fischer, N. A. Lynch, and M. S. Paterson. “Impossibility of distributed consensus
with one faulty process”. In: Journal of the ACM (JACM) 32.2 (1985), pp. 374–382.
URL : http://dl.acm.org/citation.cfm?id=214121 (visited on 11/27/2012).
[Gen09] Craig Gentry. “A fully homomorphic encryption scheme”. PhD thesis. Stanford
University, 2009.
[Ges19] Felix Gessert. “Low Latency for Cloud Data Management”. PhD thesis. University
of Hamburg, Germany, 2019. URL: http://ediss.sub.uni-hamburg.de/volltexte/2019/
9541/.
[GH02] Rachid Guerraoui and Corine Hari. “On the consistency problem in mobile distributed
computing”. In: Proceedings of the 2002 Workshop on Principles of Mobile Computing, POMC 2002, October 30–31, 2002, Toulouse, France. ACM, 2002, pp. 51–57.
DOI: 1.1145/584490.584501.
[GL02] S. Gilbert and N. Lynch. “Brewer’s conjecture and the feasibility of consistent,
available, partition-tolerant web services”. In: ACM SIGACT News 33.2 (2002),
pp. 51–59.
[GLS11] Wojciech Golab, Xiaozhou Li, and Mehul A. Shah. “Analyzing consistency properties
for fun and profit”. In: ACM PODC. ACM, 2011, pp. 197–206. URL: http://dl.acm.org/
citation.cfm?id=1993834 (visited on 09/28/2014).
[Gol+14] Wojciech M. Golab et al. “Client-Centric Benchmarking of Eventual Consistency
for Cloud Storage Systems”. In: IEEE 34th International Conference on Distributed
Computing Systems, ICDCS 2014, Madrid, Spain, June 30 - July 3, 2014. IEEE
Computer Society, 2014, pp. 493–502. DOI: 10.1109/ICDCS.2014.57.
[Goob] Google Cloud Prediction API. https://cloud.google.com/prediction/docs/. (Accessed
on 06/18/2017). 2017.
[GR15] Felix Gessert and Norbert Ritter. “Skalierbare NoSQL- und Cloud-Datenbanken in
Forschung und Praxis”. In: Datenbanksysteme fÃijr Business, Technologie und Web
(BTW 2015) - Workshopband, 2.–3. MÃd’rz 2015, Hamburg, Germany. 2015, pp. 271–
274.
References
79
[GR16] Felix Gessert and Norbert Ritter. “Scalable Data Management: NoSQL Data Stores in
Research and Practice”. In: 32nd IEEE International Conference on Data Engineering, ICDE 2016. 2016.
[Gra+76] Jim Gray et al. “Granularity of Locks and Degrees of Consistency in a Shared Data
Base”. In: Modelling in Data Base Management Systems, Proceeding of the IFIP
Working Conference on Modelling in Data Base Management Systems, Freudenstadt,
Germany, January 5–8, 1976. Ed. by G. M. Nijssen. North-Holland, 1976, pp. 365–
394.
[Gro+13] Katarina Grolinger et al. “Data management in cloud environments: NoSQL and
NewSQL data stores”. en. In: Journal of Cloud Computing: Advances, Systems and
Applications 2.1 (2013), p. 22. ISSN: 2192-113X. DOI: 10.1186/2192-113X-2-22.
http://www.journalofcloudcomputing.com/content/2/1/22
(visited
on
URL :
01/03/2015).
[GWR17] Felix Gessert, Wolfram Wingerath, and Norbert Ritter. “Scalable Data Management:
An In-Depth Tutorial on NoSQL Data Stores”. In: BTW (Workshops). Vol. P-266. LNI.
GI, 2017, pp. 399–402.
[HA90] Phillip W Hutto and Mustaque Ahamad. “Slow memory: Weakening consistency to
enhance concurrency in distributed shared memories”. In: Distributed Computing
Systems, 1990. Proceedings., 10th International Conference on. IEEE. 1990, pp. 302–
309.
[HIM02] H. Hacigumus, B. Iyer, and S. Mehrotra. “Providing database as a service”. In:
Data Engineering, 2002. Proceedings. 18th International Conference on. 2002,
pp. 29–38. URL: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=994695 (visited
on 10/16/2012).
[Hoo] GitHub - hoodiehq/hoodie: A backend for Offline First applications. https://github.
com/hoodiehq/hoodie. (Accessed on 05/25/2017). 2017. URL: https://github.com/
hoodiehq/hoodie (visited on 02/17/2017).
[HR83] Theo Haerder and Andreas Reuter. “Principles of transaction-oriented database
recovery”. In: ACM Comput. Surv. 15.4 (Dec. 1983), pp. 287–317.
[HS16] Stephan Hochhaus and Manuel Schoebel. Meteor in action. Manning Publ., 2016.
[Hun+10] Patrick Hunt et al. “ZooKeeper: Wait-free Coordination for Internet-scale Systems.”
In: USENIX Annual Technical Conference. Vol. 8. 2010, p. 9. URL: https://www.
usenix.org/event/usenix10/tech/full_papers/Hunt.pdf (visited on 01/03/2015).
[HW90] Maurice P Herlihy and Jeannette M Wing. “Linearizability: A correctness condition
for concurrent objects”. In: ACM Transactions on Programming Languages and
Systems (TOPLAS) 12.3 (1990), pp. 463–492.
[JA07] Dean Jacobs and Stefan Aulbach. “Ruminations on Multi-Tenant Data-bases”. In:
Datenbanksysteme in Business, Technologie und Web (BTW 2007), 12. Fachtagung
des GI-Fachbereichs “Datenbanken und Informationssysteme” (DBIS), Proceedings,
7.–9. März 2007, Aachen, Germany. Ed. by Alfons Kemper et al. Vol. 103. LNI.
GI, 2007, pp. 514–521. URL: http://subs.emis.de/LNI/Proceedings/Proceedings103/
article1419.html.
[Kal+08] R. Kallman et al. “H-store: a high-performance, distributed main memory transaction
processing system”. In: Proceedings of the VLDB Endowment 1.2 (2008), pp. 1496–
1499.
[Kar+16] Nikolaos Karapanos et al. “Verena: End-to-End Integrity Protection for Web Applications”. In: IEEE Symposium on Security and Privacy, SP 2016, San Jose,
CA, USA, May 22–26, 2016. IEEE Computer Society, 2016, pp. 895–913. DOI:
10.1109/SP.2016.58.
[Kim+16] In Kee Kim et al. “Empirical Evaluation of Workload Forecasting Techniques for
Predictive Cloud Resource Scaling”. In: 9th IEEE International Conference on Cloud
Computing CLOUD 2016, San Francisco, CA, USA, June 27 - July 2, 2016. IEEE
Computer Society, 2016, pp. 1–10. DOI: 10.1109/CLOUD.2016.0011.
80
4 Systems for Scalable Data Management
[KJH15] Jens Köhler, Konrad Jünemann, and Hannes Hartenstein. “Confidential database-asa-service approaches: taxonomy and survey”. In: Journal of Cloud Computing 4.1
(2015), p. 1. ISSN: 2192-113X. DOI: 10.1186/s13677-014-0025-1. URL: http://dx.doi.
org/10.1186/s13677-014-0025-1.
[KL11] Tim Kiefer and Wolfgang Lehner “Private Table Database Virtualization for DBaaS”.
In: IEEE 4th International Conference on Utility and Cloud Computing, UCC 2011,
Melbourne, Australia, December 5–8, 2011. IEEE Computer Society, 2011, pp. 328–
329. DOI: 10.1109/UCC.2011.52.
[Kle17] Martin Kleppmann. Designing Data-Intensive Applications. English. 1 edition.
O’Reilly Media, Jan. 2017. ISBN: 978-1-4493-7332-0.
[Kra+13] Tim Kraska et al. “MDCC: Multi-data center consistency”. In: EuroSys. ACM,
2013, pp. 113–126. URL: http://dl.acm.org/citation.cfm?id=2465363 (visited on
04/15/2014).
[Lam78] Leslie Lamport. “Time, Clocks, and the Ordering of Events in a Distributed System”.
In: Commun. ACM 21.7 (1978), pp. 558–565. DOI: 10.1145/359545.359563.
[Lam86a] Leslie Lamport. “On Interprocess Communication. Part I: Basic Formalism”. In:
Distributed Computing 1.2 (1986), pp. 77–85. DOI: 10.1007/BF01786227.
[Lam86b] Leslie Lamport. “On Interprocess Communication. Part II: Algorithms”. In: Distributed Computing 1.2 (1986), pp. 86–101. DOI: 10.1007/BF01786228.
[Lan01] Douglas Laney. 3D Data Management: Controlling Data Volume, Velocity, and
Variety. Tech. rep. META Group, Feb 2001.
[LBMAL14] Tania Lorido-Botran, Jose Miguel-Alonso, and JoseA. Lozano. “A Review of Autoscaling Techniques for Elastic Applications in Cloud Environments”. English. In:
Journal of Grid Computing 12.4 (2014), pp. 559–592. ISSN: 1570-7873. DOI:
10.1007/s10723-014-9314-7. URL: http://dx.doi.org/10.1007/s10723-014-9314-7.
[Lee+15] Collin Lee et al. “Implementing linearizability at large scale and low latency”. In:
Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015,
Monterey, CA, USA, October 4–7, 2015. Ed. by Ethan L. Miller and Steven Hand.
ACM, 2015, pp. 71–86. DOI: 10.1145/2815400.2815416.
[Llo+11] Wyatt Lloyd et al. “Don’t settle for eventual: scalable causal consistency for widearea storage with COPS”. In: Proceedings of the Twenty-Third ACM Symposium
on Operating Systems Principles. ACM, 2011, pp. 401–416. URL: http://dl.acm.org/
citation.cfm?id=2043593 (visited on 01/03/2015).
[LS13] Wolfgang Lehner and Kai-Uwe Sattler. Web-Scale Data Management for the Cloud.
Englisch. Auflage: 2013. New York: Springer, Apr. 2013. ISBN: 978-1-4614-6855-4.
[LS88a] Richard J Lipton and Jonathan S Sandberg. PRAM: A scalable shared memory.
Princeton University, Department of Computer Science, 1988.
[Lyn96] Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann, 1996. ISBN: 1-55860348-4.
[MAD+11] Prince Mahajan, Lorenzo Alvisi, Mike Dahlin, et al. “Consistency, availability, and
convergence”. In: University of Texas at Austin Tech Report 11 (2011).
[Mad+15] Gabor Madl et al. “Account clustering in multi-tenant storage management environments”. In: 2015 IEEE International Conference on Big Data, Big Data 2015, Santa
Clara, CA, USA, October 29 - November 1, 2015. IEEE, 2015, pp. 1698–1707. DOI:
10.1109/BigData.2015.7363941.
[Mah+11] Prince Mahajan et al. “Depot: Cloud Storage with Minimal Trust”. In: ACM Trans.
Comput. Syst. 29.4 (2011), 12:1–12:38. DOI: 10.1145/2063509.2063512.
[MB16] San Murugesan and Irena Bojanova. Encyclopedia of Cloud Computing. John Wiley
& Sons, 2016.
[Mer14] Dirk Merkel. “Docker: lightweight linux containers for consistent development and
deployment”. In: Linux Journal 2014.239 (2014), p. 2.
[Mon] MongoDB. https://www.mongodb.com/. (Accessed on 06/18/2017). 2017.
References
81
[Muk+11] Kunal Mukerjee et al. “SQL Azure as a Self-Managing Database Service: Lessons
Learned and Challenges Ahead.” In: IEEE Data Eng. Bull. 34.4 (2011), pp. 61–70.
URL : http://sites.computer.org/debull/A11dec/azure2.pdf (visited on 01/03/2015).
[Nuo] NuoDB: Emergent Architecture. http://go.nuodb.com/rs/nuodb/images/Greenbook_
Final.pdf. (Accessed on 04/30/2017). 2017. URL: http://go.nuodb.com/rs/nuodb/
images/Greenbook_Final.pdf (visited on 02/18/2017).
[Onl] Salesforce Online CRM. https://www.salesforce.com/en. (Accessed on 06/05/2017).
2017.
[Pad+07] Pradeep Padala et al. “Adaptive control of virtualized resources in utility computing
environments”. In: Proceedings of the 2007 EuroSys Conference, Lisbon, Portugal,
March 21–23, 2007. Ed. by Paulo Ferreira, Thomas R. Gross, and Luís Veiga. ACM,
2007, pp. 289–302. DOI: 10.1145/1272996.1273026.
[Par] Parse Server. http://parseplatform.github.io/docs/parse-server/guide/. (Accessed on
07/28/2017). 2017. URL: http://parseplatform.github.io/docs/parse-server/guide/ (visited on 02/19/2017).
[Pat+11] Swapnil Patil et al. “YCSB++: benchmarking and performance debugging advanced
features in scalable table stores”. In: ACM Symposium on Cloud Computing
in conjunction with SOSP 2011, SOCC ’11, Cascais, Portugal, October 26–28,
2011. Ed. by Jeffrey S. Chase and Amr El Abbadi. ACM, 2011, p. 9. DOI:
10.1145/2038916.2038925.
[Pop+11] R. A. Popa et al. “CryptDB: protecting confidentiality with encrypted query processing”. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems
Principles. 00095.2011, pp. 85–100. URL: http://dl.acm.org/citation.cfm?id=2043566
(visited on 11/16/2012).
[Pop+14] Raluca Ada Popa et al. “Building Web Applications on Top of Encrypted Data Using
Mylar”. In: Proceedings of the 11th USENIX Symposium on Networked Systems
Design and Implementation, NSDI 2014, Seattle, WA, USA, April 2–4, 2014. Ed. by
Ratul Mahajan and Ion Stoica. USENIX Association, 2014, pp. 157–172. URL: https://
www.usenix.org/conference/nsdi14/technical-sessions/presentation/popa.
[Pop14] Raluca Ada Popa. “Building practical systems that compute on encrypted data”. PhD
thesis. Massachusetts Institute of Technology, 2014.
[Pri08] Dan Pritchett. “BASE: An Acid Alternative”. In: Queue 6.3 (May 2008), pp. 48–55.
[PZ13] Raluca A. Popa and Nickolai Zeldovich. “Multi-Key Searchable Encryption”. In:
IACR Cryptology ePrint Archive 2013 (2013), p. 508. URL: http://eprint.iacr.org/2013/
508.
[Rob16] Mike Roberts. Serverless Architectures. https://martinfowler.com/articles/serverless.
html. (Accessed on 07/28/2017). 2016. URL: https://martinfowler.com/articles/
serverless.html (visited on 02/19/2017).
[Sak14] Sherif Sakr. “Cloud-hosted databases: technologies, challenges and opportunities”. In:
Cluster Computing 17.2 (2014), pp. 487–502. URL: http://link.springer.com/article/10.
1007/s10586-013-0290-7 (visited on 07/16/2014).
[SF12] Pramod J. Sadalage and Martin Fowler. NoSQL distilled: a brief guide to the emerging
world of polyglot persistence. Pearson Education, 2012.
[SKM08] Aameek Singh, Madhukar R. Korupolu, and Dushmanta Mohapatra. “Server-storage
virtualization: integration and load balancing in data centers”. In: Proceedings of the
ACM/IEEE Conference on High Performance Computing, SC 2008, November 15–21,
2008, Austin, Texas, USA. IEEE/ACM, 2008, p. 53. DOI: 10.1145/1413370.1413424.
[Sou+09] Gokul Soundararajan et al. “Dynamic Resource Allocation for Database Servers
Running on Virtual Storage”. In: 7th USENIX Conference on File and Storage
Technologies, February 24–27, 2009, San Francisco, CA, USA. Proceedings. Ed. by
Margo I. Seltzer and Richard Wheeler. USENIX, 2009, pp. 71–84. URL: http://www.
usenix.org/events/fast09/tech/full_papers/soundararajan/soundararajan.pdf.
[SRH97] Aman Singla, Umakishore Ramachandran, and Jessica K. Hodgins. “Temporal
Notions of Synchronization and Consistency in Beehive”. In: SPAA. 1997, pp. 211–
220. DOI: 10.1145/258492.258513.
82
4 Systems for Scalable Data Management
[SW13] Michael Stonebraker and Ariel Weisberg. “The VoltDB Main Memory DBMS”. In:
IEEE Data Eng. Bull. 36.2 (2013), pp. 21–27. URL: http://sites.computer.org/debull/
A13june/VoltDB1.pdf.
[TAR99] Francisco J. Torres-Rojas, Mustaque Ahamad, and Michel Raynal. “Timed Consistency for Shared Distributed Objects”. In: Proceedings of the Eighteenth Annual ACM
Symposium on Principles of Distributed Computing, PODC, ’99Atlanta, Georgia,
USA, May 3–6, 1999. Ed. by Brian A. Coan and Jennifer L. Welch. ACM, 1999,
pp. 163–172. DOI: 10.1145/301308.301350.
[Ter+13] Douglas B. Terry et al. “Consistency-based service level agreements for cloud
storage”. In: ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP
’13, Farmington, PA, USA, November 3–6, 2013. Ed. by Michael Kaminsky and Mike
Dahlin. ACM, 2013, pp. 309–324. DOI: 10.1145/2517349.2522731.
[Tho+12] Alexander Thomson et al. “Calvin: fast distributed transactions for partitioned
database systems”. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM. 2012, pp. 1–12.
[TM05] Francisco J. Torres-Rojas and Esteban Meneses. “Convergence Through a Weak
Consistency Model: Timed Causal Consistency”.In: CLEI Electron. J. 8.2 (2005).
URL : http://www.clei.org/cleiej/paper.php?id=110.
[Tor+17] Alexandre Torres et al. “Twenty years of object-relational mapping: A survey on
patterns, solutions, and their implications on application design”. In: Information and
Software Technology 82 (2017), pp. 1–18.
[TS07] Andrew S. Tanenbaum and Maarten van Steen. Distributed systems - principles and
paradigms, 2nd Edition. Pearson Education, 2007. ISBN: 978-0-13-239227-3.
[Use] Apache Usergrid. https://usergrid.apache.org/. (Accessed on 07/16/2017). 2017. URL:
https://usergrid.apache.org/ (visited on 02/19/2017).
[Vog09] Werner Vogels. “Eventually consistent”. In: Communications of the ACM 52.1 (2009),
pp. 40–44.
[VV16] Paolo Viotti and Marko VukoliÄ. “Consistency in Non-Transactional Distributed
Storage Systems”. en. In: ACM Computing Surveys 49.1 (June 2016), pp. 1–34.
ISSN : 03600300. DOI: 10.1145/2926965. URL : http://dl.acm.org/citation.cfm?doid=
2911992.2926965 (visited on 11/25/2016).
[Wad+11] Hiroshi Wada et al. “Data Consistency Properties and the Trade-offs in Commercial
Cloud Storage: the Consumers’ Perspective”. In: CIDR 2011, Fifth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 9–12, 2011,
Online Proceedings. www.cidrdb.org, 2011, pp. 134–143. URL: http://www.cidrdb.
org/cidr2011/Papers/CIDR11_Paper15.pdf.
[WB09] Craig D. Weissman and Steve Bobrowski. “The design of the force.com multitenant
internet application development platform”. In: Proceedings of the ACM SIGMOD
International Conference on Management of Data, SIGMOD 2009, Providence,
Rhode Island, USA, June 29 - July 2, 2009. Ed. by Ugur Çetintemel et al. ACM,
2009, pp. 889–896. DOI: 10.1145/1559845.1559942.
[WGR+18] Wolfram Wingerath, Felix Gessert, Norbert Ritter et al. “Real-Time Data Management
for Big Data”. In: Proceedings of the 21th International Conference on Extending
Database Technology, EDBT 2018, Vienna, Austria, March 26–29, 2018. OpenProceedings.org, 2018.
[WGR19] Wolfram Wingerath, Felix Gessert, and Norbert Ritter. “NoSQL & Real-Time Data
Management in Research & Practice”. In: Datenbanksysteme für Business, Technologie und Web (BTW 2019), 18. Fachtagung des GI-Fachbereichs „Datenbanken und
Informationssysteme” (DBIS), 4.–8. März 2019, Rostock, Germany, Workshopband.
2019, pp. 267–270. URL: https://dl.gi.de/20.500.12116/21595.
[WGR20] Wolfram Wingerath, Felix Gessert, and Norbert Ritter. “InvaliDB: Scalable PushBased Real-Time Queries on Top of Pull-Based Databases”. In: 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, Texas, April 20–24,
2020. 2020.
References
83
[WGW+20] Wolfram Wingerath, Felix Gessert, Erik Witt, et al. “Speed Kit: A Polyglot & GDPRCompliant Approach For Caching Personalized Content”. In: 36th IEEE International
Conference on Data Engineering, ICDE 2020, Dallas, Texas, April 20–24, 2020.
2020.
[Win+15] Wolfram Wingerath et al. “Who Watches the Watchmen? On the Lack of Validation
in NoSQL Benchmarking”. In: Datenbanksysteme fÃijr Business, Technologie und
Web (BTW), 16. Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme”. 2015.
[Win19] Wolfram Wingerath. “Scalable Push-Based Real-Time Queries on Top of Pull-Based
Databases”. PhD thesis. University of Hamburg, 2019. URL: https://invalidb.info/
thesis.
[WRG19] Wolfram Wingerath, Norbert Ritter, and Felix Gessert. Real-Time & Stream
Data Management: Push-Based Data in Research & Practice. Ed. by Susan
Evans. Springer International Publishing, 2019. ISBN: 978-3-030-10554-9. DOI:
10.1007/978-3-030-10555-6.
[WV02] G. Weikum and G. Vossen. Transactional information systems. Series in Data
Management Systems. Morgan Kaufmann Pub, 2002. ISBN: 9781558605084. URL:
http://books.google.de/books?hl=de&lr=&id=wV5Ran71zNoC&oi=fnd&pg=PP2&
dq=transactional+information+systems&ots=PgJAaN7R5X&sig=Iya4r9DiFhmb_
wWgOI5QMuxm6zU (visited on 06/28/2012).
[Xio+11] P. Xiong et al. “ActiveSLA: A profit-oriented admission control frame-work for
database-as-a-service providers”. In: Proceedings of the 2nd ACM Symposium on
Cloud Computing. 00019. ACM, 2011, p. 15. URL: http://dl.acm.org/citation.cfm?id=
2038931 (visited on 11/15/2012).
[Zha+14] Liang Zhao et al. Cloud Data Management. Englisch. Auflage: 2014. Springer, 2014.
[ZS17] Albert Y. Zomaya and Sherif Sakr, eds. Handbook of Big Data Technologies. Springer,
2017. ISBN: 978-3-319-49339-8. DOI: 10.1007/978-3-319-49340-4.
Chapter 5
Caching in Research and Industry
Caching technology can be categorized by several dimensions as illustrated in
Fig. 5.1. The first dimension is the location of the cache. In this book, we
focus on caches relevant for cloud and database applications, particularly serverside and database caching, reverse and forward proxy caching (mid-tier), and
client caching [Lab+09]. The second dimension is the granularity of cached data.
Examples are files, database records and pages, query results, and page fragments
[Ami+03a, Ami+03b, Ant+02, Can+01a, CZB99, CRS99, Dat+04, LC99, LN01].
The third dimension is the update strategy that determines the provided level of
consistency [Cat92, GS96, NWO88, LC97, CL98, Bor+03, Bor+04].
Besides these major dimensions, there are smaller distinctions. The cache
replacement strategy defines how the limited amount of storage is best allocated
to cached data [PB03, Dar+96]. The initialization strategy determines whether the
cache is filled on-demand or proactively1 [Alt+03, LGZ04, Luo+02, Bor+03, LR00,
LR01a]. The update processing strategy indicates whether changes to cached data
are replacements, incremental changes, or based on recomputation [Han87, BCL89,
AGK95, LR01a, IC98, BLT86].
Table 5.1 summarizes current research on caching according to the dimensions
location and updates. In the following, we will first discuss server-side and
client-side application caching as well as database caching and contrast both
to web caching approaches. After that, we will show the different methods for
cache coherence that can be grouped into expiration-based and invalidation-based
approaches. Finally, we will review work on caching query and search results and
discuss summary data structures for caching.
1 Proactive
filling of the cache is also referred to as materialization [LR00].
© Springer Nature Switzerland AG 2020
F. Gessert et al., Fast and Scalable Cloud Data Management,
https://doi.org/10.1007/978-3-030-43506-6_5
85
86
5 Caching in Research and Industry
Fig. 5.1 The three central dimensions of caching
Table 5.1 Selected related work on caching classified by location and update strategy
Expiration-based
Browser cache
[IET15], ORMs
[Rus03, Ady+07,
Che+16], ODMs
[Stö+15], User
Profiles [BR02], Alex
Protocol [GS96], CSI
[Rab+03], Service
Workers [Ama16]
Invalidation-based
Avoidance-based
Algorithms
[ÖV11, FCL97, WN90]
Hybrid
Client-Server Databases
[KK94, ÖDV92,
Cas+97], Oracle Result
Cache [Ora], Speed Kit
[WGW+20]
Mid-Tier
HTTP proxies
[IET15], PCV
[KW97], ESI
[Tsi+01]
PSI [KW98], CDNs
[PB08, FFM04, Fre10]
Server and DB
Incremental TTLs
[Ali+12],
CachePortal[Can+01b],
DCCP [KLM97],
Reverse Proxies
[Kam17], Ferdinand
[Gar+08], Facebook
Tao [Bro+13], Cache
Hierarchies [Wor94],
DBProxy [Ami+03a],
DBCache [Bor+04],
MTCache [LGZ04],
WebView [LR01a]
Leases [Vak06], Volume
Leases
[Yin+99, Yin+98], TTR
[Bhi+02],
Orestes/Baqend [Ges19]
Memcache
[Fit04, Nis+13, Xu+14],
Redis [Car13], IMDGs
[ERR11, Lwe10], CIP
[Bla+10], Materialized
Views [Lab+09]
Client
5.1 Reducing Latency: Replication, Caching and Edge Computing
87
5.1 Reducing Latency: Replication, Caching and Edge
Computing
There are three primary backend-focused technologies that are concerned with
lowering latency. Replication, caching, and edge computing follow the idea of
distributing data storage and processing for better scalability and reduced latency
towards dispersed clients.
5.1.1 Eager and Lazy Geo-Replication
To improve scalability and latency of reads, geo-replication distributes copies of
the primary database over different geographical sites. Eager geo-replication (e.g.,
in Google’s Megastore [Bak+11], Spanner [Cor+13, Cor+12], F1 [Shu+13], MDCC
[Kra+13], and Mencius [MJM08]) has the goal of achieving strong consistency
combined with geo-redundancy for failover. However, it comes at the cost of higher
write latencies that are usually between 100 ms [Cor+12] and 600 ms [Bak+11].
The second problem of eager geo-replication is that it requires extensive, databasespecific infrastructure which introduces system-specific trade-offs that cannot be
adapted at runtime. For example, it is not possible to relax consistency on a peroperation basis, as the guarantee is tied to the system-wide replication protocol
(typically variants of Paxos [Lam01]). Also, while some eagerly geo-replicated
systems support transactions, these suffer from high abort rates, as cross-site latency
in commit protocols increases the probability of deadlocks and conflicts [Shu+13].
Lazy geo-replication (e.g., in Dynamo [DeC+07], BigTable/HBase [Cha+08,
Hba], Cassandra [LM10], MongoDB [CD13], CouchDB [ALS10], Couchbase
[Lak+16], Espresso [Qia+13], PNUTS [Coo+08], Walter [Sov+11], Eiger [Llo+13],
and COPS [Llo+11]) on the other hand aims for high availability and low latency
at the expense of consistency. Typically, replicas are only allowed to serve reads,
in order to simplify the processing of concurrent updates. The problem of lazy
geo-replication is that consistency guarantees are lowered to a minimum (eventual
consistency) or cause a prohibitive overhead (e.g., causal consistency [Llo+11,
Llo+13]). Similar to eager geo-replication, system-specific infrastructure is required
to scale the database and lower latency. Therefore, providing low end-to-end latency
for web applications through a network of different replica sites is often both
financially and technically infeasible. Furthermore, geo-replication requires the
application tier to be co-located with each replica to make use of the distribution
for latency reduction. Geo-replication is nonetheless an indispensable technique for
providing resilience against disaster scenarios.
88
5 Caching in Research and Industry
5.1.2 Caching
Caching has been studied in various fields for many years. It can be applied at
different locations (e.g., clients, proxies, servers, databases), granularities (e.g.,
files, records, pages, query results) and with different update strategies (e.g., expirations, leases, invalidations). Client-side caching approaches are usually designed
for application servers and therefore not compatible with REST/HTTP, browsers
and mobile devices [ÖV11, FCL97, WN90, KK94, ÖV11, Cas+97, Ora, Tor+17].
Mid-tier (proxy) caches provide weak guarantees in order not to create synchronous
dependencies on server-side queries and updates or only cache very specific types
of data [KW97, Tsi+01, KW98, PB08, FFM04, Fre10, Vak06, Yin+99, Yin+98,
Bhi+02]. The various approaches for server-side caching have the primary goal
of minimizing query latency by offloading the database for repeated queries
[Ali+12, Can+01b, KLM97, Kam17, Gar+08, Bro+13, Ami+03a, Bor+04, LGZ04,
LR01a, Bla+10, Lab+09].
Combining expiration-based and invalidation-based cache maintenance is an
open problem, as both mechanisms provide different consistency guarantees and
therefore would degrade to the weaker model when combined. In practice, most
caching approaches rely on the application to maintain cache coherence instead
of using declarative models that map consistency requirements to cache coherence
protocols [Rus03, Ady+07, Che+16, Stö+15, Ama16, Fit04, Nis+13, Xu+14]. Very
few caching approaches tackle end-to-end latency for the web at all or consider the
distributed nature of cloud services. Caching and replication approaches bear many
similarities, as caching is a form of lazy, on-demand replication [RS03].
5.1.3 Edge Computing
A cloudlet is a “data center in a box” [AG17, p. 7] that can be deployed in
proximity to mobile devices for reduced latency. The underlying idea is to enhance
the computing capacities of mobile devices by offloading computationally expensive
operations to cloudlets [Sat+09]. Typical applications for the concept of cloudlets
are virtual and augmented reality that require powerful resources for rendering and
low latency for interactivity. For data management, cloudlets are less useful as they
would have to replicate or cache data from the main data center and would therefore
have to act as a geo-replica.
Fog computing takes the idea of highly distributed cloud resources further
and suggests provisioning storage, compute, and network resources for Internet
of Things (IoT) applications in a large amount of interconnected “fog nodes”
[Bon+12]. By deploying fog nodes close to end users and IoT devices, better quality
of service for latency and bandwidth can potentially be achieved. Fog computing
targets applications such as smart grids, sensor networks, and autonomous driving
and is therefore orthogonal to web and mobile applications [SW14].
5.2 Server-Side, Client-Side and Web Caching
89
Edge computing refers to services and computations provided at the network
edge. Edge computing in CDNs has already been practiced for years through reverse
proxy caches that support restricted processing of incoming and outgoing requests
[Kam17, PB08]. Mobile edge computing enhances 3G, 4G, and 5G base stations to
provide services close to mobile devices (e.g., video transcoding) [AR17].
The problem of cloudlets, fog computing, and edge computing regarding low
latency for web applications is that they do not provide integration into data
management and shared application data but instead expose independent resources.
Therefore, data shipping is required to execute business logic on the edge which
shifts the latency problem to the communication path between edge nodes and cloud
data storage. To minimize end-to-end latency in edge computing, it is necessary to
perform data management operations on cached data, in particular authentication
and authorization [Ges19].
5.1.4 Challenges
In summary, the open challenges of replication, caching, and edge computing for
low latency cloud data management are:
• Eager geo-replication introduces high write and commit latency, while lazy
geo-replication does not allow fine-grained consistency choices.
• Replication requires extensive, database-specific infrastructure and cannot be
employed for polyglot persistence.
• Geo-replicated database systems assume the co-distribution of application logic
and do not have the abstractions and interfaces for direct DBaaS/BaaS access by
clients.
• Common caching approaches only improve backend performance instead of
end-to-end latency or suffer from the same limitations as geo-replication.
• Expiration-based caching is considered irreconcilable with non-trivial consistency requirements.
• Edge computing does not solve the distribution of data and hence does not
improve latency for stateful computations and business logic.
5.2 Server-Side, Client-Side and Web Caching
5.2.1 Server-Side Caching
Caching is often a primary concern in distributed backend applications. Numerous
caching systems have been developed to allow application-controlled storage and
queries of volatile data. Typically, they are employed as look-aside caches storing
hot data of the underlying database system, with the application being responsible
90
5 Caching in Research and Industry
for keeping the data up-to-date. Among the most popular of these systems is
Memcache, an open-source, in-memory hash table with a binary access protocol
introduced by Fitzpatrick in 2004 [Fit04]. Memcache does not have any native
support for sharding, but there are client-side libraries for distribution of records
over instances using consistent hashing. Facebook, for example, uses this approach
for their high fan-out reads of pages [Nis+13, Xu+14] and their social media graph
[Bro+13]. The key-value store Redis is used for similar caching scenarios, enabling
more advanced access patterns with its support for data structures (e.g., hashes,
lists, sorted sets) instead of opaque data values [Car13]. In contrast to Memcache,
Redis additionally supports different levels of persistence and an optimistic batch
transaction model. Considerable research went into the optimization of these
caches in terms of hashing performance [FAK13], fair cache resource sharing
between clients [Pu+16], and optimal memory allocation [Cid16]. For the Java
programming language, a standard caching API has been defined and implemented
by various open-source and commercial caching projects [Luc14]. For server-side
caching with higher persistence guarantees, key-value stores such as Riak [Ria],
Voldemort [Aur+12], Aerospike [Aer], HyperDex [EWS12], and DynamoDB [Dyn]
are suitable.
In-memory data grids (IMDGs) [Raj+15, p. 247] are distributed object stores
used for state management and caching in Java and .Net applications. Industry
products include Oracle Coherence, VMware Gemfire, Alachisoft NCache, Gigaspaces XAP, Hazelcast, Scaleout StateServer, Terracotta, JBoss Infinispan, and IBM
eXtreme Scale [ERR11, Lwe10]. Compared to key-value caches, IMDGs offer
the advantage of tightly integrating into the application’s programming language
and its class and object models. In this respect, IMDGs are similar to objectoriented database management systems (OODBMSs), as they expose native data
types (e.g., maps and lists). Additionally, distributed coordination abstractions such
as semaphores, locks, and atomic references as well execution of MapReduce jobs
are typically supported. IMDGs are also used in related research projects (e.g.,
CloudSim [KV14]), due to the simple abstractions for shared distributed state.
Server-side caching with key-value stores and IMDGs is a proven technique
for reducing backend processing latency by offloading persistent data stores in
I/O-bound applications. This comes at a cost, however: the application has to
maintain the caches using domain-specific logic. The complexities of maintaining
consistency and retrieving cached data are thus left to application developers.
5.2.2 Client-Side Database Caching
Client-side database caching has been discussed for decades in the database
community [Lab+09, Bor+03, Luo+02, LGZ04]. In this case, the term “client”
does not refer to a browser or mobile device, but to a server node of a backend
application. In the context of distributed object databases, object-based, page-based,
and hybrid approaches have been studied [KK94, ÖV11, Cas+97]. Object-based
5.2 Server-Side, Client-Side and Web Caching
91
buffer management has the advantage of a lower granularity, allowing for higher
concurrency in the client for transactional workloads. Page-based buffers are more
efficient when queries tend to access all objects within a page and imposes
less messaging overhead. This caching model is fundamentally different from
web caching, as the client buffer has to support the transaction model of the
database system. As the cache is retained across transaction boundaries (intertransaction caching), the problem of transactional isolation is closely tied to that
of cache consistency [Car+91, BP95]: neither transactions from the same client nor
transactions from different clients are allowed to exhibit anomalies caused by stale
reads and concurrent buffer updates.
Cache consistency algorithms from the literature can be classified as avoidancebased or detection-based [ÖV11, FCL97, WN90]. The idea of avoidance-based
cache consistency is to prevent clients from reading stale data. This can be achieved
by having writing clients ensure that any updated objects are not concurrently
cached by any other client. Detection-based algorithms allow reading stale data,
but perform a validation at commit time to check for violations of consistency. The
second dimension of cache consistency algorithms is their approach to handling
writes. Writes can be synchronous, meaning that at the time a client issues a write,
the write request is sent to the server. The server can then, for example, propagate
a write lock to all clients holding a cached copy of the written object (CallbackRead Locking [FC92]). With asynchronous writes, clients still inform the server
about each write, but optimistically continue processing until informed by the server.
This can lead to higher abort rates [ÖVU98]. In the deferred scheme, clients batch
write requests and send them at the end of each transaction, thus reducing the write
message overhead. Avoidance-based deferred algorithms typically suffer from high
abort rates as well [FC92].
There are commercial relational database systems that offer client-side caching,
for example, Oracle implements a client- and server-side result cache [Ora]. The
protocols and algorithms for client-side caching in databases serve the purpose of
reducing the load on the database system, thereby decreasing backend processing
latency. However, they are not applicable to end-to-end latency reduction in cloud
data management, as web and mobile clients cannot exploit this form of caching.
The overhead of locks distributed over potentially hundreds of thousands of clients
and the complexity of client-specific state in the database server impose a prohibitive
overhead.
Özsu et al. [ÖDV92] employ invalidations to minimize the probability of
stale reads. In their model, though, expiration-based caches can neither execute
custom consistency logic nor receive server-side invalidations. To resolve this issue,
Orestes [Ges19] and its commercial derivative Baqend2 introduce the Cache Sketch
approach [Ges+15] for informing clients about stale entries in their local caches.
A list of all stale cache entries is maintained at the server side and periodically
retrieved by clients in fixed intervals of , thus providing a -atomicity guarantee
2 Baqend:
https://www.baqend.com/.
92
5 Caching in Research and Industry
by default while allowing tighter staleness bounds as an opt-in feature: For strong
consistency, the client simply has to retrieve the current Cache Sketch before every
read. By enabling clients to avoid stale cache entries that have not yet expired, the
Cache Sketch approach effectively enables invalidation even in combination with
purely expiration-based caches. Orestes uses an optimistic transaction protocol for
distributed catch-aware transactions (DCAT) [Ges19, Sec. 4.8.2] which performs
conflict checks at commit time, making it a detection-based deferred consistency
scheme. Adya et al. [Ady+95] have proposed a similar scheme called Adaptive
Optimistic Concurrency Control (AOCC). It also relies on a backward-oriented
validation step [Agr+86, CO82, LW84], but serializes transactions in timestamp
order of the committing server. Effectively, AOCC performs timestamp ordering
[SS94] with a two-phase commit protocol [Lec09] and thus accepts a smaller class
of schedules than DCAT which is based on BOCC+ [KR81, Rah88]. Moreover,
instead of relying on version numbers like DCAT, AOCC servers maintain a set
of metadata items for each of the client’s cached data. Unlike DCAT, AOCC was
designed for the case of very few clients: the metadata maintained in each server
increases with both the number of cached records and the number of clients, making
it unable to scale for web scenarios with many clients.
5.2.3 Caching in Object-Relational and Object-Document
Mappers
Due to the reasons laid out above, most persistence frameworks today rely on
programmatic control of object-based caches with no support from the database
system. With the increasing adoption of scalable NoSQL systems, the landscape
of mappers bridging the impedance mismatch between the data model of the
database system and the application has grown [Stö+15]. In fact, many applications
do not use any native database system API, but instead rely on the convenience of
object mappers such as Hibernate, DataNucleus, Kundera, EclipseLink, Doctrine,
and Morphia [Tor+17]. In case of Java, the Java Persistence API (JPA) standard
[DeM09] is considered state-of-art superseding the older Java Data Objects API
(JDO) [Rus03].
Both JPA and JDO and the equivalent technology from Microsoft called Entity
Framework [Ady+07], support the notion of a first-level (L1) and a second-level
(L2) cache. The L1 cache is exclusive to a persistence context and ensures that
queries and lookups always resolve to the same object instances. The L2 is shared
across contexts to leverage access locality between different contexts, processes, or
even machines. The L2 interface is pluggable, so various options from in-process
storage to Memcache- or IMDG-backed distributed implementations are available.
Both L1 and L2 caches are write-through caches that directly reflect any updates
passing through them. However, if data is changed from different contexts or even
5.2 Server-Side, Client-Side and Web Caching
93
different clients, the L1 and L2 caches suffer from stale reads. The application has to
explicitly flush or bypass these caches in order to prevent violations of consistency.
5.2.4 Web Caching
In literature, web caches are either treated as a storage tier for immutable content or
as a means of content distribution for media that do not require freshness guarantees
[Hua+13, Fre10]. Web caches are further defined by their implementation of the
HTTP caching standards [IET15]. They can be employed in every location on the
end-to-end path from clients to server. The granularity is typically files, though this
is up to the application. Updates are purely expiration-based.
The applicability of web caching schemes is closely tied to web workloads
and their properties. Breslau et al. were the first to systematically analyze how
Zipf-distributed access patterns lend themselves for limited storage capacities of
web caches [Bre+99, HL08, WF11]. Across six different traces, they found a
steep average exponent of 0.85. Zipf-distributed popularity is closely related to our
proposed capacity management scheme: even if only a small subset of “hot” queries
can be actively matched against update operations, this is sufficient to achieve high
cache hit rates.
The literature on workload characterization presents mixed conclusions. Based
on an analysis of school logs, Gewertzman et al. [GS96], and Besavros [Bes95]
found that most popular files tend to remain unchanged. Labrindis et al. [LR01b]
and Douglis et al. [Dou+97], however, concluded that there is a strong correlation
between update frequency and popularity of files. In another analysis of a more
diverse set of university and industry traces conducted by Breslau et al. [Bre+99],
the correlation between popularity and update rate was found to be present, but
weak.
Another question particularly important for simulations is, how the arrival
processes of reads, writes, and queries can be modeled stochastically. Poisson
processes with exponentially distributed inter-reference times are most widely
used [Tot09, Wil+05, VM14]. However, homogeneous Poisson processes do not
capture any rate changes (e.g., increased popularity) or seasonality (e.g., massive
frequent changes upon deployments). Session-based models describe web traffic
as a combination of individual user sessions. Session inter-arrival times typically
follow a Poisson process, while inter-click times follow heavy-tailed distributions
like Weibull, LogNormal, and Pareto distributions [Den96, Gel00]. For all Poissonlike workloads, TTL estimators will exhibit high error rates due to the high variance
of the exponential distribution.
Many optimizations of web caches have been studied. This includes cache
replacement schemes [CI97], cooperative caching [RL04, RLZ06, TC03], and
bandwidth-efficient updates [Mog+97]. In the past twenty years, numerous cache
prefetching schemes have been proposed for browser, proxy, and CDN caches
[PM96, Bes96, KLM97, MC+98]. Today, these schemes are not widely used in
94
5 Caching in Research and Industry
practice due to the overhead in the cache and excess network usage caused by wrong
prefetching decisions. For the Cache Sketch approach [Ges+15], the concrete workload and estimation accuracy only affect the false positive and cache hit rates, so that
correctness is guaranteed regardless of estimation errors. This is in stark contrast to
pure TTL-based cache coherence schemes [GS96, Lab+09, BR02, RS03, KR01]
which will exhibit high staleness rates, if workloads are inherently unpredictable.
5.3 Cache Coherence: Expiration-Based vs.
Invalidation-Based Caching
Cache coherence is a major concern for any caching approach. Similar to distributed
databases, there is an inherent trade-off in caching approaches between throughput
and latency on the one side and ease-of-use and provided correctness guarantees on
the other. Often in practice, developers even have to bypass caching manually in
order to achieve the desired consistency level [Nis+13, Ajo+15].
5.3.1 Expiration-Based Caching
In the literature, the idea of using an expiration-based caching model has previously
been explored in the context of file and search result caching [Dar+96, Ami+03a,
LGZ04, Bor+04, KFD00, KB96, How+88, Mog94]. Expiration-based caching (also
referred to as pull-based caching [Lab+09]) can be categorized into TTL-based,
lease-based, and piggybacking strategies. Expiration-based caching usually involves
asynchronous validation of cached entries, i.e., the freshness is validated when
cached data is expired. Synchronous validation (polling-every-time [Lab+09]) only
reduces bandwidth, but not latency, which makes it inapplicable for the goal of this
work.
5.3.2 Leases
The lease model is a concept from the distributed file systems literature [How+88,
Mog94] originally proposed by Gray et al. [GC89]. A lease grants access to a local
copy of an object until a defined expiration time [Vak06]. It is therefore similar to
a lock, however combined with a limited validity to mitigate the problem of client
failures and deadlocks. For the duration of the lease, the holder has to acknowledge
each server-side invalidation in order to maintain strong consistency. A lease
combines the concepts of expiration-based and invalidation-based cache coherence:
while the lease is still active, the client will receive invalidations, afterwards the
5.3 Cache Coherence: Expiration-Based vs. Invalidation-Based Caching
95
client has to acquire a new lease which is accompanied by a renewed expiration
time [Vak06].
A central problem of leases is that long leases may incur high waiting times for
updates, if a client does not respond, whereas short leases imply a large control
message overhead and increase latency. A major refinement of the basic lease
scheme addressing this problem are volume leases proposed by Yin et al. [Yin+99,
Yin+98]. A volume groups related objects together and introduces a coarser level of
granularity. Clients need to have both an active volume and object lease in order to
perform an object read. By giving volume leases short expiration times and object
leases longer expiration times, writes experience shorter delays and the message
overhead for object lease renewals is reduced. By additionally incorporating access
metrics, adaptive leases introduced by Duvuri et al. [DST03] can further optimize
the read-versus-write latency trade-off by dynamically calculating lease durations.
The lease model is not well-suited for client caches in the web. Especially with
mobile devices and high-traffic websites, leases on objects will usually expire, as
client connectivity is intermittent and potentially thousands of clients will hold
leases on the same object. The effect would therefore be similar to a TTL-based
model, where the server has to delay writes until the respective TTL is expired.
5.3.3 Piggybacking
Piggybacking schemes batch together validations or invalidations and transfer them
in bulk. Krishnamurthy et al. [KW97] proposed Piggyback Cache Validation (PCV).
PCV is designed for proxy caches to decrease staleness by proactively renewing
cached data. Each time a proxy cache processes a request for an origin server,
the local cache is checked for objects from that origin that are either expired or
will be expired soon. The revalidation requests for these objects are then batched
and attached (piggybacked) with the original request to the origin server. With
sufficient traffic to frequently piggyback revalidations, this can reduce latency and
staleness as cached data is refreshed before it is requested by a client. Piggyback
Server Invalidation (PSI) [KW98] follows a similar idea: when the server receives a
revalidation request based on the version, the server additionally piggybacks a list of
resources that have been invalidated since that modification, too. PCV and PSI can
be combined in a hybrid approach [KW99, CKR98]. The idea is to use PSI, if little
time has passed since the last revalidation, and PSV otherwise as the overhead of
invalidation messages is smaller, if few objects have changed. As a major problem,
However, these piggybacking schemes only work for shared caches (proxy caches,
ISP caches, reverse proxy caches) and require modifications of the caching logic of
HTTP [FR14].
96
5 Caching in Research and Industry
5.3.4 Time-to-Live (TTL)
TTLs are usually assumed to be implicit, i.e., they are not explicitly defined by the
application as they are not known in advance [Lab+09]. HTTP adopted the TTL
model as it is the most scalable and simple approach to distribute cached data in the
web [Fie+99, IET15]. The core of every TTL scheme is the latency-recency tradeoff. Cao et al. [BR02] propose to employ user profiles for browsers that express
the preference towards either higher recency or lower latency. Fixed TTL schemes
that neither vary in time nor between requested objects/queries lead to a high level
of staleness [Wor94]. This approach is often considered to be incompatible with
the modern web, since users expect maximum performance without noticeable
staleness. It therefore becomes the task of the application and the cloud services
to minimize and hide any occurring staleness.
A popular and widely used TTL estimation strategy is the Alex protocol [GS96]
(also referred to as Adaptive TTL [RS03, Wan99, KW97, CL98]) that originates
from the Alex FTP server [Cat92]. It calculates the TTL as a percentage (e.g., 20%)
of the time since the last modification, capped by an upper TTL bound. Simulations
have shown that for certain workloads this scheme can contain the staleness rate to
roughly 5% [GS96]. In an AT&T trace analyzed by Feldmann et al. [Fel+99] for a
low percentage of 20%, the overall staleness for the Alex protocol was 0.22%. On
the other hand, 58.5% of all requests were revalidations on unchanged resources.
The Alex protocol has the downside of neither converging to the actual TTL nor
being able to give estimates for new queries.
Alici et al. proposed an adaptive TTL computation scheme for search results on
the web [Ali+12]. In their incremental TTL model, expired queries are compared
with their latest cached version. If the result has changed, the TTL is reset to a
minimum TTL, otherwise the TTL is augmented by an increment function (linear,
polynomial, exponential) that can either be configured manually or trained from
logs. Though the model is adaptive, it requires offline learning and assumes a central
cache co-located with the search index. If the time of an invalidation is known (e.g.,
in a database setting instead of a search engine application), TTLs can potentially
be computed more precisely than in their scheme, which only relies on subsequent
reads to detect staleness and freshness. With the notable exception of the Cache
Sketch that only performs less efficient when over- or underestimating expiration
times, current TTL-based approaches exhibit potentially high levels of staleness in
the presence of unpredictable invalidations that are only bounded by the maximum
permissible TTLs.
5.3.5 Invalidation-Based Caching
Arguably, invalidations are the most intuitive mechanism to deal with updates
of cached data. In this case, the server is responsible for detecting changes
5.3 Cache Coherence: Expiration-Based vs. Invalidation-Based Caching
97
and distributing invalidation messages to all caches that might have cached that
data. Invalidation-based caching can either be invalidation-only or update-based
[Lab+09]. In the invalidation-only scheme, stale content is only evicted from the
cache and reloaded upon the next cache miss. With the update-based approach,
new versions are proactively pushed to caches. Almost every CDN works with
the invalidation-only scheme in order to limit network overhead [PB08]. A notable
exception is the academic Coral CDN, which is mainly designed for static, nonchanging content and hence supports the update-based model [FFM04, Fre10].
Candan et al. [Can+01b] first explored automated invalidation-based web
caching with the CachePortal system that detects changes of HTML pages by
analyzing corresponding SQL queries. CachePortal is a reverse proxy cache with
two major components. The sniffer is responsible for logging incoming HTTP
requests and relating them to SQL queries detected at the JDBC database driver level
to produce a query-to-URL mapping. The invalidator monitors update operations
and detects which queries are affected to purge the respective URLs. The authors
find the overhead of triggers or materialized views prohibitive and hence rely on a
different approach. For each incoming update, a polling query is constructed. The
polling query is either issued against the underlying relational database or an index
structure maintained by the invalidator itself. If a non-empty result is returned, the
update changes the result set of a query and a URL invalidation is triggered. The
number of polling queries is proportional to both the number of updates and the
number of cached queries. CachePortal therefore incurs a very high overhead for
caching on the database and the invalidator. Since the load on the invalidator cannot
be scaled horizontally, CachePortal is not suitable for large-scale web applications
with potentially many users and high write throughput. Furthermore, the approach
is strictly specific to a fixed set of technologies (JDBC, Oracle RDBMS, BEA
Weblogic application server) and only covers reverse proxy caching. Furthermore,
the mapping from HTTP requests to queries breaks under concurrent access, as it is
based on observing queries within a time window. If multiple users request different
resources at the same time, the mapping is flawed.
The Quaestor architecture [Ges+17] exploits the existing infrastructure of the
web to accelerate delivery of dynamic content, specifically query results. To make
this feasible, it registers all cached query results in the distributed real-time query
engine InvaliDB [Win19, WGR20] which matches all incoming database writes
to all currently cached queries in order to discover result changes with minimal
latency. As soon as an invalidating change to one of the cached query results is
detected, InvaliDB notifies one of the Quaestor application servers which in turn
sends out invalidations to all affected invalidation-based caches (specifically the
CDN). To prevent clients from reading stale query results from expiration-based
caches (e.g. the browser cache within the user device), the Quaestor architecture
relies on the Cache Sketch approach [Ges+15]: Through this mechanism, clients
are periodically informed about stale expiration-based caches which can thus be
avoided (and thereby effectively be invalidated). Since InvaliDB is scalable with
both the number of concurrently registered queries and also with write throughput,
the Quaestor architecture is feasible for large-scale web applications with many
98
5 Caching in Research and Industry
users and high data volumes. Baqend is the only commercial implementation at the
time of writing.
Dilley et al. [KLM97] proposed the invalidation-based protocol DOCP (Distributed Object Consistency Protocol). The protocol extends HTTP to let caches subscribe to invalidations. DOCP therefore presents an effort to standardize invalidation
messages, which are provided through custom and vendor-specific approaches in
practice (e.g., the HTTP PURGE method [Kam17]). The authors call the provided
consistency level delta-consistency which is similar -atomicity, i.e., all subscribed
caches will have received an invalidation of a written data item at most delta seconds
after the update has been processed. DOCP’s invalidation-only approach is less
powerful than InvaliDB’s update-based query subscription mechanism as InvaliDB
allows subscriptions to an arbitrary number of conditions and queries multiplexed
over a single Websocket connection to the origin.
Worrel [Wor94] studied hierarchical web caches to derive more efficient cache
coherence schemes. He designed an invalidation protocol specifically suited for
hierarchical topologies and compared it to fixed TTL schemes w.r.t. server load,
bandwidth usage, and staleness. He found the scheme to be superior in terms of
staleness and competitive to TTLs in server load and bandwidth usage. A particular
problem of deep hierarchies is the age penalty problem studied by Cohen et al.
[CK01]: older content in the upper levels of the hierarchy propagates downstream
and negatively impacts dependent caches.
An alternative to cache invalidation was proposed by the Akamai founders
Leighton and Lewin [LL00]. The idea is to include a hash value of the content in
the URL, so that upon changes the old version does not get invalidated, but instead
is superseded by a new URL containing the new fingerprint (cache busting). This
approach is widely used in practice through build tools such as Grunt, Gulp, and
Webpack. The downside is that this scheme only works for embedded content that
does not require stable URLs (e.g., images and scripts). In particular, it cannot be
applied to database objects, query results, and HTML pages. Furthermore, it only
allows for invalidation at application deployment time and not at runtime.
Edge Side Includes (ESI) [Tsi+01] take the approach of Leighton and Lewon
a step further by shifting template-based page assembly to the edge, i.e., CDN
caches. ESI is a simple markup language that allows to describe HTML pages using
inclusion of referenced fragments that can be cached individually. Rabinovich et al.
[Rab+03] proposed to move ESI assembly to the client arguing that the rendering of
ESI on the edge adds to the presumed main bottleneck of last-mile latency [Nag04].
While ESI has not gained relevance for the browser, the underlying idea is now
widely used in practice [BPV08]. Every single-page application based on Ajax and
MVC-frameworks for rendering employs the idea of assembling as website from
individual fragments usually consumed from cloud-based REST APIs.
Bhide et al. [Bhi+02] also proposed a scheme to combine invalidation- and
expiration-based caching in proxies. They argue that web workloads are inherently
unpredictable for the server and therefore propose a Time-to-Refresh (TTR)
computed in the proxy to replace TTLs. TTRs are computed for each data item based
on previous changes and take a user-provided temporal coherency requirement into
5.3 Cache Coherence: Expiration-Based vs. Invalidation-Based Caching
99
account that expresses the tolerable staleness based on data values (e.g., the stock
price should never diverge by more than one dollar). TTRs therefore dynamically
reflect both the rate of change (as expressed in TTLs) and the desired level of
coherence. Bhide et al. present algorithms to mix the communication-intensive
expiration-based revalidations through TTRs with invalidations.
5.3.6 Browser Caching
Traditionally, browsers only supported transparent caching at the level of HTTP, as
specified in the standard [Fie+99]. The only recent additions to the original caching
model are means to specify that stale content may be served during revalidation or
unavailability of the backend [Not10], as well as an immutability flag to prevent
the browser from revalidating upon user-initiated page refreshes [McM17]. For
workloads of static content, Facebook reported that the browser cache served by
far the highest portion of traffic (65.5%) compared to the CDN (20.0%) and reverse
proxy caches (4.6%) [Hua+13].
Two extensions have been added to browsers in order to facilitate offline
website usage and application-level caching beyond HTTP caching. AppCache was
the attempt to let the server specify a list of cacheable resources in the cache
manifest. The approach suffered from various problems, the most severe being that
no resource-level cache coherence mechanism was included and displaying nonstale data required refreshing the manifest [Ama16]. To address these problems,
Service Workers were proposed. They introduce a JavaScript-based proxy interface
to intercept requests and programmatically define appropriate caching decisions
[Ama16]. While cache coherence is not in the scope of Service Workers and
has to rely on application-specific heuristics, there already are approaches for
transparent website acceleration based on Service workers (e.g. Speed Kit3 [Win18]
[WGW+20]). A set of best practices for developing with Service Workers was
published by Google and termed Progressive Web Apps [Mal16].
To structure client-side data beyond a hash table from URLs to cached data
and enable processing of the data, three techniques have been proposed and partly
standardized [Cam16] (cf. Sect. 3.4.2). LocalStorage provides a simple key-value
interface to replace the use of inefficient cookies. Web SQL Database is an API
that exposes access to an embedded relational database, typically SQLite. The
specification is losing traction and will likely be dropped [Cam16, p. 63]. IndexedDB
is also based on an embedded relational database system. Data is grouped into
databases and object stores that present unordered collections of JSON documents.
By defining indexes on object stores, range queries and point lookups are possible
via an asynchronous API.
3 Speed
Kit: https://speed-kit.com.
100
5 Caching in Research and Industry
5.3.7 Web Performance
A central finding of performance in modern web applications is that perceived speed
and page load times (cf. Sect. 3.4.1) are a direct result of physical network latency
[Gri13]. The HTTP/1.1 protocol that currently forms the basis of the web and REST
APIs suffers from inefficiencies that have partly been addressed by HTTP/2 [IET15].
Wang et al. [WKW16] explored the idea of offloading the client by preprocessing
data in proxies with higher processing power. Their system Shandian evaluates
websites in the proxy and returns them as a combination of HTML, CSS, and
JavaScript including the heap to continue the evaluation. For slow Android devices,
this scheme yielded a page load time improvement of about 50%. Shandian does,
however, require a modified browser which makes it inapplicable for broad usability
in the web. The usefulness of the offloading is also highly dependent on the
processing power of the mobile device, as the proxy-side evaluation blocks delivery
and introduces a trade-off between increased latency and reduced processing time.
Netravali et al. proposed Polaris [Net+16] as an approach to improve page load
times. The idea is to inject information about dependencies between resources into
HTML pages, as well as JavaScript-based scheduling logic that loads resources
according to the dependency graph. This optimization works well in practice,
because browsers rely on heuristics to prioritize fetching of resources. By an offline
analysis of a specific server-generated website, the server can determine actual
read/write and write/write dependencies between JavaScript and CSS ahead of time
and express them as a dependency graph. This allows parallelism where normally
the browser would block to guarantee side-effect free execution. Depending on
the client-server round-trip time and bandwidth, Polaris yields a page load time
improvement of roughly 30%. The limitations of the approach are that it does not
allow non-determinism and that dependency graphs have to be generated for every
client view. For personalized websites, this overhead can be prohibitive. Contradicting the current trend in web development towards single-page applications, the
approach furthermore assumes server-side rendering.
5.4 Query Caching
Query caching has been tackled from different angles in the context of distributed
database systems [Dar+96, Ami+03a, LGZ04, Bor+04, KFD00, KB96], mediators
[LC99, CRS99, Ada+96], data warehouses [Des+98, KP01, Lou+01], peer-topeer systems [Gar+08, PH03, Kal+02], and web search results [BLV11, Cam+10,
Bla+10, Ali+11]. Most of this work is focused on the details of answering queries
based on previously cached results, while only few approaches also cover cache
coherence.
5.4 Query Caching
101
5.4.1 Peer-to-Peer Query Caching
Garrod et al. have proposed Ferdinand, a proxy-based caching architecture forming
a peer-to-peer distributed hash table (DHT) [Gar+08]. When clients query data,
the proxy checks a local, disk-based map from query strings to result sets. If the
result is not present, a lookup in another proxy is performed according to the DHT
scheme. The consistency management is based on a publish/subscribe invalidation
architecture. Each proxy subscribes to multicast groups corresponding to the locally
cached queries. A limiting assumption of Ferdinand is that updates and queries
follow a small set of fixed templates defined by the application. This is required to
map updates and queries to the same publish/subscribe topics, so that only relevant
updates will be received in each caching proxy. Peer-to-peer query caching has
also been employed for reducing traffic in file sharing protocols [PH03], as well
as to distributed OLAP queries [Kal+02]. IPFS [Ben14] also employs a peer-to-peer
approach with DHTs to cache file chunks across many users. Since the overhead of
metadata lookups is prohibitive for low latency, though, this scheme cannot be used
to accelerate the delivery of web content.
5.4.2 Mediators
In contrast to reverse proxies that can serve any web application, mediators are
typically designed to handle one specific use case, type of data, or class of data
sources. Work in this area is mostly concerned with constructing query plans using
semantic techniques to leverage both locally cached data from the mediator as well
as distributed data sources [LC99, CRS99, Ada+96].
5.4.3 Query Caching Proxies and Middlewares
DBProxy, DBCache, and MTCache [Ami+03a, LGZ04, Bor+04] rely on dedicated
database proxies to generate distributed query plans that can efficiently combine
cached data with the original database. However, these systems need built-in tools of
the database system for consistency management and are less motivated by latency
reduction than by reducing query processing overhead in the database similar to
materialized views [Shi11].
DBProxy [Ami+03a] is designed to cache SQL queries in a proxy, similar to a
reverse proxy cache or a CDN. DBProxy adapts the schema as new queries come
in and learns query templates by comparing queries to each other. When a query
is executed in the database, results are stored in DBProxy. To reuse cached data,
DBProxy performs a containment check that leverages the simplicity of templates
to lower the complexity of traditional query containment algorithms [Ami+03b].
102
5 Caching in Research and Industry
DBProxy receives asynchronous updates from the database system and hence
offers -atomicity by default. The authors describe monotonic reads and strong
consistency as two potential options for reducing staleness in DBProxy, but do not
evaluate or elaborate on the implications. DBProxy assumes that the application
runs as a Java-based program in the proxy and enhances the JDBC driver to inject
the caching logic. The authors do not discuss the impact of transactional queries on
correctness when they are invisible to the database system.
DBCache [Bor+04, Luo+02, Bor+03] and MTCache [LGZ04] are similar
approaches that employ nodes of relational database systems for caching (IBM
DB2 and Microsoft SQL Server, respectively). Both systems rewrite query plans
to exploit both local and remote data. In DBCache, the query plan is called a
Janus plan and consists of a probe query and a regular query. The probe query
performs an existence check to determine whether the local tables can be used
for the query. Afterwards, a regular query containing a clause for both local and
remote data is executed. Cache coherence is based on the DB2 replication interface
that asynchronously propagates all updates of a transaction. MTCache uses the
corresponding asynchronous replication mechanism from Microsoft SQL Server.
It maintains the cache as a set of materialized views and performs cost-based
optimization on query templates to decide between local and remote execution. Due
to their strong relation to database replication protocols, DBCache and MTCache
are effectively lazily populated read replicas.
Labrinidis et al. proposed WebViews as a technique of caching website fragments
[LR01a, LR00]. A WebView refers to HTML fragments generated by database
queries, e.g., a styled table of stock prices. Through a cost-based model, WebViews
are either materialized in the web servers, in the database, or not at all. The
authors found that materialization in the web servers is generally more effective
than materialization in the database by at least a factor of 10, since it incurs fewer
round-trips to the database.
5.4.4 Search Result Caching
According to Bortnikov et al. [BLV11], caching approaches for search results
can be classified into coupled and decoupled design. In a decoupled design (e.g.,
[Cam+10]), the caches are independent from the search index (i.e., the database),
while a coupled design is more sophisticated and actively uses the index to
ensure cache coherence. Blanco et al. investigated query caching in the context
of incremental search indices at Yahoo and proposed a coupled design [Bla+10].
To achieve cache coherence, their cache invalidation predictor (CIP) generates
a synopsis of invalidated documents in the document ingestion pipeline. This
summary is checked before returning a cached search query to bypass the cache
when new indexed document versions are available. Unlike evolving summary data
structures like the Cache Sketch [Ges+15], the synopses are immutable, created in
batch, and only used to predict likely invalidations of server-side caches.
5.4 Query Caching
103
Bortnikov et al. [BLV11] improved the basic CIP architecture using realistic
workloads, more efficient cache replacement algorithms and optimizations to deal
with less popular documents. Alici et al. [Ali+11] were able to achieve comparable
invalidation accuracy using a timestamp-based approach where an invalidation is
detected by having the cache distribute the timestamp metadata of a cached query to
all responsible search servers. These confirm freshness, if they did not index updated
document versions, nor new documents that also match the search term, based on the
respective timestamps. The broadcast is less expensive than reevaluating the query,
but not suitable for latency reduction in a web caching scenario.
5.4.5 Summary Data Structures for Caching
5.4.5.1
Bloom Filters for Caching
Summary Cache proposed by Fan et al. [Fan+00] is a system for web caching that
employs Bloom filters as metadata digests in cooperative web caches. As such,
it bears some resemblance to Orestes [Ges19] which uses the Cache Sketch as a
Bloom filter-based data structure for informing clients of possibly outdated caches.
Summary Cache, however, is fundamentally different as its summaries (“cache
digests”) are generated in intervals to communicate the set of locally available
cached data to cooperating web caches. In the context of Summary Cache, Counting
Bloom filters were introduced in the literature for the first time. Since each server
has to delete URLs from the Bloom filter when they are replaced from the cache,
a removal operation is necessary. In this setting, considerations about the optimal
Bloom filter size and invalidations are not required as the Bloom filter only serves
as a means of bandwidth reduction.
Recently, cache fingerprinting has been proposed for improving HTTP/2 push
[ON16]. The idea is to construct a digest of the browser cache’s contents—similar
to Summary Cache—to efficiently identify resources that are already available in
the client and therefore do not have to be pushed by the server. Instead of Bloom
filters, Golomb-compressed sets (GCS) [PSS09] are used. GCS exploit the fact
that in a Bloom filter with only one hash function, the differences between values
follow a geometrical distribution [MU05]. This pattern can be optimally compressed
using Golomb-coding, yielding a smaller size than Bloom filters. The fingerprinting
scheme has not been standardized, yet, but an experimental implementation is
available in the H2O web server [Hev].
In NoSQL systems, Bloom filters are frequently used to accelerate storage
engines based on log-structured merge-trees (LSMs). Google’s BigTable [Cha+08]
pioneered the approach that has been adopted in various systems (e.g., Cassandra,
LevelDB, HyperDex, WiredTiger, RocksDB, TokuDB) [LM10, EWS12, GD11].
In BigTable, data is stored in immutable SSTables located on disk. In order to
avoid disk I/O for the lookup of a key for each SSTable, a Bloom filter is loaded
into memory. Only when the check is positive, the SSTable is queried on disk. In
104
5 Caching in Research and Industry
contrast to the Cache Sketch in Orestes, the Bloom filters in BigTable only need to
be constructed once, as BigTable’s on-disk data is immutable.
5.4.5.2
Alternatives to Bloom Filters
Space-efficient alternatives to Bloom filters have been proposed. While Golombcoded sets [PSS09] achieve slightly smaller sizes, they are not suited for Orestes,
as fast O(1) lookups are not possible. Mitzenmacher [Mit02] proposed Compressed
Bloom Filters. They are based on the observation that a very sparse Bloom filter
can be efficiently compressed by an entropy encoding like Huffman and arithmetic
codes [Rom97]. However, due to the size of the uncompressed filter, memory
consumption is infeasible for the client. Using several Blocked Bloom filters with
additional compression would mitigate this problem, but increase the complexity
and latency of lookups [PSS09]. Cuckoo filters have been proposed by Fan et al.
[Fan+14] as a more space-efficient alternative to Counting Bloom filters. In contrast
to a Counting Bloom filter, however, the number of duplicate entries in a Cuckoo
filter is strictly bounded. Matrix filters proposed by Porat et al. [PPR05, Por09]
achieve the lower limit of required space for a given false positive rate. This
advantage is contrasted by linear lookup time and a complex initial construction
of the data structure. The original Bloom filters are already within a factor of 1.44
of the theoretical lower bound of required space [BM03] and offer O(1) inserts and
lookups. Kirsch et al. [KM06] showed that the use of a linear combination of two
independent hash functions reduces the amount of costly hash computations without
loss of uniformity. An overview of other Bloom filter variants and applications is
given by Broder and Mitzenmacher [BM03] and Tarkoma et al. [TRL12].
Bloom filters, Golomb-compressed sets, and other summary data structures are
popular means for optimization in both networking and data management. But as is
often the case for optimizations, the application scenario and workload determine
which approach is considered the right one. For example, the choice between using
Golomb-compressed sets or original Bloom filters may depend on whether the
superior space-efficiency of the former is valued over the constant-time lookups
of the latter.
5.5 Eager vs. Lazy Geo Replication
5.5.1 Replication and Caching
The goal of replication is to increase read scalability and to decouple reads from
writes to offload the database and reduce latency. Replication can protect the system
against data loss. In case of geographically distributed replicas (geo-replication),
read latency for distributed access from clients is improved, too [Bak+11, Shu+13,
5.5 Eager vs. Lazy Geo Replication
105
Kra+13, Llo+13, Cor+12, Cor+13, Sov+11, Cha+08, DeC+07, Coo+08, Ter+13]. In
this setting, a central constraint is that intra-data center latencies are small (<5 ms),
while inter-data center communication is expensive (50–150 ms) [Agr+13]. Caching
can be viewed as an alternative to replication. With caching, data is fetched and
stored on-demand, while with geo-replication the complete data set is synchronized
between multiple replica sites, incurring higher management overhead. However,
two different kinds of caches can be distinguished: caches that require expensive
updates (invalidation-based caches) and passive caches that do not incur any overhead to the server (expiration-based caches). If replicas are allowed to accept writes
(multi-master), considerable coordination is required to guarantee consistency.
Charron-Bost et al. [CBPS10, Chapter 12] and Öszu and Valduriez [ÖV11,
Chapter 13] provide a comprehensive review of replication techniques. We will
focus on a discussion of exemplary, influential geo-replicated systems and an outline
of how their trade-offs differ from one another.
5.5.2 Eager Geo-Replication
Through eager geo-replication as implemented in Megastore [Bak+11], Spanner
[Cor+13, Cor+12], and F1 [Shu+13] as well as in MDCC [Kra+13] and Mencius
[MJM08], applications achieve strong consistency at the cost of higher write
latencies (typically 100 ms [Cor+12] to 600 ms [Bak+11]).
5.5.2.1
Megastore
Baker et al. [Bak+11] came to the conclusion, that the cost of strong consistency
and ACID transactions in highly distributed systems is often acceptable in order
to empower developers. Megastore’s data model is based on entity groups, that
represent fine-grained, application-defined data partitions (e.g., a user’s message
inbox). Transactions are supported per co-located entity group, each of which is
mapped to a single row in BigTable that offers row-level atomicity. Transactions
spanning multiple entity groups are possible, but not encouraged, as they require
expensive 2PC [Lec09].
Megastore (also available as a DBaaS called Google Cloud Datastore) uses
synchronous wide-area replication. The replication protocol is based on Paxos
consensus [Lam98] over positions in a shared write-ahead log. Megastore uses the
Multi-Paxos [Lam01] optimization to achieve best-case performance of one widearea round-trip per write as opposed to two round-trips with regular Paxos. This
replication protocol has been improved by Kraska et al. [Kra+13] in MDCC (MultiData Center Consistency). They include two additional Paxos optimizations (fast
and generalized Paxos) and reduce conflicts by leveraging commutativity of certain
updates.
106
5 Caching in Research and Industry
To allow consistent local read operations, Megastore tracks the replication status
of each entity group in a per-site coordinator. In order for the coordinator to reflect
the latest state of each entity group, the Paxos replication not only has to contact a
quorum as in the original protocol, but has to wait for acknowledgments from each
replica site. This implies that lower latency for consistent reads is achieved at the
expense of slower writes.
The authors report average read latencies of 100 ms and write latencies of 600 ms.
These numbers illustrate the considerable cost of employing synchronous wide-area
replication. The high latency of writes is critical, as Megastore employs a form of
optimistic concurrency for writes on the same entity group: if two writes happen
concurrently during replication, only one will succeed. This limits the throughput
to 1/ lw , where lw is the write latency, i.e., about ten writes per second in the best
case. Megastore is also available as a DBaaS called Google Cloud Datastore in the
Google App Engine PaaS.
5.5.2.2
Spanner and F1
Spanner [Cor+13, Cor+12] evolved from the observation that Megastore’s
guarantees—though useful—come at performance penalty that is prohibitive for
some applications. Spanner is a multi-version database system that unlike Megastore
provides efficient cross-shard ACID transactions. The authors argue: “We believe
it is better to have application programmers deal with performance problems due
to overuse of transactions as bottlenecks arise, rather than always coding around
the lack of transactions” [Cor+12, p. 4]. Spanner automatically groups data into
partitions (tablets) that are synchronously replicated across sites via Paxos and
stored in Colossus, the successor of GFS [GGL03]. Transactions in Spanner are
based on two-phase locking and 2PC executed over the leaders for each partition
involved in the transaction. Spanner serializes transactions according to their global
commit times.4 To make this feasible, Spanner introduces TrueTime, an API for
high precision timestamps with uncertainty bounds implemented using atomic
clocks and GPS. Each transaction is assigned a commit timestamp from TrueTime.
Using the uncertainty bounds, the leader can wait until the transaction is guaranteed
to be visible at all sites before releasing locks. This also enables efficient read-only
transactions that can read a consistent snapshot for a certain timestamp across all
data centers without any locking.
Mahmoud et al. [Mah+13] proposed an optimization for faster commits that
integrates local 2PC in data centers with a Paxos consensus as to whether the
transaction should commit (replicated commit protocol). This reduces commit
latency, but comes at the cost of high read latency, since every read needs to contact
a majority of data centers to only read committed data.
4 This
is termed external consistency by the Spanner authors and known in the literature as strict
serializability or commit order-preserving conflict serializable (COCSR) [WV02].
5.5 Eager vs. Lazy Geo Replication
107
F1 [Shu+13] and its commercial version Cloud Spanner [Bre17] build on
Spanner to support SQL-based access for Google’s advertising business. To this
end, F1 introduces a hierarchical schema based on Protobuf, a rich data encoding
format similar to Avro and Thrift [Kle17]. To support both OLTP and OLAP queries,
it uses Spanner’s abstractions to provide consistent indexing. A lazy protocol
for schema changes allows non-blocking schema evolution [Rae+13]. Besides
pessimistic Spanner transactions, F1 supports optimistic transactions. Each row
bears a version timestamp that is used at commit time to perform a short-lived
pessimistic transaction to validate a transaction’s read set. Optimistic transactions in
F1 suffer from the abort rate problem [Gra+81], as the read phase is latency-bound
and the commit requires slow, distributed Spanner transactions.
According to the CAP theorem [Bre00], Spanner and F1 cannot be highly
available systems. Brewer [Bre17] argues that in practice, however, they behave
as highly available systems through engineering best practices. For example, Cloud
Spanner does not rely on the public Internet to perform geo-replication, but instead
transfers data over private, redundant networks owned and operated by Google.
CockroachDB [Coc] is an open-source, geo-replicated, relational database system based on the design of Spanner and F1. To support commodity hardware,
CockroachDB does not use TrueTime, but instead uses NTP synchronization with
hybrid logical clocks5 [Kul+14]. As a consequence, CockroachDB cannot provide
strict serializability for transactions, only serializability.6 The transaction protocol
is based on an underlying key-value store that is replicated using Raft consensus
[OO13] for groups of keys. Atomicity is achieved through a locking protocol on
per-record metadata, similar to Percolator [PD10]. Isolation is implemented as
multi-version timestamp ordering[WV02] per consensus group and 2PC across
groups. Read-write and write-write conflicts therefore cause transaction aborts, if
the operations are not ordered according to the transaction begin timestamps. Like
Spanner and F1, CockroachDB is prone to high read, write, and transaction latency
due to synchronous geo-replication and 2PC.
Summing up, strict serializability is an important property for applications.
Without this guarantee, blind writes (e.g., inserting a comment record) can be
delayed arbitrarily and may never become visible. Spanner and F1 achieve strict
serializability by delaying transaction commits and using high-precision clocks,
while CockroachDB sacrifices the guarantee for performance reasons.
5 Hybid logical clocks combine the benefits of logical clocks [Lam78] for simple tracking of
causality with physical clocks that are within a defined drift from real time by merging both.
6 In particular for individual operations (transactions with a single read or write), the lack of strict
serializability implies that linearizability is not guaranteed in CockroachDB.
108
5 Caching in Research and Industry
5.5.3 Lazy Geo-Replication
With lazy geo-replication as in Dynamo [DeC+07], BigTable/HBase [Cha+08,
Hba], Cassandra [LM10], MongoDB [CD13], CouchDB [ALS10], Couchbase
[Lak+16], Espresso [Qia+13], PNUTS [Coo+08], Walter [Sov+11], Eiger [Llo+13],
and COPS [Llo+11] stale reads are allowed, but the system performs better and
remains available during partitions.
5.5.3.1
Eventually Consistent Geo-Replication
Most asynchronously replicated NoSQL systems support using their intra-data
center replication protocol for cross-data center replication. In contrast to systems
with transparent geo-replication, the application needs to be explicitly configured to
route read and write requests to the correct data center. MongoDB [CD13] allows
tagging shards with a zone parameter to allocate data to regions based on properties
(e.g., an “address” field in user documents). It also supports distributing replicas
within a replica set over multiple locations. However, this comes at a cost, as replicas
from another data center can be elected to masters upon network partitions and
transient failures. Couchbase [Lak+16] uses the asynchronous Memcache replication protocol for geo-replication. Most RDBMSs include only limited support for
geo-distributed deployments, mostly directly based on their asynchronous intra-data
center replication protocols (e.g., in MySQL, MySQL Cluster, and PostgresSQL
[Pos]).
CouchDB [ALS10] has a multi-master replication protocol that was designed
for heavily geo-distributed setups from device-embedded instances to multiple
data centers. As writes are allowed on each slave, conflicts are tracked using
hash histories [Agr+13], an alternative to vector clocks [DeC+07] for causality
tracking. Quorum systems such as Dynamo, Cassandra, and Riak [LM10, DeC+07]
require location-awareness for each key’s preference list, i.e., the information
on whether the responsible database nodes are local to the data center or connected through wide-area networks. Cassandra, for example, supports configuring
remote site behavior through topology strategies and per operation quorums.
These quorums define whether data is replicated purely asynchronously (e.g., for
analytics) or whether a remote cluster has to participate in the overall quorum
(“EACH_QUORUM”) [CH16]. Riak distinguishes between a source cluster for
operational workloads and sink clusters that do not participate in quorums and only
asynchronously receive writes from the source cluster.
BigTable and HBase [Cha+08, Hba] are synchronously replicated within a
data center at the file system level (GFS and HDFS [GGL03], respectively), but
offer asynchronous wide-area replication, mainly for purposes of disaster recovery.
LinkedIn’s Espresso is a document store that uses asynchronous master-slave
5.5 Eager vs. Lazy Geo Replication
109
replication within a data center built on top of a change data capturing system called
Databus [Das+12]. Subscribers to this replication bus can be placed in remote data
centers.
5.5.3.2
PNUTS
Causal consistency is the strongest level of consistency achievable without inter-data
center coordination [Llo+11]. Yahoo’s PNUTS system [Coo+08] was influential
in this respect, as it combines stronger consistency with a geo-replicated design.
PNUTS leverages the observation that updates for a particular record tend to
originate from the same region. Therefore, the primary is chosen per record
(“record-level mastering”). Updates are propagated through an asynchronous pub/sub message broker that enforces a serial order for updates on the same key which
guarantees causal consistency per key (termed “timeline consistency”). Reads can
be directed to any replica, if timeline consistency is sufficient (“read-any”), or
explicitly request monotonic reads (“read-critical”) or strong consistency (“readlatest”) else. In each region, records are range-sharded and stored in MySQL. The
design of PNUTS presents a compromise between multi-master and master-slave
replication. It decouples failures of primaries for different records and achieves low
latency, if the primary only receives writes from nearby clients.
5.5.3.3
Eiger, COPS, and Walter
Eiger [Llo+13] and COPS [Llo+11] are two approaches for providing full causal
consistency for asynchronous replication. Eiger and COPS have strong similarities,
their major difference is that causality tracking in COPS is based on per-record
metadata, while Eiger tracks dependencies between operations. COPS introduces
the notion of causal+ consistency that combines causal consistency with guaranteed
convergence of writes. While COPS is not the first system to provide causal+
consistency for geo-replication, it is the first that is not based on unscalable use
of the database log like Bayou [Dem+94] and PRACTI [Bel+06]. The key idea
of COPS is to have clients attach metadata of causally relevant read operations to
each write operation. During replication at a remote site, a write is only applied
if all causal dependencies have also been applied already. To ensure convergence,
conflicting writes are resolved using a commutative and associative handler (e.g.,
last-writer-wins). COPS also introduces a two-phase commit algorithm for readonly transactions that only see causally consistent records.
Walter [Sov+11] extends the COPS approach for causality tracking to transactions, by introducing Parallel Snapshot Isolation as an isolation level that relaxes
snapshot isolation to allow different transaction orderings on different sites. Bailis
et al. have proposed bolt-on causal consistency [Bai+13a] that provides causal
consistency at the client side. The idea is similar to the concept behind COPS:
110
5 Caching in Research and Industry
writes are only made visible for reads, once their casual dependencies are available.
However, as this safety guarantee is not paired with a liveness guarantee, clients can
end up reading very stale data.
The main problem of all geo-replication schemes for causal consistency is
that either potential causality is tracked which imposes a large overhead or that
developers are faced with the burden of explicitly declaring causal relationships.
5.5.3.4
Pileus
Pileus [Ter+13] proposed by Terry et al. from Microsoft Research achieves low
latency, single round-trip writes, and bounded staleness. It is based on an SLA
concept, in which developers can annotate consistency levels and latency bounds
with utility values. For example, an application could specify that up to 5 min of
staleness are tolerable and then define the monetary value of requests that return in
200 ms, 400 ms, or 600 ms. Pileus has a key-value data model with CRUD-based
access. It employs a primary site for updates and all geo-replicated secondary sites
are asynchronously replicated.
Clients are responsible for selecting a replica by evaluating the SLA and
returning the sub-SLA (a combination of a consistency and latency requirement
with a utility) that has the highest utility multiplied by the probability of meeting
the consistency and latency requirement. Data is then read from the replica that
maximizes the selected sub-SLA. The decision whether a replica can satisfy
a consistency requirement is based on computing a minimum acceptable read
timestamp that indicates how far a replica is allowed to lag behind the primary
without violating the consistency level. To make this feasible, clients need to
frequently collect information about network latency and replication lag from
all replicas. Different consistency levels are supported (e.g. monotonic reads,
eventual/causal/strong consistency, -atomicity). For -atomicity, however Pileus
assumes a relatively high (typically minutes [Ter+13, p. 316]), as otherwise
polling from replicas becomes inefficient and strict clock synchronization would
be required.
In a follow-up work, Pileus was extended by the ability for dynamic primary/secondary reconfiguration in order to maximize global utility of SLAs in a system
called Tuba [AT14]. Here, a configuration service periodically collects observed
latencies and SLA violations from clients and selects a new configuration with
the best utility-to-cost ratio. Potential reconfigurations include adding or switching
a primary site and changing the replication factor and replica locations. Clients
need to always be aware of configuration changes in order not to perform strongly
consistent reads and writes on non-primaries. Compared to Pileus, Tuba increases
the probability of strongly consistent reads from 33% to 55%.
5.5 Eager vs. Lazy Geo Replication
5.5.3.5
111
Tao
Tao is an example of a system that combines geo-replication with caching. Bronson
et al. [Bro+13] describe the system that stores Facebook’s multi-petabyte social
graph. The data is held in a sharded MySQL which is asynchronously replicated
across data centers. Caching is performed at two levels of cache tiers. The leader
cache tier is located in front of MySQL and is allowed to perform writes on it.
Multiple follower cache tiers service requests to their nearest application servers
and forward requests to the leader if necessary. Each tier consists of many modified
Memcache [Fit04] servers with custom memory allocation and LRU cache eviction
schemes [Xu+14, Nis+13]. Tiers are sharded through consistent hashing to avoid
reshuffling of data in case of failures. To mitigate popularity-induced hotspots,
each shard inside a tier can be master-slave replicated. The tiers behave like
synchronous write-through caches, i.e., when a write request arrives at a follower
tier’s Memcache shard, it is forwarded to the respective leader shard. If the
current data center is the master for that data item, the write is performed on the
corresponding MySQL shard. Otherwise, it is forwarded to the leader tier of the
master data center. When the write is complete, invalidation messages are issued
to every cache holding that data item. Cache coherence is thus asynchronous, i.e.,
there are no consistency guarantees, but anecdotally the lag is in the order of one
second [Bro+13]. Tao handles roughly one billion reads per second with a readheavy workload (over 99% reads).
Lu et al. [Lu+15] performed extensive consistency checking for Tao’s twolevel caching architecture by sampling requests. They analyzed violations of
linearizability, read-your-writes consistency, and per-object sequential consistency.
For Facebook’s workload, the violation are reported to be very rare (e.g., 0.00151%
in case of linearizability). The authors attribute this to the fact that writes are very
rare and only 10–25% of all objects experience both reads and writes. The effects on
transactional isolation were not measured, as the distributed nature of transactions
made a tracing and checking approach impossible.
5.5.3.6
Tunable Consistency and the Latency-Consistency Trade-Off
The idea of exposing tunable consistency to developers is found in other systems.
In many applications some operations need to be performed with strong consistency
(e.g., password checking), while eventual consistency is acceptable for others (e.g.,
adding a product to the shopping cart). Both Twitter and Facebook have sub-systems
providing strong consistency for operations on critical data [Sch16, Lu+15]. In
Google’s Megastore [Bak+11], weakly consistent reads are allowed for performance
reasons despite strongly consistent updates. In Gemini [Li+12], red (strongly consistent) and blue (weakly consistent) operations are distinguished for geo-replicated
storage. Gemini maximizes the use of fast, locally executed blue operations by
determining when an operation is commutative to every potentially concurrent
operation.
112
5 Caching in Research and Industry
Kraska et al. [Kra+09] proposed to attach SLAs to objects in order to include
the cost as an optimization factor for cloud-based storage systems (consistency
rationing). The two SLA classes A and C reflect data that is always handled with
strong or weak consistency, respectively, while class B is continuously optimized
according to a cost function. Florescu and Kossmann [FK09] argue that most cloudbased applications are not concerned with the concrete level of consistency, but the
overall cost of the application.
Application complexity is usually increased by different consistency choices
[Li+14]. Guerraoui et al. [GPS16] proposed Correctables as a new programming
model that abstracts different consistency levels. The main idea is to provide a
Promise-based [LS88b] interface that can either directly execute an operation at the
desired consistency level or return multiple results with increasing consistency and
delay. For example, in a ticket checkout process, a potentially stale stock counter
could be returned first to proceed, when it is sufficiently high, with the option to
abort shortly afterwards, if the actual stock value is already zero. A similar scheme is
used in Meteor [HS16], to hide potentially slow write operations from users (latency
compensation).
5.5.3.7
Geo-Replica Placement
In contrast to caching, the decision where to best replicate data involves intimate
knowledge of workloads and access patterns. Web caching is inherently more
adaptive than replication, as data is materialized on demand and as near to the
client as possible. Wu et al. [Wu+13] have proposed SPANStore to address replicaplacement in multi-cloud environments. SPANStore minimizes the cost of a data
storage deployment based on application requirements such as latency SLOs and
desired consistency levels. For each access set of an application’s workload, a
placement manager decides where to store data and from where to serve reads and
writes. To provide transparency to the application, a client library proxies access to
the different cloud data centers.
The problem of geo-replication was also studied for transactional workloads by
Sharov et al. [Sha+15]. They proposed a replication framework for transactional,
highly distributed workloads that minimizes latency through appropriate primary
and replica placement. Zakhary et al. [Zak+16] described a similar approach for
majority-based replication. They employ a cache-like “optimistic read” optimization: instead of always reading from a majority of replicas, a passive replica
(effectively a client cache) can be used and reads can be validated before transaction
commit.
5.5.3.8
Consistency
Consistency in replicated storage systems has been studied in both theory [GLS11]
and practice [Bai+12, Lu+15, Ber14]. An up-to-date and in-depth discussion of
5.6 Summary
113
consistency in distributed systems and databases is provided by Viotti and Vukolic
[VV16]. Their two main observations are that there is a complex relationship
between different consistency levels and that similar guarantees are often named
differently across research communities.
Lee et al. [Lee+15] proposed to decouple the problem of consistency from
database system design through a system called RIFL (Reusable Infrastructure
for Linearizability). RIFL builds on remote procedure calls (RPCs) with at-leastonce semantics (i.e., invocations with retries) and enhances them to exactly-once
semantics which are sufficient to guarantee linearizability. To this end, each request
is assigned a unique identifier and a persistent log guarantees that completed
requests will not be re-executed.7 The authors report a write overhead of their
implementation in RAMCloud [Ous+11] of only 4% compared to the base system
without RIFL. The exactly-once semantics also simplify the implementation of
transactions. Their approach builds on Sinfonia [Agu+07], an in-memory service
infrastructure that provides a mini-transaction primitive for atomic cross-node
memory access. A central limitation of RIFL is its assumption that clients are
reliable and do not lose their state upon crashes. In the web this assumption does
not hold.
5.6 Summary
Web caching and geo-replication are both widely used for increasing scalability and
achieving low latency in globally distributed applications.
Invalidation-based caching can be used to provide rigorous consistency guarantees for accessing clients, but requires continuous change monitoring for all
cached resources and is therefore typically considered infeasible for complex access
patterns (e.g. for query caching). Expiration-based caching is often used instead, but
here staleness is only bounded by TTLs, so that high efficiency (large TTLs) has to
be weighed against data freshness (low TTLs). The advantage of geo-replication
is that consistency and transactional isolation levels can be chosen through the
replication protocol and be tuned for the respective database system. It is more
powerful to provide protection against disaster scenarios and can reduce latency
for strongly consistent reads. As a major downside, however, geo-replication either
has to perform multiple synchronous wide-area round-trips for consistent updates
or it can only provide eventual consistency without recency guarantees.
7 The idea of building distributed transactions on a shared log is also found in Calvin [Tho+12] (cf.
p. 141).
114
5 Caching in Research and Industry
References
[Ada+96] Sibel Adali et al. “Query Caching and Optimization in Distributed Mediator
Systems”. In: Proceedings of the 1996 ACM SIGMOD International Conference
on Management of Data, Montreal, Quebec, Canada, June 4–6, 1996. Ed. by
H. V. Jagadish and Inderpal Singh Mumick. ACM Press, 1996, pp. 137–148.
https://doi.org/10.1145/233269.233327.
[Ady+07] Atul Adya et al. “Anatomy of the ado. net entity framework”. In: Proceedings
of the 2007 ACM SIGMOD international conference on Management of data.
ACM. 2007, pp. 877–888.
[Ady+95] Atul Adya et al. “Efficient Optimistic Concurrency Control Using Loosely Synchronized Clocks”. In: Proceedings of the 1995 ACM SIGMOD International
Conference on Management of Data, San Jose, California, May 22–25, 1995.
Ed. by Michael J. Carey and Donovan A. Schneider. ACM Press, 1995, pp. 23–
34. https://doi.org/10.1145/223784.223787.
[Aer] Aerospike. http://www.aerospike.com/. (Accessed on 05/11/2018). 2018. URL:
http://www.aerospike.com/ (visited on 01/13/2017).
[AG17] Nick Antonopoulos and Lee Gillam, eds. Cloud Computing: Principles,
Systems and Applications (Computer Communications and Networks). 2nd ed.
2017. Springer, July 2017. ISBN: 9783319546445. URL: http://amazon.com/o/
ASIN/3319546449/.
[AGK95] Brad Adelberg, Hector Garcia-Molina, and Ben Kao. “Applying Update
Streams in a Soft Real-Time Database System”. In: Proceedings of the
1995 ACM SIGMOD International Conference on Management of Data, San
Jose, California, May 22–25, 1995. Ed. by Michael J. Carey and Donovan A.
Schneider. ACM Press, 1995, pp. 245–256. https://doi.org/10.1145/223784.
223842.
[Agr+13] Divyakant Agrawal et al.
“Managing Geo-replicated Data in Multidatacenters”. In: Databases in Networked Information Systems - 8th International Workshop, DNIS 2013, Aizu-Wakamatsu, Japan, March 25–27, 2013.
Proceedings. Ed. by Aastha Madaan, Shinji Kikuchi, and Subhash Bhalla.
Vol. 7813. Lecture Notes in Computer Science. Springer, 2013, pp. 23–43.
https://doi.org/10.1007/978-3-642-37134-9_2.
[Agr+86] Divyakant Agrawal et al. “Distributed Multi-Version Optimistic Concurrency
Control for Relational Databases”. In: Spring COMPCON’86, Digest of Papers,
Thirty-First IEEE Computer Society International Conference, San Francisco,
California, USA, March 3–6, 1986. IEEE Computer Society, 1986, pp. 416–
421.
[Agu+07] Marcos K. Aguilera et al. “Sinfonia: a new paradigm for building scalable
distributed systems”. In: ACM SIGOPS Operating Systems Review. ACM,
2007, pp. 159–174. URL: http://dl.acm.org/citation.cfm?id=1294278 (visited on
01/03/2015).
[Ajo+15] Phillipe Ajoux et al.
“Challenges to adopting stronger consistency at
scale”. In: 15th Workshop on Hot Topics in Operating Systems (HotOS XV).
2015. URL: https://www.usenix.org/conference/hotos15/workshop-program/
presentation/ajoux (visited on 11/28/2016).
[Ali+11] Sadiye Alici et al. “Timestamp-based result cache invalidation for web search
engines”. In: Proceedings of the 34th international ACM SIGIR conference on
Research and development in Information Retrieval. ACM, 2011, pp. 973–982.
URL : http://dl.acm.org/citation.cfm?id=2010046 (visited on 04/24/2015).
[Ali+12] Sadiye Alici et al. “Adaptive time-to-live strategies for query result caching
in web search engines”. In: European Conference on Information Retrieval.
Springer, 2012, pp. 401–412. URL: http://link.springer.com/chapter/10.1007/
978-3-642-28997-2_34 (visited on 11/26/2016).
References
115
[ALS10] J. Chris Anderson, Jan Lehnardt, and Noah Slater. CouchDB - The Definitive
Guide: Time to Relax. O’Reilly, 2010. ISBN: 978-0-596-15589-6. URL: http://
www.oreilly.de/catalog/9780596155896/index.html.
[Alt+03] Mehmet Altinel et al. “Cache Tables: Paving the Way for an Adaptive Database
Cache”. In: VLDB. 2003, pp. 718–729. URL: http://www.vldb.org/conf/2003/
papers/S22P01.pdf.
[Ama16] Sean Amarasinghe. Service worker development cookbook. English. OCLC:
958120287. 2016. ISBN: 978-1-78646-952-6. URL: http://lib.myilibrary.com?
id=952152 (visited on 01/28/2017).
[Ami+03a] K. Amiri et al. “DBProxy: A dynamic data cache for Web applications”. In:
Proceedings of the ICDE. 2003, pp. 821–831. URL: http://www-2.cs.cmu.edu/~
amiri/icde-indus.pdf (visited on 06/28/2012).
[Ami+03b] Khalil Amiri et al. “Scalable template-based query containment checking for
web semantic caches”. In: Proceedings of the 19th International Conference
on Data Engineering, March 5–8, 2003, Bangalore, India. Ed. by Umeshwar
Dayal, Krithi Ramamritham, and T. M. Vijayaraman. IEEE Computer Society,
2003, pp. 493–504. https://doi.org/10.1109/ICDE.2003.1260816.
[Ant+02] Jesse Anton et al. “Web caching for database applications with Oracle
Web Cache”. In: Proceedings of the 2002 ACM SIGMOD International
Conference on Management of Data, Madison, Wisconsin, June 3–6, 2002. Ed.
by Michael J. Franklin, Bongki Moon, and Anastassia Ailamaki. ACM, 2002,
pp. 594–599. https://doi.org/10.1145/564691.564762.
[AR17] Ejaz Ahmed and Mubashir Husain Rehmani. “Mobile Edge Computing:
Opportunities, solutions, and challenges”. In: Future Generation Comp. Syst.,
70 (2017), pp. 59–63. https://doi.org/10.1016/j.future.2016.09.015.
[AT14] Masoud Saeida Ardekani and Douglas B. Terry. “A Self-Configurable GeoReplicated Cloud Storage System”. In: 11th USENIX Symposium on Operating
Systems Design and Implementation, OSDI ’14, Broomfield, CO, USA, October 6–8, 2014. Ed. by Jason Flinn and Hank Levy. USENIX Association,
2014, pp. 367–381. URL: https://www.usenix.org/conference/osdi14/technicalsessions/presentation/ardekani.
[Aur+12] Aditya Auradkar et al. “Data Infrastructure at LinkedIn”. In: IEEE 28th
International Conference on Data Engineering (ICDE 2012), Washington, DC,
USA (Arlington, Virginia), 1–5 April, 2012. Ed. by Anastasios Kementsietsidis
and Marcos Antonio Vaz Salles. IEEE Computer Society, 2012, pp. 1370–1381.
https://doi.org/10.1109/ICDE.2012.147.
[Bai+12] Peter Bailis et al. Probabilistically bounded staleness for practical partial
quorums. Tech. rep. 8. 2012, pp. 776–787. URL: http://dl.acm.org/citation.cfm?
id=2212359 (visited on 07/16/2014).
[Bai+13a] Peter Bailis et al. “Bolt-on Causal Consistency”. In: Proceedings of the
2013 ACM SIGMOD International Conference on Management of Data. SIGMOD’13. New York, New York, USA: ACM, 2013, pp. 761–772.
[Bak+11] J. Baker et al. “Megastore: Providing scalable, highly available storage for
interactive services”. In: Proc. of CIDR. Vol. 11. 2011, pp. 223–234.
[BCL89] José A. Blakeley, Neil Coburn, and Per-Åke Larson. “Updating Derived
Relations: Detecting Irrelevant and Autonomously Computable Updates”. ACM
Trans. Database Syst., 14.3 (1989), pp. 369–400. https://doi.org/10.1145/
68012.68015.
[Bel+06] Nalini Moti Belaramani et al. “PRACTI Replication”. In: 3rd Symposium
on Networked Systems Design and Implementation (NSDI 2006), May 8–10,
2007, San Jose, California, USA, Proceedings. Ed. by Larry L. Peterson and
Timothy Roscoe. USENIX, 2006. URL: http://www.usenix.org/events/nsdi06/
tech/belaramani.html.
116
5 Caching in Research and Industry
[Ben14] Juan Benet. “IPFS - content addressed, versioned, P2P file system”. In: CoRR,
abs/1407.3561 (2014). arXiv: 1407.3561. URL: http://arxiv.org/abs/1407.3561.
[Ber14] David Bermbach. Benchmarking Eventually Consistent Distributed Storage
Systems. eng. Karlsruhe, Baden: KIT Scientific Publishing, 2014. ISBN: 978-37315-0186-2 3-7315-0186-4 978-3-7315-0186-2.
[Bes95] Azer Bestavros. “Demand-based document dissemination to reduce traffic and
balance load in distributed information systems”. In: Proceedings of the Seventh
IEEE Symposium on Parallel and Distributed Processing, SPDP 1995, San
Antonio, Texas, USA, October 25–28, 1995, IEEE, 1995, pp. 338–345. https://
doi.org/10.1109/SPDP.1995.530703.
[Bes96] A. Bestavros. “Speculative data dissemination and service to reduce server load,
network traffic and service time in distributed information systems”. In: Proc.
Twelfth Int. Conf. Data Engineering. Feb. 1996, pp. 180–187. https://doi.org/
10.1109/ICDE.1996.492104.
[Bhi+02] Manish Bhide et al. “Adaptive push-pull: Disseminating dynamic web data”.
In: IEEE Transactions on Computers 51.6 (2002), pp. 652–668.
[Bla+10] Roi Blanco et al. “Caching search engine results over incremental indices”. In:
Proceedings of the 33rd international ACM SIGIR conference on Research and
development in information retrieval. ACM, 2010, pp. 82–89. URL: http://dl.
acm.org/citation.cfm?id=1835466 (visited on 04/24/2015).
[BLT86] José A. Blakeley, Per-Åke Larson, and Frank Wm. Tompa. “Efficiently
Updating Materialized Views”. In: Proceedings of the 1986 ACM SIGMOD
International Conference on Management of Data, Washington, D.C., May 28–
30, 1986. Ed. by Carlo Zaniolo. ACM Press, 1986, pp. 61–71. https://doi.org/
10.1145/16894.16861.
[BLV11] Edward Bortnikov, Ronny Lempel, and Kolman Vornovitsky. “Caching for
Realtime Search”. In: Advances in Information Retrieval - 33rd European
Conference on IR Research, ECIR 2011, Dublin, Ireland, April 18–21, 2011.
Proceedings. Ed. by Paul D. Clough et al. Vol. 6611. Lecture Notes in
Computer Science. Springer, 2011, pp. 104–116. https://doi.org/10.1007/9783-642-20161-5_12.
[BM03] Andrei Broder and Michael Mitzenmacher. “Network Applications of Bloom
Filters: A Survey”. In: Internet Mathematics 1.4 (2003), pp. 485–509. URL:
http://projecteuclid.org/euclid.im/1109191032 (visited on 01/03/2015).
[Bon+12] Flavio Bonomi et al. “Fog computing and its role in the internet of things”.
In: Proceedings of the first edition of the MCC workshop on Mobile cloud
computing, MCC@SIGCOMM 2012, Helsinki, Finland, August 17, 2012. Ed.
by Mario Gerla and Dijiang Huang. ACM, 2012, pp. 13–16. https://doi.org/10.
1145/2342509.2342513.
[Bor+03] Christof Bornhövd et al. “DBCache: Middle-tier Database Caching for Highly
Scalable e-Business Architectures”. In: Proceedings of the 2003 ACM SIGMOD
International Conference on Management of Data, San Diego, California, USA,
June 9–12, 2003. Ed. by Alon Y. Halevy, Zachary G. Ives, and AnHai Doan.
ACM, 2003, p. 662. https://doi.org/10.1145/872757.872849.
[Bor+04] C. Bornhövd et al. “Adaptive database caching with DBCache”. In: Data
Engineering 27.2 (2004), pp. 11–18. URL: http://sipew.org/staff/bornhoevd/
IEEEBull’04.pdf (visited on 06/28/2012).
[BP95] Alexandros Biliris and Euthimios Panagos. “A High Performance Configurable
Storage Manager”. In: Proceedings of the Eleventh International Conference
on Data Engineering, March 6–10, 1995, Taipei, Taiwan. Ed. by Philip S. Yu
and Arbee L. P. Chen. IEEE Computer Society, 1995, pp. 35–43. https://doi.org/
10.1109/ICDE.1995.380412.
[BPV08] Rajkumar Buyya, Mukaddim Pathan, and Athena Vakali, eds. Content Delivery
Networks (Lecture Notes in Electrical Engineering). 2008th ed. Springer, Sept.
2008. ISBN: 9783540778868. URL: http://amazon.com/o/ASIN/3540778861/.
References
117
[BR02] Laura Bright and Louiqa Raschid. “Using Latency-Recency Profiles for Data
Delivery on the Web”. In: VLDB 2002, Proceedings of 28th International
Conference on Very Large Data Bases, August 20–23, 2002, Hong Kong, China.
Morgan Kaufmann, 2002, pp. 550–561. URL: http://www.vldb.org/conf/2002/
S16P01.pdf.
[Bre+99] Lee Breslau et al. “Web caching and Zipf-like distributions: Evidence and
implications”. In: INFOCOM’99. Eighteenth Annual Joint Conference of the
IEEE Computer and Communications Societies. Proceedings. IEEE. Vol. 1,
IEEE, IEEE, 1999, pp. 126–134. URL: http://ieeexplore.ieee.org/xpls/abs_all.
jsp?arnumber=749260 (visited on 01/03/2015).
[Bre00] Eric A. Brewer. Towards Robust Distributed Systems. 2000.
[Bre17] Eric Brewer. Spanner, TrueTime and the CAP Theorem. Tech. rep. 2017.
[Bro+13] Nathan Bronson et al.
“TAO: Facebook’s Distributed Data Store for
the Social Graph.”
In: USENIX Annual Technical Conference. 2013,
pp. 49–60. URL: http://dl.frz.ir/FREE/papers-we-love/datastores/tao-facebookdistributed-datastore.pdf (visited on 09/28/2014).
[Cam+10] Berkant Barla Cambazoglu et al. “A refreshing perspective of search engine
caching”. In: Proceedings of the 19th International Conference on World
Wide Web, WWW 2010, Raleigh, North Carolina, USA, April 26–30, 2010.
Ed. by Michael Rappa et al. ACM, 2010, pp. 181–190. https://doi.org/10.1145/
1772690.1772710.
[Cam16] Raymond Camden. Client-side data storage: keeping it local. First edition.
OCLC: ocn935079139. Beijing: O’Reilly, 2016. ISBN: 978-1-4919-3511-8.
[Can+01a] K. Selçuk Candan et al. “Enabling Dynamic Content Caching for DatabaseDriven Web Sites”. In" Proceedings of the 2001 ACM SIGMOD international
conference on Management of data, Santa Barbara, CA, USA, May 21–24,
2001. Ed. by Sharad Mehrotra and Timos K. Sellis. ACM, 2001, pp. 532–543.
https://doi.org/10.1145/375663.375736.
[Can+01b] K. Selçuk Candan et al. “Enabling Dynamic Content Caching for Databasedriven Web Sites”. In: SIGMOD. New York, NY, USA: ACM, 2001, pp. 532–
543. ISBN: 1-58113-332-4. https://doi.org/10.1145/375663.375736. URL: http://
doi.acm.org/10.1145/375663.375736 (visited on 10/04/2014).
[Car+91] Michael J. Carey et al. “Data caching tradeoffs in client-server DBMS
architectures”. In: Proceedings of the 1991 ACM SIGMOD International
Conference on Management of Data, Denver, Colorado, May 29–31, 1991. Ed.
by James Clifford and Roger King, ACM Press, 1991, pp. 357–366. https://doi.
org/10.1145/115790.115854.
[Car13] Josiah L. Carlson. Redis in Action. Greenwich, CT, USA: Manning Publications
Co., 2013. ISBN: 1617290858, 9781617290855.
[Cas+97] Miguel Castro et al. “HAC: hybrid adaptive caching for distributed storage
systems”. In: Proceedings of the Sixteenth ACM Symposium on Operating
System Principles, SOSP 1997, St. Malo, France, October 5–8, 1997. Ed. by
Michel Banâtre, Henry M. Levy, and William M. Waite. ACM, 1997, pp. 102–
115. https://doi.org/10.1145/268998.266666.
[Cat92] Vincent Cate. “Alex-a global filesystem”. In: Proceedings of the 1992 USENIX
File System Workshop. Citeseer, 1992, pp. 1–12.
[CBPS10] Bernadette Charron-Bost, Fernando Pedone, and André Schiper, eds. Replication: Theory and Practice. Vol. 5959. Lecture Notes in Computer Science.
Springer, 2010.
[CD13] Kristina Chodorow and Michael Dirolf. MongoDB - The Definitive Guide.
O’Reilly, 2013. ISBN: 978-1-449-38156-1. URL: http://www.oreilly.de/catalog/
9781449381561/index.html.
[CH16] Jeff Carpenter and Eben Hewitt. Cassandra: The Definitive Guide. “O’Reilly
Media, Inc.”, 2016.
118
5 Caching in Research and Industry
[Cha+08] Fay Chang et al. “Bigtable: A distributed storage system for structured data”.
In: ACM Transactions on Computer Systems (TOCS) 26.2 (2008), p. 4.
[Che+16] Tse-Hsun Chen et al. “CacheOptimizer: helping developers configure caching
frameworks for hibernate-based database-centric web applications”. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations
of Software Engineering, FSE 2016, Seattle, WA, USA, November 13–18, 2016.
Ed. by Thomas Zimmermann, Jane Cleland-Huang, and Zhendong Su, ACM,
2016, pp. 666–677. https://doi.org/10.1145/2950290.2950303.
[CI97] Pei Cao and Sandy Irani. “Cost-aware WWW Proxy Caching Algorithms”.
In: 1st USENIX Symposium on Internet Technologies and Systems, USITS’97,
Monterey, California, USA, December 8–11, 1997. USENIX, 1997. URL: http://
www.usenix.org/publications/library/proceedings/usits97/cao.html.
[Cid16] Asaf Cidon et al. “Cliffhanger: scaling performance cliffs in web memory
caches”. In: 13th USENIX Symposium on Networked Systems Design and
Implementation (NSDI 16). 2016, pp. 379–392.
[CK01] Edith Cohen and Haim Kaplan. “The Age Penalty and Its Effect on Cache
Performance”. In: 3rd USENIX Symposium on Internet Technologies and
Systems, USITS’01, San Francisco, California, USA, March 26–28, 2001. Ed.
by Tom Anderson. USENIX, 2001, pp. 73–84. URL: http://www.usenix.org/
events/usits01/cohen.html.
[CKR98] Edith Cohen, Balachander Krishnamurthy, and Jennifer Rexford. “Improving
End-to-End Performance of the Web Using Server Volumes and Proxy Filters”.
In: SIGCOMM. 1998, pp. 241–253. https://doi.org/10.1145/285237.285286.
[CL98] Pei Cao and Chengjie Liu. “Maintaining Strong Cache Consistency in the World
Wide Web”. In: IEEE Trans. Computers 47.4 (1998), pp. 445–457. https://doi.
org/10.1109/12.675713.
[CO82] Stefano Ceri and Susan S. Owicki. “On the Use of Optimistic Methods for
Concurrency Control in Distributed Databases”. In: Berkeley Workshop. 1982,
pp. 117–129.
[Coc] CockroachDB - the scalable, survivable, strongly-consistent SQL database.
https://github.com/cockroachdb/cockroach. 2017. URL: https://github.com/
cockroachdb/cockroach (visited on 02/17/2017).
[Coo+08] B. F. Cooper et al. “PNUTS: Yahoo!’s hosted data serving platform”. In:
PVLDB 1.2 (2008), pp. 1277–1288. URL: http://dl.acm.org/citation.cfm?id=
1454167 (visited on 09/12/2012).
[Cor+12] James C. Corbett et al.
“Spanner: Google’s Globally-Distributed
Database”. In: 10th USENIX Symposium on Operating Systems Design
and Implementation, OSDI 2012, Hollywood, CA, USA, October 8–10,
2012. Ed. by Chandu Thekkath and Amin Vahdat. USENIX Association,
2012, pp. 261–264. URL: https://www.usenix.org/conference/osdi12/technicalsessions/presentation/corbett.
[Cor+13] James C. Corbett et al. “Spanner: Google’s Globally Distributed Database”.
In: ACM Trans. Comput. Syst. 31.3 (2013), 8:1–8:22, 2013. https://doi.org/10.
1145/2491245.
[CRS99] Boris Chidlovskii, Claudia Roncancio, and Marie-Luise Schneider. “Semantic Cache Mechanism for Heterogeneous Web Querying”. In: Computer
Networks, 31(11–16) (1999), pp. 1347–1360. https://doi.org/10.1016/S13891286(99)00035-3.
[CZB99] Pei Cao, Jin Zhang, and Kevin Beach. “Active Cache: caching dynamic contents
on the Web”. In: Distributed Systems Engineering 6.1 (1999), pp. 43–50. https://
doi.org/10.1088/0967-1846/6/1/305.
[Dar+96] Shaul Dar et al. “Semantic Data Caching and Replacement”. In: VLDB’96,
Proceedings of 22th International Conference on Very Large Data Bases,
September 3–6, 1996, Mumbai (Bombay), India. Ed. by T. M. Vijayaraman
References
119
[Das+12]
[Dat+04]
[DeC+07]
[Dem+94]
[DeM09]
[Den96]
[Des+98]
[Dou+97]
[DST03]
[Dyn]
[ERR11]
[EWS12]
[FAK13]
et al. Morgan Kaufmann, 1996, pp. 330–341. URL: http://www.vldb.org/conf/
1996/P330.PDF.
Shirshanka Das et al. “All aboard the Databus!: Linkedin’s scalable consistent
change data capture platform”. In: Proceedings of the Third ACM Symposium
on Cloud Computing. ACM, 2012, p. 18. URL: http://dl.acm.org/citation.cfm?
id=2391247 (visited on 11/26/2016).
Anindya Datta et al. “Proxy-based acceleration of dynamically generated
content on the world wide web: An approach and implementation”. In:
ACM Trans. Database Syst. 29.2 (2004), pp. 403–443. https://doi.org/10.1145/
1005566.1005571.
G. DeCandia et al. “Dynamo: amazon’s highly available key-value store”. In:
ACM SOSP. Vol. 14. 17. ACM. 2007, pp. 205–220. URL: http://dl.acm.org/
citation.cfm?id=1294281 (visited on 09/12/2012).
Alan J. Demers et al. “The Bayou Architecture: Support for Data Sharing
Among Mobile Users”. In: First Workshop on Mobile Computing Systems and
Applications, WMCSA 1994, Santa Cruz, CA, USA, December 8–9, 1994. IEEE
Computer Society, 1994, pp. 2–7. https://doi.org/10.1109/WMCSA.1994.37.
Linda DeMichiel. “JSR 317: Java Persistence 2.0”. In: Java Community
Process, Tech. Rep (2009).
Shuang Deng. “Empirical model of WWW document arrivals at access link”.
In: Communications, 1996. ICC’96, Conference Record, Converging Technologies for Tomorrow’s Applications. 1996 IEEE International Conference on.
Vol. 3. IEEE. 1996, pp. 1797–1802.
Prasad Deshpande et al. “Caching Multidimensional Queries Using Chunks”.
In: SIGMOD 1998, Proceedings ACM SIGMOD International Conference on
Management of Data, June 2–4, 1998, Seattle, Washington, USA. Ed. by
Laura M. Haas and Ashutosh Tiwary. ACM Press, 1998, pp. 259–270. https://
doi.org/10.1145/276304.276328.
Fred Douglis et al. “Rate of Change and other Metrics: a Live Study of
the World Wide Web”. In: 1st USENIX Symposium on Internet Technologies
and Systems, USITS’97, Monterey, California, USA, December 8–11, 1997.
USENIX, 1997. URL: http://www.usenix.org/publications/library/proceedings/
usits97/douglis_rate.html.
Venkata Duvvuri, Prashant J. Shenoy, and Renu Tewari. “Adaptive Leases: A
Strong Consistency Mechanism for the World Wide Web”. In: IEEE Trans.
Knowl. Data Eng. 15.5 (2003), pp. 1266–1276. https://doi.org/10.1109/TKDE.
2003.1232277.
DynamoDB.
http://docs.aws.amazon.com/amazondynamodb/latest/
developerguide/Introduction.html.
(Accessed on 05/20/2017). 2017.
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/
URL :
Introduction.html (visited on 01/13/2017).
Mohamed El-Refaey and Bhaskar Prasad Rimal. “Grid, soa and cloud computing: On-demand computing models”. In: Computational and Data Grids:
Principles, Applications and Design: Principles, Applications and Design
(2011), p. 45.
Robert Escriva, Bernard Wong, and Emin GÃijn Sirer. “HyperDex: A
distributed, searchable key-value store”. In: ACM SIGCOMM Computer
Communication Review 42.4 (2012), pp. 25–36. URL: http://dl.acm.org/citation.
cfm?id=2377681 (visited on 01/03/2015).
Bin Fan, David G. Andersen, and Michael Kaminsky. “MemC3: Compact
and Concurrent MemCache with Dumber Caching and Smarter Hashing”. In:
Proceedings of the 10th USENIX Symposium on Networked Systems Design and
Implementation, NSDI 2013, Lombard, IL, USA, April 2–5, 2013. Ed. by Nick
Feamster and Jeffrey C. Mogul. USENIX Association, 2013, pp. 371–384. URL:
https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/fan.
120
5 Caching in Research and Industry
[Fan+00] Li Fan et al. “Summary cache: a scalable wide-area web cache sharing
protocol”. In: IEEE/ACM TON 8.3 (2000), pp. 281–293. URL: http://dl.acm.
org/citation.cfm?id=343572 (visited on 10/04/2014).
[Fan+14] Bin Fan et al. “Cuckoo Filter: Practically Better Than Bloom”. en. In:
ACM Press, 2014, pp. 75–88. ISBN: 978-1-4503-3279-8. https://doi.org/10.
1145/2674005.2674994. URL: http://dl.acm.org/citation.cfm?doid=2674005.
2674994 (visited on 01/03/2015).
[FC92] Michael J. Franklin and Michael J. Carey. “Client-Server Caching Revisited”.
In: Distributed Object Management, Papers from the International Workshop
on Distributed Object Management (IWDOM), Edmonton, Alberta, Canada,
August 19–21, 1992. Ed. by M. Tamer Özsu, Umeshwar Dayal, and Patrick
Valduriez. Morgan Kaufmann, 1992, pp. 57–78.
[FCL97] Michael J. Franklin, Michael J. Carey, and Miron Livny. “Transactional ClientServer Cache Consistency: Alternatives and Performance”. In: ACM Trans.
Database Syst. 22.3 (1997), pp. 315–363. https://doi.org/10.1145/261124.
261125.
[Fel+99] Anja Feldmann et al. “Performance of Web Proxy Caching in Heterogeneous
Bandwidth Environments”. In: Proceedings IEEE INFOCOM ’99, The Conference on Computer Communications, Eighteenth Annual Joint Conference of the
IEEE Computer and Communications Societies, The Future Is Now, New York,
NY, USA, March 21–25, 1999. IEEE, 1999, pp. 107–116.
[FFM04] Michael J. Freedman, Eric Freudenthal, and David Mazieres. “Democratizing Content Publication with Coral.” In: NSDI. Vol. 4. 2004, pp. 18–18.
URL : https://www.usenix.org/legacy/events/nsdi04/tech/full_papers/freedman/
freedman_html/ (visited on 09/28/2014).
[Fie+99] R. Fielding et al. “RFC 2616: Hypertext Transfer ProtocolâHTTP/1.1, 1999”.
In: URL http://www.rfc.net/rfc2616.html (1999).
[Fit04] Brad Fitzpatrick. “Distributed caching with Memcached”. In: Linux journal
2004.124 (2004), p. 5.
[FK09] Daniela Florescu and Donald Kossmann. “Rethinking cost and performance of
database systems”. In: SIGMOD Record 38.1 (2009), pp. 43–48. https://doi.
org/10.1145/1558334.1558339.
[FR14] Roy Fielding and J Reschke. RFC 7234: Hypertext Transfer Protocol
(HTTP/1.1): Caching. Tech. rep. IETF, 2014.
[Fre10] Michael J. Freedman. “Experiences with CoralCDN: A Five-Year Operational
View.” In: NSDI. 2010, pp. 95–110. URL: http://static.usenix.org/legacy/events/
nsdi10/tech/full_papers/freedman.pdf (visited on 01/03/2015).
[Gar+08] Charles Garrod et al. “Scalable query result caching for web applications”. In:
Proceedings of the VLDB Endowment 1.1 (2008), pp. 550–561. URL: http://dl.
acm.org/citation.cfm?id=1453917 (visited on 04/24/2015).
[GC89] Cary G. Gray and David R. Cheriton. “Leases: An Efficient Fault-Tolerant
Mechanism for Distributed File Cache Consistency”. In: Proceedings of
the Twelfth ACM Symposium on Operating System Principles, SOSP 1989,
The Wigwam, Litchfield Park, Arizona, USA, December 3–6, 1989. Ed. by
Gregory R. Andrews. ACM, 1989, pp. 202–210. https://doi.org/10.1145/74850.
74870.
[GD11] Sanjay Ghemawat and Jeff Dean. LevelDB. http://leveldb.org, 2011. URL: http://
leveldb.org.
[Gel00] Erol Gelenbe. System performance evaluation: methodologies and applications.
CRC press, 2000.
[Ges+15] Felix Gessert et al. “The Cache Sketch: Revisiting Expiration-based Caching in
the Age of Cloud Data Management”. In: Datenbanksysteme fÃijr Business,
Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme”. GI, 2015.
References
121
[Ges+17] Felix Gessert et al. “Quaestor: Query Web Caching for Database-as-a-Service
Providers”. In: Proceedings of the VLDB Endowment (2017).
[Ges19] Felix Gessert. “Low Latency for Cloud Data Management”. PhD thesis.
University of Hamburg, Germany, 2019. URL: http://ediss.sub.uni-hamburg.de/
volltexte/2019/9541/.
[GGL03] S. Ghemawat, H. Gobioff, and S. T. Leung. “The Google file system”. In: ACM
SIGOPS Operating Systems Review. Vol. 37. 2003, pp. 29–43. URL: http://dl.
acm.org/citation.cfm?id=945450 (visited on 09/12/2012).
[GLS11] Wojciech Golab, Xiaozhou Li, and Mehul A. Shah. “Analyzing consistency
properties for fun and profit”. In: ACM PODC. ACM, 2011, pp. 197–206. URL:
http://dl.acm.org/citation.cfm?id=1993834 (visited on 09/28/2014).
[GPS16] Rachid Guerraoui, Matej Pavlovic, and Dragos-Adrian Seredinschi. “Incremental Consistency Guarantees for Replicated Objects”. In: 12th USENIX
Symposium on Operating Systems Design and Implementation, OSDI 2016,
Savannah, GA, USA, November 2–4, 2016. Ed. by Kimberly Keeton and
Timothy Roscoe. USENIX Association, 2016, pp. 169–184. URL: https://www.
usenix.org/conference/osdi16/technical-sessions/presentation/guerraoui.
[Gra+81] Jim Gray et al. “A Straw Man Analysis of the Probability of Waiting and
Deadlock in a Database System”. In: Berkeley Workshop. 1981, p. 125.
[Gri13] Ilya Grigorik. High performance browser networking. English. [S.l.]: O’Reilly
Media, 2013. ISBN: 1-4493-4476-3 978-1-4493-4476-4. URL: https://books.
google.de/books?id=tf-AAAAQBAJ.
[GS96] James Gwertzman and Margo I Seltzer. “World Wide Web Cache Consistency.”
In: USENIX ATC. 1996, pp. 141–152.
[Han87] Eric N. Hanson. “A Performance Analysis of View Materialization Strategies”.
In: Proceedings of the Association for Computing Machinery Special Interest
Group on Management of Data 1987 Annual Conference, San Francisco,
California, May 27–29, 1987. Ed. by Umeshwar Dayal and Irving L. Traiger.
ACM Press, 1987, pp. 440–453. https://doi.org/10.1145/38713.38759.
[Hba] HBase. http://hbase.apache.org/. (Accessed on 05/25/2017). 2017. URL: http://
hbase.apache.org/ (visited on 07/16/2014).
https://h2o.examp1e.net/configure/http2_directives.html.
[Hev] H2O Server.
(Accessed on 05/26/2017). 2016. URL: https://h2o.examp1e.net/configure/
http2_directives.html (visited on 01/20/2017).
[HL08] R. T. Hurley and B. Y. Li. “A Performance Investigation of Web Caching
Architectures”. In: Proceedings of the 2008 C3S2E Conference. C3S2E ’08.
Montreal, Quebec, Canada: ACM, 2008, pp. 205–213. ISBN: 978-1-60558-1019. https://doi.org/10.1145/1370256.1370291. URL: http://doi.acm.org/10.1145/
1370256.1370291.
[How+88] John H. Howard et al. “Scale and Performance in a Distributed File System”.
In: ACM Trans. Comput. Syst. 6.1 (1988), pp. 51–81. https://doi.org/10.1145/
35037.35059.
[HS16] Stephan Hochhaus and Manuel Schoebel. Meteor in action. Manning Publ.,
2016.
[Hua+13] Qi Huang et al. “An analysis of Facebook photo caching”. In: SOSP.
2013, pp. 167–181. URL: http://dl.acm.org/citation.cfm?id=2522722 (visited on
09/28/2014).
[IC98] Arun Iyengar and Jim Challenger. Data Update Propagation: A Method for
Determining How Changes to Underlying Data Affect Cached Objects on the
Web. Tech. rep. Technical Report RC 21093 (94368), IBM Research Division,
Yorktown Heights, NY, 1998.
[Kal+02] Panos Kalnis et al. “An adaptive peer-to-peer network for distributed caching
of OLAP results”. In: Proceedings of the 2002 ACM SIGMOD International
Conference on Management of Data, Madison, Wisconsin, June 3–6, 2002. Ed.
122
5 Caching in Research and Industry
[Kam17]
[KB96]
[KFD00]
[KK94]
[Kle17]
[KLM97]
[KM06]
[KP01]
[KR01]
[KR81]
[Kra+09]
[Kra+13]
[Kul+14]
[KV14]
by Michael J. Franklin, Bongki Moon, and Anastassia Ailamaki. ACM, 2002,
pp. 25–36. https://doi.org/10.1145/564691.564695.
Poul-Henning Kamp. Varnish HTTP Cache. https://varnish-cache.org/.
(Accessed on 04/30/2017). 2017. URL: https://varnish-cache.org/ (visited on
01/26/2017).
Arthur M. Keller and Julie Basu. “A Predicate-based Caching Scheme for
Client-Server Database Architectures”. In: VLDB J. 5.1 (1996), pp. 35–47.
https://doi.org/10.1007/s007780050014.
Donald Kossmann, Michael J. Franklin, and Gerhard Drasch. “Cache investment: integrating query optimization and distributed data placement”. In: ACM
Trans. Database Syst. 25.4 (2000), pp. 517–558. URL: http://portal.acm.org/
citation.cfm?id=377674.377677.
Alfons Kemper and Donald Kossmann. “Dual-Buffering Strategies in Object
Bases”. In: VLDB’94, Proceedings of 20th International Conference on Very
Large Data Bases, September 12–15, 1994, Santiago de Chile, Chile. Ed. by
Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo. Morgan Kaufmann, 1994,
pp. 427–438. URL: http://www.vldb.org/conf/1994/P427.PDF.
Martin Kleppmann. Designing Data-Intensive Applications. English. 1 edition.
O’Reilly Media, Jan. 2017. ISBN: 978-1-4493-7332-0.
Tom M. Kroeger, Darrell D. E. Long, and Jeffrey C. Mogul. “Exploring
the Bounds of Web Latency Reduction from Caching and Prefetching”. In:
1st USENIX Symposium on Internet Technologies and Systems, USITS’97,
Monterey, California, USA, December 8–11, 1997. USENIX, 1997. URL: http://
www.usenix.org/publications/library/proceedings/usits97/kroeger.html.
Adam Kirsch and Michael Mitzenmacher. “Less hashing, same performance:
Building a better Bloom filter”. In: AlgorithmsâESA 2006. Springer, 2006,
pp. 456–467. URL: http://link.springer.com/chapter/10.1007/11841036_42 (visited on 01/03/2015).
Panos Kalnis and Dimitris Papadias. “Proxy-Server Architectures for OLAP”.
In: Proceedings of the 2001 ACM SIGMOD international conference on
Management of data, Santa Barbara, CA, USA, May 21–24, 2001. Ed. by
Sharad Mehrotra and Timos K. Sellis. ACM, 2001, pp. 367–378. https://doi.
org/10.1145/375663.375712.
B. Krishnamurthy and J. Rexford.
“Web Protocols and Practice,
HTTP/1.1, Networking Protocols, Caching, and Traffic Measurement”. In:
Recherche 67 (2001), p. 02. URL: http://www.lavoisier.fr/livre/notice.asp?id=
O3OWRLAROSSOWB (visited on 06/30/2012).
H. T. Kung and J. T. Robinson. “On optimistic methods for concurrency control”. In: ACM Transactions on Database Systems (TODS) 6.2 (1981), pp. 213–
226. URL: http://dl.acm.org/citation.cfm?id=319567 (visited on 11/19/2012).
Tim Kraska et al. “Consistency rationing in the cloud: pay only when it
matters”. In: Proceedings of the VLDB Endowment 2.1 (2009), pp. 253–264.
URL : http://dl.acm.org/citation.cfm?id=1687657 (visited on 11/28/2016).
Tim Kraska et al. “MDCC: Multi-data center consistency”. In: EuroSys. ACM,
2013, pp. 113–126. URL: http://dl.acm.org/citation.cfm?id=2465363 (visited on
04/15/2014).
S Kulkarni et al. Logical physical clocks and consistent snapshots in globally
distributed databases. 2014.
Pradeeban Kathiravelu and Luís Veiga. “An Adaptive Distributed Simulator
for Cloud and MapReduce Algorithms and Architectures”. In: Proceedings of
the 7th IEEE/ACM International Conference on Utility and Cloud Computing,
UCC 2014, London, United Kingdom, December 8–11, 2014. IEEE Computer
Society, 2014, pp. 79–88. https://doi.org/10.1109/UCC.2014.16.
References
123
[KW97] Balachander Krishnamurthy and Craig E. Wills. “Study of Piggyback Cache
Validation for Proxy Caches in the World Wide Web”. In: 1st USENIX
Symposium on Internet Technologies and Systems, USITS’97, Monterey, California, USA, December 8–11, 1997. USENIX, 1997. URL: http://www.usenix.
org/publications/library/proceedings/usits97/krishnamurthy.html.
[KW98] Balachander Krishnamurthy and Craig E. Wills. “Piggyback Server Invalidation
for Proxy Cache Coherency”. In: Computer Networks 30.1-7 (1998), pp. 185–
193. https://doi.org/10.1016/S0169-7552(98)00033-6.
[KW99] Balachander Krishnamurthy and Craig E. Wills. “Proxy Cache Coherency and
Replacement - Towards a More Complete Picture”. In: Proceedings of the 19th
International Conference on Distributed Computing Systems, Austin, TX, USA,
May 31 - June 4, 1999. IEEE Computer Society, 1999, pp. 332–339. https://doi.
org/10.1109/ICDCS.1999.776535.
[Lab+09] Alexandros Labrinidis et al. “Caching and Materialization for Web Databases”.
In: Foundations and Trends in Databases 2.3 (2009), pp. 169–266. https://doi.
org/10.1561/1900000005.
[Lak+16] Sarath Lakshman et al. “Nitro: A fast, scalable in-memory storage engine for
nosql global secondary index”. In: PVLDB 9.13 (2016), pp. 1413–1424. URL:
http://www.vldb.org/pvldb/vol9/p1413-lakshman.pdf.
[Lam01] Leslie Lamport. “Paxos made simple”. In: ACM Sigact News 32.4 (2001),
pp. 18–25. URL: http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/past/03F/
notes/paxos-simple.pdf (visited on 07/16/2014).
[Lam78] Leslie Lamport. “Time, Clocks, and the Ordering of Events in a Distributed
System”. In: Commun. ACM 21.7 (1978), pp. 558–565. https://doi.org/10.1145/
359545.359563.
[Lam98] Leslie Lamport. “The part-time parliament”. In: ACM Transactions on
Computer Systems (TOCS) 16.2 (1998), pp. 133–169.
[LC97] Chengjie Liu and Pei Cao. “Maintaining Strong Cache Consistency in the
World-Wide Web”. In: Proceedings of the 17th International Conference
on Distributed Computing Systems, Baltimore, MD, USA, May 27–30, 1997.
IEEE Computer Society, 1997, pp. 12–21. https://doi.org/10.1109/ICDCS.1997.
597804.
[LC99] Dongwon Lee and Wesley W. Chu. “Semantic Caching via Query Matching for
Web Sources”. In: Proceedings of the 1999 ACM CIKM International Conference on Information and Knowledge Management, Kansas City, Missouri, USA,
November 2–6, 1999. ACM, 1999, pp. 77–85. https://doi.org/10.1145/319950.
319960.
[Lec09] Jens LechtenbÃűrger. “Two-Phase Commit Protocol”. English. In: Encyclopedia of Database Systems. Ed. by LING LIU and M.TAMER ÃZSU. Springer
US, 2009, pp. 3209–3213. ISBN: 978-0-387-35544-3. https://doi.org/10.1007/
978-0-387-39940-9_2.
[Lee+15] Collin Lee et al. “Implementing linearizability at large scale and low latency”.
In: Proceedings of the 25th Symposium on Operating Systems Principles, SOSP
2015, Monterey, CA, USA, October 4–7, 2015. ACM, 2015, pp. 71–86. https://
doi.org/10.1145/2815400.2815416.
[LGZ04] Per-Åke Larson, Jonathan Goldstein, and Jingren Zhou. “MTCache: Transparent Mid-Tier Database Caching in SQL Server”. In: Proceedings of the
20th International Conference on Data Engineering, ICDE 2004, 30 March 2 April 2004, Boston, MA, USA. Ed. by Z. Meral Özsoyoglu and Stanley B.
Zdonik. IEEE Computer Society, 2004, pp. 177–188. https://doi.org/10.1109/
ICDE.2004.1319994.
[Li+12] Cheng Li et al. “Making Geo-Replicated Systems Fast as Possible, Consistent when Necessary”. In: 10th USENIX Symposium on Operating Systems
Design and Implementation, OSDI 2012, Hollywood, CA, USA, October 8–
124
5 Caching in Research and Industry
[Li+14]
[LL00]
[Llo+11]
[Llo+13]
[LM10]
[LN01]
[Lou+01]
[LR00]
[LR01a]
[LR01b]
[LS88b]
[Lu+15]
10, 2012. Ed. by Chandu Thekkath and Amin Vahdat. USENIX Association,
2012, pp. 265–278. URL: https://www.usenix.org/conference/osdi12/technicalsessions/presentation/li.
Cheng Li et al. “Automating the Choice of Consistency Levels in Replicated
Systems”. In: 2014 USENIX Annual Technical Conference, USENIX ATC ’14,
Philadelphia, PA, USA, June 19–20, 2014. Ed. by Garth Gibson and Nickolai
Zeldovich. USENIX Association, 2014, pp. 281–292. URL: https://www.usenix.
org/conference/atc14/technical-sessions/presentation/li_cheng_2.
F Thomson Leighton and Daniel M Lewin. Global hosting system. US Patent
6,108,703. 2000
Wyatt Lloyd et al. “Don’t settle for eventual: scalable causal consistency for
wide-area storage with COPS”. In: Proceedings of the Twenty-Third ACM
Symposium on Operating Systems Principles. ACM, 2011, pp. 401–416. URL:
http://dl.acm.org/citation.cfm?id=2043593 (visited on 01/03/2015).
Wyatt Lloyd et al. “Stronger semantics for low-latency geo-replicated storage”.
In: Presented as part of the 10th USENIX Symposium on Networked Systems
Design and Implementation (NSDI 13). 2013, pp. 313–328.
Avinash Lakshman and Prashant Malik. “Cassandra: a decentralized structured storage system”. In: ACM SIGOPS Operating Systems Review 44.2
(2010), pp. 35–40. URL: http://dl.acm.org/citation.cfm?id=1773922 (visited on
04/15/2014).
Qiong Luo and Jeffrey F. Naughton. “Form-Based Proxy Caching for DatabaseBacked Web Sites”. In: VLDB 2001, Proceedings of 27th International
Conference on Very Large Data Bases, September 11–14, 2001, Roma, Italy.
Ed. by Peter M. G. Apers et al. Morgan Kaufmann, 2001, pp. 191–200. URL:
http://www.vldb.org/conf/2001/P191.pdf.
Thanasis Loukopoulos et al. “Active Caching of On-Line-Analytical-Processing
Queries in WWW proxies”. In: Proceedings of the 2001 International
Conference on Parallel Processing, ICPP 2002, 3–7 September 2001, Valencia,
Spain. Ed. by Lionel M. Ni and Mateo Valero. IEEE Computer Society, 2001,
pp. 419–426. https://doi.org/10.1109/ICPP.2001.952088.
Alexandros Labrinidis and Nick Roussopoulos. “WebView Materialization”.
In: Proceedings of the 2000 ACM SIGMOD International Conference on
Management of Data, May 16–18, 2000, Dallas, Texas, USA. Ed. by Weidong
Chen, Jeffrey F. Naughton, and Philip A. Bernstein. ACM, 2000, pp. 367–378.
https://doi.org/10.1145/342009.335430.
Alexandros Labrinidis and Nick Roussopoulos. “Adaptive WebView Materialization”. In: WebDB. 2001, pp. 85–90.
Alexandros Labrinidis and Nick Roussopoulos. “Update Propagation Strategies
for Improving the Quality of Data on the Web”. In: VLDB 2001, Proceedings
of 27th International Conference on Very Large Data Bases, September 11–14,
2001, Roma, Italy. Ed. by Peter M. G. Apers et al. Morgan Kaufmann, 2001,
pp. 391–400. URL: http://www.vldb.org/conf/2001/P391.pdf.
Barbara Liskov and Liuba Shrira. “Promises: Linguistic Support for Efficient
Asynchronous Procedure Calls in Distributed Systems”. In: Proceedings of
the ACM SIGPLAN’88 Conference on Programming Language Design and
Implementation (PLDI), Atlanta, Georgia, USA, June 22–24, 1988. Ed. by
Richard L. Wexelblat. ACM, 1988, pp. 260–267. https://doi.org/10.1145/53990.
54016.
Haonan Lu et al. “Existential consistency: measuring and understanding
consistency at Facebook”. In: Proceedings of the 25th Symposium on Operating
Systems Principles, SOSP 2015, Monterey, CA, USA, October 4–7, 2015. Ed. by
Ethan L. Miller and Steven Hand. ACM, 2015, pp. 295–310. https://doi.org/10.
1145/2815400.2815426.
References
125
[Luc14] Gregory Robert Luck. The Java Community Process(SM) Program - JSRs: Java
Specification Requests - detail JSR# 107. https://www.jcp.org/en/jsr/detail?id=
107, 2014. (Accessed on 04/30/2017).
[Luo+02] Qiong Luo et al. “Middle-tier database caching for e-business”. In: Proceedings
of the 2002 ACM SIGMOD International Conference on Management of Data,
Madison, Wisconsin, June 3–6, 2002. Ed. by Michael J. Franklin, Bongki Moon,
and Anastassia Ailamaki. ACM, 2002, pp. 600–611.
[LW84] Ming-Yee Lai and W. Kevin Wilkinson. “Distributed Transaction Management
in Jasmin”. In: Tenth International Conference on Very Large Data Bases,
August 27–31, 1984, Singapore, Proceedings. Ed. by Umeshwar Dayal, Gunter
Schlageter, and Lim Huat Seng. Morgan Kaufmann, 1984, pp. 466–470. URL:
http://www.vldb.org/conf/1984/P466.PDF.
[Lwe10] Bernhard Lwenstein. Benchmarking of Middleware Systems: Evaluating
and Comparing the Performance and Scalability of XVSM (MozartSpaces),
JavaSpaces (GigaSpaces XAP) and J2EE (JBoss AS). VDM Verlag, 2010.
[Mah+13] Hatem A. Mahmoud et al. “Low-Latency Multi-Datacenter Databases using
Replicated Commit”. In: PVLDB 6.9 (2013), pp. 661–672. URL: http://www.
vldb.org/pvldb/vol6/p661-mahmoud.pdf.
[Mal16] Ivano Malavolta.
“Beyond native apps: web technologies to the rescue!(keynote)”. In: Proceedings of the 1st International Workshop on Mobile
Development. ACM, 2016, pp. 1–2.
[MC+98] Evangelos P Markatos, Catherine E Chronaki, et al. “A top-10 approach to
prefetching on the web”. In: Proceedings of INET. Vol. 98. 1998, pp. 276–290.
[McM17] Patrick McManus. Using Immutable Caching To Speed Up The Web. https://
hacks.mozilla.org/2017/01/using-immutable-caching-to-speed-up-the-web/.
(Accessed on 04/30/2017). 2017. URL: https://hacks.mozilla.org/2017/01/
using-immutable-caching-to-speed-upthe-web/ (visited on 01/28/2017).
[Mit02] M. Mitzenmacher. “Compressed bloom filters”. In: IEEE/ACM Transactions on
Networking (TON) 10.5 (2002), pp. 604–612. URL: http://dl.acm.org/citation.
cfm?id=581878 (visited on 11/15/2012).
[MJM08] Yanhua Mao, Flavio Paiva Junqueira, and Keith Marzullo. “Mencius: Building
Efficient Replicated State Machine for WANs”. In: 8th USENIX Symposium
on Operating Systems Design and Implementation, OSDI 2008, December 8–
10, 2008, San Diego, California, USA, Proceedings. Ed. by Richard Draves and
Robbert van Renesse. USENIX Association, 2008, pp. 369–384. URL: http://
www.usenix.org/events/osdi08/tech/full_papers/mao/mao.pdf.
[Mog+97] Jeffrey C. Mogul et al. “Potential benefits of delta encoding and data compression for HTTP”. In: Proceedings of the ACM SIGCOMM 1997 Conference
on Applications, Technologies, Architectures, and Protocols for Computer
Communication, September 14–18, 1997, Cannes, France. Ed. by Christophe
Diot et al. ACM, 1997, pp. 181–194. https://doi.org/10.1145/263105.263162.
[Mog94] Jeffrey C. Mogul. “Recovery in spritely NFS”. In: Computing Systems 7.2
(1994), pp. 201–262. URL: http://www.usenix.org/publications/compsystems/
1994/spr_mogul.pdf.
[MU05] Michael Mitzenmacher and Eli Upfal. Probability and computing - randomized
algorithms and probabilistic analysis. Cambridge University Press, 2005. ISBN:
978-0-521-83540-4.
[Nag04] S. V. Nagaraj. Web caching and its applications. Vol. 772. Springer, 2004.
URL : http://books.google.de/books?hl=de&lr=&id=UgFhOl2lF0oC&oi=fnd&
pg=PR11&dq=web+caching+and+its+applications&ots=X0Ow-cvXMH&
sig=eNu7MDyfbGLKMGxwv6MZpZlyo6c (visited on 06/28/2012).
[Net+16] Ravi Netravali et al. “Polaris: Faster page loads using fine-grained dependency
tracking”. In: 13th USENIX Symposium on Networked Systems Design and
Implementation (NSDI 16). USENIX Association, 2016.
126
5 Caching in Research and Industry
[Nis+13] Rajesh Nishtala et al. “Scaling Memcache at Facebook”. In: NSDI. USENIX
Association, 2013, pp. 385–398.
[Not10] Mark Nottingham. “RFC 5861 - HTTP Cache-Control Extensions for Stale
Content”. In: (2010).
[NWO88] Michael N. Nelson, Brent B. Welch, and John K. Ousterhout. “Caching in
the Sprite Network File System”. In: ACM Trans. Comput. Syst. 6.1 (1988),
pp. 134–154. https://doi.org/10.1145/35037.42183.
[ON16] Kazuho Oku and Mark Nottingham. Cache Digests for HTTP/2. https://tools.
ietf.org/html/draft-ietf-httpbis-cache-digest-01. (Accessed on 06/05/2017).
2016. URL: https://tools.ietf.org/html/draft-ietf-httpbis-cache-digest-01 (visited on 01/20/2017).
[OO13] Diego Ongaro and John Ousterhout. “In search of an understandable consensus
algorithm”. In: Draft of October 7 (2013). URL: http://bestfuturepractice.org/
mirror/https/ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.
pdf (visited on 07/16/2014).
[Ora] Oracle Result Cache. https://docs.oracle.com/database/121/TGDBA/tune_
result_cache.htm#TGDBA616. (Accessed on 06/05/2017). 2017. URL: https://
docs.oracle.com/database/121/TGDBA/tune_result_cache.htm#TGDBA616
(visited on 01/20/2017).
[Ous+11] John K. Ousterhout et al. “The case for RAMCloud”. In: Commun. ACM 54.7
(2011), pp. 121–130. https://doi.org/10.1145/1965724.1965751.
[PB03] Stefan Podlipnig and László Böszörményi. “A survey of Web cache replacement strategies”. In: ACM Comput. Surv. 35.4 (2003), pp. 374–398. https://doi.
org/10.1145/954339.954341.
[PB08] Mukaddim Pathan and Rajkumar Buyya. “A Taxonomy of CDNs”. English. In:
Content Delivery Networks. Ed. by Rajkumar Buyya, Mukaddim Pathan, and
Athena Vakali. Vol. 9. Lecture Notes Electrical Engineering. Springer Berlin
Heidelberg, 2008, pp. 33–77. ISBN: 978-3-540-77886-8. http://dx.doi.org/10.
1007/978-3-540-77887-5_2.
[PD10] Daniel Peng and Frank Dabek. “Large-scale Incremental Processing Using
Distributed Transactions and Notifications”. In: OSDI. Vol. 10. 2010, pp. 1–
15. URL: https://www.usenix.org/legacy/events/osdi10/tech/full_papers/Peng.
pdf?origin=publication_detail (visited on 01/03/2015).
[PH03] Sunil Patro and Y. Charlie Hu. “Transparent Query Caching in Peer-to-Peer
Overlay Networks”. In: 17th International Parallel and Distributed Processing
Symposium (IPDPS 2003), 22–26 April 2003, Nice, France, CD-ROM/Abstracts
Proceedings. IEEE Computer Society, 2003, p. 32. https://doi.org/10.1109/
IPDPS.2003.1213112.
[PM96] Venkata N. Padmanabhan and Jeffrey C. Mogul. “Using predictive prefetching
to improve World Wide Web latency”. In: Computer Communication Review
26.3 (1996), pp. 22–36. https://doi.org/10.1145/235160.235164.
[Por09] Ely Porat. “An Optimal Bloom Filter Replacement Based on Matrix Solving”.
In: Computer Science - Theory and Applications, Fourth International Computer Science Symposium in Russia, CSR 2009, Novosibirsk, Russia, August
18–23, 2009. Proceedings. Ed. by Anna E. Frid et al. Vol. 5675. Lecture Notes
in Computer Science. Springer, 2009, pp. 263–273. https://doi.org/10.1007/
978-3-642-03351-3_25.
[Pos] PostgreSQL: Documentation: 9.6: High Availability, Load Balancing, and
Replication. https://www.postgresql.org/docs/9.6/static/high-availability.html.
(Accessed on 07/28/2017). 2017. URL: https://www.postgresql.org/docs/9.6/
static/high-availability.html (visited on 02/04/2017).
[PPR05] Anna Pagh, Rasmus Pagh, and S. Srinivasa Rao. “An optimal Bloom filter
replacement”. In: Proceedings of the Sixteenth Annual ACM-SIAM Symposium
on Discrete Algorithms, SODA 2005, Vancouver, British Columbia, Canada,
References
127
[PSS09]
[Pu+16]
[Qia+13]
[Rab+03]
[Rae+13]
[Rah88]
[Raj+15]
[Ria]
[RL04]
[RLZ06]
[Rom97]
[RS03]
[Rus03]
[Sat+09]
[Sch16]
[Sha+15]
[Shi11]
January 23–25, 2005. SIAM, 2005, pp. 823–829. URL: http://dl.acm.org/
citation.cfm?id=1070432.1070548.
Felix Putze, Peter Sanders, and Johannes Singler. “Cache-, hash-, and spaceefficient bloom filters”. In: ACM Journal of Experimental Algorithmics 14
(2009). https://doi.org/10.1145/1498698.1594230.
Qifan Pu et al. “FairRide: near-optimal, fair cache sharing”. In: 13th USENIX
Symposium on Networked Systems Design and Implementation (NSDI 16).
2016, pp. 393–406.
Lin Qiao et al. “On brewing fresh espresso: LinkedIn’s distributed data
serving platform”. In: Proceedings of the 2013 international conference on
Management of data. ACM, 2013, pp. 1135–1146. URL: http://dl.acm.org/
citation.cfm?id=2465298 (visited on 09/28/2014).
Michael Rabinovich et al. “Moving Edge-Side Includes to the Real Edge
- the Clients”. In: 4th USENIX Symposium on Internet Technologies and
Systems, USITS’03, Seattle, Washington, USA, March 26–28, 2003. Ed. by
Steven D. Gribble. USENIX, 2003. URL: http://www.usenix.org/events/usits03/
tech/rabinovich.html.
Ian Rae et al. “Online, asynchronous schema change in F1”. In: Proceedings
of the VLDB Endowment 6.11 (2013), pp. 1045–1056. URL: http://dl.acm.org/
citation.cfm?id=2536230 (visited on 01/03/2015).
Erhard Rahm. “Optimistische Synchronisationskonzepte in zentralisierten und
verteilten Datenbanksystemen/Concepts for optimistic concurrency control in
centralized and distributed database systems”. In: it-Information Technology
30.1 (1988), pp. 28–47.
Pethuru Raj et al. High-Performance Big-Data Analytics - Computing Systems
and Approaches. Computer Communications and Networks. Springer, 2015.
ISBN : 978-3-319-20743-8. https://doi.org/10.1007/978-3-319-20744-5.
Riak. http://basho.com/products/. (Accessed on 05/25/2017). 2017. URL: http://
basho.com/products/ (visited on 01/13/2017).
Lakshmish Ramaswamy and Ling Liu. “An Expiration Age-Based Document Placement Scheme for Cooperative Web Caching”. In: IEEE Trans.
Knowl. Data Eng. 16.5 (2004), pp. 585–600. https://doi.org/10.1109/TKDE.
2004.1277819.
Lakshmish Ramaswamy, Ling Liu, and Jianjun Zhang. “Efficient Formation
of Edge Cache Groups for Dynamic Content Delivery”. In: 26th IEEE
International Conference on Distributed Computing Systems (ICDCS 2006), 4–
7 July 2006, Lisboa, Portugal. IEEE Computer Society, 2006, p. 43. https://doi.
org/10.1109/ICDCS.2006.33.
Steven Roman. Introduction to coding and information theory. Undergraduate
texts in mathematics. Springer, 1997. ISBN: 978-0-387-94704-4.
M. Rabinovich and O. Spatscheck. “Web caching and replication”. In: SIGMOD
Record 32.4 (2003), p. 107. URL: http://www.sigmod.org/publications/sigmodrecord/0312/20.WebCachingReplication2.pdf (visited on 06/28/2012).
C Russell. “Java data objects (jdo) specification jsr-12”. In: Sun Microsystems
(2003).
Mahadev Satyanarayanan et al. “The Case for VM-Based Cloudlets in Mobile
Computing”. In: IEEE Pervasive Computing 8.4 (2009), pp. 14–23. https://doi.
org/10.1109/MPRV.2009.82.
Peter Schuller. “Manhattan, our real-time, multi-tenant distributed database for
Twitter scale”. In: Twitter Blog (2016).
Artyom Sharov et al. “Take me to your leader! Online Optimization of
Distributed Storage Configurations”. In: PVLDB 8.12 (2015), pp. 1490–1501.
URL : http://www.vldb.org/pvldb/vol8/p1490-shraer.pdf.
Rada Shirkova. “Materialized Views”. In: Foundations and TrendsÂő in
Databases 4.4 (2011), pp. 295–405. ISSN: 1931-7883, 1931-7891. https://doi.
128
5 Caching in Research and Industry
[Shu+13]
[Sov+11]
[SS94]
[Stö+15]
[SW14]
[TC03]
[Ter+13]
[Tho+12]
[Tor+17]
[Tot09]
[TRL12]
[Tsi+01]
[Vak06]
[VM14]
org/10.1561/1900000020. URL: http://www.nowpublishers.com/product.aspx?
product=DBS&doi=1900000020 (visited on 01/03/2015).
Jeff Shute et al. “F1: A distributed SQL database that scales”. In: Proceedings
of the VLDB Endowment 6.11 (2013). 00004, pp. 1068–1079.
Yair Sovran et al. “Transactional storage for geo-replicated systems”. In:
Proceedings of the Twenty-Third ACM Symposium on Operating Systems
Principles. ACM, 2011, pp. 385–400.
Mukesh Singhal and Niranjan G Shivaratri. Advanced concepts in operating
systems. McGraw-Hill, Inc., 1994.
Uta Störl et al. “Schemaless NoSQL Data Stores - Object-NoSQL Mappers to
the Rescue?” In: Datenbanksysteme für Business, Technologie und Web (BTW),
16. Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme”
(DBIS), 4.-6.3.2015 in Hamburg, Germany. Proceedings. Ed. by Thomas Seidl
et al. Vol. 241. LNI. GI, 2015, pp. 579–599. URL: http://subs.emis.de/LNI/
Proceedings/Proceedings241/article13.html (visited on 03/10/2015).
Ivan Stojmenovic and Sheng Wen. “The Fog Computing Paradigm: Scenarios
and Security Issues”. In: Proceedings of the 2014 Federated Conference on
Computer Science and Information Systems, Warsaw, Poland, September 7–10,
2014. Ed. by Maria Ganzha, Leszek A. Maciaszek, and Marcin Paprzycki. 2014,
pp. 1–8. https://doi.org/10.15439/2014F503.
Xueyan Tang and Samuel T. Chanson. “Coordinated Management of Cascaded
Caches for Efficient Content Distribution”. In: Proceedings of the 19th
International Conference on Data Engineering, March 5–8, 2003, Bangalore,
India. Ed. by Umeshwar Dayal, Krithi Ramamritham, and T. M. Vijayaraman.
IEEE Computer Society, 2003, pp. 37–48. https://doi.org/10.1109/ICDE.2003.
1260780.
Douglas B. Terry et al. “Consistency-based service level agreements for
cloud storage”. In: ACM SIGOPS 24th Symposium on Operating Systems
Principles, SOSP ’13, Farmington, PA, USA, November 3–6, 2013. Ed. by
Michael Kaminsky and Mike Dahlin. ACM, 2013, pp. 309–324. https://doi.org/
10.1145/2517349.2522731.
Alexander Thomson et al. “Calvin: fast distributed transactions for partitioned
database systems”. In: Proceedings of the 2012 ACM SIGMOD International
Conference on Management of Data. ACM, 2012, pp. 1–12.
Alexandre Torres et al. “Twenty years of object-relational mapping: A survey
on patterns, solutions, and their implications on application design”. In:
Information and Software Technology 82 (2017), pp. 1–18.
Alexander Totok. Modern Internet Services. Alexander Totok, 2009.
Sasu Tarkoma, Christian Esteve Rothenberg, and Eemil Lagerspetz. “Theory
and Practice of Bloom Filters for Distributed Systems”. In: IEEE Communications Surveys & Tutorials 14.1 (2012), pp. 131–155. ISSN: 1553-877X.
https://doi.org/10.1109/SURV.2011.031611.00024. URL: http://ieeexplore.ieee.
org/document/5751342/ (visited on 11/25/2016).
Mark Tsimelzon et al. “ESI language specification 1.0”. In: Akamai Technologies, Inc. Cambridge, MA, USA, Oracle Corporation, Redwood City, CA, USA
(2001), pp. 1–0.
Athena Vakali. Web Data Management Practices: Emerging Techniques and
Technologies: Emerging Techniques and Technologies. IGI Global, 2006.
Piet Van Mieghem. Performance analysis of complex networks and systems.
Cambridge University Press, 2014. URL: http://books.google.de/books?hl=de&
lr=&id=lc3aWG0rL_MC&oi=fnd&pg=PR11&dq=mieghem+performance&
ots=ohyJ3Qz2Lz&sig=1MOrNY0vHG-D4pDsf_DygD_3vDY (visited on
10/03/2014).
References
129
[VV16] Paolo Viotti and Marko VukoliÄ. “Consistency in Non-Transactional Distributed Storage Systems”. en. In: ACM Computing Surveys 49.1 (June 2016),
pp. 1–34. ISSN: 03600300. https://doi.org/10.1145/2926965. URL: http://dl.
acm.org/citation.cfm?doid=2911992.2926965 (visited on 11/25/2016).
[Wan99] J. Wang. “A survey of web caching schemes for the internet”. In: ACM
SIGCOMM Computer Communication Review 29.5 (1999), pp. 36–46. URL:
http://dl.acm.org/citation.cfm?id=505701 (visited on 06/28/2012).
[WF11] Patrick Wendell and Michael J. Freedman. “Going viral: flash crowds in an open
CDN”. In: Proceedings of the 2011 ACM SIGCOMM conference on Internet
measurement conference. ACM, 2011, pp. 549–558. URL: http://dl.acm.org/
citation.cfm?id=2068867 (visited on 01/03/2015).
[WGR20] Wolfram Wingerath, Felix Gessert, and Norbert Ritter. “InvaliDB: Scalable
Push-Based Real-Time Queries on Top of Pull-Based Databases”. In: 36th
IEEE International Conference on Data Engineering, ICDE 2020, Dallas,
Texas, April 20–24, 2020, 2020.
[WGW+20] Wolfram Wingerath, Felix Gessert, Erik Witt, et al. “Speed Kit: A Polyglot &
GDPR-Compliant Approach For Caching Personalized Content”. In: 36th IEEE
International Conference on Data Engineering, ICDE 2020, Dallas, Texas,
April 20–24, 2020, 2020.
[Wil+05] Adepele Williams et al. “Web workload characterization: Ten years later”. In:
Web content delivery. Springer, 2005, pp. 3–21.
[Win18] Wolfram Wingerath. “Rethinking Web Performance with Service Workers: 30
Man-Years of Research in a 30-Minute Read”. In: Baqend Tech Blog (2018).
URL : https://medium.com/p/2638196fa60a.
[Win19] Wolfram Wingerath. “Scalable Push-Based Real-Time Queries on Top of PullBased Databases”. PhD thesis. University of Hamburg, 2019. URL: https://
invalidb.info/thesis.
[WKW16] Xiao Sophia Wang, Arvind Krishnamurthy, and David Wetherall. “Speeding up web page loads with Shandian”.
In: 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16).
2016, pp. 109–122. URL: https://www.usenix.org/conference/nsdi16/technicalsessions/presentation/wang (visited on 11/25/2016).
[WN90] W. Kevin Wilkinson and Marie-Anne Neimat. “Maintaining Consistency of
Client-Cached Data”. In: 16th International Conference on Very Large Data
Bases, August 13–16, 1990, Brisbane, Queensland, Australia, Proceedings. Ed.
by Dennis McLeod, Ron Sacks-Davis, and Hans-Jörg Schek. Morgan Kaufmann, 1990, pp. 122–133. URL: http://www.vldb.org/conf/1990/P122.PDF.
[Wor94] Kurt Jeffery Worrell. “Invalidation in Large Scale Network Object Caches”. In:
(1994).
[Wu+13] Zhe Wu et al. “SPANStore: cost-effective geo-replicated storage spanning
multiple cloud services”. In: ACM SIGOPS 24th Symposium on Operating
Systems Principles, SOSP ’13, Farmington, PA, USA, November 3–6, 2013. Ed.
by Michael Kaminsky and Mike Dahlin. ACM, 2013, pp. 292–308. https://doi.
org/10.1145/2517349.2522730.
[WV02] G. Weikum and G. Vossen. Transactional information systems. Series in Data
Management Systems. Morgan Kaufmann Pub, 2002. ISBN: 9781558605084.
http://books.google.de/books?hl=de&lr=&id=wV5Ran71zNoC&oi=
URL :
fnd&pg=PP2&dq=transactional+information+systems&ots=PgJAaN7R5X&
sig=Iya4r9DiFhmb_wWgOI5QMuxm6zU (visited on 06/28/2012).
[Xu+14] Yuehai Xu et al. “Characterizing Facebook’s Memcached Workload”. In: IEEE
Internet Computing 18.2 (2014), pp. 41–49.
130
5 Caching in Research and Industry
[Yin+98] Jian Yin et al. “Using Leases to Support Server-Driven Consistency in LargeScale Systems”. In: Proceedings of the 18th International Conference on
Distributed Computing Systems, Amsterdam, The Netherlands, May 26–29,
1998. IEEE Computer Society, 1998, pp. 285–294. https://doi.org/10.1109/
ICDCS.1998.679726.
[Yin+99] Jian Yin et al. “Volume Leases for Consistency in Large-Scale Systems”. In:
IEEE Trans. Knowl. Data Eng. 11.4 (1999), pp. 563–576. https://doi.org/10.
1109/69.790806.
[Zak+16] Victor Zakhary et al. “DB-Risk: The Game of Global Database Placement”. In:
Proceedings of the 2016 International Conference on Management of Data,
SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01,
2016. Ed. by Fatma Özcan, Georgia Koutrika, and Sam Madden. ACM, 2016,
pp. 2185–2188. https://doi.org/10.1145/2882903.2899405.
[IET15] IETF. “RFC 7540 - Hypertext Transfer Protocol Version 2 (HTTP/2)”. In:
(2015).
[ÖV11] M.T. Özsu and P. Valduriez. Principles of distributed database systems.
Springer, 2011.
[ÖVU98] M Tamer Özsu, Kaladhar Voruganti, and Ronald C Unrau. “An Asynchronous
Avoidance-Based Cache Consistency Algorithm for Client Caching DBMSs.”
In: VLDB. Vol. 98. Citeseer. 1998, pp. 440–451.
[ÖDV92] M. Tamer Özsu, Umeshwar Dayal, and Patrick Valduriez. “An Introduction to
Distributed Object Management”. In: Distributed Object Management, Papers
from the International Workshop on Distributed Object Management (IWDOM),
Edmonton, Alberta, Canada, August 19–21, 1992. Ed. by M. Tamer Özsu,
Umeshwar Dayal, and Patrick Valduriez. Morgan Kaufmann, 1992, pp. 1–24.
Chapter 6
Transactional Semantics for Globally
Distributed Applications
In this chapter, we will review both concepts and systems on transaction processing
for cloud data management and NoSQL databases. We will give a short discussion
of each approach and summarize the differences among them.
6.1 Latency vs. Distributed Transaction Processing
Transactions are one of the central concepts in data management, as they solve
the problem of keeping data correct and consistent under highly concurrent access.
While the adoption of distributed NoSQL databases first lead to a decline in the support of transactions, numerous systems have started to support transactions again,
often with relaxed guarantees (e.g., Megastore [Bak+11], G-Store [DAEA10], ElasTras [DAEA13], Cloud SQL Server [Ber+11], Spanner [Cor+12], F1 [Shu+13], Percolator [PD10], Baqend [Ges19], MDCC [Kra+13], TAPIR [Zha+15b], CloudTPS
[WPC12], Cherry Garcia [DFR15a], FaRMville [Dra+15], Omid [GÃ+14], RAMP
[Bai+14c], Walter [Sov+11], Calvin [Tho+12], H-Store/VoltDB [Kal+08]). The core
challenge is that serializability—like strong consistency—enforces a difficult tradeoff between high availability and correctness in distributed systems [Bai+13c].
The gold standard for transactions is ACID [WV02, HR83]:
Atomicity. A transaction must either commit or abort as a complete unit.
Atomicity is implemented through recovery, rollbacks, and atomic commitment
protocols.
Consistency. A transaction takes the database from one consistent state to
another. Consistency is implemented through constraint checking and requires
transactions to be logically consistent in themselves.
Isolation. The concurrent and interleaved execution of operations leaves transactions isolated, so that they do not affect each other. Isolation is implemented
through concurrency control algorithms.
© Springer Nature Switzerland AG 2020
F. Gessert et al., Fast and Scalable Cloud Data Management,
https://doi.org/10.1007/978-3-030-43506-6_6
131
132
6 Transactional Semantics for Globally Distributed Applications
Durability. The effects of committed transactions are persistent even in the face
of failures. Durability is implemented through logging, recovery, and replication.
A comprehensive overview of centralized and distributed transactions is given by
Agrawal et al. [ADE12], Weikum and Vossen [WV02], Öszu and Valduriez [ÖV11],
Bernstein and Newcomer [BN09], and Sippu and Soisalon-Soininen [SSS15].
6.1.1 Distributed Transaction Architectures
A transaction is a finite sequence of read and write operations. The interleaved
operations of a set of transactions is called a history and any prefix of it is
a schedule [WV02]. To provide isolation, concurrency control algorithms only
allow schedules that do not violate isolation. The strongest level of isolation is
serializability. However, many concurrency control protocols allow certain update
anomalies for performance reasons, forming different isolation levels of relaxed
transaction isolation.
Update anomalies describe undesired behavior caused by transaction interleaving [ALO00, Ady99]. A dirty write overwrites data from an uncommitted
transaction. With a dirty read, stale data is exposed. A lost update describes a write
that does not become visible, due to two transactions reading the same object version
for a subsequent write. A non-repeatable read occurs if data read by an in-flight
transaction was concurrently overwritten. A phantom read describes a predicatebased read that becomes invalid due to concurrent transactions writing data that
matches the query predicate. Read and Write Skew are two anomalies caused by
transactions operating on different, isolated database snapshots.
The strongest isolation level of serializability can also be refined into different
classes of histories, depending on defined correctness criteria [WV02, p. 109]. In
practice, the most relevant class is conflict-serializability (CSR), and its subclass
commit order-preserving conflict serializability (COCSR). CSR and COCSR are
efficiently decidable and easy to reason about from a developer’s perspective.
Figure 6.1 gives an overview of typical distributed transaction architectures as
originally described by Gray [GL06] and Liskov [Lis+99] and still used in most
systems [Bak+11, Cor+13, EWS13]. Distributed databases are partitioned into
shards, with each shard being replicated for fault tolerance. Therefore, an atomic
commitment protocol is required to enforce an all-or-nothing decision across
all shards. Common protocols such as two-phase commit (2PC) [Lec09], threephase commit (3PC) [SS83], and Paxos Commit [GL06] have to make a trade-off
between availability and correctness: any correct atomic commitment protocol
blocks under some network partitions. The replication protocol is required to keep
replicas in sync, so that staleness does not interfere with the concurrency control
algorithm. Traditionally, the replication protocol has to ensure linearizability (e.g.,
through Paxos [Lam98], Virtual Synchrony [BJ87], and Viewstamped Replication
[OL88]) but it has been shown that an appropriate concurrency control scheme
6.1 Latency vs. Distributed Transaction Processing
133
Fig. 6.1 Distributed transaction architecture consisting of an atomic commitment protocol,
concurrency control and a replication protocol
can potentially tolerate weaker consistency of the underlying replication protocol
without compromising isolation [Zha+15b].
6.1.1.1
Concurrency Control
Concurrency control schemes can be grouped into pessimistic and optimistic
approaches. Pessimistic schemes proactively prevent isolation violations during
transaction execution. Optimistic schemes do not interfere with the transaction
execution and validate the absence of violations at commit time. The major
concurrency control algorithms are:
Lock-based protocols. For operations that would create cyclic conflicts, mutual
exclusion can be achieved through locking. According to the two-phase locking
(2PL) theorem, any execution of transactions that use 2PL is serializable
[Esw+76]. The granularity and types of locks vary in different protocols, as
well as the specifics of 2PL [WV02, ÖV11, BN09]. All 2PL-based protocols
without preclaiming (acquiring locks at transaction begin) suffer from potential
deadlocks or external aborts1 [Gra+81, Gra+76]. Preclaiming on the other hand
1 Follwing
the terminology of Bailis et al. [Bai+13c] we refer to external aborts as transaction
rollbacks caused by a system’s implementation (e.g., for deadlock prevention) whereas internal
aborts are triggered by the transaction itself (e.g., as a rollback operation).
134
6 Transactional Semantics for Globally Distributed Applications
is not applicable if accessed objects are unknown in advance but determined
through queries, reads, or user interactions.
Non-Locking Pessimistic Protocols. Timestamp Ordering (TO) [Ber99]
enforces serializability by ordering conflicting operations by the begin timestamp
of transactions. The main downside of TO schedulers is that they produce
only a small subset of CSR schedules and therefore cause unnecessary aborts.
Serialization Graph Testing (SGT) [Cas81] is another non-locking pessimistic
scheme that constructs the conflict graph and prevents it from becoming cyclic.
The internal state of SGT can become very large as it is non-trivial to determine
when old transactions’ information can be safely discarded.
Multi-version Concurrency Control (MVCC). A straightforward improvement
of pessimistic protocols is to decouple concurrent reads by executing them on an
immutable snapshot. TO, SGT, and 2PL can easily be extended to incorporate
multi-versioning [WV02]. Due to reduced conflict rates, MVCC schedulers such
as Serializable Snapshot Isolation [CRF08, Fek+05, PG12] are popular among
RDBMSs.
Optimistic Concurrency Control (OCC). Optimistic schedulers operate across
three transaction phases. The principle idea is to allow all transactional operations
and to apply rollbacks at commit time when serializability would be violated
[KR81].
1. Read Phase. In the read phase, the transaction performs its operations,
including reads, writes, and queries. Writes are not applied to the database
but buffered until commit, typically in the client.
2. Validation Phase. The validation phase is executed as a critical section and
ensures that the transaction can safely commit. The type of validation depends
on the optimistic protocol. In Forward-Oriented Optimistic Concurrency
Control (FOCC) the committing transaction’s write set is validated against
the read set of all parallel transactions that are still in the read phase [Här84].
In Backward-Oriented Optimistic Concurrency (BOCC), the committing
transaction’s read set is validated against the write set of all transactions that
completed while the committing transaction was in the read phase. To resolve
a conflict, two strategies are possible:
• Kill/Broadcast-OCC: transactions that are running and preventing the
committing transaction from completing are aborted.
• Die: the committing transaction aborts.
In BOCC, only the Die strategy is applicable, as conflicting transactions
are already committed. FOCC permits both resolution strategies. However,
FOCC has two important drawbacks. First, it needs to consider reads of active
transactions, which prevents serving them from caches or replicas. Second,
the FOCC validation has to block concurrent reads and thus strongly limits
concurrency and performance.
6.1 Latency vs. Distributed Transaction Processing
135
3. Write Phase. If validation was successful, the transaction’s changes are
persisted to the database and made visible. Usually, this also includes writing
recovery information into logs to ensure durability.
The problem of pessimistic concurrency control is that preventing violations
of serializability requires transactional reads and writes to be forwarded to the
scheduler. In replicated or cached systems, this defeats the purpose of data
distribution. This also applies to MVCC as it requires local tracking of transactionspecific versions which cannot be offloaded to replicas or caches without including
them in the concurrency control algorithm. Therefore, in highly distributed systems,
optimistic transactions are advantageous, as they allow to combine client-local
processing of reads and writes with a global commit decision [Bak+11, DAEA10,
DAEA13, Cor+12, Shu+13, DFR15a, Dra+15]. Stonebraker et al. [Sto+07] identify
“locking-based concurrency control mechanisms” as a substantial performance
bottleneck and one of the relics of System R that hinder the progress of database
systems.
6.1.1.2
Impact of Latency On Transaction Success
Compared to pessimistic mechanisms, optimistic concurrency control offers the
advantage of never blocking running transactions due to lock conflicts. The
downside of optimistic transactions is that they can lead to transaction aborts since
this is the only way of handling cyclic read/write conflicts [KR81].
Locking strategies suffer from deadlocks. Let A be a random variable that
describes the outcome of a transaction. Gray et al. [Gra+81] showed that the
probability of aborts P (A = 1) increases with the second power of the number T of
parallel transactions and with the fourth power of transaction duration D [BN09]:
A(w) =
0
if w = commit
1
if w = abort
P (A = 1) ∼ D 4 and P (A = 1) ∼ T 2
(6.1)
(6.2)
Deadlocks are resolved by rollbacks. Thus, the more high-latency reads are
involved in a pessimistic transaction, the higher the abort probability. In general,
optimistic transactions are superior for read-intensive workloads while pessimistic
transactions are more appropriate for write-intensive workloads [WV02].
In a simplified model, Franaszek et al. [FRT92] showed the quadratic effect of
optimistic transactions that states that the abort probability is k 2 /N, where k is the
number of objects accessed in transactions and N the size of the database [Tho98].
This model assumes preclaiming, an even access probability across all objects, and
that every read object is also written. In that case, if the first transaction accesses
n objects and the second m, the probability of accessing at least one object in both
transactions is:
136
6 Transactional Semantics for Globally Distributed Applications
N −n
m
P (n, m) = 1 − N
≈ 1 − (1 −
m
n m
nm
) ≈
N
N
(6.3)
Thus, if all transactions read and write k objects, the abort probability for two
2
concurrent transactions is P (k, k) = kN , the quadratic effect. However, this model
has many limitations, most importantly the assumption of preclaiming, the missing
distinction between reads and writes, and the discarded influence of latency.
6.1.1.3
Example of High-Latency Transactions
To illustrate the role of latency in transaction processing, we briefly discuss an
example application use case. In the web, high latency is ubiquitous, especially for
applications employing the DBaaS and BaaS model. Transactions requiring clientserver round-trips are therefore usually avoided through heuristics, compensations,
and other non-transactional workarounds.
As an example consider a checkout process in a booking system, e.g., for an
airline or a theatre. A transaction would proceed in two steps:
1. The available seats are read from the database and shipped over a high-latency
network to the end user.
2. The end user performs a selection of seats in the frontend and sends a booking
or reservation request (i.e., a write) to the system, back over the high-latency
network.
This use case is difficult to implement with lock-based concurrency control, as
applying read locks in step 1 would cause very high deadlock probabilities and block
resources in the database system. In practice, this use case is solved by decoupling
step 1 and step 2 into two unrelated transactions [SF12]. If step 2 cannot be applied
due to a violation of isolation (i.e., seats were concurrently booked) the transaction
is rolled back, and the user is presented with an error. This solution is effectively
an optimistic transaction implemented in the application layer. Even a database
system with native optimistic concurrency control could not prevent these errors.
Furthermore, for security reasons, a database transaction API cannot be exposed to
end users, but only to the server-side business logic tier.
6.1.1.4
Challenges
In summary, high-latency environments have a detrimental effect on transaction
abort rates in both pessimistic and optimistic concurrency control algorithms.
Providing the transaction logic in an application-independent and client-accessible
way would be preferable for modern web applications, but transaction APIs are traditionally designed for three-tier applications and do not support end users directly
6.2 Entity Group Transactions
137
executing transactions. However, this type of access simplifies the development of
data-driven web applications and is required for Backend-as-a-Service (BaaS) BaaS
architectures.
6.2 Entity Group Transactions
Approaches for distributed transactions can be distinguished by their scope and the
degree to which they exploit data locality. Megastore [Bak+11] made the concept of
entity groups popular that define a set of records that can be accessed in the same
transactional context. Megastore’s transaction protocol suffers from low throughput
per entity group as discussed in the previous chapter.
In G-Store [DAEA10], entity groups (termed key groups) are created dynamically by the system as opposed to statically through schema-based definitions as in
Megastore. Each group has a dedicated master that runs the transactions in order
to avoid cross-node coordination. Ownership of a group can be transferred to a
different master using a protocol similar to 2PC. G-Store assumes a stable mapping
of records to groups, as otherwise many migrations are required to run transactions.
The master uses optimistic concurrency to run transactions locally on a single group.
Microsoft’s Cloud SQL Server [Ber+11] is also based on entity groups, which are
defined through a partition key. Unlike the primary key, a partition key is not unique
and identifies a group of records that can be updated in single transactions. A similar
concept is employed in Cassandra, Twitter’s Manhattan, Amazon DynamoDB, and
Microsoft Azure Table Services [Cal+11, LM10] to enable local sorting or multirecord atomic updates. By introducing the partition key, the concurrency control
protocol of Microsoft SQL Server can remain unchanged and still serve multi-tenant
workloads, as long as the data per partition key does not exceed the limits of a single
database node.
ElasTras [DAEA13] is a DBaaS architecture that builds on entity groups and
optimistic concurrency per group managed by an owning transaction manager.
The central assumption is that either each tenant is so small that data fits into a
single partition or that larger databases can be split into independent entity groups.
ElasTras employs the mini-transactions concept by Aguilera et al. [Agu+07] to
support transactions across nodes for management operations like schema changes.
ElasTras supports elasticity through a live-migration protocol (Albatross [Das+11])
that iteratively copies entity groups to new nodes in a multi-step process. ElasTras’
largest practical downside is that it assumes completely static entity groups, which
is unrealistic assumption and therefore prohibitive for many real-world applications
[Cor+12].
138
6 Transactional Semantics for Globally Distributed Applications
6.3 Multi-Shard Transactions
As reviewed in the previous chapter, Spanner [Cor+12], MDCC [Kra+13], CockroachDB [Coc], and F1 [Shu+13] implement transactions on top of eager georeplication by trading correctness and fault tolerance against increased latency,
whereas Walter [Sov+11] relaxes isolation to increase the efficiency of georeplication.
FaRMville is a multi-shard transaction approach that was proposed by Dragojevic et al. [Dra+15]. The design itself is based on DRAM memory and RDMA
(Remote Direct Memory Access) for very low latency. RAM is made persistent
through per-rack batteries for uninterrupted power supply. The transaction protocol
uses optimistic transactions over distributed shard servers. To this end, the write set
is locked by a coordinator executing the commit procedure. The versions of the read
set are then validated for freshness and changes are persisted to a transaction log
and each individual shard. Using the high-performance hardware setup, FaRMville
achieves 4.5 million TPC-C new order transactions per second.2
TAPIR (Transactional Application Protocol for Inconsistent Replication)
[Zha+15b] is based on the observation that replication and transaction protocols
typically do the same work twice when enforcing a strict temporal order. The
authors propose a consensus-based replication protocol that does not enforce
ordering unless explicitly necessary. TAPIR only uses a single consistent operation:
the prepare message of the 2PC protocol. All other operations are potentially
inconsistent. TAPIR achieves strict serializability using optimistic multi-version
timestamp ordering based on loosely synchronized clocks, where the validation
happens on read and write sets at commit time. The authors show that commit
latency can be reduced by 50% compared to consistent replication protocols.
TAPIR assigns transaction timestamps in clients, but assumes a low clock drift for
low abort rates. This makes the approach prohibitive for web-based use cases where
browsers and mobile devices can exhibit arbitrary clock drift [Aki15, Aki16].
Baqend [Ges19] bears some similarity to FaRMville, but follows a very different
design goal: while FaRMville optimizes intra-data center latency for transactions
executed from application servers, Baqend is designed for remote web clients executing the transactions to support the Backend-as-as-Service model. The motivating
idea is similar, though, as Baqend minimizes abort rates through caching and
FaRMville minimizes them by use of low-latency storage hardware within a data
center.
2 The
achieved transaction throughput is above the highest-ranking TPC-C result at that time, but
below the performance of the coordination-free approach by Bailis et al. [Bai+14a].
6.4 Client-Coordinated Transactions
139
6.4 Client-Coordinated Transactions
Percolator [PD10], Omid [GÃ+14], Baqend [Ges19], and the Cherry Garcia library
[DFR15a] are approaches for extending NoSQL databases with ACID transactions
using client coordination. While Omid and Percolator only address BigTable-style
systems, Cherry Garcia and Baqend support heterogeneous data stores.
Google published the design of its real-time web crawler and search index Percolator [PD10]. Percolator is implemented as an external protocol on top of BigTable.
It uses several metadata columns to implement a locking protocol with snapshot
isolation guarantees. A client-coordinated 2PC enables multi-key transactions using
a timestamp service for transaction ordering. Percolator’s protocol is designed for
high write throughput instead of low latency reads in order to accommodate massive
incremental updates to the search index: latency is reported to be in the order of
minutes. The client coordination, multi-round-trip commits and writes, and the lack
of a deadlock detection protocol make it unsuitable for access across high-latency
WANs.
Omid [GÃ+14] provides snapshot isolation for transactions with a lock-free
middleware for multi-version concurrency control on top of a slightly modified
HBase. It relies on a central Transaction Status Oracle (SO) (similar to the earlier
ReTSO work [JRY11]) for assigning begin and commit timestamps to transactions
and to perform a snapshot isolation validation at commit time. Omid is designed
for application servers, where status information of the SO can be replicated into
the servers to avoid most of the round-trips. For a highly distributed web scenario,
however, relying on a single centralized SO limits scalability and incurs expensive
wide-area round-trips for distant application servers.
In his PhD thesis, Dey proposes the Cherry Garcia library [Dey15] for transactions across heterogeneous cloud data stores. The library requires the data store
to support strong consistency, multi-versioning, and compare-and-swap updates
(e.g., as in Microsoft Azure Storage [Cal+11]). Similar to Percolator [PD10] and
ReTSO [JRY11], the transaction protocol identifies read sets based on transaction
begin timestamps and write sets based on transaction commit timestamps, with the
metadata maintained in the respective data stores [DFR15a]. For the generation
of sequentially ordered transaction timestamps, Cherry Garcia either requires a
TrueTime-like API [Cor+12] with error bounds or a centralized timestamp oracle
[GÃ+14]. In the two-phase transaction commit of Cherry Garcia, the client checks
for any write-write and read-write conflicts and makes uncommitted data visible
to other transactions. Cherry Garcia is not well-suited for low-latency, as a read
potentially requires multiple round-trips to determine the latest valid version
suitable for a read, thus increasing the probability of transaction aborts during
validation.
Neither Percolator, Omid, Cherry Garcia, nor Baqend modify the underlying
database system. However, the first three of these approaches assume that the
client coordinating the transaction is a server in a three-tier application. Unlike
Baqend, they are not suited for web and mobile clients participating in transactions,
140
6 Transactional Semantics for Globally Distributed Applications
since the latency overhead would be prohibitive for starting transactions, reading
and writing, as well as coordinating the commit. Baqend’s DCAT approach for
distributed transactions addresses this problem by caching reads, buffering writes,
and only contacting the server for commits. Also, DCAT does not burden the
primary database system with maintenance of transactional metadata, but instead
employs a fast transaction validation and commits using a coordination service.
RAMP (Read Atomic Multi-Partition) by Bailis et al. [Bai+14c] also realizes
client-coordinated transactions, but only offers a weak isolation level (read atomic)
in order to be always available, even under network partitions. A coordination-free
execution ensures that a transaction cannot be blocked by other transactions and
will commit, if the system partition of each accessed object can be reached [Bai15].
While they are highly scalable, minimize server communication, and are guaranteed
to commit, though, RAMP transactions do not prevent a number of anomalies that
are often assumed by developers (e.g., lost updates [Bai+14c, p. 9]).
6.5 Middleware-Coordinated Transactions
An alternative to embedding transaction processing in the database system or the
involved clients is to provide a transactional middleware that accepts transactions
from applications and executes them over non-transactional database systems.
CloudTPS [WPC12] is a transaction middleware for web applications. It supports
cross-shard transactions using a two-level architecture. In order to avoid a bottleneck
through a single coordinator, CloudTPS employs Local Transaction Managers
(LTMs) that manage mutually disjoint partitions of the underlying database. Isolation is implemented through timestamp ordering [WV02]. Each LTM executes a
sub-transaction of the global transaction and ensures that local commits are properly
ordered. A 2PC executed by a designated LTM over all other participating LTMs
ensures atomicity of the global commit. Transactions are executed non-interactively
in the middleware and have to be predefined at each LTM as a Java function. All
keys accessed in a transaction have to be declared at transaction begin, so that the
responsible LTMs are known in advance. As timestamp ordering is susceptible to
conflicts, transactions in CloudTPS have to be short-lived and only access a limited
set of keys (excluding range and predicate queries). Instead of persisting each write
to the underlying storage system, LTMs hold the data independently, distributed
through consistent hashing and replicated across multiple LTMs. Periodically, data
is persisted to the storage system.
Xi et al. [Xie+15] proposed a scheme to effectively combine pessimistic and
optimistic concurrency control algorithms. Their system Callas groups transactions
by performance characteristics and applies the most appropriate concurrency control
mechanism to each. This is enabled by a two-tiered protocol that applies locking
across groups and arbitrary schemes within a group of similar characteristics.
6.6 Deterministic Transactions
141
Deuteronomy [Lev+15] follows the idea of separating data storage (data component, DC) and transaction management (transaction component, TC) and relies on
heterogeneous database systems. The authors demonstrate that building on a highperformance key-value store, a throughput of over 6M operations per second can
be achieved on scale-up hardware with an appropriate TC. Scalability, however, is
limited to the threads of the underlying NUMA (Non-Uniform Memory Access)
machines. Therefore, Deuteronomy is not ideally suited for scale-out architectures.
Hekaton [Dia+13], the storage engine of Microsoft SQL Server [Gra97], is
another example for the wide-spread use of optimistic transactions in the industry.
The authors introduce a new multi-version, optimistic concurrency control scheme
for serializability that is optimized for OLTP workloads in main memory. Besides
the validation of the read set, Hekaton also validates commit dependencies introduced by concurrent operations during the validation phase. While this optimization
increases concurrency and hence throughput, it also introduces cascading aborts.
6.6 Deterministic Transactions
H-Store [Kal+08] and its commercial successor VoltDB [SW13] are horizontally
scalable main-memory RDBMSs. Sometimes, this new class of scale-out relational
databases is referred to as NewSQL [Gro+13]. Other examples of the NewSQL
movement are Clustrix [Clu], a MySQL-compatible, scalable RDBMS, and NuoDB
[Nuo], an RDBMS built on top of a distributed key-value store.
VoltDB is based on eager master-slave replication and shards data via
application-defined columns (similar to MongoDB). Transactions are defined at
deployment time as stored procedures written in Java or SQL. Each shard has a
Single Partition Initiator (SPI) that works off a transaction queue for that partition
in serial order. As data is held in memory, this lack of concurrency is considered
an optimization to avoid locking overhead [Har+08]. Single-shard transactions
are directly forwarded to SPIs and do not require additional concurrency control
as the execution is serial. Read-only transactions can directly read from any
replica without concurrency control (called one-shot). Multi-shard transactions
are sequenced through a Multi Partition Initiator (MPI) that creates a consensus
among SPIs for an interleaved transaction ordering. During execution, crossshard communication is required to distribute intermediate results. Written data
is atomically committed through 2PC. VoltDB scales well for workloads with many
single-shard transactions. For multi-shard transactions serialized through the MPI,
however, the consensus overhead causes throughput to decrease with increasing
cluster size.
Calvin [Tho+12] is a transaction and replication service for enhancing available
database systems with ACID transactions. Transactions in Calvin have to be
run fully server-side (written in C++ or Python) and must not introduce nondeterminism, similar to H-Store and VoltDB [Kal+08, SW13]. This permits Calvin
142
6 Transactional Semantics for Globally Distributed Applications
to schedule the order of transactions before their execution. Client-submitted
transactions are appended to a shared replicated log that is similar to the Tango
approach [Bal+13]. To achieve acceptable performance despite this centralized
component, requests are batched, persisted to a storage backend (e.g., Cassandra),
and the batch identifiers are replicated via Paxos. The scheduler relies on the
log order to create a deadlock-free, deterministic ordering of transactions using
two-phase locking. As each transaction’s read and write sets have to be declared
in advance, allocation of locks can be performed before the transaction begin
(preclaiming [WV02]). Transactions execute locally on each shard by exchanging
the read sets with other shards and only writing local records. While Calvin achieves
high throughput in TPC-C benchmarks, its model is strictly limited to deterministic,
non-interactive transactions on pre-defined read and write sets, which eliminates
most forms of queries. Furthermore, there is an inherent trade-off between commit
latency and throughput introduced by the batching interval of the shared log.
6.7 Summary: Consistency vs. Latency in Distributed
Applications
In this chapter, we discussed different seminal systems for distributed transaction
processing, some of which are transactional database systems and some of which
enable transactional guarantees on top of non-transactional database systems.
Table 6.1 summarizes the pivotal properties of the systems discussed in this chapter.
Many distributed data management systems employ optimistic concurrency
control to minimize abort rates (e.g. Megastore, G-Store,and MDCC), while some
use pessimistic protocols in favor of write-heavy workloads (e.g. Calvin) or a combination of both for flexibility (F1). A few systems contrastingly rely on deterministic
transactions (e.g. H-Store/VoltDB and Calvin) or custom concurrency protocols
(e.g. RAMP) to increase scalability and throughput at the expense of reduced
flexibility or consistency guarantees. Entity group transactions are lightweight and
build on well-known single-node concurrency control schemes, but they also limit
both scalability and the possible scope of transactions. A number of systems
therefore implement distributed multi-shard transactions which may also rely on
local commit procedures, but often employ client-coordinated transactions (Cherry
Garcia, RAMP) or variants of two-phase commit (e.g. Spanner, F1, Percolator,
MDCC, TAPIR, CloudTPS, Walter).
References
143
Table 6.1 Related transactional systems and their concurrency control protocols (OCC: optimistic concurrency control, PCC: pessimistic concurrency control, TO: timestamp ordering,
MVCC: multi-version concurrency control), achieved isolation level (SR: serializability, SI:
snapshot isolation, RC: read committed), transaction granularity, and commit protocol
System
Megastore [Bak+11]
G-store [DAEA10]
ElasTras [DAEA13]
Cloud SQL server
[Ber+11]
Spanner [Cor+12]
F1 [Shu+13]
Percolator [PD10]
MDCC [Kra+13]
TAPIR [Zha+15b]
CloudTPS [WPC12]
Cherry garcia [DFR15a]
Omid [GÃ+14]
FaRMville [Dra+15]
RAMP [Bai+14c]
Walter [Sov+11]
H-Store/VoltDB [Kal+08]
Calvin [Tho+12]
Orestes/Baqend with
DCAT [Ges19]
Concurrency
control
OCC
OCC
OCC
PCC
Isolation
SR
SR
SR
SR
Granularity
Entity group
Entity group
Entity group
Entity group
Commit
protocol
Local
Local
Local
Local
PCC
PCC or OCC
OCC
OCC
TO
TO
OCC
MVCC
OCC
Custom
PCC
Deterministic CC
Deterministic CC
OCC
SR/SI
SR/SI
SI
RC
SR
SR
SI
SI
SR
Read-atomic
Parallel SI
SR
SR
SR
Multi-shard
Multi-shard
Multi-shard
Multi-shard
Multi-shard
Multi-shard
Multi-shard
Multi-shard
Multi-shard
Multi-shard
Multi-shard
Multi-shard
Multi-shard
Multi-shard
2PC
2PC
2PC
2PC-like
2PC-like
2PC
Client-coord.
Local
Local
Client-coord.
2PC
Local
Local
Custom
References
[ADE12] Divyakant Agrawal, Sudipto Das, and Amr El Abbadi. Data Management
in the Cloud: Challenges and Opportunities. Synthesis Lectures on Data
Management. Morgan & Claypool Publishers, 2012. https://doi.org/10.2200/
S00456ED1V01Y201211DTM032.
[Ady99] Atul Adya. “Weak consistency: a generalized theory and optimistic implementations
for distributed transactions”. PhD thesis. Massachusetts Institute of Technology, 1999.
URL : http://www.csd.uoc.gr/~hy460/pdf/adya99weak.pdf (visited on 01/03/2015).
[Agu+07] Marcos K. Aguilera et al. “Sinfonia: a new paradigm for building scalable distributed
systems”. In: ACM SIGOPS Operating Systems Review. Vol. 41. ACM, 2007, pp. 159–
174. URL: http://dl.acm.org/citation.cfm?id=1294278 (visited on 01/03/2015).
[Aki15] Tyler Akidau. “The world beyond batch: Streaming 101”. In: O’Reilly Media
(Aug. 2015). Accessed on 08/21/2017. URL: https://www.oreilly.com/ideas/theworld-beyond-batch-streaming-101.
[Aki16] Tyler Akidau. “The world beyond batch: Streaming 102”. In: O’Reilly Media
(Jan. 2016). Accessed on 08/21/2017. URL: https://www.oreilly.com/ideas/the-worldbeyond-batch-streaming-102.
144
6 Transactional Semantics for Globally Distributed Applications
[ALO00] Atul Adya, Barbara Liskov, and Patrick E. O’Neil. “Generalized Isolation Level Definitions”. In: Proceedings of the 16th International Conference on Data Engineering,
San Diego, California, USA, February 28 - March 3, 2000. Ed. by David B. Lomet
and Gerhard Weikum. IEEE Computer Society, 2000, pp. 67–78. https://doi.org/10.
1109/ICDE.2000.839388.
[Bai+13c] Peter Bailis et al. “Highly Available Transactions: Virtues and Limitations”. In:
Proceedings of the VLDB Endowment 7.3 (2013). 00001.
[Bai+14a] Peter Bailis et al. “Coordination avoidance in database systems”. In: Proceedings of
the VLDB Endowment 8.3 (2014), pp. 185–196. URL: http://www.vldb.org/pvldb/vol8/
p185-bailis.pdf (visited on 01/03/2015).
[Bai+14c] Peter Bailis et al. “Scalable Atomic Visibility with RAMP Transactions”. In:
ACM SIGMOD Conference. 2014. URL: https://amplab.cs.berkeley.edu/wp-content/
uploads/2014/04/ramp-sigmod2014.pdf (visited on 09/28/2014).
[Bai15] Peter Bailis. “Coordination Avoidance in Distributed Databases”. PhD thesis. University of California, Berkeley, USA, 2015. URL: http://www.escholarship.org/uc/item/
8k8359g2.
[Bak+11] J. Baker et al. “Megastore: Providing scalable, highly available storage for interactive
services”. In: Proc. of CIDR. Vol. 11. 2011, pp. 223–234.
[Bal+13] Mahesh Balakrishnan et al. “Tango: distributed data structures over a shared log”.
en. In: ACM Press, 2013, pp. 325–340. ISBN: 978-1-4503-2388-8. https://doi.org/10.
1145/2517349.2522732. URL: http://dl.acm.org/citation.cfm?doid=2517349.2522732
(visited on 01/03/2015).
[Ber+11] Philip A. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”. In: Data Engineering (ICDE), 2011 IEEE 27th International Conference on.
IEEE. IEEE, 2011, pp. 1255–1263. URL: http://ieeexplore.ieee.org/xpls/abs_all.jsp?
arnumber=5767935 (visited on 05/05/2014).
[Ber99] Philip A. Bernstein. “Review - A Majority Consensus Approach to Concurrency
Control for Multiple Copy Databases”. In: ACM SIGMOD Digital Review 1 (1999).
URL : http://db/journals/dr/Bernstein99.html.
[BJ87] Ken Birman and Thomas Joseph. Exploiting virtual synchrony in distributed systems.
Vol. 21. 5. ACM, 1987. URL: http://dl.acm.org/citation.cfm?id=37515 (visited on
01/03/2015).
[BN09] Philip A. Bernstein and Eric Newcomer. Principles of Transaction Processing.
Morgan Kaufmann, 2009. ISBN: 1-55860-415-4.
[Cal+11] Brad Calder et al. “Windows Azure Storage: a highly available cloud storage service
with strong consistency”. In: Proceedings of the Twenty-Third ACM Symposium on
Operating Systems Principles. ACM. ACM, 2011, pp. 143–157. URL: http://dl.acm.
org/citation.cfm?id=2043571 (visited on 04/16/2014).
[Cas81] Marco A. Casanova. The Concurrency Control Problem for Database Systems. Vol.
116. Lecture Notes in Computer Science. Springer, 1981. ISBN: 3-540-10845-9.
https://doi.org/10.1007/3-540-10845-9.
[Clu] Clustrix: A New Approach to Scale-Out RDBMS. http://www.clustrix.com/wpcontent/uploads/2017/01/Whitepaper-ANewApproachtoScaleOutRDBMS.pdf.
(Accessed on 05/20/2017). 2017. URL: http://www.clustrix.com/wp-content/
uploads/2017/01/Whitepaper-ANewApproachtoScaleOutRDBMS.pdf (visited on
02/18/2017).
[Coc] CockroachDB - the scalable, survivable, strongly-consistent SQL database. https://
github.com/cockroachdb/cockroach. (Accessed on 05/20/2017). 2017. URL: https://
github.com/cockroachdb/cockroach (visited on 02/17/2017).
[Cor+12] James C. Corbett et al. “Spanner: Google’s Globally-Distributed Database”. In: 10th
USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012,
Hollywood, CA, USA, October 8–10, 2012. Ed. by Chandu Thekkath and Amin
Vahdat. USENIX Association, 2012, pp. 261–264. URL: https://www.usenix.org/
conference/osdi12/technical-sessions/presentation/corbett.
References
145
[Cor+13] James C. Corbett et al. “Spanner: Google’s Globally Distributed Database”. In: ACM
Trans. Comput. Syst. 31.3 (2013), 8:1–8:22. https://doi.org/10.1145/2491245.
[CRF08] Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. “Serializable Isolation for
Snapshot Databases”. In: Proceedings of the 2008 ACM SIGMOD International
Conference on Management of Data. SIGMOD ’08. Vancouver, Canada: ACM, 2008,
pp. 729–738. ISBN: 978-1-60558-102-6. https://doi.org/10.1145/1376616.1376690.
URL : http://doi.acm.org/10.1145/1376616.1376690.
[DAEA10] Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. “G-store: a scalable data store
for transactional multi key access in the cloud”. In: Proceedings of the 1st ACM
symposium on Cloud computing. ACM. 2010, pp. 163–174.
[DAEA13] Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. “ElasTraS: An elastic, scalable,
and self-managing transactional database for the cloud”. en. In: ACM Transactions on
Database Systems 38.1 (Apr. 2013), pp. 1–45. ISSN: 03625915. https://doi.org/10.
1145/2445583.2445588. URL: http://dl.acm.org/citation.cfm?doid=2445583.2445588
(visited on 11/25/2016).
[Das+11] Sudipto Das et al. “Albatross: lightweight elasticity in shared storage databases
for the cloud using live data migration”. In: Proceedings of the VLDB Endowment
4.8 (2011), pp. 494–505. URL: http://dl.acm.org/citation.cfm?id=2002977 (visited on
07/16/2014).
[Dey15] Akon Samir Dey. “Cherry Garcia: Transactions across Heterogeneous Data Stores”.
In: (2015).
[DFR15a] A. Dey, A. Fekete, and U. Röhm. “Scalable distributed transactions across heterogeneous stores”. In: 2015 IEEE 31st International Conference on Data Engineering.
2015, pp. 125–136. https://doi.org/10.1109/ICDE.2015.7113278.
[Dia+13] Cristian Diaconu et al. “Hekaton: SQL server’s memory-optimized OLTP engine”.
In: Proceedings of the 2013 international conference on Management of data. ACM,
2013, pp. 1243–1254. URL: http://dl.acm.org/citation.cfm?id=2463710 (visited on
01/03/2015).
[Dra+15] Aleksandar DragojeviÄ et al. “No compromises: distributed transactions with consistency, availability, and performance”. en. In: Proceedings of the 25th Symposium
on Operating Systems Principles. ACM. ACM Press, 2015, pp. 54–70. ISBN:
978-1-4503-3834-9. https://doi.org/10.1145/2815400.2815425. URL: http://dl.acm.
org/citation.cfm?doid=2815400.2815425 (visited on 11/25/2016).
[Esw+76] Kapali P. Eswaran et al. “The Notions of Consistency and Predicate Locks in a
Database System”. In: Commun. ACM 19.11 (1976), pp. 624–633. https://doi.org/10.
1145/360363.360369.
[EWS13] Robert Escriva, Bernard Wong, and Emin GÃijn Sirer. “Warp: Multikey
transactions for keyvalue stores”. In: United Networks, LLC, Tech. Rep 5
(2013). URL: http://dl.frz.ir/FREE/papers-we-love/distributed_systems/warp-multikey-transactions-for-key-value-stores.pdf (visited on 01/03/2015).
[Fek+05] Alan Fekete et al. “Making snapshot isolation serializable”. In: ACM Transactions on
Database Systems (TODS) 30.2 (2005), pp. 492–528. URL: http://dl.acm.org/citation.
cfm?id=1071615 (visited on 01/03/2015).
[FRT92] Peter A. Franaszek, John T. Robinson, and Alexander Thomasian. “Concurrency
Control for High Contention Environments”. In: ACM Trans. Database Syst. 17.2
(1992), pp. 304–345. https://doi.org/10.1145/128903.128906.
[Ges19] Felix Gessert. “Low Latency for Cloud Data Management”. PhD thesis. University
of Hamburg, Germany, 2019. URL: http://ediss.sub.uni-hamburg.de/volltexte/2019/
9541/.
[GL06] J. Gray and L. Lamport. “Consensus on transaction commit”. In: ACM Transactions
on Database Systems (TODS) 31.1 (2006), pp. 133–160. URL: http://dl.acm.org/
citation.cfm?id=1132867 (visited on 11/28/2016).
[Gra+76] Jim Gray et al. “Granularity of Locks and Degrees of Consistency in a Shared Data
Base”. In: Modelling in Data Base Management Systems, Proceeding of the IFIP
146
6 Transactional Semantics for Globally Distributed Applications
Working Conference on Modelling in Data Base Management Systems, Freudenstadt,
Germany, January 5–8, 1976. Ed. by G. M. Nijssen. North-Holland, 1976, pp. 365–
394.
[Gra+81] Jim Gray et al. “A Straw Man Analysis of the Probability of Waiting and Deadlock in
a Database System”. In: Berkeley Workshop. 1981, p. 125.
[Gra97] Jim Gray. “Microsoft SQL Server”. In: 1997.
[Gro+13] Katarina Grolinger et al. “Data management in cloud environments: NoSQL and
NewSQL data stores”. en. In: Journal of Cloud Computing: Advances, Systems
and Applications 2.1 (2013), p. 22. ISSN: 2192-113X. https://doi.org/10.1186/2192113X-2-22. URL: http://www.journalofcloudcomputing.com/content/2/1/22 (visited
on 01/03/2015).
[GÃ+14] Ferro Daniel GÃşmez et al. “Omid: Lock-free Transactional Support for Distributed
Data Stores”. In: ICDE. 2014.
[Har+08] S. Harizopoulos et al. “OLTP through the looking glass, and what we found there”.
In: Proceedings of the 2008 ACM SIGMOD international conference on Management
of data. 2008, pp. 981–992. URL: http://dl.acm.org/citation.cfm?id=1376713 (visited
on 07/05/2012).
[HR83] Theo Haerder and Andreas Reuter. “Principles of transaction-oriented database
recovery”. In: ACM Comput. Surv. 15.4 (Dec. 1983), pp. 287–317.
[Här84] Theo Härder. “Observations on optimistic concurrency control schemes”. In: Inf. Syst.
9.2 (1984), pp. 111–120. https://doi.org/10.1016/0306-4379(84)90020-6.
[JRY11] Flavio Junqueira, Benjamin Reed, and Maysam Yabandeh. “Lock-free transactional
support for large-scale storage systems”. In: IEEE/IFIP International Conference on
Dependable Systems and Networks Workshops (DSN-W 2011), Hong Kong, China,
June 27–30, 2011. IEEE, 2011, pp. 176–181. https://doi.org/10.1109/DSNW.2011.
5958809.
[Kal+08] R. Kallman et al. “H-store: a high-performance, distributed main memory transaction
processing system”. In: Proceedings of the VLDB Endowment 1.2 (2008), pp. 1496–
1499.
[KR81] H. T. Kung and J. T. Robinson. “On optimistic methods for concurrency control”.
In: ACM Transactions on Database Systems (TODS) 6.2 (1981), pp. 213–226. URL:
http://dl.acm.org/citation.cfm?id=319567 (visited on 11/19/2012).
[Kra+13] Tim Kraska et al. “MDCC: Multi-data center consistency”. In: EuroSys. ACM,
2013, pp. 113–126. URL: http://dl.acm.org/citation.cfm?id=2465363 (visited on
04/15/2014).
[Lam98] Leslie Lamport. “The part-time parliament”. In: ACM Transactions on Computer
Systems (TOCS) 16.2 (1998), pp. 133–169.
[Lec09] Jens Lechtenbörger.A“Two-Phase
˛
Commit Protocol”. English. In: Encyclopedia of
Database Systems. Ed. by LING LIU and M.TAMER ÃZSU. Springer US, 2009, pp.
3209–3213. ISBN: 978-0-387-35544-3. https://doi.org/10.1007/978-0-387-39940-9_
2. URL: http://dx.doi.org/10.1007/978-0-387-39940-9_2.
[Lev+15] Justin J. Levandoski et al. “High Performance Transactions in Deuteronomy”. In:
CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research,
Asilomar, CA, USA, January 4–7, 2015, Online Proceedings. www.cidrdb.org, 2015.
URL : http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper15.pdf.
[Lis+99] Barbara Liskov et al. “Providing Persistent Objects in Distributed Systems”. In:
ECOOP’99 - Object-Oriented Programming, 13th European Conference, Lisbon,
Portugal, June 14–18, 1999, Proceedings. Ed. by Rachid Guerraoui. Vol. 1628.
Lecture Notes in Computer Science. Springer, 1999, pp. 230–257. https://doi.org/10.
1007/3-540-48743-3_11.
[LM10] Avinash Lakshman and Prashant Malik. “Cassandra: a decentralized structured
storage system”. In: ACM SIGOPS Operating Systems Review 44.2 (2010), pp. 35–
40. URL: http://dl.acm.org/citation.cfm?id=1773922 (visited on 04/15/2014).
References
147
[Nuo] NuoDB: Emergent Architecture. http://go.nuodb.com/rs/nuodb/images/Greenbook_
Final.pdf. (Accessed on 04/30/2017). 2017. URL: http://go.nuodb.com/rs/nuodb/
images/Greenbook_Final.pdf (visited on 02/18/2017).
[OL88] Brian M. Oki and Barbara Liskov. “Viewstamped Replication: A General Primary
Copy”. In: Proceedings of the Seventh Annual ACM Symposium on Principles of
Distributed Computing, Toronto, Ontario, Canada, August 15–17, 1988. Ed. by Danny
Dolev. ACM, 1988, pp. 8–17. https://doi.org/10.1145/62546.62549.
[PD10] Daniel Peng and Frank Dabek. “Large-scale Incremental Processing Using Distributed Transactions and Notifications.” In: OSDI. Vol. 10. 2010, pp. 1–15.
URL : https://www.usenix.org/legacy/events/osdi10/tech/full_papers/Peng.pdf?origin=
publication_detail (visited on 01/03/2015).
[PG12] Dan RK Ports and Kevin Grittner. “Serializable snapshot isolation in PostgreSQL”.
In: Proceedings of the VLDB Endowment 5.12 (2012), pp. 1850–1861. URL: http://dl.
acm.org/citation.cfm?id=2367523 (visited on 01/03/2015).
[SF12] Pramod J. Sadalage and Martin Fowler. NoSQL distilled: a brief guide to the emerging
world of polyglot persistence. Pearson Education, 2012.
[Shu+13] Jeff Shute et al. “F1: A distributed SQL database that scales”. In: Proceedings of the
VLDB Endowment 6.11 (2013). 00004, pp. 1068–1079.
[Sov+11] Yair Sovran et al. “Transactional storage for geo-replicated systems”. In: Proceedings
of the Twenty-Third ACM Symposium on Operating Systems Principles. ACM, 2011,
pp. 385–400.
[SS83] Dale Skeen and Michael Stonebraker. “A Formal Model of Crash Recovery in a
Distributed System”. In: IEEE Trans. Software Eng. 9.3 (1983), pp. 219–228. https://
doi.org/10.1109/TSE.1983.236608.
[SSS15] S. Sippu and E. Soisalon-Soininen. Transaction Processing: Management of the
Logical Database and its Underlying Physical Structure. Data-Centric Systems and
Applications. Springer International Publishing, 2015. ISBN: 9783319122922. URL:
https://books.google.de/books?id=TN1sBgAAQBAJ.
[Sto+07] M. Stonebraker et al. “The end of an architectural era:(it’s time for a complete
rewrite)”. In: Proceedings of the 33rd international conference on Very large data
bases. 2007, pp. 1150–1160. URL: http://dl.acm.org/citation.cfm?id=1325981 (visited
on 07/05/2012).
[SW13] Michael Stonebraker and Ariel Weisberg. “The VoltDB Main Memory DBMS”. In:
IEEE Data Eng. Bull. 36.2 (2013), pp. 21–27. URL: http://sites.computer.org/debull/
A13june/VoltDB1.pdf.
[Tho+12] Alexander Thomson et al. “Calvin: fast distributed transactions for partitioned
database systems”. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM. 2012, pp. 1–12.
[Tho98] A. Thomasian. “Concurrency control: methods, performance, and analysis”. In: ACM
Computing Surveys (CSUR) 30.1 (1998). 00119, pp. 70–119. URL: http://dl.acm.org/
citation.cfm?id=274443 (visited on 10/18/2012).
[WPC12] Zhou Wei, Guillaume Pierre, and Chi-Hung Chi. “CloudTPS: Scalable transactions
for Web applications in the cloud”. In: Services Computing, IEEE Transactions on 5.4
(2012), pp. 525–539.
[WV02] G. Weikum and G. Vossen. Transactional information systems. Series in Data
Management Systems. Morgan Kaufmann Pub, 2002. ISBN: 9781558605084. URL:
http://books.google.de/books?hl=de&lr=&id=wV5Ran71zNoC&oi=fnd&pg=PP2&
dq=transactional+information+systems&ots=PgJAaN7R5X&sig=Iya4r9DiFhmb_
wWgOI5QMuxm6zU (visited on 06/28/2012).
[Xie+15] Chao Xie et al. “High-performance ACID via modular concurrency control”. In:
Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015,
Monterey, CA, USA, October 4–7, 2015. Ed. by Ethan L. Miller and Steven Hand.
ACM, 2015, pp. 279–294. https://doi.org/10.1145/2815400.2815430.
148
6 Transactional Semantics for Globally Distributed Applications
[Zha+15b] Irene Zhang et al. “Building consistent transactions with inconsistent replication”. In:
Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015,
Monterey, CA, USA, October 4–7, 2015. Ed. by Ethan L. Miller and Steven Hand.
ACM, 2015, pp. 263–278. https://doi.org/10.1145/2815400.2815404.
[ÖV11] M.T. Özsu and P. Valduriez. Principles of distributed database systems. Springer,
2011.
Chapter 7
Polyglot Persistence in Data Management
As applications become more data-driven and highly distributed, providing low
response times to increasingly many users becomes more challenging within the
scope of a single database system. Not only the variety of use cases is increasing,
but also the requirements are becoming more heterogeneous: horizontal scalability,
schema flexibility, and high availability are primary concerns for modern applications. While RDBMSs cover many of the functional requirements (e.g., ACID
transactions and expressive queries), they cannot cover scalability, performance,
and fault tolerance in the same way that specialized data stores can. The explosive
growth of available systems through the Big Data and NoSQL movement sparked
the idea of employing particularly well-suited database systems for subproblems of
the overall application.
The architectural style polyglot persistence describes the usage of specialized
data stores for different requirements. The term was popularized by Fowler in 2011
and builds on the idea of polyglot programming [SF12]. The core idea is that
abandoning a “one size fits all” architecture can increase development productivity,
resp. time-to-market, as well as performance. Polyglot persistence applies to single
applications as well as complete organizations.
Figure 7.1 shows an example of a polyglot persistence architecture for an ecommerce application, as often found in real-world applications [Kle17]. Data
is distributed to different database systems according to their associated requirements. For example, financial transactions are processed through a relational
database, to guarantee correctness. As product descriptions form a semi-structured
aggregate, they are well-suited for storage in a distributed document store that
can guarantee scalability of data volume and reads. The log-structured storage
management in wide-column stores is optimal for maintaining high write throughput
for application-generated event streams. Additionally, they provide interfaces to
apply complex data analysis through Big Data platforms such as Hadoop and Spark
[Whi15, Zah+10]. The example illustrates that in polyglot persistence architectures,
© Springer Nature Switzerland AG 2020
F. Gessert et al., Fast and Scalable Cloud Data Management,
https://doi.org/10.1007/978-3-030-43506-6_7
149
150
7 Polyglot Persistence in Data Management
Fig. 7.1 Example of a polyglot persistence architecture with database systems for different
requirements and types of data in an e-commerce scenario
there is an inherent trade-off between increased complexity of maintenance and
development against improved, problem-tailored storage of application data.
In a nutshell, polyglot persistence adopts the idea of applying the best persistence
technology for a given problem. In the following, we will present an overview of
different strategies for implementing polyglot persistence and the challenges they
entail.
7.1 Functional and Non-functional Requirements
The requirement for a fast time-to-market is supported by avoiding the impedance
mismatch [Mai90, Amb12] between the application’s data structures and the
persistent data model. For example, if a web application using a JSON-based REST
API can store native JSON documents in a document store, the development process
is considerably simplified compared to systems where the application’s data model
has to be mapped to a database system’s data model.
Performance can be maximized, if the persistence requirements allow for an
efficient partitioning and replication of data combined with suitable index structures
and storage management. If the application can tolerate relaxed guarantees for
7.1 Functional and Non-functional Requirements
151
consistency or transactional isolation, database systems can leverage this to optimize
throughput and latency.
Typical functional persistence requirements are:
• ACID transactions with different isolation levels
• Atomic, conditional, or set-oriented updates
• Query types: point lookups, scans, aggregations, selections, projections, joins,
subqueries, Map-Reduce, graph queries, batch analyses, searches, real-time
queries, dataflow graphs
• Partial or commutative update operations
• Data structures: graphs, lists, sets, maps, trees, documents, etc.
• Structured, semi-structured, or implicit schemas
• Semantic integrity constraints
Among the non-functional requirements are:
•
•
•
•
•
•
•
Throughput for reads, writes, and queries
Read and write latency
High availability
Scalability of data volume, reads, writes, and queries
Consistency guarantees
Durability
Elastic scale-out and scale-in
The central challenge in polyglot persistence is determining whether a given
database system satisfies a set of application-provided requirements and access
patterns. While some performance metrics can be quantified with benchmarks such
as YCSB, TPC, and others [Dey+14, Coo+10, Coo+10, Pat+11, BZS13, Ber+14,
PF00, Wad+11, Fio+13, BT14, Ber15, Ber14], many non-functional requirements
such as consistency and scalability are currently not covered through benchmarks
or even diverge from the documented behavior [Win+15].
In a polyglot persistence architecture, the boundary of the database form the
boundary of transactions, queries, and update operations. Thus, if data is persisted
and modified in different databases, this entails consistency challenges. The application therefore has to explicitly control the synchronization of data across systems,
e.g., through ETL batch jobs, or has to maintain consistency at the application level,
e.g., through commutative data structures. Alternatively, data can be distributed in
disjoint partitions which shifts the problem to cross-database queries, a well-studied
topic in data integration [Len02]. In contrast to data integration problems, however,
there is no autonomy of data sources. Instead, the application explicitly combines
and modifies the databases for polyglot persistence [SF12].
152
7 Polyglot Persistence in Data Management
Fig. 7.2 Polyglot persistence
requirements for a product
catalog in an e-commerce
application
7.1.1 Implementation of Polyglot Persistence
To manage the increased complexity introduced by polyglot persistence, different
architectures can be applied. We group them into the three architectural patterns
application-coordinated polyglot persistence, microservices, and polyglot database
services. As an example, consider the product catalog of the introductory ecommerce example (see Fig. 7.2). The product catalog should be able to answer
simple filter queries (e.g., searching by keyword) as well as returning the topk products according to access statistics. The functional requirement therefore is
that the access statistics have to support a high write throughput (incrementing on
each view) and top-k queries (1). The product catalog has to offer filter queries and
scalability of data volume (2). These requirements can, for example, be fulfilled
with the key-value store Redis and the document store MongoDB. With its sorted
set data structure, Redis supports a mapping from counters to primary keys of
products. Incrementing and performing top-k queries are efficiently supported
with logarithmic time complexity in memory. MongoDB supports storing product
information in nested documents and allows queries on the attributes of these
documents. Using hash partitioning, the documents can efficiently be distributed
over many nodes in a cluster to achieve scalability.
With application-coordinated polyglot persistence (see Fig. 7.3), the application server’s data tier programmatically coordinates polyglot persistence. Typically,
the mapping of data to databases follows the application’s modularization. This
pattern simplifies development, as each module is specialized for the use of one
particular data store. Also, design decisions in data modeling as well as access
patterns are encapsulated in a single module (loose coupling). The separation can
also be relaxed: for the product catalog, it would not only be possible to model a
counter and separate product data. Instead, a product could also be modeled as an
entity containing a counter. The dependency between databases has to be considered
both at development time and during operation. For example, if the format of the primary key changes, the new key structure has to be implemented for both systems in
the code and in the database. Object-NoSQL mappers simplify the implementation
7.1 Functional and Non-functional Requirements
153
Fig. 7.3 Architectural patterns for the implementation of polyglot persistence: applicationcoordinated polyglot persistence, microservices, and polyglot database services
of application-coordinated polyglot persistence. However, currently, the functional
scope of these mappers is very limited [Tor+17, Stö+15, Wol+13].
A practical example of application-coordinated polyglot persistence is Twitter’s
storage of user feeds [Kri13]. For fast read access, the newest tweets for each user
are materialized in a Redis cluster. Upon publishing of a new tweet, the social graph
is queried from a graph store and distributed among the Redis-based feeds for each
relevant user (Write Fanout). As a persistent fallback for Redis, MySQL servers are
managed and partitioned by the application tier.
To increase encapsulation of persistence decisions, microservice architectures
are useful [New15] (see Sect. 2.1). Microservices allow narrowing the choice of a
database system to one particular service and thus decouple the development and
operations of services [DB13]. Technologically, IaaS/PaaS, containers, and cluster
management frameworks provide sophisticated tooling for scaling and operating
microservice architectures. In the example, the product catalog could be split into
two microservices using MongoDB and Redis separately. The Redis-based service
would provide an API for querying popular products and incrementing counters,
whereas the MongoDB-based microservice would have a similar interface for
retrieving product information. The user-facing business logic (e.g., the frontend
in a two-tier architecture) simply has to invoke both microservices and combine the
result.
154
7 Polyglot Persistence in Data Management
In order to make polyglot persistence fully transparent for the application, polyglot database services need to abstract from implementation details of underlying
systems. The key idea is to hide the allocation of data and queries to databases
through a generic cloud service API. Some NoSQL databases and services use
this approach, for example, to integrate full-text search with structured storage
(e.g., in Riak [Ria] and Cassandra [LM10]), to store metadata consistently (e.g.,
in HBase [Hba] and BigTable [Cha+08]), or to cache objects (e.g., Facebook’s TAO
[Bro+13]). However, these approaches use a defined scheme for the allocation and
cannot adapt to varying application requirements. Polyglot database services can
also apply static rules for polyglot persistence: if the type of the data is known (for
example a user object or a file), a rule-based selection of a storage system can be
performed [SGR15].
In the example, the application could declare the throughput requirements of
the counter and the scalability requirement for the product catalog. The task of the
polyglot database service would then be to autonomously derive a suitable mapping
for queries and data. The core challenge here is to base the selection of systems
on quantifiable metrics of available databases and applying transparent rewriting of
operations. A weaker form than fully-automated polyglot persistence are database
services with semi-automatic polyglot persistence. In this model, the application
can explicitly define which underlying system should be targeted, while reusing
high-level features such as schema modeling, transactions, and business logic across
systems through a unified API.
7.2 Multi-Tenancy and Virtualization in Cloud-Based
Deployments
The Database-as-a-Service (DBaaS) model promises to shift the problem of configuration, scaling, provisioning, monitoring, backup, privacy, and access control to a
service provider [Cur+11a]. Hacigumus et al. [HIM02] coined the term DBaaS and
argued that it provided a new paradigm for organizations to alleviate the need for
purchasing expensive hardware and software to build a scalable deployment. Lehner
and Sattler [LS13] and Zhao et al. [Zha+14] provide a comprehensive overview of
current research and challenges introduced by the DBaaS paradigm.
The DBaaS model emerged as a useful service category offered by PaaS and
IaaS providers and is therefore mainly rooted in industry. Table 7.1 summarizes
selected commercial systems and groups them by important properties such as data
model, sharding strategy, and query capabilities. All systems except Cloudant are
based on proprietary REST APIs and details about their internal architectures are not
published (with the exception of Baqend which is the commercial variant of Orestes
[Ges19]). Another observation is that fine-grained SLAs are not provided, due to the
difficulty of satisfying tenant-specific requirements on a multi-tenant infrastructure.
System
Cloudant [Bie+15]
DynamoDB [Dyn]
Azure tables [Cal+11]
Google cloud datastore
[Dat, Bak+11]
S3, Azure blobs, GCS
[Amaa]
Baqend [Ges19]
Data model
Document store
Wide-column
Wide-column
Wide-column
CAP
AP
CP
CP
CP
Queries/indexing
Incremental MR views
Local and Global index
By key, scans
Local and global index
Replication
Lazy, local and geo
Eager, local
Eager, local
Eager, geo
Sharding
Hashing
Hashing
Hashing
Entity groups
Transactions
No
No
No
Per group
SLAs
No
No
99.9% Uptime
No
Blob-store
AP
No
Lazy, local and geo
Hashing
No
99.9% uptime (S3)
Document
CP
Yes
Eager, local
Range
Yes
No
7.2 Multi-Tenancy and Virtualization in Cloud-Based Deployments
Table 7.1 Selected industry DBaaS systems and their main properties: data model, category according to the CAP theorem, support for queries and
indexing, replication model, sharding strategy, transaction support, and service level agreements
155
156
7 Polyglot Persistence in Data Management
Most related work focuses on specific aspects of DBaaS models. Multi-tenancy
and virtualization are closely related, as resource sharing between tenants requires
some level of virtualization of underlying resources (the schema, database process,
operating system, computing hardware, and storage systems). The trade-off between
performance and isolation for multi-tenant systems has been studied extensively
[Aul+11, Aul+08, Aul+09, KL11, SKM08, WB09, JA07].
7.2.1 Database Privacy and Encryption
Since a DBaaS is hosted by third party, security and privacy are particularly
important. Several researchers have proposed solutions to prevent attackers and
providers from analyzing data stored in a DBaaS system. A survey of the field is
provided by Köhler et al. [KJH15]. The ideal solution for DBaaS privacy is fully
homomorphic encryption, which enables arbitrary computations on encrypted data
stored in the database. Though Gentry [Gen09] proposed a scheme in 2009, the
performance overhead is still prohibitive for use in real-world application.
The naive approach to ensure data confidentiality is to perform queries only in
the client, so that data can be fully encrypted. This approach is used in ZeroDB
[EW16]. The obvious limitation is that the client and network quickly become the
bottleneck: in ZeroDB, the query logic is executed in the client and each descent
in the B-tree requires one round-trip, leading to very high latency. MIT’s CryptDB
project [Pop+11, Pop14] is based on a layered encryption scheme, where different
encryption levels enable different query operators, e.g., homomorphic encryption
for sum-based aggregation and deterministic encryption for equality predicates.
CryptDB assumes a database proxy out of the threat scope that is responsible for
rewriting queries with the appropriate keys before forwarding them to the database
holding the encrypted data. The MySQL-based prototype exhibited a processing
overhead of 26% compared to native access, but latency was increased by an order
of magnitude. The problem of CryptDB is that the vulnerability is only moved into
the proxy that is co-located with application servers and therefore typically cloudhosted, too. Nonetheless, first commercial DBMSs have implemented explicitly
declared encryption levels for queries on encrypted data, e.g., Microsoft SQL Server
supporting random and deterministic encryption [Alw].
The problem of vulnerable proxies in CryptDB was addressed in a follow-up
system called Mylar [Kar+16, PZ13, Pop+14]. Mylar implements multi-key keyword search on encrypted data with a middleware operating only on encrypted data
without access to keys. The browser is responsible for encrypting and decrypting
data based on user keys. Data is stored and encrypted using the key of the user
owning the record. The core idea of the encrypted keyword search is that clients
generate an encrypted token for search that works on any record irrespective of the
key it was encrypted with. When a user grants access to another user, a delta value
is constructed in a way that allows the server to transform tokens without leaking
data. The downside of Mylar is that it only enables keyword search. Performance
7.2 Multi-Tenancy and Virtualization in Cloud-Based Deployments
157
is further limited, as the server has to scan every record for a token comparison,
only per-record duplicates of keywords can be indexed. Nonetheless, Mylar is an
important step towards secure sharing of information between application users
and it is also notable for providing security against attacks of both middleware and
database.
Relational Cloud is a visionary architecture for a secure, scalable, and multitenant DBaaS by Curino et al. [Cur+11a]. It proposes to use private database
virtualization for multi-tenancy and CryptDB for privacy. Access to the database
is handled through a JDBC driver which directs requests to load-balancing frontend
servers that partition data across backend servers to store the actual data in CryptDB.
The partitioning engine Schism [Cur+10] is based on workload graphs: whenever
two tuples are accessed within a transaction, the weight of their edge is increased.
By finding a partitioning of tuples with a minimal cut, cross-node transactions
are minimized. The partitioning rules are compacted and generalized by training
a decision tree that is used in frontend servers for routing. The consolidation engine
Kairos [Cur+11b] monitors workloads and outputs a mapping from virtual machines
to physical nodes in order to optimize combined resource requirements of multiple
tenants.
7.2.2 Service Level Agreements
Various approaches have been proposed for SLAs in cloud services and DBaaS
systems [Cun+07, Zha+14, ABC14, Bas12, Xio+11, Ter+13, LBMAL14, Pad+07,
Sak14]. Traditionally, this topic has been tackled in the context of workload
management for mainframe systems, to optimize simple performance metrics like
query response time [Cas+07, LS13]. Many approaches rely on the underlying
virtualization environment to enforce SLAs by means of live migration, e.g.,
Zephyr [Elm+11], Albatross [Das+11], Dolly [Cec+11], and Slacker [Bar+12].
Baset [Bas12] reviews SLAs of commercial cloud providers like AWS, Azure, and
Rackspace and concludes that performance-based SLAs are not guaranteed by any
provider. Furthermore, the burden of providing evidence of SLA violations rests on
the customer.
Xiong et al. have proposed ActiveSLA [Xio+11] as an admission control
framework for DBaaS systems. By predicting the probability of a query completing
before its deadline, a cost-based decision on allowing or rejecting the query can be
made using the SLA. Chi et al. [CMH11] have proposed a similar approach that uses
an SLA-based scheduler iCBS to minimize expected total costs. Sakr et al. [SL12]
presented the CloudDB AutoAdmin framework that monitors SLAs of cloud-hosted
databases and triggers application-defined rules upon violations to help developers
build on SLAs. Armbrust et al. [Arm+11] proposed the SQL extension PIQL
(Performance Insightful Query Language) that predicts SLA compliance using a
query planner which is aware of developer-provided hints. Instead of choosing the
fastest plan, the optimizer only outputs plans where the number of operations is
158
7 Polyglot Persistence in Data Management
known in advance. Lang et al. [Lan+12] formulate the SLA problem for DBaaS
systems as an optimization task of mapping client workloads to available hardware
resources. In particular, they provide a way for DBaaS providers to choose the class
of hardware that best suits the performance SLOs of their tenants. The Polyglot
Persistence Mediator (PPM) in Orestes [SGR15] is a DBaaS approach to combine
service level agreements with schema design for database-driven applications.
Instead of focusing on a specific performance SLA as common in most related work,
the approach lets application developers express each functional and non-functional
data management requirement based as schema annotations.
Problems closely related to SLAs are resource and storage allocation [Mad+15,
Sou+09], pricing models [LS13, p. 145], and workload characterization [GKA09,
Gul+12]. Since we focus on the perspective of application architects, though, we
consider low-level hardware and virtual machine allocation schemes to be out of
scope for this book.
7.3 Auto-Scaling and Elasticity
For providing elasticity, DBaaS systems have to automatically scale in and out
to accommodate the current and future mix of tenant workloads. The ability to
forecast workloads enables the most efficient forms of auto-scaling, as the service
does not have to react to overload situations and SLA violations, but can instead
proactively adjust its capacities. Kim et al. [Kim+16] and Lorido-Botran et al.
[LBMAL14] provide an overview of commonly employed workload predictors and
auto-scaling techniques from the literature. Related work on auto-scaling can be
grouped into approaches for threshold-based rules (e.g., [Has+12, Han+12, KF11,
MBS11, Gha+11, CS13]), reinforcement learning (e.g., [Dut+10, BHD13, Tes+06,
BRX13, XRB12]), queuing theory (e.g., [Urg+08, VPR07, ZCS07]), time series
analysis and prediction (e.g., [CDM11, GGW10, She+11, Fan+12, Isl+12, PN09]),
control theory (e.g., [Pad+09, Xu+07, Bod+09, ATE12, PH09]), and database livemigration (e.g., [Elm+11, Das+11, Cec+11, Bar+12, DAEA13]). While auto-scaling
does not replace capacity planning, it significantly increases flexibility as the cloud
infrastructure can be adapted at runtime.
Complex, proactive models are usually stronger for sudden surges in demand,
but most of the algorithms proposed in the literature strongly depend on a certain
workload type. Marcus and Papaemmanouil [MP17] argue that scalability and
query planning decisions for cloud data management should not depend on humans
or simple rules but instead harness machine learning techniques, in particular
reinforcement learning. An example of a system that follows this idea is Quaestor
[Ges+17] which applies deep reinforcement learning to find suitable caching TTLs
[Sch+16].
7.4 Database Benchmarking
159
7.4 Database Benchmarking
Different benchmarks have been proposed to evaluate latency, throughput, consistency, and other non-functional properties of distributed and cloud databases
[Dey+14, Coo+10, Coo+10, Pat+11, BZS13, Ber+14, BT11, BK13, BT14, Ber15,
Ber14].
The Yahoo Cloud Serving Benchmark (YCSB) [Coo+10] was published in 2010
and is the de-facto standard for benchmarking NoSQL systems. YCSB is designed
to measure throughput and latency for CRUD and scan operations performed
against different data stores [Fri+14, Win+15]. The main shortcoming is the missing
distribution of workload generation to prevent clients from becoming the actual
bottleneck. The second problem is that YCSB’s thread-per-request model incurs
high overhead and increases latency [FWR17].
While YCSB’s generic workloads make it easily applicable to any data store,
its lack of application-specific workloads render the results hard to interpret.
Particularly in contrast to the widely used TPC benchmarks [PF00] for RDBMSs,
YCSB neither covers queries nor transactions. BG [BG13] was proposed as an
alternative to YCSB that models interactions in a social network. BG not only
collects performance indicators, but also measures the conformance to applicationspecific SLAs and consistency. The Under Pressure Benchmark (UPB) [Fio+13] is
based on YCSB and quantifies the availability of replicated data stores by comparing
the performance during normal operation with the performance during node failures.
7.4.1 Consistency Benchmarking
As consistency is one of the central properties that many cloud data management
systems trade against other non-functional properties for performance reasons,
various benchmarks have been proposed to quantify eventual consistency and
staleness. Wada et al. [Wad+11] proposed a methodology to measure the staleness
of reads for cloud databases based on a single reader and writer. As reader and writer
rely on simple timestamps for consistency checks, the strategy is highly dependent
on clock synchronization and unsuitable for geo-replicated systems. Bermbach et
al. [BT11, BT14] extended the approach by supporting multiple, distributed readers
frequently polling the data store. This uncovered a pattern for the staleness windows
of Amazon S3. However, the scheme still assumes clock synchronization and
therefore might lead to questionable results [BZS13].
Golab and Rahman et al. [GLS11, Rah+12] argue that a consistency benchmark should not introduce a workload that stresses the system artificially, but
should rather extend existing workloads to also capture staleness information. The
authors propose an extension of YCSB that tracks timestamps and uses them
to compute an empirical for the -atomicity of the underlying data store by
finding the maximum time between two operations that yielded a stale result.
160
7 Polyglot Persistence in Data Management
YCSB++ [Pat+11] circumvents the problem of clock synchronization by relying
on a centralized Zookeeper instance for coordination of readers and writers to
measure consistency. As a consequence, YCSB++ can only provide a lower bound
for the inconsistency window. NoSQLMark [Win+15] is a database benchmarking
framework that provides both lower and upper bounds for measurements to make
results more meaningful. In addition, its implementation is validated using the tool
SickStore as a safeguard against implementation or other errors.
Bailis et al. proposed the Probabilistically Bounded Staleness (PBS) [Bai+12,
Bai+14b] prediction model to estimate the staleness of Dynamo-style systems based
on messaging latencies between nodes. PBS relies on a Monte Carlo simulation
sampling from latency distributions to calculate the probability of a stale read
for a given time after a write ((, t)−atomicity). The YCSB wrapper for Monte
Carlo simulations (YMCA) [Ges19, Section 4.3.1] is an adaption of this approach
for YCSB workloads and arbitrary topologies of database nodes and caches.
The YMCA allows studying staleness introduced not only by replication, but
also by invalidation-based and expiration-based caching. Furthermore, the YMCA
simulation frees the analysis from the trade-off between errors introduced by clock
drift and imprecision introduced by coordination delay, as exact simulation times
can be used.
Any database system can potentially be provided in the form of a DBaaS. However, low-latency access, elastic scalability, polyglot persistence, cross-database
transactions, and efficient multi-tenancy play important roles for scalable web
applications and have only partly been addressed by related work so far.
7.5 REST APIs, Multi-Model Databases and
Backend-as-a-Service
Most cloud services, including DBaaS and Backend-as-a-Service (BaaS) systems,
use REST APIs to ensure interoperability and accessibility from heterogeneous
environments. Originally proposed as an architectural style by Fielding [Fie00],
REST now commonly refers to HTTP-based interfaces. HTTP [Fie+99] emerged
as the standard for distributing information on the Internet. Originally, it was
employed for static data, but now serves sophisticated use cases from web and
mobile application to Internet of Things (IoT) applications. The growing adoption
of HTTP/2 [IET15] solving the connection multiplexing problem of HTTP/1.1
facilitates this movement. For web applications, REST and HTTP have largely
replaced RPC-based approaches (e.g., XML RPC or Java RMI [Dow98]), wire
protocols (e.g., PostgreSQL protocol [Pos]), and web services (specifically, SOAP
and WS-* standards family [Alo+04]).
Google’s GData [Gda] and Microsoft’s OData (Open Data Protocol) [Oda] are
two approaches for standardized REST/HTTP CRUD APIs that are used by some
of their respective cloud services. Many commercial DBaaS systems offer custom
7.5 REST APIs, Multi-Model Databases and Backend-as-a-Service
161
REST APIs tailored for one particular database (e.g., DynamoDB, Cloudant). A
first theoretic attempt for a unified DBaaS REST API has been made by Haselman
et al. [HTV10] for RDBMSs. Dey [Dey15] proposed REST+T as a REST API
for transactions. In REST+T, each object is modeled as a state machine modified
through HTTP methods.
7.5.1 Backend-as-a-Service
According to Roberts [Rob16], serverless architectures are applications that depend
on cloud services for server-side logic and persistence. The two major categories
of serverless services are Function-as-a-Service (FaaS) and Backend-as-a-Service
(BaaS). Both approaches are rooted in commercial cloud platforms rather than
research efforts.
FaaS refers to stateless, event-triggered business logic executed on a 3rdparty platform [Rob16]. Industry offerings are AWS Lambda, Microsoft Azure
Functions, and Google Cloud Functions. While FaaS offers a very simple and
scalable programming model, its applicability is limited by the lack of persistent
state. The major difference between FaaS and Platform-as-a-Service lies in the
ability of FaaS to seamlessly scale on a per-request basis, as no application server
infrastructure (e.g., Rails, Django, Java EE) is required. The term “BaaS” refers to
services that enable the development of rich client applications through database
access, authentication and authorization mechanisms, as well as SDKs for websites
and mobile apps. BaaS therefore is a natural extension of DBaaS towards scenarios
of direct client access Apps without intermediate application servers.
Many commercial BaaS platforms are available (e.g., Firebase, Kinvey, and
Azure Mobile Services, Baqend). Most of these platforms are based on proprietary
software and unpublished architectures which hinders a comparison. However,
different open-source BaaS platforms have been developed. They typically consist
of an API server (e.g., Node.js or Java) for BaaS functionality and user-submitted
code and a NoSQL database system for persistence (e.g., MongoDB, Cassandra or
CouchDB).
Meteor [HS16] is a development framework and server for running real-time
web applications. It is based on MongoDB and directly exposes the MongoDB
query language to JavaScript clients for both ad-hoc and real-time queries. Node.jsbased application servers run custom code and standard APIs, e.g., for user login.
Scalability is limited, as each application server subscribes to the MongoDB
replication log (oplog tailing),1 in order to match subscribed queries to updates.
Each server therefore has to maintain the aggregate throughput of a potentially
1 Historically,
there is another approach called poll-and-diff that relies on periodic query execution
for discovery of result changes. However, poll-and-diff does not scale with the number of real-time
query subscribers.
162
7 Polyglot Persistence in Data Management
sharded MongoDB cluster which is often infeasible in practice. As the replication
log furthermore only contains partial information on updates, the application servers
need to perform additional database queries to check for a match.
Deployd [Dep], Hoodie [Hoo], and Parse Server [Par] are based on Node.js,
too. Deployd [Dep] is a simple API server for common app functionalites and a
simple, MongoDB-based CRUD persistence API. It is focused on simplicity and is
neither horizontally scalable nor multi-tenant. Hoodie [Hoo] is a BaaS that combines
CouchDB and a client-side CouchDB clone called PouchDB for offline-capable
apps with synchronization. Through CouchDB change feeds, clients can subscribe
to simple CRUD events with limited querying capabilities. Hoodie is focused on
offline-first applications and offers no support for data and request scalability.
Parse Server [Par] is an open-source implementation of the Parse platform that
was acquired by Facebook in 2013 and later discontinued [Lac16]. It has extensive
mobile SDKs that go beyond wrapping the REST API and also provide widgets and
tooling for building the frontend. Parse Server is based on Node.js and MongoDB
and supports file storage, a JSON CRUD API, user management, access control,
and real-time queries that are functionally similar to those provided by Meteor,
but without support for ordering [Wan16]. The real-time query architecture relies
on broadcasting every update to every server that holds WebSocket connections to
clients through a single Redis instance. This does not allow the system to scale
upon increasing update workloads beyond single-server capacity. Parse Server does
not expose many data management abstractions such as indexes, partial updates,
concurrency control, and schemas, making it unsuitable for performance-critical and
advanced applications. In particular, the latency of HTTP requests is not reduced
through caching. However, in order to prevent the browser from performing two
round-trips due to cross-origin pre-flight requests, REST semantics are violated and
every interaction is wrapped in an HTTP POST and GET request [Gri13].
BaasBox [Bas] and Apache Usergrid [Use] are open-source Java-based BaaS
platforms. BaasBox [Bas] is a simple single-server platform based on the multimodel database OrientDB [Tes13]. Its main capabilities are CRUD-based persistence and a simple social media API for app development. Apache Usergrid [Use]
is a scalable BaaS built on Cassandra and geared towards mobile applications.
Through a REST API and SDKs, it supports typical features such as user management, authorization, JSON and file storage, as well as custom business logic
expressed in Java. Multi-tenancy is achieved through a shared database model by
running private API servers for each tenant, while consolidating rows in a single
Cassandra cluster. Query support is limited due to Cassandra’s architecture and there
are no consistency guarantees nor multi-key transactions.
Baqend2 is the commercial variant of Orestes [Ges19] and is designed to support
large-scale, low-latency web applications through web caching. While Baqend’s
caching acceleration works out-of-the-box for all applications and websites built
on the platform, it can also be applied to arbitrary legacy websites through the
2 Baqend:
https://www.baqend.com.
7.5 REST APIs, Multi-Model Databases and Backend-as-a-Service
163
performance plugin Speed Kit [WGW+20]. Baqend further provides execution of
custom code through user-defined Node modules and thereby offers FaaS features
in its BaaS model. Similar to Meteor, Baqend provides push-based real-time queries
on top of MongoDB. However, Baqend’s architecture decouples data storage from
stateless application logic to solve the scalability issue that Meteor is subject to. At
the time of writing, Baqend’s real-time query mechanism is the only3 one scalable
with respect to both update throughput and query concurrency. Baqend is also
the only BaaS to expose ACID transactions and explicit fine-grained control over
consistency levels.
7.5.2 Polyglot Persistence
The term polyglot persistence was introduced by Leberknight [Leb08] and later
popularized by Fowler [SF12]. Most web-scale architectures are heavily based on
polyglot persistence both within application components as well as across different
applications. Twitter uses Redis [San17] for storing tweets, a custom eventually
consistent wide-column store named Manhattan [Sch16] for user and analytics data,
Memcache [Fit04] for caching, a custom graph store called FlockDB as well as
MySQL, HDFS, and an object store [Has17]. Google, Facebook, and Amazon are
also recognized for their broad spectrum of employed database systems. While
there is no shortage of polyglot persistence architectures in practice, little research
has gone into addressing the problem of how to design, implement, and maintain
polyglot persistence architectures.
Object-relational (OR) and object-document (OD) mappers are important classes
of tools that limit vendor lock-in and minimize impedance mismatch [Mai90,
Amb12]. By abstracting from implementation details of database systems, they
facilitate polyglot persistence. Popular mappers are Hibernate, DataNucleus, Kundera, EclipseLink, OpenJPA, Entity Framework, Active Record, Spring Data, Core
Data, Doctrine, Django, and Morphia [Ire+09, Tor+17, DeM09]. Torres et al.
[Tor+17] provide a comprehensive overview of mappers and propose a catalog of
criteria to evaluate their capabilities (e.g., metadata extraction, foreign key support
and inheritance). Störl et al. [Stö+15] reviewed mappers specifically targeted to
NoSQL databases. The authors observed that, while basic CRUD functionality
works well across all analyzed mappers, query expressiveness vastly differs. This
is a consequence of providing high-level query languages in the mapper, that
potentially cannot be mapped to the limited querying capabilities of the underlying
database system and therefore has to be emulated client-side. Also, the authors
observed that the overhead introduced by some mappers is significant, in particular
for updates and deletes. Wolf et al. [Wol+13] describe the steps required to adapt
traditional OR-mappers such as Hibernate to key-value stores. Their effort makes
3 For
an in-depth discussion of the state of the art in real-time databases, we refer to [WRG19].
164
7 Polyglot Persistence in Data Management
it obvious that there is a significant feature gap between state-of-the-art mapper
abstractions and capabilities found in low-level data stores.
Multi-model databases address polyglot persistence of data models and seek to
provide them in a single data store. This imposes heterogeneous requirements on
a single database system and hence implies tremendous engineering challenges.
ArangoDB [Ara] and OrientDB [Tes13] are two examples of systems that provide
main APIs for storing and querying documents, but also support graph traversal and
key-value storage. While these systems simplify operations by integrating polyglot
capabilities into single systems, there are more sophisticated solutions available for
each of the supported polyglot models. Several RDBMSs also incorporate nonrelational data models such as XML and JSON [Cro06] as a data type with SQL
extensions to modify and query its contents. The major limitation of multi-model
approaches is that the data model is only one of many requirements that necessitate
polyglot persistence (e.g., scalability and latency). Many requirements are directly
tied to replication, sharding, and query processing architectures and therefore are
very difficult to consolidate in a single system.
7.6 Summary
Modern (web) applications are complex as they need to address an increasing
number of functional and non-functional requirements. Finding, deploying, and
operating the right system—or set of systems—for a given application scenario is
therefore becoming an ever greater challenge. Ideally, application developers would
be able to express requirements as SLAs in a declarative way and then let a polyglot
database service determine the optimal mapping to actual systems in transparent
fashion. But while there already are first approaches to adapt the choice of a database
system to the actual requirements and workloads of the application, the challenge
of automating polyglot persistence is mostly unsolved as of today.
References
[ABC14] Ioannis Arapakis, Xiao Bai, and B. Barla Cambazoglu. “Impact of Response Latency
on User Behavior in Web Search”. In: Proceedings of the 37th International ACM
SIGIR Conference on Research & Development in Information Retrieval. SIGIR
’14. Gold Coast, Queensland, Australia: ACM, 2014, pp. 103–112. ISBN: 978-1-45032257-7. https://doi.org/10.1145/2600428.2609627. URL: http://doi.acm.org/10.1145/
2600428.2609627.
[Alo+04] Gustavo Alonso et al. “Web services”. In: Web Services. Springer, 2004, pp. 123–149.
[Alw] Always Encrypted (Database Engine). https://msdn.microsoft.com/en-us/library/
mt163865.aspx. (Accessed on 05/20/2017). 2017. URL: https://msdn.microsoft.com/
en-us/library/mt163865.aspx.
References
165
[Amaa] Amazon Simple Storage Service (S3). //aws.amazon.com/documentation/s3/.
(Accessed on 07/28/2017). 2017. URL: //aws.amazon.com/documentation/s3/ (visited
on 02/18/2017).
[Amb12] Scott Ambler. Agile database techniques: Effective strategies for the agile software
developer. John Wiley & Sons, 2012.
[Ara] ArangoDB. https://www.arangodb.com/documentation/. (Accessed on 05/20/2017).
2017. URL: https://www.arangodb.com/documentation/ (visited on 02/18/2017).
[Arm+11] Michael Armbrust et al. “PIQL: Success-Tolerant Query Processing in the Cloud”.
In: PVLDB 5.3 (2011), pp. 181–192. URL: http://www.vldb.org/pvldb/vol5/p181_
michaelarmbrust_vldb2012.pdf.
[ATE12] Ahmed Ali-Eldin, Johan Tordsson, and Erik Elmroth. “An adaptive hybrid elasticity
controller for cloud infrastructures”. In: 2012 IEEE Network Operations and Management Symposium, NOMS 2012, Maui, HI, USA, April 16–20, 2012. Ed. by Filip De
Turck, Luciano Paschoal Gaspary, and Deep Medhi. IEEE, 2012, pp. 204–212. https://
doi.org/10.1109/NOMS.2012.6211900.
[Aul+08] S. Aulbach et al. “Multi-tenant databases for software as a service: schema-mapping
techniques”. In: Proceedings of the 2008 ACM SIGMOD international conference on
Management of data. 2008, pp. 1195–1206. URL: http://dl.acm.org/citation.cfm?id=
1376736 (visited on 11/15/2012).
[Aul+09] Stefan Aulbach et al. “A comparison of flexible schemas for software as a service”.
In: Proceedings of the ACM SIGMOD International Conference on Management of
Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009. Ed.
by Ugur Çetintemel et al. ACM, 2009, pp. 881–888. https://doi.org/10.1145/1559845.
1559941.
[Aul+11] Stefan Aulbach et al. “Extensibility and Data Sharing in evolving multitenant
databases”. In: Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11–16, 2011, Hannover Germany. Ed. by Serge Abiteboul et
al. IEEE Computer Society, 2011, pp. 99–110. https://doi.org/10.1109/ICDE.2011.
5767872.
[Bai+12] Peter Bailis et al. Probabilistically bounded staleness for practical partial quorums.
Tech. rep. 8. 2012, pp. 776–787. URL: http://dl.acm.org/citation.cfm?id=2212359
(visited on 07/16/2014).
[Bai+14b] Peter Bailis et al. “Quantifying eventual consistency with PBS”. en. In: The VLDB
Journal 23.2 (Apr. 2014), pp. 279–302. ISSN: 1066-8888, 0949-877X. https://
doi.org/10.1007/s00778-013-0330-1. URL: http://link.springer.com/10.1007/s00778013-0330-1 (visited on 01/03/2015).
[Bak+11] J. Baker et al. “Megastore: Providing scalable, highly available storage for interactive
services”. In: Proc. of CIDR. Vol. 11. 2011, pp. 223–234.
[Bar+12] Sean Barker et al. “Cut me some slack: Latency-aware live migration for databases”.
In: Proceedings of the 15th international conference on extending database technology. ACM, 2012, pp. 432–443. URL: http://dl.acm.org/citation.cfm?id=2247647
(visited on 07/16/2014).
[Bas] The BaasBox server. https://github.com/baasbox/baasbox. (Accessed on 05/20/2017).
2017. URL: https://github.com/baasbox/baasbox (visited on 02/19/2017).
[Bas12] Salman A. Baset. “Cloud SLAs: present and future”. In: ACM SIGOPS Operating
Systems Review 46.2 (2012), pp. 57–66. URL: http://dl.acm.org/citation.cfm?id=
2331586 (visited on 01/03/2015).
[Ber+14] David Bermbach et al. “Towards an Extensible Middleware for Database Benchmarking”. In: Performance Characterization and Benchmarking. Traditional to Big Data
- 6th TPC Technology Conference, TPCTC 2014, Hangzhou, China, September 1–
5, 2014. Revised Selected Papers. Ed. by Raghunath Nambiar and Meikel Poess.
Vol. 8904. Lecture Notes in Computer Science. Springer, 2014, pp. 82–96. https://
doi.org/10.1007/978-3-319-15350-6_6.
166
7 Polyglot Persistence in Data Management
[Ber14] David Bermbach. Benchmarking Eventually Consistent Distributed Storage Systems.
eng. Karlsruhe, Baden: KIT Scientific Publishing, 2014. ISBN: 978-3-7315-0186-2
3-7315-0186-4 978-3-7315-0186-2.
[Ber15] David Bermbach. “An Introduction to Cloud Benchmarking”. In: 2015 IEEE International Conference on Cloud Engineering, IC2E 2015, Tempe, AZ, USA, March 9–13,
2015. IEEE Computer Society, 2015, p. 3. https://doi.org/10.1109/IC2E.2015.65.
[BG13] Sumita Barahmand and Shahram Ghandeharizadeh. “BG: A Benchmark to Evaluate
Interactive Social Networking Actions”. In: CIDR 2013, Sixth Biennial Conference
on Innovative Data Systems Research, Asilomar, CA, USA, January 6–9, 2013, Online
Proceedings. www.cidrdb.org, 2013. URL: http://www.cidrdb.org/cidr2013/Papers/
CIDR13_Paper93.pdf.
[BHD13] Enda Barrett, Enda Howley, and Jim Duggan. “Applying reinforcement learning
towards automating resource allocation and application scalability in the cloud”. In:
Concurrency and Computation: Practice and Experience 25.12 (2013), pp. 1656–
1674. https://doi.org/10.1002/cpe.2864.
[Bie+15] Christopher D Bienko et al. IBM Cloudant: Database as a Service Advanced Topics.
IBM Redbooks, 2015.
[BK13] David Bermbach and Jörn Kuhlenkamp. “Consistency in Distributed Storage Systems
- An Overview of Models, Metrics and Measurement Approaches”. In: Networked
Systems - First International Conference, NETYS 2013, Marrakech, Morocco, May
2–4, 2013, Revised Selected Papers. Ed. by Vincent Gramoli and Rachid Guerraoui.
Vol. 7853. Lecture Notes in Computer Science. Springer, 2013, pp. 175–189. https://
doi.org/10.1007/978-3-642-40148-0_13.
[Bod+09] Peter Bodík et al. “Statistical Machine Learning Makes Automatic Control Practical
for Internet Datacenters”. In: Proceedings of the 2009 Conference on Hot Topics in
Cloud Computing. HotCloud’09. San Diego, California: USENIX Association, 2009.
URL : http://dl.acm.org/citation.cfm?id=1855533.1855545.
[Bro+13] Nathan Bronson et al. “TAO: Facebook’s Distributed Data Store for the Social Graph.”
In: USENIX Annual Technical Conference. 2013, pp. 49–60. URL: http://dl.frz.ir/
FREE/papers-we-love/datastores/tao-facebook-distributed-datastore.pdf (visited on
09/28/2014).
[BRX13] Xiangping Bu, Jia Rao, and Cheng-Zhong Xu. “Coordinated Self-Configuration of
Virtual Machines and Appliances Using a Model-Free Learning Approach”. In: IEEE
Trans. Parallel Distrib. Syst. 24.4 (2013), pp. 681–690. https://doi.org/10.1109/TPDS.
2012.174.
[BT11] David Bermbach and Stefan Tai. “Eventual consistency: How soon is eventual?
An evaluation of Amazon S3’s consistency behavior”. In: Proceedings of the 6th
Workshop on Middleware for Service Oriented Computing, MW4SOC 2011, Lisbon,
Portugal, December 12–16, 2011. Ed. by Karl M. Göschka, Schahram Dustdar, and
Vladimir Tosic. ACM, 2011, p. 1. https://doi.org/10.1145/2093185.2093186.
[BT14] David Bermbach and Stefan Tai. “Benchmarking Eventual Consistency: Lessons
Learned from Long-Term Experimental Studies”. In: 2014 IEEE International
Conference on Cloud Engineering, Boston, MA, USA, March 11–14, 2014. IEEE
Computer Society, 2014, pp. 47–56. https://doi.org/10.1109/IC2E.2014.37.
[BZS13] David Bermbach, Liang Zhao, and Sherif Sakr. “Towards Comprehensive Measurement of Consistency Guarantees for Cloud-Hosted Data Storage Services”. In:
Performance Characterization and Benchmarking - 5th TPC Technology Conference,
TPCTC 2013, Trento, Italy, August 26, 2013, Revised Selected Papers. Ed. by
Raghunath Nambiar and Meikel Poess. Vol. 8391. Lecture Notes in Computer Science.
Springer, 2013, pp. 32–47. https://doi.org/10.1007/978-3-319-04936-6_3.
[Cal+11] Brad Calder et al. “Windows Azure Storage: a highly available cloud storage service
with strong consistency”. In: Proceedings of the Twenty-Third ACM Symposium on
Operating Systems Principles. ACM. ACM, 2011, pp. 143–157. URL: http://dl.acm.
org/citation.cfm?id=2043571 (visited on 04/16/2014).
References
167
[Cas+07] Pierre Cassier et al. System Programmer’s Guide To–Workload Manager. IBM, 2007.
[CDM11] Eddy Caron, Frédéric Desprez, and Adrian Muresan. “Pattern Matching Based
Forecast of Non-periodic Repetitive Behavior for Cloud Clients”. In: J. Grid Comput.
9.1 (2011), pp. 49–64. https://doi.org/10.1007/s10723-010-9178-4.
[Cec+11] Emmanuel Cecchet et al. “Dolly: virtualization-driven database provisioning for the
cloud”. In: ACM SIGPLAN Notices. Vol. 46. ACM, 2011, pp. 51–62. URL: http://dl.
acm.org/citation.cfm?id=1952691 (visited on 07/16/2014).
[Cha+08] Fay Chang et al. “Bigtable: A distributed storage system for structured data”. In: ACM
Transactions on Computer Systems (TOCS) 26.2 (2008), p. 4.
[CMH11] Yun Chi, Hyun Jin Moon, and Hakan Hacigümüs. “iCBS: Incremental Costbased
Scheduling under Piecewise Linear SLAs”. In: PVLDB 4.9 (2011), pp. 563–574. URL:
http://www.vldb.org/pvldb/vol4/p563-chi.pdf.
[Coo+10] Brian F. Cooper et al. “Benchmarking cloud serving systems with YCSB”. In:
Proceedings of the 1st ACM symposium on Cloud computing. ACM, 2010, pp. 143–
154. URL: http://dl.acm.org/citation.cfm?id=1807152 (visited on 11/26/2016).
[Cro06] Douglas Crockford. “JSON: Javascript object notation”. In: URL http://www.json.org
(2006).
[CS13] Emiliano Casalicchio and Luca Silvestri. “Autonomic management of cloud-based
systems: the service provider perspective”. In: Computer and Information Sciences
III. Springer, 2013, pp. 39–47.
[Cun+07] Ítalo S. Cunha et al. “Self-Adaptive Capacity Management for MultiTier Virtualized
Environments”. In: Integrated Network Management, IM 2007. 10th IFIP/IEEE
International Symposium on Integrated Network Management, Munich, Germany, 21–
25 May 2007. IEEE, 2007, pp. 129–138. https://doi.org/10.1109/INM.2007.374777.
[Cur+10] Carlo Curino et al. “Schism: a workload-driven approach to database replication and
partitioning”. In: Proceedings of the VLDB Endowment 3.1-2 (2010), pp. 48–57. URL:
http://dl.acm.org/citation.cfm?id=1920853 (visited on 01/03/2015).
[Cur+11a] Carlo Curino et al. “Relational Cloud: A Database-as-a-Service for the Cloud”. In:
Proc. of CIDR. 2011. URL: http://dspace.mit.edu/handle/1721.1/62241 (visited on
04/15/2014).
[Cur+11b] Carlo Curino et al. “Workload-aware database monitoring and consolidation”. In:
Proceedings of the ACM SIGMOD International Conference on Management of Data,
SIGMOD 2011, Athens, Greece, June 12–16, 2011. Ed. by Timos K. Sellis et al. ACM,
2011, pp. 313–324. https://doi.org/10.1145/1989323.1989357.
[DAEA13] Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. “ElasTraS: An elastic, scalable,
and self-managing transactional database for the cloud”. en. In: ACM Transactions on
Database Systems 38.1 (Apr. 2013), pp. 1–45. ISSN: 03625915. https://doi.org/10.
1145/2445583.2445588. URL: http://dl.acm.org/citation.cfm?doid=2445583.2445588
(visited on 11/25/2016).
[Das+11] Sudipto Das et al. “Albatross: lightweight elasticity in shared storage databases
for the cloud using live data migration”. In: Proceedings of the VLDB Endowment
4.8 (2011), pp. 494–505. URL: http://dl.acm.org/citation.cfm?id=2002977 (visited on
07/16/2014).
[Dat] Google Cloud Datastore. https://cloud.google.com/datastore/docs/concepts/overview.
(Accessed on 05/20/2017). 2017. URL: https://cloud.google.com/datastore/docs/
concepts/overview (visited on 02/18/2017).
[DB13] Regine Dörbecker and Tilo Böhmann. “The Concept and Effects of Service Modularity - A Literature Review”. In: 46th Hawaii International Conference on System
Sciences, HICSS 2013, Wailea, HI, USA, January 7–10, 2013. IEEE Computer
Society, 2013, pp. 1357–1366. https://doi.org/10.1109/HICSS.2013.22.
[DeM09] Linda DeMichiel. “JSR 317: Java Persistence 2.0”. In: Java Community Process, Tech.
Rep (2009).
168
7 Polyglot Persistence in Data Management
[Dep] Deployd: a toolkit for building realtime APIs https://github.com/deployd/deployd.
(Accessed on 05/20/2017). 2017. URL: https://github.com/deployd/deployd (visited
on 02/19/2017).
[Dey+14] Anamika Dey et al. “YCSB+T: Benchmarking web-scale transactional databases”. In:
Data Engineering Workshops (ICDEW), 2014 IEEE 30th International Conference
on. IEEE. 2014, pp. 223–230.
[Dey15] Akon Samir Dey. “Cherry Garcia: Transactions across Heterogeneous Data Stores”.
In: (2015).
[Dow98] Troy Bryan Downing. Java RMI: remote method invocation. IDG Books Worldwide,
Inc., 1998.
[Dut+10] Xavier Dutreilh et al. “From Data Center Resource Allocation to Control Theory
and Back”. In: IEEE International Conference on Cloud Computing, CLOUD 2010,
Miami, FL, USA, 5–10 July, 2010. IEEE Computer Society, 2010, pp. 410–417. https://
doi.org/10.1109/CLOUD.2010.55.
[Dyn] DynamoDB. http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/
Introduction.html. (Accessed on 05/20/2017). 2017. URL: http://docs.aws.
amazon.com/amazondynamodb/latest/developerguide/Introduction.html
(visited
on 01/13/2017).
[Elm+11] Aaron J. Elmore et al. “Zephyr: live migration in shared nothing databases for
elastic cloud platforms”. In: Proceedings of the 2011 ACM SIGMOD International
Conference on Management of data. ACM, 2011, pp. 301–312. URL: http://dl.acm.
org/citation.cfm?id=1989356 (visited on 07/16/2014).
[EW16] Michael Egorov and MacLane Wilkison. “ZeroDB white paper”. In: arXiv preprint
arXiv:1602.07168 (2016).
[Fan+12] Wei Fang et al. “RPPS: A Novel Resource Prediction and Provisioning Scheme
in Cloud Data Center”. In: 2012 IEEE Ninth International Conference on Services
Computing, Honolulu, HI, USA, June 24–29, 2012. Ed. by Louise E. Moser, Manish
Parashar, and Patrick C. K. Hung. IEEE Computer Society, 2012, pp. 609–616. https://
doi.org/10.1109/SCC.2012.47.
[Fie+99] R. Fielding et al. “RFC 2616: Hypertext Transfer ProtocolâHTTP/1.1, 1999”. In:
URL http://www.rfc.net/rfc2616.html (1999).
[Fie00] R. T Fielding. “Architectural styles and the design of network-based software
architectures”. PhD thesis. Citeseer, 2000.
[Fio+13] Alessandro Gustavo Fior et al. “Under Pressure Benchmark for DDBMS Availability”. In: JIDM 4.3 (2013), pp. 266–278. URL: http://seer.lcc.ufmg.br/index.php/jidm/
article/view/249.
[Fit04] Brad Fitzpatrick. “Distributed caching with Memcached”. In: Linux journal 2004.124
(2004), p. 5.
[Fri+14] Steffen Friedrich et al. “NoSQL OLTP Benchmarking: A Survey”. In: 44. Jahrestagung der Gesellschaft fÃijr Informatik, Informatik 2014, Big Data KomplexitÃd’t meistern, 22.–26. September 2014 in Stuttgart, Deutschland. Ed. by Erhard
PlÃüdereder et al. Vol. 232. LNI. GI, 2014, pp. 693–704. ISBN: 978-3-88579-6268.
[FWR17] Steffen Friedrich, Wolfram Wingerath, and Norbert Ritter. “Coordinated Omission in
NoSQL Database Benchmarking”. In: Datenbanksysteme für Business, Technologie
und Web (BTW 2017), 17. Fachtagung des GI-Fachbereichs, Datenbanken und
Informationssysteme” (DBIS), 6.–10. März 2017, Stuttgart, Germany, Workshopband.
Ed. by Bernhard Mitschang et al. Vol. P-266. LNI. GI, 2017, pp. 215–225.
[Gda] Google Data APIs. https://developers.google.com/gdata/. (Accessed on 05/26/2017).
2017. URL: https://developers.google.com/gdata/ (visited on 02/17/2017).
[Gen09] Craig Gentry. “A fully homomorphic encryption scheme”. PhD thesis. Stanford
University, 2009.
[Ges+17] Felix Gessert et al. “Quaestor: Query Web Caching for Database-as-a-Service
Providers”. In: Proceedings of the VLDB Endowment (2017).
References
169
[Ges19] Felix Gessert. “Low Latency for Cloud Data Management”. PhD thesis. University
of Hamburg, Germany, 2019. URL: http://ediss.sub.uni-hamburg.de/volltexte/2019/
9541/.
[GGW10] Zhenhuan Gong, Xiaohui Gu, and John Wilkes. “PRESS: PRedictive Elastic ReSource
Scaling for cloud systems”. In: Proceedings of the 6th International Conference on
Network and Service Management, CNSM 2010, Niagara Falls, Canada, October 25–
29, 2010. IEEE, 2010, pp. 9–16. https://doi.org/10.1109/CNSM.2010.5691343.
[Gha+11] Hamoun Ghanbari et al. “Exploring alternative approaches to implement an elasticity
policy”. In: Cloud Computing (CLOUD), 2011 IEEE International Conference on.
IEEE. 2011, pp. 716–723.
[GKA09] Ajay Gulati, Chethan Kumar, and Irfan Ahmad. “Storage workload characterization
and consolidation in virtualized environments”. In: Workshop on Virtualization Performance: Analysis, Characterization, and Tools (VPACT). Citeseer. 2009.
[GLS11] Wojciech Golab, Xiaozhou Li, and Mehul A. Shah. “Analyzing consistency properties
for fun and profit”. In: ACM PODC. ACM, 2011, pp. 197–206. URL: http://dl.acm.org/
citation.cfm?id=1993834 (visited on 09/28/2014).
[Gri13] Ilya Grigorik. High performance browser networking. English. [S.l.]: O’Reilly Media,
2013. ISBN: 1-4493-4476-3 978-1-4493-4476-4. URL: https://books.google.de/books?
id=tf-AAAAQBAJ.
[Gul+12] Ajay Gulati et al. “Workload dependent IO scheduling for fairness and efficiency in
shared storage systems”. In: 19th International Conference on High Performance
Computing, HiPC 2012, Pune, India, December 18–22, 2012. IEEE Computer
Society, 2012, pp. 1–10. https://doi.org/10.1109/HiPC.2012.6507480.
[Han+12] Rui Han et al. “Lightweight Resource Scaling for Cloud Applications”. In: 12th
IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid
2012, Ottawa, Canada, May 13–16, 2012. IEEE Computer Society, 2012, pp. 644–
651. https://doi.org/10.1109/CCGrid.2012.52.
[Has+12] Masum Z. Hasan et al. “Integrated and autonomic cloud resource scaling”. In: 2012
IEEE Network Operations and Management Symposium, NOMS 2012, Maui, HI,
USA, April 16–20, 2012. Ed. by Filip De Turck, Luciano Paschoal Gaspary, and Deep
Medhi. IEEE, 2012, pp. 1327–1334. https://doi.org/10.1109/NOMS.2012.6212070.
[Has17] Mazdak Hashemi. The Infrastructure Behind Twitter: Scale. https://blog.twitter.
com/2017/the-infrastructure-behind-twitter-scale. (Accessed on 05/25/2017). 2017.
URL : https://blog.twitter.com/2017/the-infrastructure-behind-twitter-scale (visited on
02/18/2017).
[Hba] HBase. http://hbase.apache.org/. (Accessed on 05/25/2017). 2017. URL: http://hbase.
apache.org/ (visited on 07/16/2014).
[HIM02] H. Hacigumus, B. Iyer, and S. Mehrotra. “Providing database as a service”. In:
Data Engineering, 2002. Proceedings. 18th International Conference on. 2002,
pp. 29–38. URL: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=994695 (visited
on 10/16/2012).
[Hoo] GitHub - hoodiehq/hoodie: A backend for Offline First applications. https://github.
com/hoodiehq/hoodie. (Accessed on 05/25/2017). 2017. URL: https://github.com/
hoodiehq/hoodie (visited on 02/17/2017).
[HS16] Stephan Hochhaus and Manuel Schoebel. Meteor in action. Manning Publ., 2016.
[HTV10] T. Haselmann, G. Thies, and G. Vossen. “Looking into a REST-Based Universal
API for Database-as-a-Service Systems”. In: CEC. 2010, pp. 17–24. URL: http://
ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5708388 (visited on 10/15/2012).
[Ire+09] Christopher Ireland et al. “A Classification of Object-Relational Impedance Mismatch”. In: IEEE, 2009, pp. 36–43. ISBN: 978-1-4244-3467-1. https://doi.org/10.
1109/DBKDA.2009.11. URL: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?
arnumber=5071809 (visited on 01/03/2015).
170
7 Polyglot Persistence in Data Management
[Isl+12] Sadeka Islam et al. “Empirical prediction models for adaptive resource provisioning
in the cloud”. In: Future Generation Comp. Syst. 28.1 (2012), pp. 155–162. https://
doi.org/10.1016/j.future.2011.05.027.
[JA07] Dean Jacobs and Stefan Aulbach. “Ruminations on Multi-Tenant Databases”. In:
Datenbanksysteme in Business, Technologie und Web (BTW 2007), 12. Fachtagung
des GI-Fachbereichs “Datenbanken und Informationssysteme” (DBIS), Proceedings,
7.–9. März 2007, Aachen, Germany. Ed. by Alfons Kemper et al. Vol. 103. LNI.
GI, 2007, pp. 514–521. URL: http://subs.emis.de/LNI/Proceedings/Proceedings103/
article1419.html.
[Kar+16] Nikolaos Karapanos et al. “Verena: End-to-End Integrity Protection for Web Applications”. In: IEEE Symposium on Security and Privacy, SP 2016, San Jose, CA, USA,
May 22–26, 2016. IEEE Computer Society, 2016, pp. 895–913. https://doi.org/10.
1109/SP.2016.58.
[KF11] Pawel Koperek and Wlodzimierz Funika. “Dynamic Business Metrics-driven
Resource Provisioning in Cloud Environments”. In: Parallel Processing and Applied
Mathematics - 9th International Conference, PPAM 2011, Torun, Poland, September
11–14, 2011. Revised Selected Papers, Part II. Ed. by Roman Wyrzykowski et al.
Vol. 7204. Lecture Notes in Computer Science. Springer, 2011, pp. 171–180. https://
doi.org/10.1007/978-3-642-31500-8_18.
[Kim+16] In Kee Kim et al. “Empirical Evaluation of Workload Forecasting Techniques for
Predictive Cloud Resource Scaling”. In: 9th IEEE International Conference on Cloud
Computing, CLOUD 2016, San Francisco, CA, USA, June 27 - July 2, 2016. IEEE
Computer Society 2016, pp. 1–10. https://doi.org/10.1109/CLOUD.2016.0011.
[KJH15] Jens Köhler, Konrad Jünemann, and Hannes Hartenstein. “Confidential database-asa-service approaches: taxonomy and survey”. In: Journal of Cloud Computing 4.1
(2015), p. 1. ISSN: 2192-113X. https://doi.org/10.1186/s13677-014-0025-1.
[KL11] Tim Kiefer and Wolfgang Lehner. “Private Table Database Virtualization for DBaaS”.
In: IEEE 4th International Conference on Utility and Cloud Computing, UCC 2011,
Melbourne, Australia, December 5–8, 2011. IEEE Computer Society, 2011, pp. 328–
329. https://doi.org/10.1109/UCC.2011.52.
[Kle17] Martin Kleppmann. Designing Data-Intensive Applications. English. 1 edition.
O’Reilly Media, Jan. 2017. ISBN: 978-1-4493-7332-0.
[Kri13] Raffi Krikorian. Timelines at Scale. http://infoq.com/presentations/Twitter-TimelineScalability. (Accessed on 04/30/2017). 2013. URL: http://infoq.com/presentations/
Twitter-Timeline-Scalability.
[Lac16] Kevin Lacker. “Moving On”. In: Parse Blog (Jan. 2016). Accessed on 12/09/2017.
URL : http://blog.parseplatform.org/announcements/moving-on/.
[Lan+12] Willis Lang et al. “Towards Multi-tenant Performance SLOs”. In: IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA
(Arlington, Virginia), 1–5 April, 2012. Ed. by Anastasios Kementsietsidis and Marcos
Antonio Vaz Salles. IEEE Computer Society, 2012, pp. 702–713.https://doi.org/10.
1109/ICDE.2012.101.
[LBMAL14] Tania Lorido-Botran, Jose Miguel-Alonso, and JoseA. Lozano. “A Review of Autoscaling Techniques for Elastic Applications in Cloud Environments”. English. In:
Journal of Grid Computing 12.4 (2014), pp. 559–592. ISSN: 1570–7873. https://doi.
org/10.1007/s10723-014-9314-7.
[Leb08] Scott Leberknight. Polyglot Persistence. http://www.sleberknight.com/blog/sleberkn/
entry/polyglot_persistence. (Accessed on 04/30/2017). 2008. URL: http://www.
sleberknight.com/blog/sleberkn/entry/polyglot_persistence.
[Len02] Maurizio Lenzerini. “Data Integration: A Theoretical Perspective”. In: Proceedings
of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of
Database Systems, June 3–5, Madison, Wisconsin, USA. Ed. by Lucian Popa, Serge
Abiteboul, and Phokion G. Kolaitis. ACM, 2002, pp. 233–246. https://doi.org/10.
1145/543613.543644.
References
171
[LM10] Avinash Lakshman and Prashant Malik. “Cassandra: a decentralized structured
storage system”. In: ACM SIGOPS Operating Systems Review 44.2 (2010), pp. 35–
40. URL: http://dl.acm.org/citation.cfm?id=1773922 (visited on 04/15/2014).
[LS13] Wolfgang Lehner and Kai-Uwe Sattler. Web-Scale Data Management for the Cloud.
Englisch. Auflage: 2013. New York: Springer, Apr 2013. ISBN: 978-1-4614-6855-4.
[Mad+15] Gabor Madl et al. “Account clustering in multi-tenant storage management environments”. In: 2015 IEEE International Conference on Big Data, Big Data 2015, Santa
Clara, CA, USA, October 29 - November 1, 2015. IEEE, 2015, pp. 1698–1707. https://
doi.org/10.1109/BigData.2015.7363941.
[Mai90] David Maier. “Representing database programs as objects”. In: Advances in database
programming languages. ACM. 1990, pp. 377–386.
[MBS11] Michael Maurer, Ivona Brandic, and Rizos Sakellariou. “Enacting SLAs in clouds
using rules”. In: European Conference on Parallel Processing. Springer. 2011,
pp. 455–466.
[MP17] Ryan Marcus and Olga Papaemmanouil. “Releasing Cloud Databases from the Chains
of Performance Prediction Models”. In: CIDR. 2017.
[New15] Sam Newman. Building microservices - designing fine-grained systems, 1st Edition. O’Reilly, 2015. ISBN: 9781491950357. URL: http://www.worldcat.org/oclc/
904463848.
[Oda] OData - open data protocol. http://www.odata.org/. (Accessed on 06/05/2017). 2017.
URL : http://www.odata.org/ (visited on 02/17/2017).
[Pad+07] Pradeep Padala et al. “Adaptive control of virtualized resources in utility computing
environments”. In: Proceedings of the 2007 EuroSys Conference, Lisbon, Portugal,
March 21–23, 2007. Ed. by Paulo Ferreira, Thomas R. Gross, and Luís Veiga. ACM,
2007, pp. 289–302. https://doi.org/10.1145/1272996.1273026.
[Pad+09] Pradeep Padala et al. “Automated control of multiple virtualized resources”. In:
Proceedings of the 4th ACM European conference on Computer systems. ACM, 2009,
pp. 13–26. URL: http://dl.acm.org/citation.cfm?id=1519068 (visited on 07/16/2014).
[Par] Parse Server. http://parseplatform.github.io/docs/parse-server/guide/. (Accessed on
07/28/2017). 2017. URL: http://parseplatform.github.io/docs/parse-server/guide/ (visited on 02/19/2017).
[Pat+11] Swapnil Patil et al. “YCSB++: benchmarking and performance debugging advanced
features in scalable table stores”. In: ACM Symposium on Cloud Computing in
conjunction with SOSP 2011, SOCC ’11, Cascais, Portugal, October 26–28, 2011.
Ed. by Jeffrey S. Chase and Amr El Abbadi. ACM, 2011, p. 9. https://doi.org/10.
1145/2038916.2038925.
[PF00] Meikel Pöss and Chris Floyd. “New TPC Benchmarks for Decision Support and Web
Commerce”. In: SIGMOD Record 29.4 (2000), pp. 64–71. https://doi.org/10.1145/
369275.369291.
[PH09] Sang-Min Park and Marty Humphrey. “Self-Tuning Virtual Machines for Predictable
eScience”. In: 9th IEEE/ACM International Symposium on Cluster Computing and
the Grid, CCGrid 2009, Shanghai, China, 18–21 May 2009. Ed. by Franck Cappello,
Cho-Li Wang, and Rajkumar Buyya. IEEE Computer Society, 2009, pp. 356–363.
https://doi.org/10.1109/CCGRID.2009.84.
[PN09] Radu Prodan and Vlad Nae. “Prediction-based real-time resource provisioning for
massively multiplayer online games”. In: Future Generation Comp. Syst. 25.7 (2009),
pp. 785–793. https://doi.org/10.1016/j.future.2008.11.002.
[Pop+11] R. A. Popa et al. “CryptDB: protecting confidentiality with encrypted query processing”. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems
Principles. 00095. 2011, pp. 85–100. URL: http://dl.acm.org/citation.cfm?id=2043566
(visited on 11/16/2012).
172
7 Polyglot Persistence in Data Management
[Pop+14] Raluca Ada Popa et al. “Building Web Applications on Top of Encrypted Data Using
Mylar”. In: Proceedings of the 11th USENIX Symposium on Networked Systems
Design and Implementation, NSDI2014, Seattle, WA, USA, April 2–4, 2014. Ed. by
Ratul Mahajan and Ion Stoica. USENIX Association, 2014, pp. 157–172. URL: https://
www.usenix.org/conference/nsdi14/technical-sessions/presentation/popa.
[Pop14] Raluca Ada Popa. “Building practical systems that compute on encrypted data”. PhD
thesis. Massachusetts Institute of Technology, 2014.
[Pos] PostgreSQL: Documentation: 9.6: High Availability, Load Balancing, and Replication. https://www.postgresql.org/docs/9.6/static/high-availability.html. (Accessed on
07/28/2017). 2017. URL: https://www.postgresql.org/docs/9.6/static/high-availability.
html (visited on 02/04/2017).
[PZ13] Raluca A. Popa and Nickolai Zeldovich. “Multi-Key Searchable Encryption”. In:
IACR Cryptology ePrint Archive 2013 (2013), p. 508. URL: http://eprint.iacr.org/2013/
508.
[Rah+12] Muntasir Raihan Rahman et al. “Toward a Principled Framework for Benchmarking
Consistency”. In: CoRR abs/1211.4290 (2012). URL: http://arxiv.org/abs/1211.4290.
[Ria] Riak. http://basho.com/products/. (Accessed on 05/25/2017). 2017. URL: http://basho.
com/products/ (visited on 01/13/2017).
[Rob16] Mike Roberts. Serverless Architectures. https://martinfowler.com/articles/serverless.
html. (Accessed on 07/28/2017). 2016. URL: https://martinfowler.com/articles/
serverless.html (visited on 02/19/2017).
[Sak14] Sherif Sakr. “Cloud-hosted databases: technologies, challenges and opportunities”. In:
Cluster Computing 17.2 (2014), pp. 487–502. URL: http://link.springer.com/article/10.
1007/s10586-013-0290-7 (visited on 07/16/2014).
[San17] Salvatore Sanfilippo. Redis. http://redis.io/. (Accessed on 07/16/2017). 2017. URL:
http://redis.io/ (visited on 09/02/2015).
[Sch+16] Michael Schaarschmidt et al. “Learning Runtime Parameters in Computer Systems
with Delayed Experience Injection”. In: Deep Reinforcement Learning Workshop,
NIPS 2016. 2016.
[Sch16] Peter Schuller. “Manhattan, our real-time, multi-tenant distributed database for Twitter
scale”. In: Twitter Blog (2016).
[SF12] Pramod J. Sadalage and Martin Fowler. NoSQL distilled: a brief guide to the emerging
world of polyglot persistence. Pearson Education, 2012.
[SGR15] Michael Schaarschmidt, Felix Gessert, and Norbert Ritter. “Towards Automated Polyglot Persistence”. In: Datenbanksysteme fÃijr Business, Technologie und Web (BTW),
16. Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme”. 2015.
[She+11] Zhiming Shen et al. “CloudScale: elastic resource scaling for multitenant cloud
systems”. In: ACM Symposium on Cloud Computing in conjunction with SOSP 2011,
SOCC ’11, Cascais, Portugal, October 26–28, 2011. Ed. by Jeffrey S. Chase and Amr
El Abbadi. ACM, 2011, p. 5. https://doi.org/10.1145/2038916.2038921.
[SKM08] Aameek Singh, Madhukar R. Korupolu, and Dushmanta Mohapatra. “Server-storage
virtualization: integration and load balancing in data centers”. In: Proceedings of the
ACM/IEEE Conference on High Performance Computing, SC 2008, November 15–21,
2008, Austin, Texas, USA. IEEE/ACM, 2008, p. 53. https://doi.org/10.1145/1413370.
1413424.
[SL12] Sherif Sakr and Anna Liu. “SLA-Based and Consumer-centric Dynamic Provisioning
for Cloud Databases”. In: 2012 IEEE Fifth International Conference on Cloud
Computing, Honolulu, HI, USA, June 24–29, 2012. Ed. by Rong Chang. IEEE
Computer Society, 2012, pp. 360–367. https://doi.org/10.1109/CLOUD.2012.11.
[Sou+09] Gokul Soundararajan et al. “Dynamic Resource Allocation for Database Servers
Running on Virtual Storage”. In: 7th USENIX Conference on File and Storage
Technologies, February 24–27, 2009, San Francisco, CA, USA. Proceedings. Ed. by
Margo I. Seltzer and Richard Wheeler. USENIX, 2009, pp. 71–84. URL: http://www.
usenix.org/events/fast09/tech/full_papers/soundararajan/soundararajan.pdf.
References
173
[Stö+15] Uta Störl et al. “Schemaless NoSQL Data Stores Object-NoSQL Mappers to the
Rescue?” In: Datenbanksysteme für Business, Technologie und Web (BTW), 16.
Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme” (DBIS),
4.-6.3.2015 in Hamburg, Germany. Proceedings. Ed. by Thomas Seidl et al.
Vol. 241. LNI. GI, 2015, pp. 579–599. URL: http://subs.emis.de/LNI/Proceedings/
Proceedings241/article13.html (visited on 03/10/2015).
[Ter+13] Douglas B. Terry et al. “Consistency-based service level agreements for cloud
storage”. In: ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP
’13, Farmington, PA, USA, November 3–6, 2013. Ed. by Michael Kaminsky and Mike
Dahlin. ACM, 2013, pp. 309–324. https://doi.org/10.1145/2517349.2522731.
[Tes+06] G. Tesauro et al. “A Hybrid Reinforcement Learning Approach to Autonomic
Resource Allocation”. In: Proceedings of the 2006 IEEE International Conference on
Autonomic Computing. ICAC ’06. Washington, DC, USA: IEEE Computer Society,
2006, pp. 65–73. ISBN: 1-4244-0175-5. https://doi.org/10.1109/ICAC.2006.1662383.
[Tes13] Claudio Tesoriero. Getting Started with OrientDB. Packt Publishing Ltd, 2013.
[Tor+17] Alexandre Torres et al. “Twenty years of object-relational mapping: A survey on
patterns, solutions, and their implications on application design”. In: Information and
Software Technology 82 (2017), pp. 1–18.
[Urg+08] Bhuvan Urgaonkar et al. “Agile dynamic provisioning of multi-tier Internet applications”. In: TAAS 3.1 (2008), 1:1–1:39. https://doi.org/10.1145/1342171.1342172.
[Use] Apache Usergrid. https://usergrid.apache.org/. (Accessed on 07/16/2017). 2017. URL:
https://usergrid.apache.org/ (visited on 02/19/2017).
[VPR07] Daniel A. Villela, Prashant Pradhan, and Dan Rubenstein. “Provisioning servers in the
application tier for e-commerce systems”. In: ACM Trans. Internet Techn. 7.1 (2007),
p. 7. https://doi.org/10.1145/1189740.1189747.
[Wad+11] Hiroshi Wada et al. “Data Consistency Properties and the Tradeoffs in Commercial
Cloud Storage: the Consumers’ Perspective”. In: CIDR 2011, Fifth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 9–12, 2011,
Online Proceedings. www.cidrdb.org, 2011, pp. 134–143. URL: http://www.cidrdb.
org/cidr2011/Papers/CIDR11_Paper15.pdf.
[Wan16] Mengyan Wang. “Parse LiveQuery Protocol Specification”. In: GitHub:
ParsePlatform/parse-server (Mar. 2016). Accessed on 12/14/2017. URL: https://
github.com/parse-community/parse-server/wiki/Parse-LiveQuery-ProtocolSpecification.
[WB09] Craig D. Weissman and Steve Bobrowski. “The design of the force.com multitenant
internet application development platform”. In: Proceedings of the ACM SIGMOD
International Conference on Management of Data, SIGMOD 2009, Providence,
Rhode Island, USA, June 29 - July 2, 2009. Ed. by Ugur Çetintemel et al. ACM,
2009, pp. 889–896. https://doi.org/10.1145/1559845.1559942.
[WGW+20] Wolfram Wingerath, Felix Gessert, Erik Witt, et al. “Speed Kit: A Polyglot & GDPRCompliant Approach For Caching Personalized Content”. In: 36th IEEE International
Conference on Data Engineering, ICDE 2020, Dallas, Texas, April 20–24, 2020.
2020.
[Whi15] Tom White. Hadoop - The Definitive Guide: Storage and Analysis at Internet Scale
(4. ed., revised & updated). O’Reilly, 2015. ISBN: 978-1-491-90163-2. URL: http://
www.oreilly.de/catalog/9781491901632/index.html.
[Win+15] Wolfram Wingerath et al. “Who Watches the Watchmen? On the Lack of Validation
in NoSQL Benchmarking”. In: Datenbanksysteme fÃijr Business, Technologie und
Web (BTW), 16. Fachtagung des GI-Fachbereichs “Datenbanken und Informationssysteme”. 2015.
[Wol+13] Florian Wolf et al. “Hibernating in the Cloud-Implementation and Evaluation of
Object-NoSQL-Mapping.” In: BTW. Citeseer, 2013, pp. 327–341.
[WRG19] Wolfram Wingerath, Norbert Ritter, and Felix Gessert. Real-Time & Stream Data
Management: Push-Based Data in Research & Practice. Ed. by Susan Evans. Springer
174
7 Polyglot Persistence in Data Management
International Publishing, 2019. ISBN: 978-3-030-10554-9. https://doi.org/10.1007/
978-3-030-10555-6.
[Xio+11] P. Xiong et al. “ActiveSLA: A profit-oriented admission control framework for
database-as-a-service providers”. In: Proceedings of the 2nd ACM Symposium on
Cloud Computing. 00019. ACM, 2011, p. 15. URL: http://dl.acm.org/citation.cfm?id=
2038931 (visited on 11/15/2012).
[XRB12] Cheng-Zhong Xu, Jia Rao, and Xiangping Bu. “URL: A unified reinforcement
learning approach for autonomic cloud management”. In: J. Parallel Distrib. Comput.
72.2 (2012), pp. 95–105. https://doi.org/10.1016/j.jpdc.2011.10.003.
[Xu+07] Jing Xu et al. “On the Use of Fuzzy Modeling in Virtualized Data Center Management”. In: Fourth International Conference on Autonomic Computing (ICAC’07),
Jacksonville, Florida, USA, June 11–15, 2007. IEEE Computer Society, 2007, p. 25.
https://doi.org/10.1109/ICAC.2007.28.
[Zah+10] Matei Zaharia et al. “Spark: cluster computing with working sets”. In: Proceedings of
the 2nd USENIX conference on Hot topics in cloud computing. 2010, pp. 10–10. URL:
http://static.usenix.org/legacy/events/hotcloud1/tech/full_papers/Zaharia.pdf (visited
on 01/03/2015).
[ZCS07] Qi Zhang, Ludmila Cherkasova, and Evgenia Smirni. “A Regression-Based Analytic
Model for Dynamic Resource Provisioning of Multi-Tier Applications”. In: Fourth
International Conference on Autonomic Computing (ICAC’07), Jacksonville, Florida,
USA, June 11–15, 2007. IEEE Computer Society, 2007, p. 27. https://doi.org/10.1109/
ICAC.2007.1.
[Zha+14] Liang Zhao et al. Cloud Data Management. Englisch. Auflage: 2014. Springer, 2014.
[IET15] IETF. “RFC 7540 - Hypertext Transfer Protocol Version 2 (HTTP/2)”. In: (2015).
Chapter 8
The NoSQL Toolbox: The NoSQL
Landscape in a Nutshell
In this chapter, we highlight the design space of distributed database systems,
dividing it by the four dimensions sharding, replication, storage management, and
query processing. The goal is to provide a comprehensive set of data management
requirements that have to be considered for designing a flexible backend for globally
distributed web applications. Therefore, we survey the implementation techniques
of systems and discuss how they are related to different functional and nonfunctional properties (goals) of data management systems.
Every significantly successful database is designed for a particular class of
applications, or to achieve a specific combination of desirable system properties.
The simple reason why there are so many different database systems is that it is
not possible for any system to achieve all desirable properties at once. Traditional
relational databases such as PostgreSQL have been built to provide the full
functional package: a very flexible data model, sophisticated querying capabilities
including joins, global integrity constraints, and transactional guarantees. On the
other end of the design spectrum, there are key-value stores like Dynamo that scale
with data and request volume and offer high read and write throughput as well as
low latency, but barely any functionality apart from simple lookups.
In order to illustrate which techniques are suitable to achieve specific system
properties, we provide the NoSQL Toolbox (Fig. 8.1) that connects each technique
to the functional and non-functional properties it enables (positive edges only). In
the following, we will review each of the four major categories of techniques in
scalable data management: sharding, replication, storage management, and query
processing.
© Springer Nature Switzerland AG 2020
F. Gessert et al., Fast and Scalable Cloud Data Management,
https://doi.org/10.1007/978-3-030-43506-6_8
175
176
8 The NoSQL Toolbox: The NoSQL Landscape in a Nutshell
Fig. 8.1 The NoSQL Toolbox: It connects the techniques of NoSQL databases with the desired
functional and non-functional system properties they support
8.1 Sharding
Several distributed relational database systems such as Oracle RAC or IBM DB2
pureScale rely on a shared-disk architecture where all database nodes access the
same central data repository (e.g., a NAS or SAN). Thus, these systems provide
consistent data at all times, but are also inherently difficult to scale. In contrast,
the (NoSQL) database systems in the focus of this dissertation are built upon a
shared-nothing architecture, meaning each system consists of many servers with
private memory and private disks that are connected through a network. Thus, high
scalability in throughput and data volume is achieved by sharding (partitioning)
data across different nodes (shards) in the system.
There are three basic distribution techniques: range partitioning, hash partitioning, and entity-group sharding.
8.1.1 Range Partitioning
To make efficient scans possible, data can be partitioned into ordered and contiguous
value ranges by range-sharding. However, this approach requires some coordina-
8.1 Sharding
177
tion through a master that manages assignments. To ensure elasticity, the system
has to be able to detect and resolve hotspots automatically by further splitting an
overburdened shard.
Range sharding is supported by wide-column stores like BigTable, HBase or
Hypertable [Wie15] and document stores, e.g., MongoDB, RethinkDB, Espresso
[Qia+13] and DocumentDB [STR+15].
8.1.2 Hash Partitioning
Another way to partition data over several machines is hash-sharding where every
data item is assigned to a shard server according to some hash value built from
the primary key. This approach does not require a coordinator and also guarantees
data to be evenly distributed across the shards, as long as the used hash function
produces an even distribution. The obvious disadvantage, though, is that it only
allows lookups and makes scans impossible. Hash sharding is used in key-value
stores and is also available in some wide-column stores like Cassandra [LM10] or
Azure Tables [Cal+11].
The shard server that is responsible for a record can be determined as serverid =
hash(id) mod servers, for example. However, this hashing scheme requires all
records to be reassigned every time a new server joins or leaves, because it changes
with the number of shard servers (servers). Consequently, it is infeasible to use in
elastic systems like Dynamo, Riak, or Cassandra, which allow additional resources
to be added on-demand and again be removed when dispensable. For increased
flexibility, elastic systems typically use consistent hashing [Kar+97] where records
are not directly assigned to servers, but instead to logical partitions which are
then distributed across all shard servers. Thus, only a fraction of data has to be
reassigned upon changes in the system topology. For example, an elastic system can
be downsized by offloading all logical partitions residing on a particular server to
other servers and then shutting down the now idle machine. For details on how
consistent hashing is used in NoSQL systems, please refer to DeCandia et al.
[DeC+07].
8.1.3 Entity-Group Sharding
A data partitioning scheme with the goal of enabling single-partition transactions
on co-located data is entity-group sharding. Partitions are called entity-groups
and either explicitly declared by the application (e.g., in G-Store [DAEA10]
and MegaStore [Bak+11]) or derived from transactions’ access patterns (e.g., in
Relational Cloud [Cur+11a] and Cloud SQL Server [Ber+11]). If a transaction
accesses data that spans more than one group, data ownership can be transferred
178
8 The NoSQL Toolbox: The NoSQL Landscape in a Nutshell
between entity-groups or the transaction manager has to fall back to more expensive
multi-node transaction protocols.
8.2 Replication
In terms of CAP (cf. Sect. 4.3), conventional RDBMSs are often CA systems run
in single-server mode: the entire system becomes unavailable on machine failure.
System operators therefore secure data integrity and availability through expensive,
but reliable high-end hardware. In contrast, NoSQL systems like Dynamo, BigTable,
or Cassandra are designed for data and request volumes that cannot possibly
be handled by one single machine, and therefore run on clusters consisting of
potentially thousands of servers.1 Since failures are inevitable and will occur
frequently in any large-scale, distributed system, the software has to cope with them
on a daily basis [Ham07]. In 2009, Dean [Dea09] stated that a typical new cluster at
Google encounters thousands of hard drive failures, 1000 single-machine failures,
20 rack failures and several network partitions due to expected and unexpected
circumstances in its first year alone. Many more recent cases of network partitions
and outages in large cloud data centers have been reported [BK14]. Replication
allows the system to maintain availability and durability in the face of such errors.
But storing the same records on different machines (replica servers) in the cluster
introduces the problem of synchronization between them and thus a trade-off
between consistency on the one hand and latency and availability on the other.
Gray et al. [GHa+96] propose a two-tier classification of different replication
strategies according to when updates are propagated to replicas and where updates
are accepted. There are two possible choices on tier one (“when”): eager (synchronous) replication propagates incoming changes synchronously to all replicas
before a commit can be returned to the client, whereas lazy (asynchronous)
replication applies changes only at the receiving replica and passes them on
asynchronously. The great advantage of eager replication is consistency among
replicas, but it comes at the cost of higher write latency and impaired availability due
to the need to wait for other replicas [GHa+96]. Lazy replication is faster, because
it allows replicas to diverge. As a consequence, though, stale data might be served.
On the second tier (“where”), again, two different approaches are possible: either a
master-slave (primary copy) scheme is pursued where changes can only be accepted
by one replica (the master) or, in a update anywhere (multi-master) approach,
every replica can accept writes. In master-slave protocols, concurrency control is
not more complex than in a distributed system without replicas, but the entire replica
set becomes unavailable, as soon as the master fails. Multi-master protocols require
complex mechanisms for prevention or detection and reconciliation of conflicting
1 Low-end
hardware is used, because it is substantially more cost-efficient than high-end hardware
[HB09, Section 3.1].
8.3 Storage Management
179
changes. Techniques typically used for these purposes are versioning, vector
clocks, gossiping, and read repair (e.g., in Dynamo [DeC+07]), and convergent or
commutative data types [Sha+11] (e.g., in Riak).
All four combinations of the two-tier classification are possible. Distributed
relational systems usually perform eager master-slave replication to maintain strong
consistency. Eager update anywhere replication as for example featured in Google’s
Megastore [Bak+11] suffers from a heavy communication overhead generated by
synchronization and can cause distributed deadlocks which are expensive to detect.
NoSQL database systems typically rely on lazy replication, either in combination
with the master-slave approach (CP systems, e.g., HBase and MongoDB) or the
update anywhere approach (AP systems, e.g., Dynamo and Cassandra). Many
NoSQL systems leave the choice between latency and consistency to the client, i.e.,
for every request, the client decides whether to wait for a response from any replica
to achieve minimal latency or for a certainly consistent response (by a majority of
the replicas or the master) to prevent stale data.
An aspect of replication that is not covered by the two-tier scheme is the distance
between replicas. The obvious advantage of placing replicas near one another is
low latency, but close proximity of replicas might also reduce the positive effects
on availability; for example, if two replicas of the same data item are placed in the
same rack, the data item is not available on rack failure in spite of replication. But
more than the possibility of mere temporary unavailability, placing replicas nearby
also bears the peril of losing all copies at once in a disaster scenario.
Geo-replication can protect the system against unavailability and data loss and
potentially improves read latency for distributed access from clients. Eager georeplication, as implemented in Google’s Megastore [Bak+11], Spanner [Cor+13],
MDCC [Kra+13], and Mencius [MJM08] allows for higher write latency to
achieve linearizability or other strong consistency models. In contrast, lazy georeplication as in Dynamo [DeC+07], PNUTS [Coo+08], Walter [Sov+11], COPS
[Llo+11], Cassandra [LM10], and BigTable [Cha+08] relaxes consistency in favor
of availability and latency. Charron-Bost et al. [CBPS10, Chapter 12] and Öszu
and Valduriez [ÖV11, Chapter 13] provide a comprehensive discussion of database
replication.
8.3 Storage Management
For best performance, database systems need to be optimized for the storage media
they employ to serve and persist data. These are typically main memory (RAM),
solid-state drives (SSDs), and spinning disk drives (HDDs) that can be used in any
combination. Unlike RDBMSs in enterprise setups, distributed NoSQL databases
avoid specialized shared-disk architectures in favor of shared-nothing clusters that
are based on commodity servers (employing commodity storage media).
Storage devices are typically visualized as a “storage pyramid” (see Fig. 8.2)
[Hel07]. The huge variety of cost and performance characteristics of RAM, SSD,
180
8 The NoSQL Toolbox: The NoSQL Landscape in a Nutshell
Fig. 8.2 The storage pyramid and its role in NoSQL systems
and HDD storage and the different strategies to leverage their strengths (storage
management) is one reason for the diversity of NoSQL databases. Storage management has a spatial dimension (where to store data) and a temporal dimension (when
to store data). Update-in-place and append-only I/O are two complementary spatial
techniques of organizing data; in-memory prescribes RAM as the location of data,
whereas logging is a temporal technique that decouples main memory and persistent
storage and thus provides control over when data is actually persisted. Besides the
major storage media, there is also a set of transparent caches (e.g., L1-L3 CPU
caches and disk buffers, not shown in the figure), that are only implicitly leveraged
through well-engineered database algorithms that promote data locality.
Stonebraker et al. [Sto+07] have found that in typical RDBMSs, only 6.8% of
the execution time is spent on “useful work”, while the rest is spent on:
• buffer management (34.6%), i.e., caching to mitigate slower disk access
• latching (14.2%), to protect shared data structures from race conditions caused
by multi-threading
• locking (16.3%), to guarantee logical isolation of transactions
• logging (11.9%), to ensure durability in the face of failures
• hand-coded optimizations (16.2%)
This motivates that large performance improvements can be expected if RAM is
used as primary storage (cf. in-memory databases [Zha+15a]). The downside are
high storage costs and lack of durability—a small power outage can destroy the
database state. This can be solved in two ways: the state can be replicated over n
in-memory server nodes protecting against n − 1 single-node failures (e.g., HStore,
VoltDB [Kal+08, SW13]) or by logging to durable storage (e.g., Redis or SAP Hana
[Car13, Pla13]). Through logging, a random write access pattern can be transformed
to a sequential one comprised of received operations and their associated properties
(e.g., redo information). In most NoSQL systems, the commit rule for logging is
8.3 Storage Management
181
respected, which demands every write operation that is confirmed as successful to be
logged and the log to be flushed to persistent storage. In order to avoid the rotational
latency of HDDs incurred by logging each operation individually, log flushes can be
batched together (group commit) which slightly increases the latency of individual
writes, but drastically improves overall throughput.
SSDs and more generally all storage devices based on NAND flash memory
differ substantially from HDDs in various aspects: “(1) asymmetric speed of read
and write operations, (2) no in-place overwrite—the whole block must be erased
before overwriting any page in that block, and (3) limited program/erase cycles”
[MKC+12]. Thus, a database system’s storage management must not treat SSDs
and HDDs as slightly slower, persistent RAM, since random writes to an SSD are
roughly an order of magnitude slower than sequential writes. Random reads, on the
other hand, can be performed without any performance penalties. There are some
database systems (e.g., Oracle Exadata, Aerospike) that are explicitly engineered for
these performance characteristics of SSDs. In HDDs, both random reads and writes
are 10–100 times slower than sequential access. Logging hence suits the strengths
of SSDs and HDDs which both offer a significantly higher throughput for sequential
writes.
For in-memory databases, an update-in-place access pattern is ideal: it simplifies the implementation and random writes to RAM are essentially equally fast
as sequential ones, with small differences being hidden by pipelining and the
CPU-cache hierarchy. However, RDBMSs and many NoSQL systems employ an
update-in-place update pattern for persistent storage, too. To mitigate the slow
random access to persistent storage, main memory is usually used as a cache and
complemented by logging to guarantee durability. In RDBMSs, this is achieved
through a complex buffer pool which not only employs cache-replace algorithms
appropriate for typical SQL-based access patterns, but also ensures ACID semantics.
NoSQL databases have simpler buffer pools that profit from simpler queries and
the lack of ACID transactions. The alternative to the buffer pool model is to leave
caching to the OS through virtual memory (e.g., employed in MongoDB’s MMAP
storage engine). This simplifies the database architecture, but has the downside of
giving less control over which data items or pages reside in memory and when they
get evicted. Also read-ahead (speculative reads) and write-behind (write buffering)
transparently performed by the operating system lack sophistication as they are
based on file system logics instead of database queries.
Append-only storage (also referred to as log-structuring) tries to maximize
throughput by writing sequentially. Although log-structured file systems have a long
research history, append-only I/O has only recently been popularized for databases
by BigTable’s use of Log-Structured Merge (LSM) trees [Cha+08] consisting of
an in-memory cache, a persistent log, and immutable, periodically written storage
files. LSM trees and variants like Sorted Array Merge Trees (SAMT) and CacheOblivious Look-ahead Arrays (COLA) have been applied in many NoSQL systems
(e.g., Cassandra, CouchDB, LevelDB, Bitcask, RethinkDB, WiredTiger, RocksDB,
InfluxDB, TokuDB) [Kle17]. Designing a database to achieve maximum write
performance by always writing to a log is rather simple, the difficulty lies in
182
8 The NoSQL Toolbox: The NoSQL Landscape in a Nutshell
providing fast random and sequential reads. This requires an appropriate index
structure that is either actively maintained as a copy-on-write (COW) data structure
(e.g., CouchDB’s COW B-trees) or only periodically persisted as an immutable data
structure (e.g., in BigTable-style systems). An issue of all log-structured storage
approaches is costly garbage collection (compaction) to reclaim space of updated or
deleted items.
In virtualized environments like Infrastructure-as-a-Service clouds, many of the
discussed characteristics of the underlying storage layer are hidden. In the future,
the availability of storage class memory combining speed of main memory with
persistence will also require novel approaches for storage management [Nan+16].
8.4 Query Processing
The querying capabilities of a NoSQL database mainly follow from its distribution
model, consistency guarantees, and data model. Primary key lookup, i.e., retrieving data items by a unique ID, is supported by every NoSQL system, since it is
compatible to range- as well as hash-partitioning. Filter queries return all items
(or projections) that meet a predicate specified over the properties of data items
from a single table. In their simplest form, they can be performed as filtered fulltable scans. For hash-partitioned databases, this implies a scatter-gather pattern
where each partition performs the predicated scan and results are merged. For rangepartitioned systems, any conditions on the range attribute can be exploited to select
partitions.
To circumvent the inefficiencies of O(n) scans, secondary indexes can be
employed. These can either be local secondary indexes that are managed in each
partition or global secondary indexes that index data over all partitions [Bak+11].
As the global index itself has to be distributed over partitions, consistent secondary
index maintenance would necessitate slow and potentially unavailable commit
protocols. Therefore, in practice, most systems only offer eventual consistency for
these indexes (e.g., Megastore, Google AppEngine Datastore, DynamoDB) or do
not support them at all (e.g., HBase, Azure Tables). When executing global queries
over local secondary indexes, the query can only be targeted to a subset of partitions,
if the query predicate and the partitioning rules intersect. Otherwise, results have
to be assembled through scatter-gather. For example, a user table with rangepartitioning over an age field can service queries that have an equality condition
on age from one partition, whereas queries over names need to be evaluated at
each partition. A special case of global secondary indexing is full-text search, where
selected fields or complete data items are fed into either a database-internal inverted
index (e.g., MongoDB) or to an external search platform such as ElasticSearch or
Solr (Riak Search, DataStax Cassandra).
Query planning is the task of optimizing a query plan to minimize execution
costs [Hel07]. For aggregations and joins, query planning is essential as these
queries are very inefficient and hard to implement in application code. The wealth of
8.5 Summary: System Studies
183
literature and results on relational query processing is largely disregarded in current
NoSQL systems for two reasons. First, the key-value and wide-column model are
centered around CRUD and scan operations on primary keys which leave little
room for query optimization. Second, most work on distributed query processing
focuses on OLAP (online analytical processing) workloads that favor throughput
over latency whereas single-node query optimization is not easily applicable for
partitioned and replicated databases [Kos00, ESW78, ÖV11]. However, it remains
an open research challenge to generalize the large body of applicable query
optimization techniques, especially in the context of document databases.2
In-database analytics can be performed either natively (e.g., in MongoDB,
Riak, CouchDB) or through external analytics platforms such as Hadoop, Spark
and Flink (e.g., in Cassandra and HBase). The prevalent native batch analytics
abstraction exposed by NoSQL systems is MapReduce3 [DG04]. Due to I/O,
communication overhead, and limited execution plan optimization, these batch- and
micro-batch-oriented approaches have high response times. Materialized views are
an alternative with lower query response times. They are declared at design time
and continuously updated on change operations (e.g., in CouchDB and Cassandra).
However, similar to global secondary indexing, view consistency is usually relaxed
in favor of fast, highly available writes, when the system is distributed [Lab+09].
As only few database systems come with built-in support for ingesting and
querying unbounded streams of data, near-real-time analytics pipelines commonly
implement either the Lambda Architecture [MW15] or the Kappa Architecture
[Kre14]: the former complements a batch processing framework like Hadoop
MapReduce with a stream processor such as Storm [Boy+14] and the latter
exclusively relies on stream processing and forgoes batch processing altogether.
8.5 Summary: System Studies
To conclude this chapter, we provide a qualitative comparison of some of a selection
of the most prominent key-value, document, and wide-column stores. We present
the results in strongly condensed comparisons and refer to the documentation of the
individual systems and our tutorials [GR15, GR16, GWR17, WGR+18, WGR19]
for in-detail information. The proposed NoSQL Toolbox (see Fig. 8.1, p. 176)
is a means of abstraction that can be used to classify database systems along
three dimensions: functional requirements, non-functional requirements, and the
techniques used to implement them. We argue that this classification characterizes
2 Currently
only RethinkDB can perform general θ-joins. MongoDB’s aggregation framework has
support for left-outer equi-joins in its aggregation framework and CouchDB allows joins for predeclared MapReduce views.
3 An alternative to MapReduce are generalized data processing pipelines, where the database tries
to optimize the flow of data and locality of computation based on a more declarative query language
(e.g., MongoDB’s aggregation framework [Mon]).
184
8 The NoSQL Toolbox: The NoSQL Landscape in a Nutshell
Fig. 8.3 A direct comparison of functional requirements, non-functional requirements and techniques among MongoDB, Redis, HBase, Riak, Cassandra, and MySQL according to the proposed
NoSQL Toolbox
many database systems well and thus can be used to meaningfully contrast different
database systems: Figure 8.3 shows a direct comparison of MongoDB, Redis,
HBase, Riak, Cassandra, and MySQL in their respective default configurations. A
more verbose comparison of central system properties is presented in Table 8.1 (see
p. 185).
The methodology used to identify the specific system properties consists of an
in-depth analysis of publicly available documentation and literature on the systems
[Mon, CD13, Car13, San17, Hba, Ria, CH16, LM10, Mys]. Furthermore, some
Dimension
Model
CAP
Scan performance
Disk latency per get by
row key
Write performance
Network latency
Durability
Replication
Sharding
Consistency
Atomicity
MongoDB
Document
CP
High (with appropriate
shard key)
∼Several disk seeks
HBase
Wide-column
CP
High (only on row key)
∼Several disk seeks
Cassandra
Wide-column
AP
High (using compound
index)
∼Several disk seeks
Riak
Key-value
AP
N/A
∼One disk seek
High (append-only I/O)
Configurable: nearest
slave, master (read
preference)
Configurable: none,
WAL, replicated (write
concern)
Master-slave,
synchronicity
configurable
Hash- or range-based
on attribute (s)
linearizable (master
writes with quorum
reads) or eventual (else)
High (append-only I/O)
Designated region
server
High (append-only I/O)
Configurable: R
replicas contacted
High (append-only I/O)
Configurable: R
replicas contacted
Very high, in-memory
Designated master
WAL, row-level
versioning
WAL, W replicas
written
Configurable: none,
periodic logging, WAL
File-system-level
(HDFS)
Consistent hashing
Configurable: writes,
durable writes, W
replicas written
Consistent hashing
Range-based (row key)
Consistent hashing
Consistent hashing
Linearizable
Eventual, client-side
conflict resolution
Single document
Single row, or explicit
locking
Eventual, optional
linearizable updates
(lightweight
transactions)
Single column
(multi-column updates
may cause dirty writes)
Single key/value pair
Redis
Key-value
CP
High (depends on data
structure)
In-memory
8.5 Summary: System Studies
Table 8.1 A qualitative comparison of MongoDB, HBase, Cassandra, Riak, and Redis
Asynchronous
master-slave
Only in Redis Cluster:
hashing
Master reads:
linearizable, slave
reads: eventual
Optimistic multi-key
transactions, atomic
Lua scripts
(continued)
185
186
Table 8.1 (continued)
MongoDB
Yes (mastered)
HBase
Yes (mastered)
Interface
Special data types
Binary TCP
Objects, arrays, sets,
counters, files
Queries
Secondary Indexing
License
Riak
No
Redis
Yes (mastered)
Thrift
Counters
Cassandra
Yes
(Paxos-coordinated)
Thrift or TCP/CQL
Counters
REST or TCP/Protobuf
CRDTs for counters,
flags, registers, maps
Query by example
(filter, sort, project),
range queries,
MapReduce,
aggregation, limited
joins
Hash, B-Tree,
geospatial indexes
Get by row key, scans
over row key ranges,
project CFs/columns
Get by Partition Key
and filter/sort over
cluster key, FT-search
Get by ID or local
secondary index,
materialized views,
MapReduce, FT-search
TCP/Plain-Text
Sets, hashes, counters,
sorted sets, lists,
HyperLogLogs, bit
vectors
Data Structure
Operations
None
Local secondary
indexes, search index
(Solr)
Not explicit
GPL 3.0
Apache 2
Local sorted index,
global secondary hash
index, search index
(Solr)
Apache 2
Apache 2
BSD
8 The NoSQL Toolbox: The NoSQL Landscape in a Nutshell
Dimension
Conditional updates
References
187
properties had to be evaluated by researching the open-source code bases, personal
communication with the developers, as well as a meta-analysis of reports and
benchmarks by practitioners.
The comparison elucidates how SQL and NoSQL databases are designed to
fulfill very different needs: RDBMSs provide a broad set of functionalities whereas
NoSQL databases excel on the non-functional side through scalability, availability,
low latency, and high throughput. However, there are also large differences among
the NoSQL databases. Riak and Cassandra, for example, can be configured to
fulfill many non-functional requirements, but are only eventually consistent and
do not feature many functional capabilities apart from data analytics and, in case
of Cassandra, conditional updates. MongoDB and HBase, on the other hand, offer
stronger consistency and more sophisticated functional capabilities such as scan
queries and—only in MongoDB—filter queries, but do not maintain read and
write availability during partitions and tend to display higher read latencies. As
the only non-partitioned system in this comparison apart from MySQL, Redis
shows a special set of trade-offs centered around the ability to maintain extremely
high throughput at low latency using in-memory data structures and asynchronous
master-slave replication.
This diversity illustrates that for enabling low latency cloud data management,
no single database technology can cover all use cases. Therefore, latency reductions
have to operate across different database systems and requirements.
References
[Bak+11] J. Baker et al. “Megastore: Providing scalable, highly available storage for
interactive services”. In: Proc. of CIDR. Vol. 11. 2011, pp. 223–234.
[Ber+11] Philip A. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”. In: Data Engineering (ICDE), 2011 IEEE 27th International Conference
on. IEEE. IEEE, 2011, pp. 1255–1263. URL: http://ieeexplore.ieee.org/xpls/
abs_all.jsp?arnumber=5767935 (visited on 05/05/2014).
[BK14] Peter Bailis and Kyle Kingsbury. “The network is reliable”. In: Queue 12.7
(2014), p. 20. URL: http://dl.acm.org/citation.cfm?id=2655736 (visited on
01/03/2015).
[Boy+14] Oscar Boykin et al. “Summingbird: A Framework for Integrating Batch and
Online MapReduce Computations”. In: VLDB 7.13 (2014).
[Cal+11] Brad Calder et al. “Windows Azure Storage: a highly available cloud storage
service with strong consistency”. In: Proceedings of the Twenty-Third ACM
Symposium on Operating Systems Principles. ACM. ACM, 2011, pp. 143–157.
URL : http://dl.acm.org/citation.cfm?id=2043571 (visited on 04/16/2014).
[Car13] Josiah L. Carlson. Redis in Action. Greenwich, CT, USA: Manning Publications
Co., 2013. ISBN: 1617290858, 9781617290855.
[CBPS10] Bernadette Charron-Bost, Fernando Pedone, and André Schiper, eds. Replication: Theory and Practice. Vol. 5959. Lecture Notes in Computer Science.
Springer, 2010.
[CD13] Kristina Chodorow and Michael Dirolf. MongoDB - The Definitive Guide.
O’Reilly, 2013. ISBN: 978-1-449-38156-1. URL: http://www.oreilly.de/catalog/
9781449381561/index.html.
188
8 The NoSQL Toolbox: The NoSQL Landscape in a Nutshell
[CH16] Jeff Carpenter and Eben Hewitt. Cassandra: The Definitive Guide. “O’Reilly
Media, Inc.”, 2016.
[Cha+08] Fay Chang et al. “Bigtable: A distributed storage system for structured data”.
In: ACM Transactions on Computer Systems (TOCS) 26.2 (2008), p. 4.
[Coo+08] B. F. Cooper et al. “PNUTS: Yahoo!’s hosted data serving platform”. In: PVLDB
1.2 (2008), pp. 1277–1288. URL: http://dl.acm.org/citation.cfm?id=1454167
(visited on 09/12/2012).
[Cor+13] James C. Corbett et al. “Spanner: Google’s Globally Distributed Database”. In:
ACM Trans. Comput. Syst. 31.3 (2013), 8:1–8:22. DOI: 10.1145/2491245.
[Cur+11a] Carlo Curino et al. “Relational Cloud: A Database-as-a-Service for the Cloud”.
In: Proc. of CIDR. 2011. URL: http://dspace.mit.edu/handle/1721.1/62241 (visited on 04/15/2014).
[DAEA10] Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. “G-store: a scalable data
store for transactional multi key access in the cloud”. In: Proceedings of the 1st
ACM symposium on Cloud computing. ACM. 2010, pp. 163–174.
[Dea09] Jeff Dean. Designs, lessons and advice from building large distributed systems.
Keynote talk at LADIS 2009. 2009.
[DeC+07] G. DeCandia et al. “Dynamo: amazon’s highly available key-value store”. In:
ACM SOSP. Vol. 14. 17. ACM. 2007, pp. 205–220. URL: http://dl.acm.org/
citation.cfm?id=1294281 (visited on 09/12/2012).
[DG04] Jeffrey Dean and Sanjay Ghemawat. “MapReduce: Simplified Data Processing
on Large Clusters”. In: Proceedings of the 6th Conference on Symposium
on Operating Systems Design & Implementation - Volume 6. OSDI’04. San
Francisco, CA: USENIX Association, 2004, pp. 10–10. URL: http://dl.acm.org/
citation.cfm?id=1251254.1251264.
[ESW78] Robert S. Epstein, Michael Stonebraker, and Eugene Wong. “Distributed Query
Processing in a Relational Data Base System”. In: Proceedings of the 1978 ACM
SIGMOD International Conference on Management of Data, Austin, Texas,
USA, May 31 - June 2, 1978. Ed. by Eugene I. Lowenthal and Nell B. Dale.
ACM, 1978, pp. 169–180. DOI: 10.1145/509252.509292.
[GHa+96] Jim Gray, Pat Hell and, et al. “The dangers of replication and a solution”. In:
SIGMOD Rec. 25.2 (June 1996), pp. 173–182.
[GR15] Felix Gessert and Norbert Ritter. “Skalierbare NoSQL- und Cloud- Datenbanken in Forschung und Praxis”. In: Datenbanksysteme für Business, Technologie und Web (BTW 2015) - Workshopband, 2.-3. März 2015, Hamburg,
Germany. 2015, pp. 271–274.
[GR16] Felix Gessert and Norbert Ritter. “Scalable Data Management: NoSQL Data
Stores in Research and Practice”. In: 32nd IEEE International Conference on
Data Engineering, ICDE 2016. 2016.
[GWR17] Felix Gessert, Wolfram Wingerath, and Norbert Ritter. “Scalable Data Management: An In-Depth Tutorial on NoSQL Data Stores”. In: BTW (Workshops).
Vol. P-266. LNI. GI, 2017, pp. 399–402.
[Ham07] James Hamilton. “On designing and deploying internet-scale services”. In: 21st
LISA. USENIX Association, 2007.
[HB09] Urs Hoelzle and Luiz Andre Barroso. The Datacenter As a Computer: An
Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool
Publishers, 2009.
[Hba] HBase. http://hbase.apache.org/. (Accessed on 05/25/2017). 2017. URL: http://
hbase.apache.org/ (visited on 07/16/2014).
[Hel07] Joesph Hellerstein. “Architecture of a Database System”. In: Foundations
and Trends in Databases 1.2 (Nov. 2007), pp. 141–259. ISSN: 1931-7883,
1931-7891. DOI: 10.1561/1900000002. URL: http://www.nowpublishers.com/
product.aspx?product=DBS&doi=1900000002 (visited on 01/03/2015).
References
189
[Kal+08] R. Kallman et al. “H-store: a high-performance, distributed main memory
transaction processing system”. In: Proceedings of the VLDB Endowment 1.2
(2008), pp. 1496–1499.
[Kar+97] David R. Karger et al. “Consistent hashing and random trees: distributed
caching protocols for relieving hot spots on the World Wide Web”. In:
ACM Symposium on Theory of Computing. 1997, pp. 654–663. DOI:
10.1145/258533.258660.
[Kle17] Martin Kleppmann. Designing Data-Intensive Applications. English. 1 edition.
O’Reilly Media, Jan. 2017. ISBN: 978-1-4493-7332-0.
[Kos00] Donald Kossmann. “The State of the art in distributed query processing”. In:
ACM Comput. Surv. 32.4 (2000), pp. 422–469. DOI: 10.1145/371578.371598.
[Kra+13] Tim Kraska et al. “MDCC: Multi-data center consistency”. In: EuroSys. ACM,
2013, pp. 113–126. URL: http://dl.acm.org/citation.cfm?id=2465363 (visited on
04/15/2014).
[Kre14] Jay Kreps. Questioning the Lambda Architecture. https://www.oreilly.com/
ideas/questioning-the-lambda-architecture. (Accessed on 09/23/2018). 2014.
[Lab+09] Alexandros Labrinidis et al. “Caching and Materialization for Web Databases”.
In: Foundations and Trends in Databases 2.3 (2009), pp. 169–266. DOI:
10.1561/1900000005.
[Llo+11] Wyatt Lloyd et al. “Don’t settle for eventual: scalable causal consistency for
wide-area storage with COPS”. In: Proceedings of the Twenty-Third ACM
Symposium on Operating Systems Principles. ACM, 2011, pp. 401–416. URL:
http://dl.acm.org/citation.cfm?id=2043593 (visited on 01/03/2015).
[LM10] Avinash Lakshman and Prashant Malik. “Cassandra: a decentralized structured storage system”. In: ACM SIGOPS Operating Systems Review 44.2
(2010), pp. 35–40. URL: http://dl.acm.org/citation.cfm?id=1773922 (visited on
04/15/2014).
[MJM08] Yanhua Mao, Flavio Paiva Junqueira, and Keith Marzullo. “Mencius: Building
Efficient Replicated State Machine for WANs”. In: 8th USENIX Symposium on
Operating Systems Design and Implementation, OSDI 2008, December 8–10,
2008, San Diego, California, USA, Proceedings. Ed. by Richard Draves and
Robbert van Renesse. USENIX Association, 2008, pp. 369–384. URL: http://
www.usenix.org/events/osdi08/tech/full_papers/mao/mao.pdf.
[MKC+12] Changwoo Min, Kangnyeon Kim, Hyunjin Cho, et al. “SFS: random write
considered harmful in solid state drives”. In: FAST. 2012.
[Mon] MongoDB. https://www.mongodb.com/. (Accessed on 06/18/2017). 2017.
[MW15] Nathan Marz and James Warren. Big Data: Principles and Best Practices of
Scalable Realtime Data Systems. Manning Publications Co., 2015.
[Mys] MySQL Documentation.
https://dev.mysql.com/doc/. (Accessed on
09/15/2017). 2017.
[Nan+16] Mihir Nanavati et al. “Non-volatile storage”. In: Commun. ACM 59.1 (2016),
pp. 56–63. DOI: 10.1145/2814342.
[Pla13] Hasso Plattner. A course in in-memory data management. Springer, 2013.
[Qia+13] Lin Qiao et al. “On brewing fresh espresso: LinkedIn’s distributed data serving
platform”. In: Proceedings of the 2013 international conference on Management of data. ACM, 2013, pp. 1135–1146. URL: http://dl.acm.org/citation.cfm?
id=2465298 (visited on 09/28/2014).
[Ria] Riak. http://basho.com/products/. (Accessed on 05/25/2017). 2017. URL: http://
basho.com/products/ (visited on 01/13/2017).
[San17] Salvatore Sanfilippo. Redis. http://redis.io/. (Accessed on 07/16/2017). 2017.
URL : http://redis.io/ (visited on 09/02/2015).
[Sha+11] M. Shapiro et al. “A comprehensive study of convergent and commutative
replicated data types”. In: (2011). URL: http://hal.upmc.fr/inria-00555588/
(visited on 11/23/2012).
190
8 The NoSQL Toolbox: The NoSQL Landscape in a Nutshell
[Sov+11] Yair Sovran et al. “Transactional storage for geo-replicated systems”. In:
Proceedings of the Twenty-Third ACM Symposium on Operating Systems
Principles. ACM, 2011, pp. 385–400.
[Sto+07] M. Stonebraker et al. “The end of an architectural era:(it’s time for a complete rewrite)”. In: Proceedings of the 33rd international conference on Very
large data bases. 2007, pp. 1150–1160. URL: http://dl.acm.org/citation.cfm?id=
1325981 (visited on 07/05/2012).
[STR+15] Dharma Shukla, Shireesh Thota, Karthik Raman, et al. “Schemaagnostic
indexing with Azure DocumentDB”. In: PVLDB 8.12 (2015).
[SW13] Michael Stonebraker and Ariel Weisberg. “The VoltDB Main Memory DBMS”.
In: IEEE Data Eng. Bull. 36.2 (2013), pp. 21–27. URL: http://sites.computer.
org/debull/A13june/VoltDB1.pdf.
[WGR+18] Wolfram Wingerath, Felix Gessert, Norbert Ritter, et al. “Real-Time Data Management for Big Data”. In: Proceedings of the 21th International Conference
on Extending Database Technology, EDBT 2018, Vienna, Austria, March 26–
29, 2018. OpenProceedings.org, 2018.
[WGR19] Wolfram Wingerath, Felix Gessert, and Norbert Ritter. “NoSQL & Real-Time
Data Management in Research & Practice”. In: Datenbanksysteme für Business, Technologie und Web (BTW 2019), 18. Fachtagung des GI-Fachbereichs
“Datenbanken und Informationssysteme” (DBIS), 4.-8. März 2019, Rostock,
Germany, Workshopband. 2019, pp. 267–270. URL: https://dl.gi.de/20.500.
12116/21595.
[Wie15] Lena Wiese. Advanced data management: for SQL, NoSQL, cloud and distributed databases. Berlin; Boston: De Gruyter, Oldenbourg, 2015. ISBN:
978-3-11-044140-6.
[Zha+15a] Hao Zhang et al. “In-Memory Big Data Management and Processing: A
Survey”. In: IEEE Transactions on Knowledge and Data Engineering 27.7 (July
2015), pp. 1920–1948. ISSN: 1041-4347. DOI: 10.1109/TKDE.2015.2427795.
URL : http://ieeexplore.ieee.org/document/7097722/ (visited on 11/25/2016).
[ÖV11] M.T. Özsu and P. Valduriez. Principles of distributed database systems.
Springer, 2011.
Chapter 9
Summary and Future Trends
In this book, we highlighted core performance challenges across the web application stack: Performance depends on frontend rendering, networking and caching
infrastructures, as well as data storage and business logic in the backend.
First, we discussed the requirements of web applications that include high
availability, elastic scalability, quick page loads, engaging user experience, and a fast
time-to-market. We showed that these requirements are difficult to achieve in cloudbased two- and three-tier architectures and that latency poses a pivotal challenge
in heterogeneous cloud environments. Since the conflict between latency and
correctness becomes clearly evident in NoSQL database systems and their various
levels of relaxed consistency guarantees, we discussed in detail how the combination
of data storage systems in polyglot persistence architectures complicates data
management. We further explored how latency becomes critical problem in the
context of distributed transactions as it is directly related to abort rates and thus
can be the limiting factor for transaction throughput in distributed settings. Seeing
that database systems cannot be efficiently employed in web applications as they
lack Database- and Backend-as-a-Service (DBaaS/BaaS) interfaces for direct access
from other cloud services or client devices, we covered today’s DBaaS and BaaS
systems in the later chapters. We then turned to the underlying database systems
themselves and classified the different technological means for addressing varying
functional and non-functional requirements in order to facilitate an informed choice
regarding the right technology for a given set of requirements. In conclusion to this
book, we will finally provide a concise discussion of how to find the right database
system for a given application scenario.
© Springer Nature Switzerland AG 2020
F. Gessert et al., Fast and Scalable Cloud Data Management,
https://doi.org/10.1007/978-3-030-43506-6_9
191
192
9 Summary and Future Trends
9.1 From Abstract Requirements to Concrete Systems
Choosing a database system always means to choose one set of desirable properties
over another. To break down the complexity of this choice, we present a binary
decision tree in Fig. 9.1 that maps trade-off decisions to example applications and
potentially suitable database systems. The leaf nodes cover applications ranging
from simple caching (left) to Big Data analytics (right). Naturally, this view on
the problem space is not complete, but it vaguely points towards a solution for a
particular data management problem.
The first split in the tree is along the access pattern of applications: they either
rely on fast lookups only (left half) or require more complex querying capabilities
(right half). The fast lookup applications can be distinguished further by the data
volume they process: if the main memory of one single machine can hold all the
data, a single-node system like Redis or Memcache probably is the best choice,
depending on whether functionality (Redis) or simplicity (Memcache) is favored.
If the data volume is or might grow beyond RAM capacity or is even unbounded,
a multi-node system that scales horizontally might be more appropriate. The most
important decision in this case is whether to favor availability (AP) or consistency
(CP) as described by the CAP theorem. Systems like Cassandra and Riak can deliver
an always-on experience, while systems like HBase, MongoDB, and DynamoDB
deliver strong consistency.
The right half of the tree covers applications requiring more complex queries than
simple lookups. Here, too, we first distinguish the systems by the data volume they
have to handle according to whether single-node systems are feasible (HDD-size) or
Fig. 9.1 A decision tree for mapping requirements to (NoSQL) database system candidates
9.2 Future Prospects
193
distribution is required (unbounded volume). For common OLTP (online transaction
processing) workloads on moderately large data volumes, traditional RDBMSs or
graph databases like Neo4J are optimal, because they offer ACID semantics. If,
however, availability is essential, distributed systems like MongoDB, CouchDB or
DocumentDB, are preferable.
If data volume exceeds the limits of a single machine, the choice depends on the
prevalent query pattern: when complex queries have to be optimized for latency, as
for example in social networking applications, MongoDB is very attractive, because
it facilitates expressive ad-hoc queries. HBase and Cassandra are also useful in such
a scenario, but excel at throughput-optimized Big Data analytics, when combined
with Hadoop.
In summary, we are convinced that the proposed top-down model is an effective
decision support to filter the vast amount of NoSQL database systems based on
central requirements. The NoSQL Toolbox furthermore provides a mapping from
functional and non-functional requirements to common implementation techniques
in order to categorize the constantly evolving NoSQL space. In the following, we
will conceive a DBaaS/BaaS middleware architecture that is designed to cover an as
large subset of the decision tree as possible within a coherent REST/HTTP API.
9.2 Future Prospects
Today, the landscape of cloud data management is still undergoing massive changes
and the coming years will decide over central paradigm shifts.
One of the pivotal questions is whether the trend towards a fragmented and
highly specialized ecosystem of database systems continues (polyglot persistence)
or whether middlewares will become capable of abstracting away database systems
(polystores). Potentially even a new generation of one-size-fits-all databases could
consolidate the recent advances in a single system: similar to programming languages and operating systems, the current heterogeneity of implementations might
soon be replaced by the prevalence of a small number of core systems. As the
trend towards novel machine learning techniques continues, these algorithms could
quickly become first-class citizens in query languages and database interfaces. For
cloud data management, it remains to be seen whether proprietary systems by large
cloud vendors (e.g., Google’s Spanner) have an inherent advantage in economies of
scale that allows them to outperform even the best on-premise database systems.
This book aims to structure the great number of systems and approaches in modern cloud data management. However, standardization—e.g., of query languages
for NoSQL data models—as well as comprehensive taxonomies—e.g., for SLAs
or consistency models—need to be addressed in cloud data management research
during the next years. We sincerely believe that these are the most exciting days to
engage in cloud data management, as the progress in both research and commercial
products may heavily influence computer science research as a whole for decades.