Scalability, Availability & Stability Patterns
twitter: @jboner email: jonas.boner@jayway.com
Jonas Bonr
Jayway Stockholm
Tuesday, May 11, 2010
Outline
Tuesday, May 11, 2010
Outline
Tuesday, May 11, 2010
Outline
Tuesday, May 11, 2010
Outline
Tuesday, May 11, 2010
Outline
Tuesday, May 11, 2010
Introduction
Tuesday, May 11, 2010
Scalability Patterns
Tuesday, May 11, 2010
Managing Overload
Tuesday, May 11, 2010
Scale up vs Scale out?
Tuesday, May 11, 2010
General recommendations
Immutability as the default Referential Transparency (FP) Laziness Think about your data:
Different data need different guarantees
Tuesday, May 11, 2010
Scalability Trade-offs
Tuesday, May 11, 2010
Tuesday, May 11, 2010
Trade-offs
Latency vs Throughput Availability vs Consistency
Performance vs Scalability
Tuesday, May 11, 2010
Performance vs Scalability
Tuesday, May 11, 2010
How do I know if I have a performance problem?
Tuesday, May 11, 2010
How do I know if I have a performance problem?
If your system is slow for a single user
Tuesday, May 11, 2010
How do I know if I have a scalability problem?
Tuesday, May 11, 2010
How do I know if I have a scalability problem?
If your system is fast for a single user but slow under heavy load
Tuesday, May 11, 2010
Latency vs Throughput
Tuesday, May 11, 2010
You should strive for
maximal throughput
with
acceptable latency
Tuesday, May 11, 2010
Availability vs Consistency
Tuesday, May 11, 2010
CAP
theorem
Tuesday, May 11, 2010
Brewsters
You can only pick
Consistency Availability Partition tolerance
At a given point in time
Tuesday, May 11, 2010
Centralized system
In a centralized system (RDBMS etc.) So you get both:
we dont have network partitions, e.g. P in CAP
Availability
onsistency
Tuesday, May 11, 2010
Atomic Consistent Isolated Durable
Tuesday, May 11, 2010
Distributed system
In a distributed system we (will) have
network partitions, e.g. P in CAP
So you get to only pick one:
Availability
onsistency
Tuesday, May 11, 2010
CAP in practice:
...there are only two types of systems:
1. CA == CP (they are equivalent) 2. AP
...there is only one choice to make. In
case of a network partition, what do you sacrice?
1. C: Consistency 2. A: Availability
Tuesday, May 11, 2010
Basically Available Soft state Eventually consistent
Tuesday, May 11, 2010
Eventual Consistency
...is an interesting trade-off
Tuesday, May 11, 2010
Eventual Consistency
...is an interesting trade-off
But lets get back to that later
Tuesday, May 11, 2010
Availability Patterns
Tuesday, May 11, 2010
Availability Patterns
Fail-over Replication
Master-Slave Tree replication Master-Master Buddy Replication
Tuesday, May 11, 2010
What do we mean with Availability?
Tuesday, May 11, 2010
Fail-over
Tuesday, May 11, 2010
Fail-over
Copyright Michael Nygaard
Tuesday, May 11, 2010
Fail-over
But fail-over is not always this simple
Copyright Michael Nygaard
Tuesday, May 11, 2010
Fail-over
Copyright Michael Nygaard
Tuesday, May 11, 2010
Fail-back
Copyright Michael Nygaard
Tuesday, May 11, 2010
Network fail-over
Tuesday, May 11, 2010
Replication
Tuesday, May 11, 2010
Replication
Active replication - Push Passive replication - Pull
Data not available, read from peer, then store it locally Works well with timeout-based caches
Tuesday, May 11, 2010
Replication
Master-Slave replication Tree Replication Master-Master replication Buddy replication
Tuesday, May 11, 2010
Master-Slave Replication
Tuesday, May 11, 2010
Master-Slave Replication
Tuesday, May 11, 2010
Tree Replication
Tuesday, May 11, 2010
Master-Master Replication
Tuesday, May 11, 2010
Buddy Replication
Tuesday, May 11, 2010
Buddy Replication
Tuesday, May 11, 2010
Scalability Patterns: State
Tuesday, May 11, 2010
Scalability Patterns: State
Partitioning HTTP Caching RDBMS Sharding NOSQL Distributed Caching Data Grids Concurrency
Tuesday, May 11, 2010
Partitioning
Tuesday, May 11, 2010
HTTP Caching
Reverse Proxy
Tuesday, May 11, 2010
Varnish Squid rack-cache Pound Nginx Apache mod_proxy
HTTP Caching
CDN, Akamai
Tuesday, May 11, 2010
Generate Static Content
Precompute content
Homegrown + cron or Quartz Spring Batch Gearman Hadoop Google Data Protocol Amazon Elastic MapReduce
Tuesday, May 11, 2010
HTTP Caching
First request
Copyright Ryan Tomayko
Tuesday, May 11, 2010
Subsequent request
HTTP Caching
Copyright Ryan Tomayko
Tuesday, May 11, 2010
Service of Record SoR
Tuesday, May 11, 2010
Service of Record
Relational Databases (RDBMS) NOSQL Databases
Tuesday, May 11, 2010
How to scale out RDBMS?
Tuesday, May 11, 2010
Sharding
Replication
Tuesday, May 11, 2010
Partitioning
Sharding: Partitioning
Tuesday, May 11, 2010
Sharding: Replication
Tuesday, May 11, 2010
ORM + rich domain model
anti-pattern
Attempt: Result:
Read an object from DB You sit with your whole database in your lap
Tuesday, May 11, 2010
Think about your data
Think again
When do you need ACID? When is Eventually Consistent a better t? Different kinds of data has different needs
Tuesday, May 11, 2010
When is a RDBMS good enough?
Tuesday, May 11, 2010
not
Scaling to a RDBMS is
Tuesday, May 11, 2010
reads
hard
Scaling to a RDBMS is
Tuesday, May 11, 2010
writes
impossible
Do we really need a RDBMS?
Tuesday, May 11, 2010
Do we really need a RDBMS? Sometimes...
Tuesday, May 11, 2010
Do we really need a RDBMS?
Tuesday, May 11, 2010
Do we really need a RDBMS? But many times we dont
Tuesday, May 11, 2010
NOSQL (Not Only SQL)
Tuesday, May 11, 2010
NOSQL
Key-Value databases Column databases Document databases Graph databases Datastructure databases
Tuesday, May 11, 2010
Whos ACID?
Relational DBs (MySQL, Oracle, Postgres) Object DBs (Gemstone, db4o) Clustering products (Coherence,
Terracotta)
Most caching products (ehcache)
Tuesday, May 11, 2010
Whos BASE?
Distributed databases
Cassandra Riak Voldemort Dynomite, SimpleDB etc.
Tuesday, May 11, 2010
NOSQL in the wild
Google: Bigtable Amazon: Dynamo Amazon: SimpleDB Yahoo: HBase Microsoft: Dynomite Facebook: Cassandra LinkedIn: Voldemort
Tuesday, May 11, 2010
But rst some background...
Tuesday, May 11, 2010
Chord & Pastry
Distributed Hash Tables (DHT) Scalable Partitioned Fault-tolerant Decentralized Peer to peer Popularized
Node ring Consistent Hashing
Tuesday, May 11, 2010
Node ring with Consistent Hashing
Find data in log(N) jumps
Tuesday, May 11, 2010
Bigtable
How can we build a DB on top of Google File System? Paper: Bigtable: A distributed storage system for structured data, 2006 Rich data-model, structured storage Clones: HBase Hypertable Neptune
Tuesday, May 11, 2010
Dynamo
How can we build a distributed hash table for the data center? Paper: Dynamo: Amazons highly available keyvalue store, 2007 Focus: partitioning, replication and availability Eventually Consistent Clones: Voldemort Dynomite
Tuesday, May 11, 2010
Types of NOSQL stores
Key-Value databases (Voldemort, Dynomite) Column databases (Cassandra,Vertica) Document databases (MongoDB, CouchDB) Graph databases (Neo4J, AllegroGraph) Datastructure databases (Redis, Hazelcast)
Tuesday, May 11, 2010
Distributed Caching
Tuesday, May 11, 2010
Distributed Caching
Write-through Write-behind Eviction Policies Replication Peer-To-Peer (P2P)
Tuesday, May 11, 2010
Write-through
Tuesday, May 11, 2010
Write-behind
Tuesday, May 11, 2010
Eviction policies
TTL (time to live) Bounded FIFO (rst in rst out) Bounded LIFO (last in rst out) Explicit cache invalidation
Tuesday, May 11, 2010
Peer-To-Peer
Decentralized No special or blessed nodes Nodes can join and leave as they please
Tuesday, May 11, 2010
Distributed Caching
Products
EHCache JBoss Cache OSCache memcached
Tuesday, May 11, 2010
memcached
Very fast Simple Key-Value (string >binary) Clients for most languages Distributed Not replicated - so 1/N chance
for local access in cluster
Tuesday, May 11, 2010
Data Grids / Clustering
Tuesday, May 11, 2010
Data Grids/Clustering
Parallel data storage
Data replication Data partitioning Continuous availability Data invalidation Fail-over C + A in CAP
Tuesday, May 11, 2010
Data Grids/Clustering
Products
Coherence Terracotta GigaSpaces GemStone Hazelcast Innispan
Tuesday, May 11, 2010
Concurrency
Tuesday, May 11, 2010
Concurrency
Shared-State Concurrency Message-Passing Concurrency Dataow Concurrency Software Transactional Memory
Tuesday, May 11, 2010
Shared-State Concurrency
Tuesday, May 11, 2010
Shared-State Concurrency
Everyone can access anything anytime Totally indeterministic Introduce determinism at well-dened places... ...using locks
Tuesday, May 11, 2010
Shared-State Concurrency
Problems with locks:
Locks do not compose Taking too few locks Taking too many locks Taking the wrong locks Taking locks in the wrong order Error recovery is hard
Tuesday, May 11, 2010
Shared-State Concurrency
Please use java.util.concurrent.*
ConcurrentHashMap BlockingQueue ConcurrentQueue ExecutorService ReentrantReadWriteLock CountDownLatch ParallelArray andmuchmuchmore..
Tuesday, May 11, 2010
Message-Passing Concurrency
Tuesday, May 11, 2010
Actors
Originates in a 1973 paper by Carl Hewitt Implemented in Erlang, Occam, Oz Encapsulates state and behavior Closer to the denition of OO than classes
Tuesday, May 11, 2010
Actors
ShareNOTHING Isolated lightweight processes Communicates through messages Asynchronous and non-blocking No shared state hence, nothing to synchronize. Each actor has a mailbox (message queue)
Tuesday, May 11, 2010
Actors
Easier to reason about Raised abstraction level Easier to avoid Race conditions Deadlocks Starvation Live locks
Tuesday, May 11, 2010
Actor libs for the JVM
Akka (Java/Scala) scalaz actors (Scala) Lift Actors (Scala) Scala Actors (Scala) Kilim (Java) Jetlang (Java) Actors Guild (Java) Actorom (Java) FunctionalJava (Java) GPars (Groovy)
Tuesday, May 11, 2010
Dataow Concurrency
Tuesday, May 11, 2010
Dataow Concurrency
Declarative No observable non-determinism Data-driven threads block until
data is available On-demand, lazy No difference between:
Concurrent & Sequential code
Limitations: cant have side-effects
Tuesday, May 11, 2010
Software Transactional Memory
STM:
Tuesday, May 11, 2010
STM: overview
See the memory (heap and stack)
as a transactional dataset Similar to a database
begin commit abort/rollback
Transactions are retried
automatically upon collision Rolls back the memory on abort
Tuesday, May 11, 2010
STM: overview
Transactions can nest
Transactionscompose (yipee!!)
atomic{ ... atomic{ ... } }
Tuesday, May 11, 2010
STM: restrictions
All operations in scope of a transaction:
Need to be idempotent
Tuesday, May 11, 2010
STM libs for the JVM
Akka
(Java/Scala) Multiverse (Java) Clojure STM (Clojure) CCSTM (Scala) Deuce STM (Java)
Tuesday, May 11, 2010
Scalability Patterns: Behavior
Tuesday, May 11, 2010
Scalability Patterns: Behavior
Event-Driven Architecture Compute Grids Load-balancing Parallel Computing
Tuesday, May 11, 2010
Event-Driven Architecture
Four years from now, mere mortals will begin to adopt an event-driven architecture (EDA) for the sort of complex event processing that has been attempted only by software gurus [until now] --Roy Schulte (Gartner), 2003
Tuesday, May 11, 2010
Event-Driven Architecture
Domain Events Event Sourcing Command and Query Responsibility Segregation (CQRS) pattern Event Stream Processing Messaging Enterprise Service Bus Actors Enterprise Integration Architecture (EIA)
Tuesday, May 11, 2010
Domain Events
It's really become clear to me in the last couple of years that we need a new building block and that is the Domain Events -- Eric Evans, 2009
Tuesday, May 11, 2010
Domain Events
Domain Events represent the state of entities at a given time when an important event occurred and decouple subsystems with event streams. Domain Events give us clearer, more expressive models in those cases. -- Eric Evans, 2009
Tuesday, May 11, 2010
Domain Events
State transitions are an important part of our problem space and should be modeled within our domain. -- Greg Young, 2008
Tuesday, May 11, 2010
Event Sourcing
Every state change is materialized in an Event All Events are sent to an EventProcessor EventProcessor stores all events in an Event Log System can be reset and Event Log replayed No need for ORM, just persist the Events Many different EventListeners can be added to
Tuesday, May 11, 2010
EventProcessor (or listen directly on the Event log)
Event Sourcing
Tuesday, May 11, 2010
Command and Query Responsibility Segregation (CQRS) pattern
A single model cannot be appropriate for reporting, searching and transactional behavior. -- Greg Young, 2008
Tuesday, May 11, 2010
Bidirectional
Bidirectional
Tuesday, May 11, 2010
Tuesday, May 11, 2010
Unidirectional
Unidirectional
Unidirectional
Tuesday, May 11, 2010
Tuesday, May 11, 2010
Tuesday, May 11, 2010
Tuesday, May 11, 2010
CQRS
in a nutshell
All state changes are represented by Domain Events Aggregate roots receive Commands and publish Events
Reporting (query database) is updated as a result of the
published Events
All Queries from Presentation go directly to Reporting
and the Domain is not involved
Tuesday, May 11, 2010
CQRS
Copyright by Axis Framework
Tuesday, May 11, 2010
CQRS: Benets
Fully encapsulated domain that only exposes Queries do not use the domain model No object-relational impedance mismatch Bullet-proof auditing and historical tracing Easy integration with external systems Performance and scalability
Tuesday, May 11, 2010
behavior
Event Stream Processing
select*from Withdrawal(amount>=200).win:length(5)
Tuesday, May 11, 2010
Event Stream Processing
Products
Esper (Open Source) StreamBase RuleCast
Tuesday, May 11, 2010
Messaging
Publish-Subscribe Point-to-Point Store-forward Request-Reply
Tuesday, May 11, 2010
Publish-Subscribe
Tuesday, May 11, 2010
Point-to-Point
Tuesday, May 11, 2010
Store-Forward
Durability, event log, auditing etc.
Tuesday, May 11, 2010
Request-Reply
F.e. AMQPs replyTo header
Tuesday, May 11, 2010
Messaging
Standards: Products:
RabbitMQ (AMQP) ActiveMQ (JMS) Tibco MQSeries etc
AMQP JMS
Tuesday, May 11, 2010
ESB
Tuesday, May 11, 2010
ESB products
ServiceMix (Open Source) Mule (Open Source) Open ESB (Open Source) Sonic ESB WebSphere ESB Oracle ESB Tibco BizTalk Server
Tuesday, May 11, 2010
Actors
Fire-forget
Fire-And-Receive-Eventually
Async send
Async send + wait on Future for reply
Tuesday, May 11, 2010
Enterprise Integration Patterns
Tuesday, May 11, 2010
Enterprise Integration Patterns
Apache Camel
More than 80 endpoints XML (Spring) DSL Scala DSL
Tuesday, May 11, 2010
Compute Grids
Tuesday, May 11, 2010
Compute Grids
Parallel execution
Divide and conquer
1. Split up job in independent tasks 2. Execute tasks in parallel 3. Aggregate and return result
MapReduce - Master/Worker
Tuesday, May 11, 2010
Compute Grids
Parallel execution
Automatic provisioning Load balancing Fail-over Topology resolution
Tuesday, May 11, 2010
Compute Grids
Products
Platform DataSynapse Google MapReduce Hadoop GigaSpaces GridGain
Tuesday, May 11, 2010
Load balancing
Tuesday, May 11, 2010
Load balancing
Random allocation Round robin allocation Weighted allocation Dynamic load balancing
Least connections Least server CPU etc.
Tuesday, May 11, 2010
Load balancing
DNS Round Robin (simplest) Reverse Proxy (better) Hardware Load Balancing
Tuesday, May 11, 2010
Ask DNS for IP for host Get a new IP every time
Load balancing products
Reverse Proxies:
Apache mod_proxy (OSS) HAProxy (OSS) Squid (OSS) Nginx (OSS)
Hardware Load Balancers:
BIG-IP Cisco
Tuesday, May 11, 2010
Parallel Computing
Tuesday, May 11, 2010
Parallel Computing
SPMD Pattern Master/Worker Pattern Loop Parallelism Pattern Fork/Join Pattern MapReduce Pattern
UE: Unit of Execution Process Thread Coroutine Actor
Tuesday, May 11, 2010
SPMD Pattern
Single Program Multiple Data Very generic pattern, used in many other patterns Use a single program for all the UEs Use the UEs ID to select different pathways through the program. F.e:
Branching on ID Use ID in loop index to split loops
Keep interactions between UEs explicit
Tuesday, May 11, 2010
Master/Worker
Tuesday, May 11, 2010
Master/Worker
Good scalability Automatic load-balancing How to detect termination?
Bag of tasks is empty Poison pill
If we bottleneck on single queue?
Use multiple work queues Work stealing
What about fault tolerance?
Use in-progress queue
Tuesday, May 11, 2010
Loop Parallelism
Workow
1.Find the loops that are bottlenecks 2.Eliminate coupling between loop iterations 3.Parallelize the loop
If too few iterations to pull its weight
OpenMP
Tuesday, May 11, 2010
Merge loops Coalesce nested loops ompparallelfor
What if task creation cant be handled by:
parallelizing loops (Loop Parallelism) putting them on work queues (Master/Worker)
Tuesday, May 11, 2010
What if task creation cant be handled by:
parallelizing loops (Loop Parallelism) putting them on work queues (Master/Worker)
Enter Fork/Join
Tuesday, May 11, 2010
Fork/Join
Use when relationship between tasks
is simple Good for recursive data processing Can use work-stealing
1. Fork: Tasks are dynamically created 2. Join: Tasks are later terminated and data aggregated
Tuesday, May 11, 2010
Fork/Join
Direct task/UE mapping
1-1 mapping between Task/UE Problem: Dynamic UE creation is expensive Pool the UE Control (constrain) the resource allocation Automatic load balancing
Indirect task/UE mapping
Tuesday, May 11, 2010
Fork/Join
Java 7 ParallelArray (Fork/Join DSL)
Tuesday, May 11, 2010
Fork/Join
Java 7 ParallelArray (Fork/Join DSL)
ParallelArraystudents= newParallelArray(fjPool,data); doublebestGpa=students.withFilter(isSenior) .withMapping(selectGpa) .max();
Tuesday, May 11, 2010
MapReduce
Origin from Google paper 2004 Used internally @ Google Variation of Fork/Join Work divided upfront not dynamically Usually distributed Normally used for massive data crunching
Tuesday, May 11, 2010
MapReduce
Products
Hadoop (OSS), used @ Yahoo Amazon Elastic MapReduce Many NOSQL DBs utilizes it for searching/querying
Tuesday, May 11, 2010
MapReduce
Tuesday, May 11, 2010
Parallel Computing
products
MPI OpenMP JSR166 Fork/Join java.util.concurrent
ExecutorService, BlockingQueue etc.
ProActive Parallel Suite CommonJ WorkManager (JEE)
Tuesday, May 11, 2010
Stability Patterns
Tuesday, May 11, 2010
Stability Patterns
Timeouts Circuit Breaker Let-it-crash Fail fast Bulkheads Steady State Throttling
Tuesday, May 11, 2010
Timeouts
Always use timeouts (if possible):
Thread.wait(timeout) reentrantLock.tryLock blockingQueue.poll(timeout,timeUnit)/
offer(..)
futureTask.get(timeout,timeUnit) socket.setSoTimeOut(timeout)
Tuesday, May 11, 2010
etc.
Circuit Breaker
Tuesday, May 11, 2010
Let it crash
Embrace failure as a natural state in
the life-cycle of the application manage it
Instead of trying to prevent it; Process supervision Supervisor hierarchies (from Erlang)
Tuesday, May 11, 2010
Restart Strategy
OneForOne
Tuesday, May 11, 2010
Restart Strategy
OneForOne
Tuesday, May 11, 2010
Restart Strategy
OneForOne
Tuesday, May 11, 2010
Restart Strategy
AllForOne
Tuesday, May 11, 2010
Restart Strategy
AllForOne
Tuesday, May 11, 2010
Restart Strategy
AllForOne
Tuesday, May 11, 2010
Restart Strategy
AllForOne
Tuesday, May 11, 2010
Supervisor Hierarchies
Tuesday, May 11, 2010
Supervisor Hierarchies
Tuesday, May 11, 2010
Supervisor Hierarchies
Tuesday, May 11, 2010
Supervisor Hierarchies
Tuesday, May 11, 2010
Fail fast
Avoid slow responses Separate: Verify resource availability before
starting expensive task
SystemError - resources not available ApplicationError - bad user input etc
Input validation immediately
Tuesday, May 11, 2010
Bulkheads
Tuesday, May 11, 2010
Bulkheads
Partition and tolerate
failure in one part
Redundancy Applies to threads as well:
One pool for admin tasks to be able to perform tasks even though all threads are blocked
Tuesday, May 11, 2010
Steady State
Clean up after you Logging:
RollingFileAppender (log4j) logrotate (Unix) Scribe - server for aggregating streaming log data Always put logs on separate disk
Tuesday, May 11, 2010
Throttling
Maintain a steady pace Count requests Queue requests
Used in for example Staged Event-Driven Architecture (SEDA)
If limit reached, back-off (drop, raise error)
Tuesday, May 11, 2010
Upcoming seminars
7 Juni - Akka ? - Event-Driven Architecture
Tuesday, May 11, 2010
?
Tuesday, May 11, 2010
thanks
for listening
Tuesday, May 11, 2010
Extra material
Tuesday, May 11, 2010
Client-side consistency
Strong consistency Weak consistency Eventually consistent Never consistent
Tuesday, May 11, 2010
Client-side Eventual Consistency levels
Casual consistency Read-your-writes consistency (important) Session consistency Monotonic read consistency (important) Monotonic write consistency
Tuesday, May 11, 2010
Server-side consistency
N = the number of nodes that store replicas of
the data
W = the number of replicas that need to
acknowledge the receipt of the update before the update completes
R = the number of replicas that are contacted
when a data object is accessed through a read operation
Tuesday, May 11, 2010
Server-side consistency
W+R > N W + R <= N
strong consistency eventual consistency
Tuesday, May 11, 2010