0% found this document useful (0 votes)

4 views25 pages

Core System Design Principles

The document discusses core system design principles, focusing on availability, consistency, and the CAP theorem. It defines availability as the uptime of a system and explains how it can be quantified in terms of 'nines.' Additionally, it explores consistency patterns in distributed systems, emphasizing the trade-offs between consistency, availability, and partition tolerance.

Uploaded by

mujhecodingnahiaati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views25 pages

Core System Design Principles

Uploaded by

mujhecodingnahiaati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

CORE SYSTEM DESIGN PRINCIPLES

Availability
What do we mean by Availability?

Availability refers to the ability of a system to provide its services to clients even
in the presence of failures. This is often measured in terms of the percentage of time
that the system is up and running, also known as its uptime.

It is usually expressed as a percentage, indicating the system's uptime over a

specific period.

The formal definition of availability is:

Availability = Uptime / (Uptime + Downtime)

Uptime: The period during which a system is functional and accessible.

Downtime: The period during which a system is unavailable due to failures,

maintenance, or other issues.

The Nine's of availability

Availability is often quantified by uptime (or downtime) as a percentage of time

the service is available. It is generally measured in the number of 9s.

Availability = Uptime(Uptime+Downtime)

If availability is 99.00% available, it is said to have "2 nines" of availability, and if

it is 99.9%, it is called "3 nines", and so on.

1
Availability Downtime Downtime Downtime
(Percent) (Year) (Month) (Week)
90% 36.53 72 16.8
(one nine) days hours hours
99% 3.65 7.20 1.68
(two nines) days hours hours
99.9% 8.77 43.8 10.1
(three nines) hours minutes minutes
99.99% 52.6 4.32 1.01
(four nines) minutes minutes minutes
99.999% 5.25 25.9 6.05
(five nines) minutes seconds seconds
99.9999% 31.56 2.59 604.8
(six nines) seconds seconds milliseconds
99.99999% 3.15 263 60.5
(seven nines) seconds milliseconds milliseconds
99.999999% 315.6 26.3 6
(eight nines) milliseconds milliseconds milliseconds
99.9999999% 31.6 2.6 0.6
(nine nines) milliseconds milliseconds milliseconds

Availability in Sequence vs Parallel

If a service consists of multiple components prone to failure, the service's

overall availability depends on whether the components are in sequence or in
parallel.

Sequence

 Overall availability decreases when two components are in sequence.

 Availability (Total)=Availability (Foo)∗Availability (Bar)
 For example, if both Foo and Bar each had 99.9% availability, their total
availability in sequence would be 99.8%.

2
Parallel

 Overall availability increases when two components are in parallel.

 Availability (Total)=1−(1−Availability (Foo))∗(1−Availability (Bar))
 For example, if both Foo and Bar each had 99.9% availability, their total
availability in parallel would be 99.9999%.

3
Availability vs. Reliability
If a system is reliable, it is available. However, if it is available, it is not necessarily
reliable. In other words, high reliability contributes to high availability, but it is
possible to achieve high availability even with an unreliable system.

High availability vs. Fault Tolerance

Both high availability and fault tolerance apply to methods for providing high
uptime levels. However, they accomplish the objective differently.

A fault-tolerant system has no service interruption but a significantly higher cost,

while a highly available system has minimal service interruption. Fault-tolerance
requires full hardware redundancy as if the main system fails, with no loss in uptime,
another system should take over.

Availability Patterns

There are two complementary patterns to support high availability: fail-over and
replication.

1. FailOver

Failover is an availability pattern that is used to ensure that a system can

continue to function in the event of a failure. It involves having a backup component
or system that can take over in the event of a failure.

In a failover system, there is a primary component that is responsible for

handling requests, and a secondary (or backup) component that is on standby. The
primary component is monitored for failures, and if it fails, the secondary component

4
is activated to take over its duties. This allows the system to continue functioning
with minimal disruption.

Failover can be implemented in various ways, such as active-passive,

active-active, and hot-standby.

2. Active-Passive Redundancy

An active-passive architectural pattern consists of at least two nodes. The

passive server (failover) acts as a backup that remains on standby and takes over in
the event the active server gets disconnected for whatever reason. The primary
active server hosts production, test and development applications.

The secondary passive server essentially remains dormant during normal

operation. A major disadvantage of this model is that there is no guarantee that the
production application will function as expected on the passive server. The model is
also considered a relatively wasteful approach because expensive hardware is left
unused.

5
Active-Active Redundancy

The active-active model also contains at least two nodes; however, in this
architectural pattern, multiple nodes are actively running the same services
simultaneously. In order to fully utilize all the active nodes, an active-active cluster
uses load balancing to distribute workloads across the nodes in order to prevent any
single node from being overloaded. The distributed workload subsequently leads to a
marked improvement in response times and throughput.

The load balancers uses a set of complex algorithms to assign clients to the
nodes, the connections are typically based on performance metrics and health
checks. In order to guarantee seamless operability, all the nodes in the cluster must
be configured for redundancy. A potential drawback for an active-active redundancy
is that in case one of the nodes fails, client sessions might be dropped, forcing them
to re-login into the system. However, this can easily be mitigated by ensuring that
the individual configuration settings of each node are virtually identical.

6
Replication
Replication is an availability pattern that involves having multiple copies of the
same data stored in different locations. In the event of a failure, the data can be
retrieved from a different location. There are two main types of replication:
Master-Master replication and Master-Slave replication.

Master-slave replication

The master serves reads and writes, replicating writes to one or more slaves,
which serve only reads. Slaves can also replicate to additional slaves in a tree-like
fashion. If the master goes offline, the system can continue to operate in read-only
mode until a slave is promoted to a master or a new master is provisioned.

Both masters serve reads and writes and coordinate with each other on writes.
If either master goes down, the system can continue to operate with both reads and
writes.

7
Disadvantages: master-master replication

 You'll need a load balancer or you'll need to make changes to your application
logic to determine where to write.

 Most master-master systems are either loosely consistent (violating ACID) or

have increased write latency due to synchronization.

 Conflict resolution comes more into play as more write nodes are added and
as latency increases.

 Additional logic is needed to promote a slave to a master.

Disadvantages: replication

 There is a potential for loss of data if the master fails before any newly written
data can be replicated to other nodes.

 Writes are replayed to the read replicas. If there are a lot of writes, the read
replicas can get bogged down with replaying writes and can't do as many
reads.

 The more read slaves, the more you have to replicate, which leads to greater
replication lag.
8
 On some systems, writing to the master can spawn multiple threads to write
in parallel, whereas read replicas only support writing sequentially with a
single thread.

 Replication adds more hardware and additional complexity.

9
Consistency
What do we mean by consistency?

Consistency refers to the system's ability to ensure that all users see the same
data, regardless of where or when they access it. In a consistent system, any update
to the data is immediately visible to all users, and there are no conflicting or
outdated versions of the data.

Consistency Patterns in Distributed Systems

Before we talk about the Consistency Patterns, we should know what a

distributed system is. Simply put, a distributed system is a system that consists of
more than one components, and each component is responsible for one part of the
application.

A distributed system is a system whose components are located on different

networked computers, which communicate and coordinate their actions by passing
messages to one another. The components interact with one another in order to
achieve a common goal.

Example of a Distributed System

Imagine we have an e-commerce application where we are selling books. This

application may consist of multiple different components. For example, one server
might be responsible for the accounts, another might be responsible for the
payments, one might be responsible for storing orders, one might be responsible for
loyalty points and relevant functionalities, and another might be responsible for
maintaining the books inventory and so on.

10
Now, if a user buys a book, there might be different services involved in
placing the order; order service for storing the order, payment service for handling
the payments, and inventory service for keeping the stock of that ordered book up to
date. This is an example of a distributed system, an application that consists of
multiple different components, each of which is responsible for a different part of
the application.

Why is Consistency Important?

When working with distributed systems, we need to think about managing the
data across different servers. If we take the above example of the e-commerce
application, we can see that the inventory service must have up-to-date stock
information for the ordered items if the user places an order. Now, there might be
two different users looking at the same book. Now imagine if one of the customers
places a successful order, and before the inventory service can update the stock, the
second customer also places the order for the same book. In that case, when the
inventory wasn’t updated, we will have the wrong stock information when the
second order was placed, i.e., the ordered book may or may not be available in stock.
This is where different consistency patterns come into play. They help ensure that
the data is consistent across the application.

11
Consistency Patterns
Consistency patterns refer to the ways in which data is stored and managed in
a distributed system and how that data is made available to users and applications.

There are three main types of consistency patterns:

1. Strong consistency
2. Weak consistency
3. Eventual Consistency

Each of these patterns has its own advantages and disadvantages, and the choice of
which pattern to use will depend on the specific requirements of the application or
system.

12
1. Strong Consistency
After an update is made to the data, it will be immediately visible to any
subsequent read operations. The data is replicated in a synchronous manner,
ensuring that all copies of the data are updated at the same time.

In a strong consistency system, any updates to some data are immediately

propagated to all locations. This ensures that all locations have the same version of
the data, but it also means that the system is not highly available and has high
latency.

An example of strong consistency is a financial system where users can

transfer money between accounts. The system is designed for high data integrity, so
the data is stored in a single location and updates to that data are immediately
propagated to all other locations. This ensures that all users and applications are
working with the same, accurate data. For instance, when a user initiates a transfer
of funds from one account to another, the system immediately updates the balance
of both accounts and all other system components are immediately aware of the
change. This ensures that all users can see the updated balance of both accounts and
prevents any discrepancies.

2. Weak Consistency
After an update is made to the data, it is not guaranteed that any subsequent
read operation will immediately reflect the changes made. The read may or may not
see the recent write.

In a weakly consistent system, updates to the data may not be immediately

propagated. This can lead to inconsistencies and conflicts between different versions
of the data, but it also allows for high availability and low latency.

13
Another example of weak consistency is a gaming platform where users can
play online multiplayer games. When a user plays a game, their actions are
immediately visible to other players in the same data center, but if there was a lag or
temporary connection loss, the actions may not be seen by some of the users and
the game will continue. This can lead to inconsistencies between different versions of
the game state, but it also allows for a high level of availability and low latency.

3. Eventual Consistency
Eventual consistency is a form of Weak Consistency. After an update is made
to the data, it will be eventually visible to any subsequent read operations. The data
is replicated in an asynchronous manner, ensuring that all copies of the data are
eventually updated.

In an eventually consistent system, data is typically stored in multiple

locations, and updates to that data are eventually propagated to all locations. This
means that the system is highly available and has low latency, but it also means that
there may be inconsistencies and conflicts between different versions of the data.

An example of eventual consistency is a social media platform where users

can post updates, comments, and messages. The platform is designed for high
availability and low latency, so the data is stored in multiple data centers around the
world. When a user posts an update, the update is immediately visible to other users
in the same data center, but it may take some time for the update to propagate to
other data centers. This means that some users may see the update while others
may not, depending on which data center they are connected to. This can lead to
inconsistencies between different versions of the data, but it also allows for a high
level of availability and low latency.

14
CAP Theorem
The CAP Theorem states that, in a distributed system (a collection of
interconnected nodes that share data.), you can only have two out of the following
three guarantees across a write/read pair: Consistency, Availability, and Partition
Tolerance - one of them must be sacrificed. However, as you will see below, you
don't have as many options here as you might think.

Consistency

Consistency means that all clients see the same data at the same time, no
matter which node they connect to. For this to happen, whenever data is written to
one node, it must be instantly forwarded or replicated across all the nodes in the
system before the write is deemed "successful".

Availability

Availability means that any client making a request for data gets a response,
even if one or more nodes are down.

Partition tolerance

Partition tolerance means the system continues to work despite message loss
or partial failure. A system that is partition-tolerant can sustain any amount of

15
network failure that doesn't result in a failure of the entire network. Data is
sufficiently replicated across combinations of nodes and networks to keep the
system up through intermittent outages.

Consistency-Availability Tradeoff

We live in a physical world and can't guarantee the stability of a network, so

distributed databases must choose Partition Tolerance (P). This implies a tradeoff
between Consistency (C) and Availability (A).

CP - Consistency/Partition Tolerance

Wait for a response from the partitioned node which could result in a timeout
error. The system can also choose to return an error, depending on the scenario you
desire.

Choose Consistency over Availability when your business requirements dictate

atomic reads and write

16
AP - Availability/Partition Tolerance

Return the most recent version of the data you have, which could be stale.
This system state will also accept writes that can be processed later when the
partition is resolved.

Choose Availability over Consistency when your business requirements allow

for some flexibility around when the data in the system synchronizes. Availability is
also a compelling option when the system needs to continue to function in spite of
external errors (shopping carts, etc.)

The decision between Consistency and Availability is a software trade off. You
can choose what to do in the face of a network partition - the control is in your hands.
Network outages, both temporary and permanent, are a fact of life and occur
whether you want them to or not - this exists outside of your software.

Building distributed systems provide many advantages, but also adds

complexity. Understanding the trade-offs available to you in the face of network
errors, and choosing the right path is vital to the success of your application. Failing
to get this right from the beginning could doom your application to failure before
your first deployment.

17
CA database

A CA database delivers consistency and availability across all nodes. It can't do

this if there is a partition between any two nodes in the system, and therefore can't
deliver fault tolerance.

Example: PostgreSQL, MariaDB.

CP database

A CP database delivers consistency and partition tolerance at the expense of

availability. When a partition occurs between any two nodes, the system has to shut
down the non-consistent node until the partition is resolved.

Example: MongoDB, Apache HBase.

AP database

An AP database delivers availability and partition tolerance at the expense of

consistency. When a partition occurs, all nodes remain available but those at the
wrong end of a partition might return an older version of data than others. When the
partition is resolved, the AP databases typically re-syncs the nodes to repair all
inconsistencies in the system.

Example: Apache Cassandra, CouchDB.

18
PACELC Theorem
The PACELC theorem is an extension of the CAP theorem. The CAP theorem
states that in the case of network partitioning (P) in a distributed system, one has to
choose between Availability (A) and Consistency (C).

PACELC extends the CAP theorem by introducing latency (L) as an additional

attribute of a distributed system. The theorem states that else (E), even when the
system is running normally in the absence of partitions, one has to choose between
latency (L) and consistency (C).

The PACELC theorem was first described by Daniel J. Abadi.

PACELC theorem was developed to address a key limitation of the CAP

theorem as it makes no provision for performance or latency.

For example, according to the CAP theorem, a database can be considered

available if a query returns a response after 30 days. Obviously, such latency would
be unacceptable for any real-world application.

19
Consistent Hashing
Let's first understand the problem we're trying to solve.

Why do we need this?

In traditional hashing-based distribution methods, we use a hash function to

hash our partition keys (i.e. request ID or IP). Then if we use the modulo against the
total number of nodes (server or databases). This will give us the node where we
want to route our request.

Hash(key1)→H1modN=Node0Hash(key2)→H2modN=Node1Hash(key3)→H3modN=
Node2...Hash(keyn)→HnmodN=Noden−1

Where,

key: Request ID or IP.

H: Hash function result.

N: Total number of nodes.

Node: The node where the request will be routed.

The problem with this is if we add or remove a node, it will cause N to change,
meaning our mapping strategy will break as the same requests will now map to a
different server. As a consequence, the majority of requests will need to be
redistributed which is very inefficient.

20
We want to uniformly distribute requests among different nodes such that we
should be able to add or remove nodes with minimal effort. Hence, we need a
distribution scheme that does not depend directly on the number of nodes (or
servers), so that, when adding or removing nodes, the number of keys that need to
be relocated is minimized.

Consistent hashing solves this horizontal scalability problem by ensuring that

every time we scale up or down, we do not have to re-arrange all the keys or touch
all the servers.

Now that we understand the problem, let's discuss consistent hashing in detail.

How does it work

Consistent Hashing is a distributed hashing scheme that operates

independently of the number of nodes in a distributed hash table by assigning them
a position on an abstract circle, or hash ring.

This allows servers and objects to scale without affecting the overall system.

21
Using consistent hashing, only K/N data would require re-distributing.

R = K/N

Where,

R: Data that would require re-distribution.

K: Number of partition keys.

N: Number of nodes.

The output of the hash function is a range let's say 0...m-1 which we can represent
on our hash ring. We hash the requests and distribute them on the ring depending
on what the output was. Similarly, we also hash the node and distribute them on the
same ring as well.

Hash(key1)=P1Hash(key2)=P2Hash(key3)=P3...Hash(keyn)=Pm−1

Where,

key: Request/Node ID or IP.

P: Position on the hash ring.

m: Total range of the hash ring.

Now, when the request comes in we can simply route it to the closest node in
a clockwise (can be counterclockwise as well) manner.

This means that if a new node is added or removed, we can use the nearest
node and only a fraction of the requests need to be re-routed.

In theory, consistent hashing should distribute the load evenly however it

doesn't happen in practice. Usually, the load distribution is uneven and one server
may end up handling the majority of the request becoming a hotspot, essentially a
bottleneck for the system.

We can fix this by adding extra nodes but that can be expensive.

22
Let's see how we can address these issues.

Virtual Nodes

In order to ensure a more evenly distributed load, we can introduce the idea
of a virtual node, sometimes also referred to as a V Node.

Instead of assigning a single position to a node, the hash range is divided into
multiple smaller ranges, and each physical node is assigned several of these smaller
ranges.

Each of these sub ranges is considered a V Node. Hence, virtual nodes are
basically existing physical nodes mapped multiple times across the hash ring to
minimize changes to a node's assigned range.

For this, we can use k number of hash functions.

Hash1(key1)=P1Hash2(key2)=P2Hash3(key3)=P3...Hashk(keyn)=Pm−1

Where,

key: Request/Node ID or IP.

k: Number of hash functions.

23
P: Position on the hash ring.

m: Total range of the hash ring.

As VNodes help spread the load more evenly across the physical nodes on the
cluster by diving the hash ranges into smaller subranges, this speeds up the
re-balancing process after adding or removing nodes. This also helps us reduce the
probability of hotspots.

24
Data Replication

To ensure high availability and durability, consistent hashing replicates each

data item on multiple N nodes in the system where the value N is equivalent to the
replication factor.

The replication factor is the number of nodes that will receive the copy of the
same data. In eventually consistent systems, this is done asynchronously.

Advantages

Let's look at some advantages of consistent hashing:

1. Makes rapid scaling up and down more predictable.

2. Facilitates partitioning and replication across nodes.
3. Enables scalability and availability.
4. Reduces hotspots.

Disadvantages

Below are some disadvantages of consistent hashing:

1. Increases complexity.
2. Cascading failures.
3. Load distribution can still be uneven.
4. Key management can be expensive when nodes transiently fail.

Examples

Let's look at some examples where consistent hashing is used:

1. Data partitioning in Apache Cassandra.

2. Load distribution across multiple storage hosts in Amazon DynamoDB.

System Design
No ratings yet
System Design
22 pages
System Design for Tech Interviews
100% (1)
System Design for Tech Interviews
24 pages
Class 4 - CAP Theorem
No ratings yet
Class 4 - CAP Theorem
3 pages
BSCSF 1552 09 21 Cosf326
No ratings yet
BSCSF 1552 09 21 Cosf326
4 pages
Fault Tolerance Unit 3-4
No ratings yet
Fault Tolerance Unit 3-4
32 pages
Distributed Systems
No ratings yet
Distributed Systems
234 pages
System Design Interview Guide
100% (2)
System Design Interview Guide
91 pages
Availability Digest: Blueprints For High Availability
No ratings yet
Availability Digest: Blueprints For High Availability
8 pages
REPLICATION
No ratings yet
REPLICATION
20 pages
Fault Tolerance Refers To The Ability of A System
No ratings yet
Fault Tolerance Refers To The Ability of A System
3 pages
Assignment Systems2023
100% (3)
Assignment Systems2023
11 pages
Ccaws Unit 5
No ratings yet
Ccaws Unit 5
17 pages
CHAPTER1 Update
No ratings yet
CHAPTER1 Update
20 pages
Technical Essentials of HP Servers, Rev. 11.41
No ratings yet
Technical Essentials of HP Servers, Rev. 11.41
72 pages
Distributed Systems Overview
No ratings yet
Distributed Systems Overview
48 pages
Unit 1 Ccws QB
No ratings yet
Unit 1 Ccws QB
34 pages
Availability Concepts
No ratings yet
Availability Concepts
32 pages
DISTRIBUTED SYSTEMS - Dis Unit 1-5
No ratings yet
DISTRIBUTED SYSTEMS - Dis Unit 1-5
29 pages
CSC 403 Net-Centric Computing
No ratings yet
CSC 403 Net-Centric Computing
47 pages
4 - HA-Reya
No ratings yet
4 - HA-Reya
20 pages
EContent 11 2023 12 02 20 03 22 Cloudcomputingch5pptx 2023 10 12 19 22 05
No ratings yet
EContent 11 2023 12 02 20 03 22 Cloudcomputingch5pptx 2023 10 12 19 22 05
42 pages
Activity - Assessment 2 - Decmar J. Jaclop - CS2E
No ratings yet
Activity - Assessment 2 - Decmar J. Jaclop - CS2E
7 pages
Availability
No ratings yet
Availability
30 pages
The Concept of A Cluster: Solving Three Typical Problems
No ratings yet
The Concept of A Cluster: Solving Three Typical Problems
4 pages
Distributed Systems Overview
No ratings yet
Distributed Systems Overview
18 pages
Availabilty
No ratings yet
Availabilty
23 pages
High Availability and Load Balancing: Realized by
No ratings yet
High Availability and Load Balancing: Realized by
31 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
9 pages
System Design (Computer Science)
No ratings yet
System Design (Computer Science)
4 pages
Bu I 1 - Distributed Systems - 1
No ratings yet
Bu I 1 - Distributed Systems - 1
32 pages
DSCC Unit 4
No ratings yet
DSCC Unit 4
10 pages
Design Patterns For High Availability
No ratings yet
Design Patterns For High Availability
10 pages
Ieee Ha Swieorick
No ratings yet
Ieee Ha Swieorick
19 pages
Lecture 2,3 4 Fault Tolerance Techniques in HPC
No ratings yet
Lecture 2,3 4 Fault Tolerance Techniques in HPC
78 pages
Lesson 1 - Introduction To Fault-Tolerant Computing
No ratings yet
Lesson 1 - Introduction To Fault-Tolerant Computing
6 pages
Dis Sys
No ratings yet
Dis Sys
16 pages
DSs CH 07 - Replication, Consistency Fault Tolerance
No ratings yet
DSs CH 07 - Replication, Consistency Fault Tolerance
4 pages
Big Data Storage: Clusters & NoSQL
No ratings yet
Big Data Storage: Clusters & NoSQL
23 pages
Fault Avoidance and Tolerance Technique
No ratings yet
Fault Avoidance and Tolerance Technique
15 pages
SDE Assignmet-1
No ratings yet
SDE Assignmet-1
14 pages
Distributed Systems Overview
No ratings yet
Distributed Systems Overview
87 pages
Chapter-1Introduction To DS, Issues and Architecture
No ratings yet
Chapter-1Introduction To DS, Issues and Architecture
38 pages
Platform Technology Software IT Infrastracture-1
No ratings yet
Platform Technology Software IT Infrastracture-1
23 pages
10 HighAvailability
No ratings yet
10 HighAvailability
95 pages
Big Data Storage Concepts
No ratings yet
Big Data Storage Concepts
31 pages
Distributed System and Cloud Computing
No ratings yet
Distributed System and Cloud Computing
203 pages
IS303 Architectural Analysis: SMU SIS Personal Notes
No ratings yet
IS303 Architectural Analysis: SMU SIS Personal Notes
86 pages
SDA Presentation
No ratings yet
SDA Presentation
12 pages
Distributed OS: Key Features & Types
No ratings yet
Distributed OS: Key Features & Types
18 pages
SW Architecture - Lecture - 03
No ratings yet
SW Architecture - Lecture - 03
46 pages
CH 1 Distributed System
No ratings yet
CH 1 Distributed System
12 pages
CH 11
No ratings yet
CH 11
44 pages
Lecture 1 - Introduction - 2024
No ratings yet
Lecture 1 - Introduction - 2024
55 pages
01 en Principles of Distributed Systems
No ratings yet
01 en Principles of Distributed Systems
35 pages
Chapter01 Part01
No ratings yet
Chapter01 Part01
26 pages
Fault Tolerance FDCC
No ratings yet
Fault Tolerance FDCC
76 pages
Replication: Distributed Computing
No ratings yet
Replication: Distributed Computing
43 pages
Information Technology Infrastructure IT602
No ratings yet
Information Technology Infrastructure IT602
10 pages
Lec 3 - Basic Concepts
No ratings yet
Lec 3 - Basic Concepts
32 pages
Implement Time Based One Time Password and Secure Hash Algorithm 1 For Security of Website Login Authentication
No ratings yet
Implement Time Based One Time Password and Secure Hash Algorithm 1 For Security of Website Login Authentication
6 pages
Common Default Passwords List
No ratings yet
Common Default Passwords List
3,893 pages
Emc Recoverpoint/Cluster Enabler For Microsoft Failover Cluster
No ratings yet
Emc Recoverpoint/Cluster Enabler For Microsoft Failover Cluster
17 pages
Backup FTP and Delete
No ratings yet
Backup FTP and Delete
9 pages
AWR Tuning Guide for DBAs
100% (1)
AWR Tuning Guide for DBAs
29 pages
Microsoft IIS Configuring BIND To Support Active Directory
No ratings yet
Microsoft IIS Configuring BIND To Support Active Directory
11 pages
Monitoring and Controlling Virtual and Hybrid Test Benches in The Cloud
No ratings yet
Monitoring and Controlling Virtual and Hybrid Test Benches in The Cloud
6 pages
PDMS Command Reference Guide
No ratings yet
PDMS Command Reference Guide
4 pages
OITS Audit
No ratings yet
OITS Audit
6 pages
Krishna Chaitanya 30.nov
No ratings yet
Krishna Chaitanya 30.nov
2 pages
DBMS Assignment PDF
No ratings yet
DBMS Assignment PDF
42 pages
58 - SQL vs. NoSQL
0% (1)
58 - SQL vs. NoSQL
6 pages
Director Release Notes 5.5.1.1
No ratings yet
Director Release Notes 5.5.1.1
34 pages
Metadata Implementation With Ab Initio EME - Teradata Downloads
No ratings yet
Metadata Implementation With Ab Initio EME - Teradata Downloads
9 pages
Procedure - SSP - JDE Implementation Projects - 5.5
No ratings yet
Procedure - SSP - JDE Implementation Projects - 5.5
19 pages
C++ DLLs in VBA for Excel 2000
No ratings yet
C++ DLLs in VBA for Excel 2000
5 pages
Veritas VXVM Cheat Sheet
No ratings yet
Veritas VXVM Cheat Sheet
7 pages
QUESTIONS ON MYSQL-DBMS and PYTHON CONNECTIVITY WITH ANSWERS USING Format
No ratings yet
QUESTIONS ON MYSQL-DBMS and PYTHON CONNECTIVITY WITH ANSWERS USING Format
3 pages
Red Hat DO280 Training Courses
No ratings yet
Red Hat DO280 Training Courses
9 pages
Internet Programming Slip Questions
No ratings yet
Internet Programming Slip Questions
25 pages
PL/SQL Midterm Exam Review
No ratings yet
PL/SQL Midterm Exam Review
19 pages
Suhrit A2 P1
No ratings yet
Suhrit A2 P1
22 pages
Incidentrequest Resolved Feb
No ratings yet
Incidentrequest Resolved Feb
540 pages
Angular of Syllabus PDF
No ratings yet
Angular of Syllabus PDF
8 pages
Sources and Nature of Data
No ratings yet
Sources and Nature of Data
44 pages
CERT-In Vulnerability Notes3
No ratings yet
CERT-In Vulnerability Notes3
2 pages
Aras Innovator 9.2 - Installation Guide
No ratings yet
Aras Innovator 9.2 - Installation Guide
61 pages
Int217 Report Final
No ratings yet
Int217 Report Final
17 pages
Infologix Sap Console Itsmobile
No ratings yet
Infologix Sap Console Itsmobile
4 pages
Attendance Management System Using Face Recognition
No ratings yet
Attendance Management System Using Face Recognition
16 pages

Core System Design Principles

Uploaded by

Core System Design Principles

Uploaded by

CORE SYSTEM DESIGN PRINCIPLES

It is usually expressed as a percentage, indicating the system's uptime over a

The formal definition of availability is:

Availability = Uptime / (Uptime + Downtime)

Uptime: The period during which a system is functional and accessible.

Downtime: The period during which a system is unavailable due to failures,

The Nine's of availability

Availability is often quantified by uptime (or downtime) as a percentage of time

If availability is 99.00% available, it is said to have "2 nines" of availability, and if

Availability in Sequence vs Parallel

If a service consists of multiple components prone to failure, the service's

 Overall availability decreases when two components are in sequence.

 Overall availability increases when two components are in parallel.

High availability vs. Fault Tolerance

A fault-tolerant system has no service interruption but a significantly higher cost,

Failover is an availability pattern that is used to ensure that a system can

In a failover system, there is a primary component that is responsible for

Failover can be implemented in various ways, such as active-passive,

An active-passive architectural pattern consists of at least two nodes. The

The secondary passive server essentially remains dormant during normal

 Most master-master systems are either loosely consistent (violating ACID) or

 Additional logic is needed to promote a slave to a master.

 Replication adds more hardware and additional complexity.

Consistency Patterns in Distributed Systems

Before we talk about the Consistency Patterns, we should know what a

A distributed system is a system whose components are located on different

Example of a Distributed System

Imagine we have an e-commerce application where we are selling books. This

Why is Consistency Important?

There are three main types of consistency patterns:

In a strong consistency system, any updates to some data are immediately

An example of strong consistency is a financial system where users can

In a weakly consistent system, updates to the data may not be immediately

In an eventually consistent system, data is typically stored in multiple

An example of eventual consistency is a social media platform where users

We live in a physical world and can't guarantee the stability of a network, so

Choose Consistency over Availability when your business requirements dictate

Choose Availability over Consistency when your business requirements allow

Building distributed systems provide many advantages, but also adds

A CA database delivers consistency and availability across all nodes. It can't do

Example: PostgreSQL, MariaDB.

A CP database delivers consistency and partition tolerance at the expense of

Example: MongoDB, Apache HBase.

An AP database delivers availability and partition tolerance at the expense of

Example: Apache Cassandra, CouchDB.

PACELC extends the CAP theorem by introducing latency (L) as an additional

The PACELC theorem was first described by Daniel J. Abadi.

PACELC theorem was developed to address a key limitation of the CAP

For example, according to the CAP theorem, a database can be considered

Why do we need this?

In traditional hashing-based distribution methods, we use a hash function to

key: Request ID or IP.

H: Hash function result.

N: Total number of nodes.

Node: The node where the request will be routed.

Consistent hashing solves this horizontal scalability problem by ensuring that

How does it work

Consistent Hashing is a distributed hashing scheme that operates

R: Data that would require re-distribution.

K: Number of partition keys.

key: Request/Node ID or IP.

P: Position on the hash ring.

m: Total range of the hash ring.

In theory, consistent hashing should distribute the load evenly however it

For this, we can use k number of hash functions.

key: Request/Node ID or IP.

k: Number of hash functions.

m: Total range of the hash ring.

To ensure high availability and durability, consistent hashing replicates each

Let's look at some advantages of consistent hashing:

1. Makes rapid scaling up and down more predictable.

Below are some disadvantages of consistent hashing:

Let's look at some examples where consistent hashing is used:

1. Data partitioning in Apache Cassandra.

You might also like