01-07-2025
DSC 403C-5: NoSQL
Unit - 1
Dr. Kavitha R
Mission Vision Core Values
Christ University is a nurturing ground for an individual’s Excellence and Service Faith in God | Moral Uprightness
holistic development to make effective contribution Love of Fellow Beings | Social Responsibility
to the society in a dynamic environment Pursuit of Excellence
Unit - 1
Introduction
Overview, History of NoSQL databases, Definition of four types of NoSQL database,
value of relational databases, getting at persistent data, concurrency, Integration,
Impedance mismatch, Application and integration databases, attack of clusters,
emergence of NoSQL, Key points, comparison of relational databases to new NoSQL
stores, Replication and Sharding, MapReduce on databases. Distribution models,
single server, Master Slave Replication, Peer to Peer Replication, Combining Sharding
and Replication.
1
01-07-2025
Distribution Models
Introduction
NoSQL → ability to run db on a large cluster
Data volume increase → scale up is expensive → scale out (cluster of servers)
Distribution model →
to handle large quantities of data
to process greater read / write traffic
availability during n/w slowdown or breakage
2 pathways to distribution:
Replication and Sharding
2
01-07-2025
Introduction
Replication
copying of same data in multiple nodes
Master-Slave, Peer-to-Peer
Sharding
different data on different nodes
Orthogonal techniques
can use either or both
Single Server
Simplest and no distribution
Run the db on a single machine (handles all r/w)
Preferred as it eliminates all the complexities, Easy to manage
NoSQL also can use this though it is designed to work on clusters
Graph db
If data usage is mostly about processing aggregates (single server document)
In early stages with limited no. of users and data
total dataset is small enough to fit in the memory
frequent reads and few writes (eg., product catalog)
3
01-07-2025
Sharding
✓ Database partitioning technique used to split large datasets across multiple servers
(called shards) so that each shard holds a subset of the data.
✓ Horizontal Partitioning of Data across Multiple Machines
✓ Each shard contains unique rows of information that you can store separately
✓ The partitioned data chunks are called logical shards.
✓ The machine that stores the logical shard is called a physical shard
✓ A physical shard can contain multiple logical shards.
✓ Shard key to determine how to partition the dataset
Sharding
Unsharded db Sharded db
4
01-07-2025
Sharding
Methods of database sharding
To determine the correct node for a particular data
✓ Range based Sharding
✓ Hashed Sharding
✓ Directory Sharding
✓ Geo based Sharding
✓ Entity based Sharding
Sharding
Range based sharding (dynamic sharding)
splits database rows based on a range of values
then, shared key is assigned to respective range
customer name based on 1st alphabet
how read/ write performed?
10
5
01-07-2025
Sharding
Hashed Sharding
assigns the shard key to each row of the database by using a mathematical formula
called a hash function
hash function takes the information from the row and produces a hash value
Hash key as shard key
hash(user_id) % number_of_shards
Ex: User id = 72, No. of Shards = 10 → shard key is 2
Even distribution of data if hash key is chosen properly
Does not separate the db based on some context
11
Sharding
Directory Sharding
uses a lookup table to match database information to the corresponding physical shard
can use any logic
12
6
01-07-2025
Sharding
Geo based Sharding
Data is partitioned based on geographical attributes like country or region.
Customers from India → Shard A
Customers from US → Shard B
Customers from UK → Shard C
Local data stored closely (reduced latency time)
May be uneven data load
13
Sharding
Entity based sharding
Each entity type or feature is stored in a different shard.
Products → Shard A
Users → Shard B
Orders → Shard C
Clear separation
But complex queries span multiple entitites
14
7
01-07-2025
Replication
Copying data from primary server to more replicas
High availability
Fault tolerance
2 ways
Master – Slave Replication
Peer –Peer Replication
15
Replication
Master Slave Replication
One node acts as Master ie., Primary
responsible for write / update operations
Other nodes are slaves ie., secondary
sync from primary and handle read operations
if primary fails, one of the slave will take over
Can scale horizontally to handle more read requests with more slaves
16
8
01-07-2025
Replication
Master Slave Replication
Mostly helpful for scaling when read intensive not for write intensive
Read resilience: slaves can handle read requests though master fails
speeds up the recovery process as a slave can be appointed as master
immediately
Master can be appointed manually or automatically
There may be chances of data inconsistency
if client reads from the slave which is not updated with very recent write
17
Replication
Master Slave Replication
Advantages:
Read Scalability
Data Availability
Disadvantages:
Slave Lag → Data Inconsistency
Write bottleneck
suits only for read intensive applications
18
9
01-07-2025
Replication
Peer to Peer Replication (Multi Master)
Master-Slave → Resilience against failure of slave but not master
Master is still a bottleneck as it is a single point of failure
Addresses this challenge
all replicas has equal weightage
all can accept writes / reads
changes made in one peer replicated to all peers
loss of any of them does not prevent access to data store
19
Replication
Peer to Peer Replication
Can ride over node failures without losing access to data (no single point of failure)
Can easily add nodes to improve performance
Complication is maintaining consistency
R-W or W-W conflict
Latency: changes take time to propagate across all peers
complex set up and maintenance
When high availability, fault tolerance, load balancing for R/ W
20
10
01-07-2025
Replication
Master-Slave vs Peer to Peer Replication
Feature Master-Slave Peer to Peer
Write Access Only Master All Peers
Read Scalability Yes Yes
Write Scalability No Yes
Conflict Resolution Not Required Required
Complexity Lower Higher
Fault Tolerance Master is single point Fully distributed
21
Combining Sharding and Replication
Combination of Sharding and Replication
22
11
01-07-2025
Combining Sharding and Replication
Multiple masters but each data item has a single master
Depending on configuration choose node to be,
master for some data
slave for some data
master for some data and slave for some data
23
Thank You….. !
24
12