[go: up one dir, main page]

0% found this document useful (0 votes)
66 views189 pages

NoSql 2024 Assign2

NoSQL databases are non-relational and designed for large scale data storage needs. They avoid join operations and typically scale horizontally. NoSQL databases have dynamic schemas, support unstructured data, and focus on eventual consistency over ACID properties. There are several categories of NoSQL databases including key-value stores, column-oriented databases, document databases, and graph databases. The CAP theorem states that a distributed system can only strongly support two of the properties of consistency, availability, and partition tolerance at the same time.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views189 pages

NoSql 2024 Assign2

NoSQL databases are non-relational and designed for large scale data storage needs. They avoid join operations and typically scale horizontally. NoSQL databases have dynamic schemas, support unstructured data, and focus on eventual consistency over ACID properties. There are several categories of NoSQL databases including key-value stores, column-oriented databases, document databases, and graph databases. The CAP theorem states that a distributed system can only strongly support two of the properties of consistency, availability, and partition tolerance at the same time.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 189

NOSQL

NoSQL

NoSQL is a non-relational database management systems,


different from traditional relational database management
systems in some significant ways.

It is designed for distributed data stores where very large scale of


data storing needs.

For example Google or Facebook which collects terabits of data


every day for their users.

These type of data storing may not require fixed schema, avoid
join operations and typically scale horizontally.
RDBMS

• Structured and organized data

• Structured query language (SQL)

• Data and its relationships are stored in separate tables.

• Data Manipulation Language, Data Definition Language

• Tight Consistency
Impedance Mismatch
The emergence of NoSQL

• Stands for Not Only SQL

• No predefined schema

• Key-Value pair storage, Column Store, Document Store, Graph databases

• Eventual consistency rather ACID property

• Unstructured and unpredictable data

• CAP Theorem
SQL vs NoSQL

1. SQL databases are relational, NoSQL are non-


relational.
2. SQL databases use structured query language and have
a predefined schema. NoSQL databases have dynamic
schemas for unstructured data.
3. SQL databases are vertically scalable, NoSQL databases
are horizontally scalable.
4. SQL databases are table based, while NoSQL databases
are document, key-value, graph or wide-column stores.
5. SQL databases are better for multi-row transactions,
NoSQL are better for unstructured data like documents
or JSON.
NoSQL Categories

• There are four general types (most common categories) of NoSQL

databases.

• Each of these categories has its own specific attributes and limitations.

• There is not a single solutions which is better than all the others, however

there are some databases that are better to solve specific problems.
• Key-value stores
• Column-oriented
• Document oriented
• Graph database
Key-value stores
• Key-value stores are most basic types of NoSQL databases.

• Designed to handle huge amounts of data (Based on Amazon’s Dynamo paper ).

• Key value stores allow developer to store schema-less data.

• In the key-value storage, database stores data as hash table where each
key is unique and the value can be string, JSON etc.

• For example a key-value pair might consist of a key like "Name" that is
associated with a value like "Robin".

• Key-Values stores would work well for shopping cart contents.

• Example of Key-value store DataBase : Redis, Dynamo, Riak. etc.


Column-oriented databases

• Most databases have a row as a unit of storage which, in


particular, helps write performance.

• However, there are many scenarios where writes are rare, but we
often need to read a few columns of many rows at once.

• In this situation, it’s better to store groups of columns for all rows
as the basic storage unit—which is why these databases are
called column stores.

• Example of Column-oriented databases : BigTable, Hbase,


Cassandra etc.
To get a particular customer’s name from Figure 2.5 we could do something like
get('1234', 'name').
Document Oriented databases

• A collection of documents

• Data in this model is stored inside documents.

• A document is a key value collection where the key allows access to its value.

• Documents are not typically forced to have a schema and therefore are

flexible and easy to change.

• Documents can contain many different key-value pairs, or key-array pairs, or

even nested documents.

• Example of Document Oriented databases : MongoDB, CouchDB etc.


Graph databases

• A graph database stores data in a graph.


• A graph database is a collection of nodes and edges
• Each node represents an entity (such as a student or business) and
each edge represents a connection or relationship between two
nodes.
• Every node and edge are defined by a unique identifier.
• Each node knows its adjacent nodes.
• Example of Graph databases : OrientDB, Neo4J, Titan. etc.
• Graph databases are an odd fish in the NoSQL pond.

• Most NoSQL databases were inspired by the need to run on clusters,


which led to aggregate-oriented data models of large records with simple
connections.

• Graph databases are motivated by a different frustration with relational


databases and thus have an opposite model—small records with complex
interconnections, something like Figure below.
• We refer to a graph data structure of nodes connected by edges.

• In Figure we have a web of information whose nodes are very small (nothing
more than a name) but there is a rich structure of interconnections between
them.

• With this structure, we can ask questions such as “ find the books in the
Databases category that are written by someone whom a friend of mine
likes.”
Brewer’s CAP Theorem
• The theorem states that within a large-scale distributed data system, there
are three requirements that have a relationship of sliding dependency:
Consistency, Availability, and Partition Tolerance.

• Consistency : All database clients will read the same value for the same
query, even given concurrent updates.

• Availability : All database clients will always be able to read and write
data.

• Partition Tolerance : The database can be split into multiple machines; it


can continue functioning in the face of network segmentation breaks.
• The system will be allowed to loose arbitrarily many messages sent from
one node to another.
• Brewer’s theorem is that in any given system, we can strongly support only
two of the three.

• We have to choose between them because of this sliding mutual


dependency.

• The more consistency we demand from our system, for example, the less
partition-tolerant we are likely to be able to make it, unless we make some
concessions around availability.
• In distributed systems, however, it is very likely that we will have network
partitioning, and that at some point, machines will fail and cause others to
become unreachable.

• Packet loss, too, is nearly inevitable.

• This leads us to the conclusion that a distributed system must do its best to
continue operating in the face of network partitions (to be Partition-
Tolerant), leaving us with only two real options to choose from: Availability
and Consistency.
• Figure . CAP Theorem indicates that we can realize only two of these properties at once
• Figure shows the general focus of some of the different databases.

Figure 1-2. Where different databases appear on the CAP continuum


• In this depiction, relational databases are on the line between Consistency
and Availability.

• Graph databases such as Neo4J and the set of databases derived at least in
part from the design of Google’s Bigtable database (such as MongoDB,
HBase, Hypertable, and Redis) all are focused slightly less on Availability and
more on ensuring Consistency and Partition Tolerance.

• Finally, the databases Cassandra, Project Voldemort, CouchDB, and Riak


are more focused on Availability and Partition-Tolerance.

• However, this does not mean that they dismiss Consistency as unimportant.
• According to the Bigtable paper, the average percentage of server hours that
“some data” was unavailable is 0.0047%.

• Many use cases where “eventual consistency” is tolerable and where


“eventual” is a matter of milliseconds.
CA :
• Single site cluster, therefore all nodes are always in contact, when
partition occurs system block.

CP :
• Some data may not be accessible, but the rest is still consistent/accurate.

AP :
• System is still avaiable under partioning, but some of the data returned
may be inaccurate.
How to Choose the Right NoSQL
Database for Your Application?
• https://www.dataversity.net/choose-right-
nosql-database-application/
Aggregate data models

• A data model is the model through which we perceive and manipulate our
data.

• For people using a database, the data model describe how we interact with
the data in the database.

• Data model : the model by which the database organizes data.


• The dominant data model of the last couple of decades is the relational data
model, which is best visualized as a set of tables.

• Each table has rows, with each row representing some entity of interest.

• We describe this entity through columns, each having a single value.

• A column may refer to another row in the different table, which constitutes a
relationship between those entities.
Aggregates

• The relational model takes the information that we want to store and divides
it into tuples (rows).

• A tuple is a limited data structure: It captures a set of values, so we cannot


nest one tuple within another to get nested records, nor can we put a list of
values or tuples within another.

• Aggregate orientation takes a different approach.

• We often want to operate on data in units that have a more complex structure
than a set of tuples.

• key-value, document, and column-family databases all make use of this more
complex record
• However, there is no common term for this complex record; here (we) use the
term “ aggregate.”

• aggregate is a collection of related objects that we wish to treat as a unit.

• In particular, it is a unit for data manipulation and management of


consistency.

• Aggregates are also often easier for application programmers to work with,
since they often manipulate data through aggregate structures.
Example of Relations and Aggregates
• Consider an example of building an e-commerce website;

• we are going to be selling items directly to customers over the web, and we
will have to store information about users, our product catalog, orders,
shipping addresses, billing addresses, and payment data.

• We can use this scenario to model the data using a relation data store as
well as NoSQL data stores and talk about their pros and cons.
For a relational database, we might start with a data model shown in Figure 2.1.
Figure 2.2 presents some sample data for this model
The model might look when we think in more aggregate oriented terms (Figure 2.3).
• sample data, which is shown in JSON format as a common representation for
data in NoSQL land.

// in customers
{
"id":1,
"name":"Martin",
"billingAddress":[{"city":"Chicago"}]
}
// in orders
{
"id":99,
"customerId":1,
"orderItems":[
{ "productId":27,
"price": 32.45,
"productName": "NoSQL Distilled"
}
],
"shippingAddress":[{"city":"Chicago"}]
"orderPayment":[
{ "ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Chicago"}
}
],
}
• In this model, we have two main aggregates: customer and order.

• The black-diamond composition marker in UML to show how data fits into the
aggregation structure.

• The customer contains a list of billing addresses; the order contains a list of
order items, a shipping address, and payments.
• A single logical address record appears three times in the example data, but
instead of using IDs it’s treated as a value and copied each time.

• With aggregates, we can copy the whole address structure into the
aggregate as we need to.

• Indeed we could draw our aggregate boundaries differently, putting all the
orders for a customer into the customer aggregate (Figure 2.4).
• Using the above data model, an example Customer and Order would look
like this:
• Like most things in modeling, there’s no universal answer for how to draw our
aggregate boundaries.

• It depends entirely on how we tend to manipulate our data.

• If we tend to access a customer together with all of that customer’s orders at


once, then we would prefer a single aggregate.

• However, if we tend to focus on accessing a single order at a time, then we


should prefer having separate aggregates for each order.
Distribution Models

• NoSQL has an ability to run databases on a large cluster.

• As data volumes increase, it becomes more difficult and expensive to scale


up—buy a bigger server to run the database on.

• A more appealing option is to scale out—run the database on a cluster of


servers.

• Depending on our distribution model, we can get a data store that will give us
– the ability to handle larger quantities of data,
– the ability to process a greater read or write traffic, or
– more availability in the face of network slowdowns or breakages.
• Broadly, there are two paths to data distribution: replication and sharding.

• Replication takes the same data and copies it over multiple nodes.

• Sharding puts different data on different nodes.

• Replication and sharding are orthogonal techniques: we can use either or


both of them.

• Replication comes into two forms: master-slave and peer-to-peer.


Sharding
• Often, a busy data store is busy because different people are accessing
different parts of the dataset.

• In these circumstances we can support horizontal scalability by putting


different parts of the data onto different servers—a technique that’s called
sharding.

• In the ideal case, we have different users all talking to different server nodes.

• Each user only has to talk to one server, so gets rapid responses from that
server.

• The load is balanced out nicely between servers—for example, if we have ten
servers, each one only has to handle 10% of the load.
• we have to ensure that data that’s accessed together is clumped together
on the same node and that these clumps are arranged on the nodes to
provide the best data access.

• The first part of this question is how to clump the data up so that one user
mostly gets her data from a single server. This is where aggregate
orientation comes in really handy.
• When it comes to arranging the data on the nodes, there are several factors
that can help improve performance.

• If we know that most accesses of certain aggregates are based on a physical


location, we can place the data close to where it’s being accessed.

• If we have orders for someone who lives in Boston, we can place that data in
our eastern US data center.
• Sharding does little to improve resilience when used alone.

• Although the data is on different nodes, a node failure makes that shard’s
data unavailable just as surely as it does for a single-server solution.

• The resilience benefit it does provide is that only the users of the data on
that shard will suffer; however, it’s not good to have a database with part of
its data missing.
Master-Slave Replication

• With master-slave distribution, we replicate data across multiple nodes.

• One node is designated as the master, or primary.

• This master is the authoritative source for the data and is usually
responsible for processing any updates to that data.

• The other nodes are slaves, or secondaries.

• A replication process synchronizes the slaves with the master


• Master-slave replication is most helpful for scaling when we have a
read intensive dataset.

• we can scale horizontally to handle more read requests by adding more slave
nodes and ensuring that all read requests are routed to the slaves.

• we are still, however, limited by the ability of the master to process updates
and its ability to pass those updates on.

• Consequently it isn’t such a good scheme for datasets with heavy write traffic.
• A second advantage of master-slave replication is read resilience: Should the
master fail, the slaves can still handle read requests.

• Again, this is useful if most of our data access is reads.

• The failure of the master does eliminate the ability to handle writes until
either the master is restored or a new master is appointed.
• Masters can be appointed manually or automatically.

• Manual appointing typically means that when we configure our cluster, we


configure one node as the master.

• With automatic appointment, we create a cluster of nodes and they select


one of themselves to be the master.

• Apart from simpler configuration, automatic appointment means that the


cluster can automatically appoint a new master when a master fails,
reducing downtime.
Peer-to-Peer Replication
• Master-slave replication helps with read scalability but doesn’t help with
scalability of writes.

• It provides resilience against failure of a slave, but not of a master.

• Essentially, the master is still a bottleneck and a single point of failure.

• Peer-to-peer replication (see Figure 4.3) attacks these problems by not


having a master.

• All the replicas have equal weight, they can all accept writes, and the loss
of any of them doesn’t prevent access to the data store.
• With a peer-to-peer replication cluster, we can ride over node failures
without losing access to data.

• Furthermore, we can easily add nodes to improve our performance.

• There’s much to like here—but there are complications.

• The biggest complication is, again, consistency.

• When we can write to two different places, we run the risk that two people
will attempt to update the same record at the same time—a write-write
conflict.
• Inconsistencies on read lead to problems but at least they are relatively
transient. Inconsistent writes are forever.
Combining Sharding and Replication
• Replication and sharding are strategies that can be combined.

• If we use both masterslave replication and sharding (see Figure 4.4), this
means that we have multiple masters, but each data item only has a single
master.

• Depending on our configuration, we may choose a node to be a master for


some data and slaves for others, or we may dedicate nodes for master or
slave duties.
• Using peer-to-peer replication and sharding is a common strategy for column
family databases.

• In a scenario like this we might have tens or hundreds of nodes in a cluster


with data sharded over them.

• A good starting point for peer-to-peer replication is to have a replication


factor of 3, so each shard is present on three nodes.

• Should a node fail, then the shards on that node will be built on the other
nodes (see Figure 4.5).
• MongoDB is an open-source document database and leading NoSQL
database.

• MongoDB is written in C++

• MongoDB is a cross-platform, document oriented database that provides,


high performance, high availability, and easy scalability.

• MongoDB works on concept of collection and document.


Schema-less
Extensive driver support
Auto-sharding
Replication and High availability
Document oriented storage
Flexibility
Performance
Scalability
Database

Database is a physical container for collections.

Each database gets its own set of files on the file system.

A single MongoDB server typically has multiple databases.


Collection

• Collection is a group of MongoDB documents.

• It is the equivalent of an RDBMS table.

• A collection exists within a single database.

• Collections do not enforce a schema. Documents within a


collection can have different fields.

• Typically, all documents in a collection are of similar or related


purpose.
Document

• A document is a set of key-value pairs.

• Documents have dynamic schema.

• Dynamic schema means that documents in the same


collection do not need to have the same set of fields or
structure, and common fields in a collection's documents may
hold different types of data.
Install MongoDB community server
• https://www.mongodb.com/try/download/co
mmunity (windows)
• Create directory /data/db
• Check the environment variable
• Check if MongoDB Compass is installed
• Install mongo shell
Install the MongoDB PHP Library
• Installing the Extension
• Installing the Library
– Using Composer
Install the MongoDB PHP Library..
Installing the Extension
https://pecl.php.net/package/mongodb
Install the MongoDB PHP Library..
Installing the Extension
https://pecl.php.net/package/mongodb
After downloading the appropriate archive for
your PHP environment, extract the
php_mongodb.dll file to PHP’s extension
directory (XAMPP can be used) and add the
following line to your php.ini file:
extension=php_mongodb.dll
Installing the Library

Using Composer
The preferred method of installing the
MongoDB PHP Library is with Composer by
running the following command from your
project root:
https://getcomposer.org/
Installing the Library

Using Composer
Installing the Library

Using Composer
The preferred method of installing the
MongoDB PHP Library is with Composer by
running the following command from your
project root:
composer require mongodb/mongodb
(run on cmd)
<?php
require 'vendor/autoload.php';
$client=new MongoDB\Client;
$companydb=$client->companydb;
$result1=$companydb-
>createCollection('empcollection');
var_dump($result1);
?>
Following example shows the document structure of a blog site, which is simply
a comma separated key value pair.
• _id is a 12 bytes hexadecimal number which assures the uniqueness of
every document.

• We can provide _id while inserting the document.

• If we don’t provide then MongoDB provides a unique id for every


document.

• These 12 bytes first 4 bytes for the current timestamp, next 3 bytes for
machine id, next 2 bytes for process id of MongoDB server and remaining
3 bytes are simple incremental VALUE.
MongoDB Help
MongoDB Statistics

• db.stats : attributes configured MongoDB server.


MongoDB Statistics

• db.stats() : To get stats about MongoDB server.


• This shows the database name, number of collection and
documents in the database.
Example

• Suppose a client needs a database design for his blog/website and see the
differences between RDBMS and MongoDB schema design. Website has
the following requirements.

• Every post has the unique title, description and url.

• Every post can have one or more tags.

• Every post has the name of its publisher and total number of likes.

• Every post has comments given by users along with their name, message,
data-time and likes.

• On each post, there can be zero or more comments.


Example

In RDBMS schema, design for above requirements will have minimum


three tables.
While in MongoDB schema, design will have one collection post and the
following structure
So while showing the data, in RDBMS we need to join three
tables and in MongoDB, data will be shown from one collection
only.
The use Command
The use Command

In MongoDB default database is test. If you didn't create any database, then
collections will be stored in test database.
The dropDatabase() Method
The dropDatabase() Method
The createCollection() Method

Options parameter is optional


In MongoDB, we don't need to create collection. MongoDB creates
collection automatically, when we insert some document.

>db.mitcollection.insert({"name" : “ICT"})

>show collections
mycol
mycollection
system.indexes
mitcollection
>
The drop() Method
MongoDB - Datatypes

• String

• Integer

• Boolean

• Double

• Arrays

• Timestamp

• Object.

And more
The insert() Method
MongoDB - Query Document

The find() Method


RDBMS Where Clause Equivalents in MongoDB
AND in MongoDB
OR in MongoDB
Using AND and OR Together

The following example will show the documents that have likes greater than 100
and whose title is either 'MongoDB Overview' or by is 'tutorials point'.

Equivalent SQL where clause is 'where likes>10 AND (by = 'tutorials point' OR
title = 'MongoDB Overview')'
Update Document

MongoDB's update() and save() methods are used to update document into a
collection.
The update() method updates the values in the existing document while the
save() method replaces the existing document with the document passed in
save() method.
db.post.updateOne({_id: ObjectId('65a9ea971220ffbb76de9657')},{ $set:
{ title: 'Developer\'s hub',topic:['MongoDB Atlas','MongoDB
Compass']}},{upsert:true})
db.post.updateOne({_id:
ObjectId('65a9ea971220ffbb76de9657')},{$push:{tags:'not structured'}})
• The updateOne() method accepts a filter document,
an update document, and an optional options
object. MongoDB provides update operators and
options to help you update documents.
• The $set operator replaces the value of a field with
the specified value
• The upsert option creates a new document if no
documents match the filtered criteria
• The $push operator adds a new value to
the hosts array field
• db.post.updateMany({},{$set:{application:["Se
rverless dev","Edge Computing","AI","IOT"]}})
Delete Document
Projection

In MongoDB, projection means selecting only the necessary data rather than
selecting whole of the data of a document. If a document has 5 fields and
you need to show only 3, then select only 3 fields from them.

The find() Method


MongoDB's find() method, explained in MongoDB Query Document accepts
second optional parameter that is list of fields that you want to retrieve. In
MongoDB, when you execute find() method, then it displays all fields of a
document. To limit this, you need to set a list of fields with value 1 or 0. 1 is
used to show the field while 0 is used to hide the fields.
Please note _id field is always displayed while executing find() method, if you
don't want this field, then you need to set it as 0.
Querying on Array Elements

• { <field>: { $elemMatch: { <query1>, <query2>, ... } } }


• The $elemMatch operator matches documents that contain an array field
with at least one element that matches all the specified query criteria
• db.post.find({tags:'mongodb',comments: { $elemMatch: { like: { $gte: 0 }
}}})
• db.post.find({tags:'mongodb',comments: { $elemMatch: { like: { $gte: 0 }
}}},{title:1,'comments.user':1})
Limit Records
If you don't specify the number argument in limit() method then it will
display all documents from the collection.
Please note, the default value in skip() method is 0.
Sort Records

The sort() Method


To sort documents in MongoDB, you need to use sort() method. The method
accepts a document containing a list of fields along with their sorting order.

To specify sorting order 1 and -1 are used. 1 is used for ascending order while
-1 is used for descending order.
Please note, if you don't specify the sorting preference, then sort() method will
display the documents in ascending order.
Count Documents
• db.collection.countDocuments( <query>,
<options> )
• db.post.countDocuments({likes:{$gt:25}})
Aggregation Pipeline
• Aggregation: Collection and summary of data
• Stage: One of the built-in methods that can be
completed on the data, but does not
permanently alter it
• Aggregation pipeline: A series of stages
completed on the data in order
db.collection.aggregate([
{
$stage1: {
{ expression1 },
{ expression2 }...
},
$stage2: {
{ expression1 }...
}
}
])
• $match and $group aggregation
• The $match stage filters for documents that
match specified conditions
• The $group stage groups documents by a
group key
• {
• $match: {
• "field_name": "value"
• }
• }
• {
• $group:
• {
• _id: <expression>, // Group key
• <field>: { <accumulator> : <expression> }
• }
• }
• db.post.aggregate([
• {
• $match: {
• by: 'tutorials point'
• }
• }, { $group:{_id:null,"totallikes":{$sum:"$likes"}}}
• ])
• < { _id: null,
• totallikes: 440}
• db.post.aggregate([
• {
• $match: {
• by: 'tutorials point'
• }
• }, { $group:{_id:"$title","totallikes":{$sum:"$likes"}}}
• ])
Sort and limit
• The $sort stage sorts all input documents and
returns them to the pipeline in sorted order.
Use 1 to represent ascending order, and -1 to
represent descending order.
• The $limit stage returns only a specified
number of records.
Sort and limit
$merge
• Merges the output with a specified collection
• The $merge stage provides more flexibility. It
can merge the results of the aggregation with
an existing collection.
• It allows specifying how the merging should
occur, with options like overwriting existing
documents, merging them, or even keeping
the existing ones if there's a conflict
$merge
$merge
$lookup and $map
• The $lookup stage adds a new array field to
each input document.
• $map Applies an expression to each item in an
array and returns an array with the applied
results
• db.posts.insertMany([ { _id: 1, title: "The Joy of MongoDB", description:
"Introduction to MongoDB", url: "http://example.com/mongodb", likes:
100, post_by: "Author1" },
• { _id: 2, title: "Aggregation Framework", description: "Deep Dive into
Aggregation", url: "http://example.com/aggregation", likes: 150, post_by:
"Author2" },
• { _id: 3, title: "Sharding Strategies", description: "How to shard
effectively", url: "http://example.com/sharding", likes: 75, post_by:
"Author3" }]);
• db.comments.insertMany([ // Comments for post with _id: 1 {
comment_id: 1, post_id: 1, by_user: "User1", message: "Great post!",
date_time: new Date(), likes: 5 }, { comment_id: 2, post_id: 1, by_user:
"User2", message: "Very informative!", date_time: new Date(), likes: 3 },

• // Comments for post with _id: 2 { comment_id: 3, post_id: 2, by_user:


"User3", message: "I love the Aggregation Framework!", date_time: new
Date(), likes: 8 }, { comment_id: 4, post_id: 2, by_user: "User4",
message: "Can't wait to try this out.", date_time: new Date(), likes: 4 },

• // Comments for post with _id: 3 { comment_id: 5, post_id: 3, by_user:


"User5", message: "Sharding is complex, but this helps.", date_time: new
Date(), likes: 2 }]);
• db.accounts.insertMany([ { accountId: 1, accountType: "Savings",
customerName: "John Doe", amount: 1000, previousTransactionDetails: [ {
transactionId: "T1001", amount: 100, type: "Deposit" }, { transactionId: "T1002",
amount: 50, type: "Withdrawal" } ] },
{ accountId: 2, accountType: "Checking", customerName: "Jane Smith", amount:
1500, previousTransactionDetails: [ { transactionId: "T2001", amount: 200, type:
"Deposit" }, { transactionId: "T2002", amount: 100, type: "Withdrawal" } ] },
{ accountId: 3, accountType: "Savings", customerName: "Alice Johnson",
amount: 500, previousTransactionDetails: [ { transactionId: "T3001", amount: 500, type:
"Deposit" } ] },
{ accountId: 4, accountType: "Checking", customerName: "Bob Brown", amount:
800, previousTransactionDetails: [ { transactionId: "T4001", amount: 300, type:
"Deposit" }, { transactionId: "T4002", amount: 100, type: "Withdrawal" } ] },
{ accountId: 5, accountType: "Savings", customerName: "Charlie Davis", amount:
1200, previousTransactionDetails: [ { transactionId: "T5001", amount: 1000, type:
"Deposit" }, { transactionId: "T5002", amount: 200, type: "Withdrawal" } ] }]);
MongoDB - Replication

• Replication is the process of synchronizing data across multiple servers.

• Replication provides redundancy and increases data availability with


multiple copies of data on different database servers.

• Replication protects a database from the loss of a single server.

• Replication also allows you to recover from hardware failure and service
interruptions.
Why Replication?

• To keep your data safe


• High (24*7) availability of data
• Disaster recovery
• No downtime for maintenance (like backups, index rebuilds, compaction)
• Read scaling (extra copies to read from)
How Replication Works in MongoDB

MongoDB achieves replication by the use of replica set.

A replica set is a group of mongod instances that host the same data set.

In a replica, one node is primary node that receives all write operations.

All other instances, such as secondaries, apply operations from the primary so
that they have the same data set.

Replica set can have only one primary node.


• Replica set is a group of two or more nodes (generally minimum 3 nodes
are required).

• In a replica set, one node is primary node and remaining nodes are
secondary.

• All data replicates from primary to secondary node.

• At the time of automatic failover or maintenance, election establishes for


primary and a new primary node is elected.

• After the recovery of failed node, it again join the replica set and works as
a secondary node.
A typical diagram of MongoDB replication is shown in which client application always
interact with the primary node and the primary node then replicates the data to the
secondary nodes.
Replica Set: Adding the First Member using rs.initiate()

Step 1) Ensure that all mongod.exe instances which will be added to the
replica set are installed on different servers.

Step 2) Ensure that all mongo.exe instances can connect to each other. From
ServerA, issue the below 2 commands

Mongo –host ServerB –port 27017


mongo –host ServerC –port 27017

mongod --port 27017 --dbpath /data/rs1 --replSet rs0 --bind_ip localhost


mongod --port 27018 --dbpath /data/rs2 --replSet rs0 --bind_ip localhost
mongod --port 27019 --dbpath /data/rs3 --replSet rs0 --bind_ip localhost
Similarly, do the same thing from the remaining servers.
Replica Set: Adding the First Member using rs.initiate()

Step 3) Start the first mongod.exe instance with the replSet option.
This option provides a grouping for all servers which will be part of this replica
set.
mongo –replSet "Replica1“

Where "Replica1" is the name of your replica set. You can choose any
meaningful name for your replica set name.

Step 4) Now that the first server is added to the replica set, the next step is to
initiate the replica set by issuing the following command

rs.initiate ()

Step 5) Verify the replica set by issuing the command rs.conf() to ensure the
replica set up properly
Replica Set: Adding a Secondary using rs.add()

The secondary servers can be added to the replica set by just using the
rs.add command.

This command takes in the name of the secondary servers and adds the
servers to the replication set.

Suppose if you have ServerA, ServerB, and ServerC, which are required to
be part of your replica set and ServerA, is defined as the primary server in
the replica set.

To add ServerB and ServerC to the replica set issue the commands
rs.add("ServerB")
rs.add("ServerC")

Eg: rs.add("localhost:27018");rs.add("localhost:27019");
MongoDB - Sharding

Sharding is the process of storing data records across multiple machines and
it is MongoDB's approach to meeting the demands of data growth.

As the size of the data increases, a single machine may not be sufficient to
store the data nor provide an acceptable read and write throughput.

Sharding solves the problem with horizontal scaling.

With sharding, you add more machines to support data growth and the
demands of read and write operations.
Sharding in MongoDB
Three main components are

Shards :
This is the basic thing, and this is nothing but a MongoDB instance which holds
the subset of the data.
They provide high availability and data consistency.
In production environments, all shards need to be part of replica sets.

Config Servers :
This is a mongodb instance which holds metadata about the cluster, basically
information about the various mongodb instances which will hold the shard
data.
Query Routers :

This is a mongodb instance which basically is responsible to re-directing the


commands send by the client to the right servers.
A sharded cluster can contain more than one query router to divide the
client request load.
A client sends requests to one query router. Generally, a sharded cluster
have many query routers.
Key-Value Databases
• A key-value store is a simple hash table, primarily used
when all access to the database is via primary key.
• Key-value stores are the simplest NoSQL data stores to
use from an API perspective. The client can either get the
value for the key, put a value for a key, or delete a key
from the data store. The value is a blob that the data
store just stores, without caring or knowing what’s inside;
it’s the responsibility of the application to understand
what was stored.
• Since key-value stores always use primary-key access,
they generally have great performance and can be easily
scaled.
Some of the popular key-value
databases
• Riak [Riak],
• Redis (often referred to as Data Structure server) [Redis],
• Memcached DB and its flavors [Memcached],
• Berkeley DB [Berkeley DB],
• HamsterDB (especially suited for embedded use)
[HamsterDB],
• Amazon DynamoDB [Amazon’s Dynamo] (not open-
source), and
• Project Voldemort [Project Voldemort] (an open-source
implementation of Amazon DynamoDB).
Key-Value Store Features
• consistency,
• transactions,
• query features,
• structure of the data, and
• scaling
REmote DIctionary Server (REDIS)
REmote DIctionary Server (REDIS)
REmote DIctionary Server (REDIS)
Redis - Quick Guide

https://www.tutorialspoint.com/redi
s/redis_quick_guide.htm

You might also like