Big Data
And Analytics
Seema Acharya
Subhashini Chellappan
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Chapter 6
Introduction to MongoDB
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Learning Objectives and Learning Outcomes
Learning Objectives Learning Outcomes
Introduction to MongoDB
1. To study the features of MongoDB. a) To comprehend the reasons
behind the popularity of NoSQL
2. To learn how to perform CRUD database.
operations.
b) To be able to perform CRUD
3. To study aggregation. operations.
4. To study the MapReduce c) To comprehend MapReduce
Framework. framework.
5. To import from and export to CSV d) To understand the aggregation.
format.
e) To be able to successfully
import from and export to CSV.
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Session Plan
Lecture time 90 to 120 minutes
Q/A 15 minutes
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Agenda
What is MongoDB?
Why MongoDB?
Using JSON
Creating or Generating a Unique Key
Support for Dynamic Queries
Storing Binary Data
Replication
Sharding
Terms used in RDBMS and MongoDB
Data Types in MongoDB
CRUD (Insert(), Update(), Save(), Remove(), Find())
MapReduce Functions
Aggregation
Java Scripting
MongoImport
MongoExport
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
MongoDB– An Introduction
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
What is MongoDB?
MongoDB is:
1. Cross-platform.
2. Open source.
3. Non-relational.
4. Distributed.
5. NoSQL.
6. Document-oriented data store.
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Why MongoDB?
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Why MongoDB?
• Open Source
• Distributed
• Fast In-Place Updates
• Replication
• Full Index Support
• Rich Query Language
• Easy Scalability
• Auto sharding
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
JSON
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
JSON (Java Script Object Notation)
Sample JSON Document
{
FirstName: John,
LastName: Mathews,
ContactNo: [+123 4567 8900, +123 4444 5555]
}
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Unique Identifier
Each JSON document should have a unique identifier. It is the _id key.
0 1 2 3 4 5 6 7 8 9 10 11
Timestamp Machine ID Process ID Counter
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Support for Dynamic Queries
MongoDB has extensive support for dynamic queries.
This is in keeping with traditional RDBMS wherein we have static data and
dynamic queries.
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Storing Binary Data
MongoDB provides GridFS to support the storage of binary data.
It can store up to 4 MB of data.
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Replication in MongoDB
Client Application
Writes Reads
Primary
Replication Replication
Replication
Secondary Secondary Secondary
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Sharding in MongoDB
Collection 1
1 TB database
Shard 1 Shard 2 Shard 3 Shard 4
(256 GB) (256 GB) (256 GB) (256 GB)
Logical Database (Collection 1)
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Terms Used in RDBMS and MongoDB
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Terms Used in RDBMS and MongoDB
RDBMS MongoDB
Database Database
Table Collection
Record Document
Columns Fields / Key Value pairs
Index Index
Joins Embedded documents
Primary Key Primary key (_id is a identifier)
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Data Types in MongoDB
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Data Types in MongoDB
String Must be UTF-8 valid.
Most commonly used data type.
Integer Can be 32-bit or 64-bit (depends on the server).
Boolean To store a true/false value.
Double To store floating point (real values).
Min/Max keys To compare a value against the lowest or highest
BSON elements.
Arrays To store arrays or list or multiple values into one
key.
Timestamp To record when a document has been modified or
added.
Null To store a NULL value. A NULL is a missing or
unknown value.
Date To store the current date or time in Unix time
format. One can create object of date and pass
day, month and year to it.
Object ID To store the document’s id.
Binary data To store binary data (images, binaries, etc.).
Code To store javascript code into the document.
Regular expression To store regular expression.
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
CRUD in MongoDB
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Collections
To create a collection by the name “Person”. Let us take a look at the
collection list prior to the creation of the new collection “Person”.
db.createCollection(“Person”);
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Collections
To drop a collection by the name “food”.
db.food.drop();
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Insert Method
Create a collection by the name “Students” and store the following data in it.
db.Students.insert({_id:1, StudName:"Michelle Jacintha", Grade: "VII", Hobbies:
"Internet Surfing"});
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Update Method
Insert the document for “Aryan David” into the Students collection only if it
does not already exist in the collection. However, if it is already present in
the collection, then update the document with new values. (Update his
Hobbies from “Skating” to “Chess”.) Use “Update else insert” (if there is an
existing document, it will attempt to update it, if there is no existing
document then it will insert it).
db.Students.update({_id:3, StudName:"Aryan David", Grade: "VII"},{$set:{Hobbies:
"Skating"}},{upsert:true});
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Find Method
To search for documents from the “Students” collection based on certain
search criteria.
db.Students.find({StudName:"Aryan David"});
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Find Method
To display only the StudName and Grade from all the documents of the
Students collection. The identifier _id should be suppressed and NOT
displayed.
db.Students.find({},{StudName:1,Grade:1,_id:0});
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Find Method
To find those documents where the Grade is set to ‘VII’
db.Students.find({Grade:{$eq:'VII'}}).pretty();
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Find Method
To find those documents from the Students collection where the Hobbies is set
to either ‘Chess’ or is set to ‘Skating’.
db.Students.find ({Hobbies :{ $in: ['Chess','Skating']}}).pretty ();
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Find Method
To find documents from the Students collection where the StudName begins
with “M”.
db.Students.find({StudName:/^M/}).pretty();
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Find Method
To find documents from the Students collection where the StudName has an “e”
in any position.
db.Students.find({StudName:/e/}).pretty();
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Find Method
To find the number of documents in the Students collection.
db.Students.count();
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Find Method
To sort the documents from the Students collection in the descending order of
StudName.
db.Students.find().sort({StudName:-1}).pretty();
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Aggregate Function
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Aggregate Function
{
CustID: “C123”,
AccBal: 500,
AccType: “S”
} {
CustID: “C123”,
AccBal: 500,
{
AccType: “S”
CustID: “C123”,
} {
AccBal: 900,
AccType: “S” _id: “C123”,
} { TotAccBal: 1400
CustID: “C123”, }
AccBal: 900,
{
AccType: “S”
CustID: “C111”, $match $group
}
AccBal: 1200, {
AccType: “S” _id: “C111”,
} {
CustID: “C111”, TotAccBal: 1200
AccBal: 1200, }
{
AccType: “S”
CustID: “C123”,
}
AccBal: 1500,
AccType: “C”
}
Customers
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Aggregate Function
First filter on “AccType:S” and then group it on “CustID” and then
compute the sum of “AccBal” and then filter those documents wherein
the “TotAccBal” is greater than 1200, use the below syntax:
db.Customers.aggregate( { $match : {AccType : "S" } },
{ $group : { _id : "$CustID",TotAccBal : { $sum : "$AccBal" } } },
{ $match : {TotAccBal : { $gt : 1200 } }});
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
MapReduce Framework
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
MapReduce Framework
{
CustID: “C123”,
AccBal: 500,
AccType: “S”
} {
CustID: “C123”,
AccBal: 500,
{
AccType: “S”
CustID: “C123”, {
}
AccBal: 900, _id: “C123”,
AccType: “S” value: 1400
} {
CustID: “C123”,
AccBal: 900, {“C123”:[ 500,900 ]} }
{
AccType: “S”
CustID: “C111”,
query } map
AccBal: 1200, {“C111”: 1200 } {
AccType: “S” _id: “C111”,
} {
CustID: “C111”, value: 1200
AccBal: 1200, }
{
AccType: “S”
CustID: “C123”, Customer_Totals
}
AccBal: 1500,
AccType: “C”
}
Customers
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Java Script Programming
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Java Script Programming
To compute the factorial of a given positive number. The user is required to create
a function by the name “factorial” and insert it into the “system.js” collection.
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
MongoImport
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Import data from a CSV file
Given a CSV file “sample.txt” in the D: drive, import the file into the MongoDB
collection, “SampleJSON”. The collection is in the database “test”.
Mongoimport --db test --collection SampleJSON --type csv --headerline --file d:\sample.txt
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
MongoExport
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Export data to a CSV file
This command used at the command prompt exports MongoDB JSON documents
from “Customers” collection in the “test” database into a CSV file “Output.txt”
in the D: drive.
Mongoexport --db test --collection Customers --csv --fieldFile d:\fields.txt --out
d:\output.txt
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Answer a few quick questions …
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Crossword
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Answer Me
What is MongoDB?
Comment on Auto-sharding in MongoDB.
What are collections and documents?
What is JSON?
Explain your understanding of Update In-Place.
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Summary please…
Ask a few participants of the learning program to summarize the lecture.
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
References …
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Further Readings
http://www.mongodb.org/
https://university.mongodb.com/
http://www.tutorialspoint.com/mongodb/
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.
Thank you
Big Data and Analytics by Seema Acharya and Subhashini Chellappan
Copyright 2015, WILEY INDIA PVT. LTD.