[go: up one dir, main page]

0% found this document useful (0 votes)
7 views12 pages

3 Module NOSQL Preparation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views12 pages

3 Module NOSQL Preparation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

1)​ Explain with a neat diagram, the partitioning and combining in MapReduce

A)​ Diagram - 4 Marks


Combining reduces data before sending it across the network. - 3 Marks

Sol:
❖​ In the simplest form, we think of a map-reduce job as having a single reduce
Function.
❖​ The outputs from all the map tasks running on the various nodes
are concatenated together and sent into the reduce.
❖​ While this will work, there are things we can do to increase the parallelism and to reduce
the data transfer
❖​ The first thing we can do is increase parallelism by partitioning the output of the
mappers.
❖​ Each reduce function operates on the results of a single key.

❖​ A combiner function is, in essence, a reducer function—indeed, in many cases


the same function can be used for combining as the final reduction.
❖​ The reduce function needs a special shape for this to work: Its output must match
its input. We call such a function a combinable reducer.
2) Explain basic map reduce, with neat diagram // . Explain Mappers and Reducers with
examples.
A)​
MapReduce is a programming model used for processing large datasets in a distributed and
parallel manner across many computers.

MapReduce operates in two main phases: Map and Reduce. Each phase involves specific
tasks:

Map Phase

●​ The input data is divided into chunks (splits).


●​ Each chunk is processed by a Mapper function, which transforms the input into
intermediate key-value pairs.
Reduce Phase

●​ Each group of key-value pairs is processed by a Reducer function.


●​ The Reducer aggregates values for each key to produce the final result.

Shuffle and Sort

●​ The intermediate key-value pairs are shuffled and grouped by key.


●​ Keys are sent to the appropriate Reducer based on a partitioning function.

3) Explain two stages Map reduce example, with neat diagram

A)​
1.​ As map-reduce calculations get more complex, it’s useful to break them down into
stages using a pipes-and-filters approach, with the output of one stage serving as input
to the next, rather like the pipelines in UNIX.
2.​ A first stage (Figure 7.9) would read the original order records and output a series of
key-value pairs for the sales of each product per month.
3.​ The second-stage mappers (Figure 7.10) process this output depending on the year. A
2011 record populates the current year quantity while a 2010 record populates a prior
year quantity.

4) How are calculations composed in Map reduce? Explain with neat diagram
A)​

MapReduce is designed to process large datasets by dividing the work into smaller tasks.
However, it imposes some constraints:

1.​ In the Map Phase: You can only process one piece of data (record) at a time.
2.​ In the Reduce Phase: You can only process one group of data (key) at a time.

This means you must think differently when solving problems, especially for tasks like
calculating averages, which aren’t straightforward in this model.
5) What Is a Key-Value Store? Single Bucket , Popular Key-Value Databases?

A)​ Key-value stores are the simplest NoSQL data stores to use from an API perspective.
The client can either get the value for the key, put a value for a key, or delete a key from
the data store.
Single Bucket:

Bucket Organization in Key-Value Stores:


Single Bucket Approach: All data (e.g., session data, shopping carts) can be stored
within a single bucket under one key-value pair, creating a unified object. However, this
can risk key conflicts due to different data types being stored under the same bucket. 29
Separate
Buckets for Data Types:
By appending object names to keys or creating specific buckets for each data type (e.g.,
sessionID_userProfile), it’s possible to avoid key conflicts and access only the
necessary object types without needing extensive key design changes.

Popular Key-Value Databases:


●​ Popular Key-Value Databases:
1.​ Riak: Uses a "bucket" structure for segmenting keys, aiding organization.
2.​ Redis: Often referred to as a data structure server, supports complex
structures like lists, sets, and hashes, enabling more versatile use.
3.​ Memcached, Berkeley DB, HamsterDB, Amazon DynamoDB, Project
Voldemort.
6) Give a brief description of the features of key value stores.

A)​ Key-Value Store Features

Key-value stores are a type of NoSQL database that store data in a simple
format: a unique key associated with a value. Think of it as a dictionary where
each key points to a specific value.

Let’s explore the features with respect to the mentioned points:

1. Consistency

●​ Explanation: Consistency refers to whether data remains the same across


all replicas in a distributed database.
●​ In key-value stores:
○​ They usually follow the CAP theorem, where they can trade off
between consistency, availability, and partition tolerance.
○​ Some systems prioritize eventual consistency: changes to a value
may take time to propagate to all replicas but will eventually become
consistent.
○​ Others may enforce strong consistency, ensuring all clients see
the same data at any given time.
●​ Example: If you update a key’s value in a distributed system, not all nodes
may show the update immediately if eventual consistency is used.

2. Transactions

●​ Explanation: Transactions involve ensuring that a group of operations are


completed successfully or not at all (atomicity).
●​ In key-value stores:
○​ Many do not natively support ACID transactions (Atomicity,
Consistency, Isolation, Durability) like relational databases.
○​ Some advanced key-value stores (e.g., Redis) provide limited
transaction-like mechanisms (like multi-operations or optimistic
locking).
○​ They are generally designed for speed and scalability rather than
transactional integrity.
●​ Use Case: A key-value store may not handle banking transactions well
because it lacks strong transaction guarantees.

3. Query Features

●​ Explanation: Query features determine how you retrieve and manipulate


data in the store.
●​ In key-value stores:
○​ Queries are very simple—data is accessed using the key.
○​ There are no complex querying capabilities (e.g., SQL JOINs or
WHERE clauses).
○​ Some systems provide additional features like range queries or
secondary indexing, but these are not standard.
●​ Example:
○​ To retrieve data: GET key1
○​ To update data: SET key1 value1
○​ Advanced querying like "find all users older than 30" is not directly
supported.

4. Structure of the Data

●​ Explanation: This refers to how data is organized and stored.


●​ In key-value stores:
○​ Data is stored as a simple key-value pair.
○​ The value can be of any type—string, number, JSON, or even a
binary object (like an image).
○​ They are schema-less, meaning no predefined structure is required
for values.
●​ Example:
○​ Key: user123
○​ Value: { "name": "John", "age": 30, "city": "New
York" }
5. Scaling

●​ Explanation: Scaling determines how well the database handles an


increase in data or traffic.
●​ In key-value stores:
○​ They are designed to scale horizontally (by adding more servers to
the cluster).
○​ Distributed systems partition the data using techniques like
consistent hashing to balance the load across servers.
○​ Scaling is easy because there is no need to maintain relationships
between data (unlike relational databases).
●​ Example: When a shopping website grows and needs to handle millions of
users, a key-value store can distribute data across multiple servers
seamlessly.

7) Explain with suitable use cases of key value stores


A)​
1.​ Storing Session Information
2.​ User Profiles, Preferences
3.​ Shopping Cart Data

You might also like