Map Reduce Paradigm

MapReduce is a programming paradigm designed for parallel distributed processing of large datasets by transforming them into key-value pairs and reducing them into smaller sets. It involves map tasks that delegate data into key-value pairs and reduce tasks that aggregate the data into a standard format. The final output is typically stored in the Hadoop Distributed File System (HDFS) for efficient big data handling.

Uploaded by

Ishmael Garikai Kutambura

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views3 pages

Map Reduce Paradigm

Uploaded by

Ishmael Garikai Kutambura

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

MapReduce Paradigm

 MapReduce is a programming paradigm that was designed to allow parallel distributed

processing of large sets of data, converting them to sets of tuples, and then combining and
reducing those tuples into smaller sets of tuples.
 MapReduce was designed to take big data and use parallel distributed computing to turn big
data into little- or regular-sized data.
 Parallel distributed processing refers to a powerful framework where mass volumes of data
are processed very quickly by distributing processing tasks across clusters of commodity
servers. With respect to MapReduce, tuples refer to key-value pairs by which data is
grouped, sorted, and processed.

In the map task, you delegate your data to key-value pairs, transform it, and filter it. Then you
assign the data to nodes for processing.

Map the data.

The incoming data must first be delegated into key-value pairs and divided into fragments, which
are then assigned to map tasks. Each computing cluster — a group of nodes that are connected to
each other and perform a shared computing task — is assigned a number of map tasks, which are
subsequently distributed among its nodes.

Upon processing of the key-value pairs, intermediate key-value pairs are generated. The
intermediate key-value pairs are sorted by their key values, and this list is divided into a new set
of fragments. Whatever count you have for these new fragments, it will be the same as the count
of the reduce tasks.

In the reduce task, you aggregate that data down to smaller sized datasets. Data from the reduce
step is transformed into a standard key-value format — where the key acts as the record identifier
and the value is the value that’s being identified by the key. The clusters’ computing nodes
process the map and reduce tasks that are defined by the user.

Reduce the data.

Every reduce task has a fragment assigned to it. The reduce task simply processes the fragment
and produces an output, which is also a key-value pair. Reduce tasks are also distributed among
the different nodes of the cluster. After the task is completed, the final output is written onto a
file system.

In short, you can quickly and efficiently boil down and begin to make sense of a huge volume,
velocity, and variety of data by using map and reduce tasks to tag your data by (key, value) pairs,
and then reduce those pairs into smaller sets of data through aggregation operations —
operations that combine multiple values from a dataset into a single value.

A diagram of the MapReduce architecture

 Search phase is geared toward throughput as it processes very efficiently large batches of
queries, typically 104–107 query descriptors.
 The search also requires a preliminary step, the creation of a lookup table, where all
query descriptors of a batch are grouped according to their closest representative
discovered from traversing the index tree.
 This lookup table is written to the local disk of all the nodes that will perform search.
 Each map task receives (i) a block of data from one of the previously created index files
and (ii) the file containing the lookup table.
 The mapper processes only the descriptors in its assigned chunk of data, which are
relevant for the queries.
 Distance calculations are computed for those descriptors and queries assigned to the same
cluster identifier.
 k-nn results are eventually emitted by the mappers and then aggregated by reducers to
create the final result for the query batch.

If your data doesn’t lend itself to being tagged and processed via keys, values, and
aggregation, then map and reduce generally isn’t a good fit for your needs.

If you’re using MapReduce as part of a Hadoop solution, then the final output is written onto
the Hadoop Distributed File System (HDFS). HDFS is a file system that includes clusters of
commodity servers that are used to store big data. HDFS makes big data handling and storage
financially feasible by distributing storage tasks across clusters of cheap commodity servers.

Map and reduce functions, do not address the parallelization and execution of the MapReduce
jobs. This is the responsibility of the MapReduce model, which automatically takes care of
distribution of input data, as well as scheduling and managing map and reduce tasks.

https://www.dummies.com/programming/big-data/data-science/the-mapreduce-programming-
paradigm/#:~:text=MapReduce%20is%20a%20programming%20paradigm,into%20smaller
%20sets%20of%20tuples.&text=In%20the%20map%20task%2C%20you,transform%20it%2C
%20and%20filter%20it.

https://www.ibm.com/docs/en/netezza?topic=guide-mapreduce-paradigm

https://www.sciencedirect.com/topics/computer-science/mapreduce-paradigm

Map Reduce
No ratings yet
Map Reduce
2 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Hadoop Seminar Report IIT Guwahati
No ratings yet
Hadoop Seminar Report IIT Guwahati
28 pages
MapReduce Architecture Explained
No ratings yet
MapReduce Architecture Explained
13 pages
Big Data Computing
No ratings yet
Big Data Computing
36 pages
Hadoop MapReduce for Developers
No ratings yet
Hadoop MapReduce for Developers
4 pages
MapReduce for Big Data Developers
No ratings yet
MapReduce for Big Data Developers
9 pages
Big Data Analytics UNIT 3 Notets
No ratings yet
Big Data Analytics UNIT 3 Notets
12 pages
Unit 4 1
No ratings yet
Unit 4 1
12 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Mapreduce: Simpli - Ed Data Processing On Large Clusters
No ratings yet
Mapreduce: Simpli - Ed Data Processing On Large Clusters
4 pages
Hadoop MapReduce for Big Data
No ratings yet
Hadoop MapReduce for Big Data
5 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
Hadoop: A Report Writing On
No ratings yet
Hadoop: A Report Writing On
13 pages
777 1651400043 BD Module 4
No ratings yet
777 1651400043 BD Module 4
21 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
Practical 1: Data Mining and Business Intelligence Practical-1
No ratings yet
Practical 1: Data Mining and Business Intelligence Practical-1
10 pages
MapReduce BigData 09
No ratings yet
MapReduce BigData 09
9 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
29 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
28 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Hadoop: Er. Gursewak Singh Dsce
No ratings yet
Hadoop: Er. Gursewak Singh Dsce
15 pages
Analysis of Mapreduce Algorithms: Harini Padmanaban
No ratings yet
Analysis of Mapreduce Algorithms: Harini Padmanaban
6 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
Unit 5 Lecture 5
No ratings yet
Unit 5 Lecture 5
21 pages
Mapreduce Model Principles
No ratings yet
Mapreduce Model Principles
65 pages
Unit - III Advanced Analytics Technology and Tools
No ratings yet
Unit - III Advanced Analytics Technology and Tools
44 pages
UNIT III Notes
No ratings yet
UNIT III Notes
24 pages
B. Hadoop Ecosystem - III (MapReduce)
No ratings yet
B. Hadoop Ecosystem - III (MapReduce)
55 pages
Cloud Computing Prof
No ratings yet
Cloud Computing Prof
11 pages
Lec 6
No ratings yet
Lec 6
16 pages
CC Unit-7
No ratings yet
CC Unit-7
16 pages
MapReduce for Big Data Processing
No ratings yet
MapReduce for Big Data Processing
7 pages
MapReduce: Big Data Processing Guide
No ratings yet
MapReduce: Big Data Processing Guide
25 pages
Bda Unit 3
No ratings yet
Bda Unit 3
29 pages
Module 3 (Part-1) - Big Data
No ratings yet
Module 3 (Part-1) - Big Data
46 pages
TM2 ch02 Mapreduce
No ratings yet
TM2 ch02 Mapreduce
51 pages
Unit 3
No ratings yet
Unit 3
27 pages
Medha 8059
No ratings yet
Medha 8059
4 pages
Lec 6
No ratings yet
Lec 6
14 pages
Big Data
No ratings yet
Big Data
120 pages
Big Data BCA Unit4
No ratings yet
Big Data BCA Unit4
9 pages
The Map Reduce Programming
No ratings yet
The Map Reduce Programming
15 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
Map Reduce Report
No ratings yet
Map Reduce Report
16 pages
Untitled
No ratings yet
Untitled
16 pages
Unit 3 BDT
No ratings yet
Unit 3 BDT
42 pages
8300 Gui SV
No ratings yet
8300 Gui SV
22 pages
Mapreduce: Definition - What Is ?
No ratings yet
Mapreduce: Definition - What Is ?
3 pages
Lecture 3 - MapReduce
No ratings yet
Lecture 3 - MapReduce
9 pages
3.1.how Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.how Map Reduce Works & 3.2 Anatomy
11 pages
Unit-III Big Data
No ratings yet
Unit-III Big Data
10 pages
MapReduce for Big Data Processing
No ratings yet
MapReduce for Big Data Processing
7 pages
Lesson 2 A Review of Hadoop
No ratings yet
Lesson 2 A Review of Hadoop
6 pages
Paper Map Reduce
No ratings yet
Paper Map Reduce
16 pages
By Christian Mechem and Geoff Crowley
No ratings yet
By Christian Mechem and Geoff Crowley
11 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Unit - 5
No ratings yet
Unit - 5
57 pages
BDA - Unit 3
No ratings yet
BDA - Unit 3
41 pages
Database Management Exam Nuggets
No ratings yet
Database Management Exam Nuggets
133 pages
Interactive Dashboard Application Using R
No ratings yet
Interactive Dashboard Application Using R
2 pages
Battlet Test of Sphericity
No ratings yet
Battlet Test of Sphericity
1 page
Ethical Hacking Simulations
No ratings yet
Ethical Hacking Simulations
14 pages
Hadoop System and Yarn Including Their Components
No ratings yet
Hadoop System and Yarn Including Their Components
3 pages
Hadoop Distributed File System
No ratings yet
Hadoop Distributed File System
4 pages
IS Solution for Health Collective
No ratings yet
IS Solution for Health Collective
18 pages
Database
No ratings yet
Database
122 pages
CCS369
No ratings yet
CCS369
2 pages
CBSE Class 11 Informatics Practices Introduction To SQL
No ratings yet
CBSE Class 11 Informatics Practices Introduction To SQL
13 pages
Gecelec 1 Midterm
No ratings yet
Gecelec 1 Midterm
2 pages
EWM Main Tables
No ratings yet
EWM Main Tables
4 pages
The Role of AI in Enhancing Sandboxing For Cybersecurity
No ratings yet
The Role of AI in Enhancing Sandboxing For Cybersecurity
5 pages
MS Access Basics Quiz
No ratings yet
MS Access Basics Quiz
8 pages
CHAPTER 5 Summary of Findings Conclusion Recommendation
67% (3)
CHAPTER 5 Summary of Findings Conclusion Recommendation
5 pages
JAMES-AIS - Slides, Ch1, 7e.1
No ratings yet
JAMES-AIS - Slides, Ch1, 7e.1
62 pages
DVD Rental Data Analysis
No ratings yet
DVD Rental Data Analysis
7 pages
MBA Business Analysis SIP Project
80% (10)
MBA Business Analysis SIP Project
50 pages
R Programming Lab
No ratings yet
R Programming Lab
57 pages
Operating and Managing Hitachi Content Platform v8.x: HCP Concepts
No ratings yet
Operating and Managing Hitachi Content Platform v8.x: HCP Concepts
27 pages
Find If You Have HDD or SSD in Windows 10
No ratings yet
Find If You Have HDD or SSD in Windows 10
2 pages
SPark Monitoring and Tuning PPT 3.3.1
No ratings yet
SPark Monitoring and Tuning PPT 3.3.1
15 pages
Hari Prasad Resume
No ratings yet
Hari Prasad Resume
4 pages
Configuration Librarian
No ratings yet
Configuration Librarian
3 pages
Resume Anvesh Garg Recent
No ratings yet
Resume Anvesh Garg Recent
2 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
99 pages
Accounting Information Systems Guide
No ratings yet
Accounting Information Systems Guide
40 pages
Dbms TLP Batch 06
No ratings yet
Dbms TLP Batch 06
10 pages
Predictive Data Analytics With Python
100% (2)
Predictive Data Analytics With Python
97 pages
Sqoop Export and Import Commands
No ratings yet
Sqoop Export and Import Commands
5 pages
Flowchart For Bibliomagika® 2.0
No ratings yet
Flowchart For Bibliomagika® 2.0
1 page
CH 4 Relational Databas..
No ratings yet
CH 4 Relational Databas..
14 pages
03 Prep For PCA - Designing and Implementing v1.2
100% (1)
03 Prep For PCA - Designing and Implementing v1.2
100 pages
NOVOgen Brown Production Chart
No ratings yet
NOVOgen Brown Production Chart
6 pages
Supply Chain Analytics in Tableu
No ratings yet
Supply Chain Analytics in Tableu
15 pages
LVM3
No ratings yet
LVM3
5 pages

Map Reduce Paradigm

Uploaded by

Map Reduce Paradigm

Uploaded by

MapReduce Paradigm

 MapReduce is a programming paradigm that was designed to allow parallel distributed

Map the data.

Reduce the data.

A diagram of the MapReduce architecture

You might also like