Map Reduce

The MapReduce framework, originally based on a Google paper, enables parallel data processing in Hadoop clusters, allowing users to perform tasks without necessarily writing Java code. A typical MapReduce job involves dividing data into input splits, processing them with mappers to produce key-value pairs, and optionally reducing the output for final results. Additionally, MapReduce Streaming allows the use of various programming languages, such as Python, for job implementation while leveraging Unix/Linux input and output streams.

Uploaded by

om6454984

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views9 pages

Map Reduce

Uploaded by

om6454984

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

The MapReduce framework

Dr. Garrett Dancik

Map Reduce background
• Based on a Google paper:
• https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf

• MapReduce provides the original processing engine for

Hadoop, that allows for parallel processing of data in a
cluster

• Hadoop is built on Java, but you do not need to write Java

programs to use Map Reduce, though Java programs will
be faster.
Map Reduce Overview and Word Count Example

Image source: https://www.edupristine.com/blog/hadoop-mapreduce-framework

Steps of a MapReduce Job
1. Hadoop divides the data into input splits, and creates
one map task for each split.
2. Each mapper reads each record (each line) of its input
split, and outputs a key-value pair
3. Output from the mappers are transferred to reducers as
inputs, such that the input to each reducer is sorted by
key.
4. The reducer processes the data for each key, and
outputs the result. The reducer is optional.
MapReduce data flow with a single reduce task
Here, mappers process data on 3
nodes in parallel. Results are sent
across the cluster to one or more
reducers

An optional combiner function can

be specified to process the output
from each map task before being
sent to the reducer

Image from White, T. (2015). Hadoop: The definitive guide. Sebastopol, CA: OReilly.
Pseudocode for word count mapper and reducer

value well, 1
that, 1
well that is that
is, 1
that, 1

key, values
is, (1) is, 1
that, (1,1) that, 2
well, (1) well, 1

Dean, J., Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters, (2004).
OSDI'04: Sixth Symposium on Operating System Design and Implementation, pgs 137-150.
MapReduce Streaming
• MapReduce jobs were originally written in Java
• MapReduce in Java is the most efficient
• For word count example, see:
https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html
• We will use Hadoop MapReduce Streaming
• Uses Unix/Linux input and output streams as interface with Hadoop
• Map input data is passed over standard input to the map function
• Map output is a tab-separated key,value pair…
• …which is passed to the reducer over standard input
• The reduce function reads lines from standard input, sorted by key
• Any language can be used (we will use Python)
Running a Map Reduce Streaming Job
hadoop jar /usr/jars/hadoop-streaming-2.6.0-cdh5.7.0.jar \
-mapper "python3.4 $PWD/mapper.py" \
-reducer "python3.4 $PWD/reducer.py" \
-input "inputFiles" \
-output "outputDirectory"

More background and additional options:

https://hadoop.apache.org/docs/r1.2.1/streaming.html

• You may monitor jobs from localhost:8088 (for some links, you
will need to replace quickstart.cloudera with localhost). Some
links also require mapping of additional ports.

• The mapred terminal command may also be useful:

• mapred job -list (list jobs)
• mapred job –kill jobID (kill job with jobID)

Cloud Computing Prof
No ratings yet
Cloud Computing Prof
11 pages
MapReduce for Big Data Developers
No ratings yet
MapReduce for Big Data Developers
9 pages
MapReduce for Big Data Enthusiasts
No ratings yet
MapReduce for Big Data Enthusiasts
18 pages
Big Data Management Continued
No ratings yet
Big Data Management Continued
48 pages
Hadoop MapReduce Tutorial Guide
No ratings yet
Hadoop MapReduce Tutorial Guide
31 pages
UNIT III Notes
No ratings yet
UNIT III Notes
24 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
Unit IV Notes
No ratings yet
Unit IV Notes
25 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
No ratings yet
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
83 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
MapReduce & Hadoop for CS Students
No ratings yet
MapReduce & Hadoop for CS Students
25 pages
M4 06 MapReduce
No ratings yet
M4 06 MapReduce
28 pages
MapR Certified Hadoop Developer Study Guide (MCHD)
No ratings yet
MapR Certified Hadoop Developer Study Guide (MCHD)
26 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Bda Unit-3
No ratings yet
Bda Unit-3
44 pages
HadoopMapreduce Summerization
No ratings yet
HadoopMapreduce Summerization
24 pages
Bda Megh
No ratings yet
Bda Megh
50 pages
MapReduce Unit3
No ratings yet
MapReduce Unit3
27 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
30 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
MapReduce Programming in Hadoop
No ratings yet
MapReduce Programming in Hadoop
42 pages
Hadoop
No ratings yet
Hadoop
34 pages
Big Data
No ratings yet
Big Data
120 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
60 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
BDA Unit-3
No ratings yet
BDA Unit-3
63 pages
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
No ratings yet
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
26 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
81 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
MapReduce Basics for Big Data Beginners
No ratings yet
MapReduce Basics for Big Data Beginners
32 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
67 pages
Bda Unit 3
No ratings yet
Bda Unit 3
20 pages
Dean 08 Map Reduce
No ratings yet
Dean 08 Map Reduce
7 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
Notes 3 & 4 B Unit
No ratings yet
Notes 3 & 4 B Unit
19 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
28 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
Hadoop MapReduce Tutorial Guide
No ratings yet
Hadoop MapReduce Tutorial Guide
20 pages
MapReduce Programming Model Guide
No ratings yet
MapReduce Programming Model Guide
55 pages
4a MapReduce
No ratings yet
4a MapReduce
47 pages
Hadoop Streaming Hadoop Pipes Swig: 4 Inputs and Outputs
No ratings yet
Hadoop Streaming Hadoop Pipes Swig: 4 Inputs and Outputs
1 page
Hadoop Mapreduce
No ratings yet
Hadoop Mapreduce
131 pages
MapReduce Tutorial
100% (1)
MapReduce Tutorial
192 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
2 pages
1 - STP VLAN VTP EtherChannel Threat Mitigation - Conf
No ratings yet
1 - STP VLAN VTP EtherChannel Threat Mitigation - Conf
1 page
Xerox D95/D110/D125/D136 Copier/Printer: System Administration Guide
No ratings yet
Xerox D95/D110/D125/D136 Copier/Printer: System Administration Guide
250 pages
CCNA Exploration 2 - Module 4 Exam Answers Version 4.0
No ratings yet
CCNA Exploration 2 - Module 4 Exam Answers Version 4.0
3 pages
Avast! Antivirus Silent Install
No ratings yet
Avast! Antivirus Silent Install
1 page
NAS Platform v13 4 File Service Administration Guide MK-92HNAS006-16
No ratings yet
NAS Platform v13 4 File Service Administration Guide MK-92HNAS006-16
236 pages
Unix Shells: Features & Commands
No ratings yet
Unix Shells: Features & Commands
62 pages
AI Developers: Boost Performance with AiMX
No ratings yet
AI Developers: Boost Performance with AiMX
34 pages
Unit - Iv 4.0) Introduction: Digital Logic and Computer Organization
No ratings yet
Unit - Iv 4.0) Introduction: Digital Logic and Computer Organization
18 pages
X-Global Print DriverProduct Enhancements Document-5.759.5.0
No ratings yet
X-Global Print DriverProduct Enhancements Document-5.759.5.0
2 pages
Android Auto Diagnostic Guide
No ratings yet
Android Auto Diagnostic Guide
35 pages
Process Simulate ABB Controller: Siemens Siemens Siemens
No ratings yet
Process Simulate ABB Controller: Siemens Siemens Siemens
16 pages
Programming Procedure For Systems
No ratings yet
Programming Procedure For Systems
7 pages
HLINKCC Dataline Communication Link For Complete Control Networks Setup
No ratings yet
HLINKCC Dataline Communication Link For Complete Control Networks Setup
4 pages
MB Manual Ga-Z (H) 270n-Wifi e
No ratings yet
MB Manual Ga-Z (H) 270n-Wifi e
48 pages
Unit 5
No ratings yet
Unit 5
83 pages
Multiprotocol Label Switching (MPLS) : Alcatel Internetworking Division
No ratings yet
Multiprotocol Label Switching (MPLS) : Alcatel Internetworking Division
35 pages
NICE - IPC Unigy Active IP Integration Manual v2.2
No ratings yet
NICE - IPC Unigy Active IP Integration Manual v2.2
234 pages
Networking & TCP IP
No ratings yet
Networking & TCP IP
2 pages
Alif E7 Datasheet v2.5-1
No ratings yet
Alif E7 Datasheet v2.5-1
182 pages
WR-854 - B - Manual-01202004
No ratings yet
WR-854 - B - Manual-01202004
45 pages
SatLink VSAT User Guide
No ratings yet
SatLink VSAT User Guide
188 pages
Sonicwall - Network Behind Router - Adding Static Routes
No ratings yet
Sonicwall - Network Behind Router - Adding Static Routes
4 pages
HTC One: Install ClockworkMod Recovery
No ratings yet
HTC One: Install ClockworkMod Recovery
3 pages
Getting Geographical Information Using An IP Address
No ratings yet
Getting Geographical Information Using An IP Address
5 pages
Configuring Vista Cinema V3
100% (1)
Configuring Vista Cinema V3
57 pages
Read Me - Instructions!
No ratings yet
Read Me - Instructions!
3 pages
Siemens S7-200 PPI: HMI Setting
No ratings yet
Siemens S7-200 PPI: HMI Setting
7 pages
ITC Assignment 02 PDF
No ratings yet
ITC Assignment 02 PDF
7 pages
Theory of Multiplication Algorithms in Computer Architecture
No ratings yet
Theory of Multiplication Algorithms in Computer Architecture
4 pages
eNSP Test Intruction of Test A
0% (1)
eNSP Test Intruction of Test A
2 pages

Map Reduce

Uploaded by

Map Reduce

Uploaded by

The MapReduce framework

Dr. Garrett Dancik

• MapReduce provides the original processing engine for

• Hadoop is built on Java, but you do not need to write Java

Image source: https://www.edupristine.com/blog/hadoop-mapreduce-framework

An optional combiner function can

More background and additional options:

• The mapred terminal command may also be useful:

You might also like