0% found this document useful (0 votes)

14 views55 pages

Mapreduce Final

MapReduce is a programming model that simplifies the creation of parallel programs by allowing users to define key-value pairs and mapper/reducer functions while Hadoop manages the logistics. The document illustrates the WordCount application as a primary example, detailing the mapping and reducing processes to count word frequencies across multiple files. Additionally, it discusses the limitations of MapReduce for frequently changing data and dependent tasks, along with examples of data joining and vector multiplication tasks.

Uploaded by

Muneeba Kaleem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views55 pages

Mapreduce Final

Uploaded by

Muneeba Kaleem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

MapReduce:

Simple Programming for

Big Results
• Explain how MapReduce simplifies
creating parallel programs

• Design a WordCount application using the

MapReduce programming model
MapReduce = Programming
Model for Hadoop Ecosystem

Hive Pig
Giraph

Spark
Storm

Flink
MapReduce

HBase

Cassandra

MongoDB
Zookeeper

YARN

HDFS
Based on Functional Programming

Map = apply operation f (x) = y

to all elements

Reduce = summarize
operation on elements
Example MapReduce Application: WordCount

File 1
Result
File 2 WordCount
File

File N
Shuffle
Map Reduce
and Sort

Represents a large
number of applications.
Sort and Shuffle (You, http://you1.fake)
(apple, http://apple1.fake)
(apple, http://apple2.fake)

(is, http://apple2.fake)
(is, http://apple2.fake)

(rose, http://apple2.fake)
(red, http://apple2.fake)
Reduce Results for “apple”

(apple -> http://apple1.fake,

http://apple2.fake)
Reduce Results for “apple”

Key Value
(apple -> http://apple1.fake,
http://apple2.fake)

apple
Shuffle
Map Reduce
and Sort
Shuffle
Map Reduce
and Sort

Parallelization Parallelization
Parallelization over
over the input intermediate data over data groups
MapReduce is bad for:

Frequently changing data

Dependent tasks
Interactive analysis
MapReduce

Simplified parallel Applications with

programming independent data-
parallel tasks
Who The framework woke:
• User defines:
a. <key, value>
The framework:
• defines:
a. <key, value>
b. mapper & reducer functions
The framework:
• defines:
a. <key, value>
b. mapper & reducer functions
• Hadoop handles the logistics
Map/Reduce flow
• map() reads data and outputs <key,value>
Dn map() <key,value>
Map/Reduce flow
• Hadoop distributes map() to data
D1 map()

D2 map()

Dn
map()
Map/Reduce flow
• Hadoop groups <key,value> data

D1 map()

D2 map()

Dn
map()
Map/Reduce flow
• Hadoop distributes groups to reducers()

D1 map()
map() reduce() O1
D2

reduce() Om

Dn
map()
The paradigmatic example:
• Count word frequencies
Wordcount task:
• How would you count all the
words in Star Wars?
Wordcount serial code:
• In a nutshell:
1. Get word
2. Look up word in table
3. Add 1 to count
Wordcount serial code:
• Result Table: Word Count

a 1000

far 2000

Jedi 5000

Luke 9000

…
Wordcount task:
• How would you count all the words
in all the Star Wars scripts and …
Wordcount Map/Reduce:
The Mapper:

Loop Get word

Until
Emit <word> < 1>
Done
What One Mapper Does
line = A long time ago in a galaxy far far …

keys = A long time ago in a galaxy far far

Emit <key, value> ...

A 1 ago 1 far 1 in 1
time 1
to the reducers
long 1 galaxy 1
far 1
a 1
Wordcount Map/Reduce:
The Reducer:

Loop Get next <word><value>

Over If <word> is same as previous word
key- add <value> to count
else
values
emit <word> < count>
set count to 0
map() output
A1
long 1
A long time ago
map() time 1
ago 1

in 1
in a galaxy far map() a1
galaxy 1
far 1

far away
map() far 1
away 1
Hadoop shuffles, groups,
and distributes
A1 A1
long 1 a1
A long time ago
map() time 1 far 1
ago 1 far 1
ago 1

in 1
in a galaxy far map() a1
galaxy 1 galaxy 1
far 1 in 1
long 1
time 1
far away
map() far 1 away 1
away 1
…
reduce() aggregates
A1 A1
long 1 a1
time 1 far 1 A1
ago 1 far 1 a1
ago 1 reduce()
far 2
in 1 …
a1
galaxy 1 galaxy 1
far 1 in 1 reduce() galaxy 1
long 1 …

time 1
far 1 away 1
away 1
…
Example:
Joining Data
Joining Data
• Task: combine datasets by key
– A standard data management function
Joining Data
• Task: combine datasets by key
– A standard data management function
– In pseudo SQL
Select * from table A, table B, where
A.key=B.key
Joining Data
• Task: combine datasets by key
– A standard data management function
– In pseudo SQL
Select * from table A, table B, where
A.key=B.key
– Joins can be inner, left or right outer
Joining Data
• Task: given two wordcount datasets …
Joining Data
• Task: given two wordcount datasets …
File A: <word, total-count>
able , 5
actor , 18
burger , 25
.
.
.
Joining Data
• Task: given two wordcount datasets …
File A: <word, total-count> File B: <date word, day-count>
able , 5 Jan-16 able , 2
actor , 18 Feb-22 actor , 15
burger , 25 May-03 actor , 3
. Jul-4 burger, 20
. .
. .
.
Joining Data
• Task: combine by word
File A: <word, total-count> File B: <date word, day-count>
able , 5 Jan-16 able , 2
actor , 18 Feb-22 actor , 15
burger , 25 May-03 actor , 3
. Jul-04 burger, 20
. .
. .
.
Joining Data
• Result wanted:
File AjoinB: <word date, day-count total-count >
able Jan-16, 2 5
actor Feb-22, 15 18
actor May-03, 1 18
burger Jul-04, 20 25
.
.
.
Example:
Vector Multiplication
Vector Multiplication
• Task: multiply 2 arrays of N numbers
– A basic mathematical operation
– Let’s assume N is very large
Vector Multiplication
• Task: multiply 2 arrays of N numbers
A X B =
5 2.7 (𝟓 x 𝟐. 𝟕) # 1st of A & B
4 1.9 + (𝟒 x 1.9) # 2nd of A & B
-3.2 -1.3 + (– 𝟑. 𝟐 x –1.3) # 3rd …
. . .
. . .
. . .
-2 1 + (– 𝟐 x 𝟏) # Nth of A & B
Vector Multiplication
A B
5 2.7 • Recall:
4
1.9 – data partitioned in HDFS
-1.3
-3.2
.
.
... ...
. .
-2 1
Vector Multiplication
• Main design consideration:
need elements with same index together

Let <key, value> =

<index, number>
Vector Multiplication
A B
• Problem: array partitions
5 2.7
4 don’t have an index
1.9
-1.3
-3.2
.
.
... ...
. .
-2 1
Vector Multiplication
A B
5 2.7
4
1.9
Environment
-1.3
-3.2 Information
.
.
... ...
. .
-2 1
Vector Multiplication
A B info outside map/reduce
5 2.7
<key, value>
4
1.9 map()
Environment os.getenv
-1.3
-3.2 Information ('map_input_file')
.
.
... ...
. . map() can
-2 1 access info
Vector Multiplication
A B • Let’s assume:
1, 5 1, 2.7 – each line already has
2, 4
2, 1.9 <index, number>
3, -3.2 3,-1.3
. .
... ...
. .
N, -2 N, 1
Vector Multiplication
A B • Let’s assume:
1, 5 1, 2.7 – each line already has
2, 4
2, 1.9 <index, number>
3, -3.2 3,-1.3
. .
Note: mapper only needs to
pass data (identity function)
... ...
. .
N, -2 N, 1
Vector Multiplication
A B <index, num>
<index, num> <index, num>
shuffle & 1, 5
1, 5 1, 2.7
1, 2.7
2, 4 group indices 3, -1.3
2, 1.9
3, -3.2 3,-1.3 3, -3.2
. .
2, 1.9
... ... 2, 4
. . …
N, -2 N, 1
Vector Multiplication
A,B grouped
<index, num>
1, 5 What should
1, 2.7 reducers do?
3, -1.3
3, -3.2

2, 1.9
2, 4
...
Vector Multiplication
A,B grouped
<index, num>
1, 5 Reducer:
1, 2.7 -get pairs of
3, -1.3 <index, number>
3, -3.2

2, 1.9
2, 4
...
Vector Multiplication
A,B grouped
<index, num>
1, 5 subtotals Reducer:
1, 2.7 -get pairs of
3, -1.3 + 17.66
<index, number>
3, -3.2
-multiply & add
2, 1.9
7.6
2, 4
...
Vector Multiplication
A,B grouped
<index, num>
subtotals Reducer:
1, 5
1, 2.7 -get pairs of
3, -1.3 + 17.66 <index, number>
3, -3.2 -multiply & add
2, 1.9
7.6
2, 4 (Still need get total
...
sum, but should be
largely reduced)

Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Computational Tools DTU Presentation Week3
No ratings yet
Computational Tools DTU Presentation Week3
33 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Module2 D MapReduceParadigm
No ratings yet
Module2 D MapReduceParadigm
90 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
59 pages
June 19th 2009
No ratings yet
June 19th 2009
71 pages
MapReduce Algorithms Lecture 11
No ratings yet
MapReduce Algorithms Lecture 11
47 pages
Exp 5 Bdafinal
No ratings yet
Exp 5 Bdafinal
7 pages
Lec 8
No ratings yet
Lec 8
19 pages
Chapter 2 - Introduction To MapReduce - New
No ratings yet
Chapter 2 - Introduction To MapReduce - New
107 pages
Lec 8
No ratings yet
Lec 8
24 pages
Hadoop MapReduce
No ratings yet
Hadoop MapReduce
25 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
Module2 D MapReduceParadigm
No ratings yet
Module2 D MapReduceParadigm
84 pages
BDA Module 3
No ratings yet
BDA Module 3
66 pages
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
No ratings yet
Mapreduce Programming Model and Design Patterns: Andrea Lottarini January 17, 2012
23 pages
Big Data Practical 2
No ratings yet
Big Data Practical 2
11 pages
Map Reduce PArt 2
No ratings yet
Map Reduce PArt 2
40 pages
BDP 2023 10
No ratings yet
BDP 2023 10
25 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Big Data Lab
No ratings yet
Big Data Lab
12 pages
Cloud Computing & MapReduce Basics
No ratings yet
Cloud Computing & MapReduce Basics
55 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
03 MapReduce
No ratings yet
03 MapReduce
184 pages
BDA Practical Exam Experiments List
No ratings yet
BDA Practical Exam Experiments List
21 pages
ESSIR MapReduce For Indexing
No ratings yet
ESSIR MapReduce For Indexing
86 pages
Map-Reduce 1
No ratings yet
Map-Reduce 1
49 pages
Map Reduce Design and Execution Framework Part 1
No ratings yet
Map Reduce Design and Execution Framework Part 1
19 pages
MapReduce for Big Data Developers
No ratings yet
MapReduce for Big Data Developers
9 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
Map Reduce
No ratings yet
Map Reduce
57 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
81 pages
Exp5 BDI 60004200124
No ratings yet
Exp5 BDI 60004200124
5 pages
Map Reduce Examples
No ratings yet
Map Reduce Examples
7 pages
BDA Module3
No ratings yet
BDA Module3
44 pages
Map Reduce - 3
No ratings yet
Map Reduce - 3
23 pages
Lez.d-01-Hadoop (A) Intro
No ratings yet
Lez.d-01-Hadoop (A) Intro
58 pages
Big Data Lab
No ratings yet
Big Data Lab
52 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
BDA-4 MapReduce v.2
No ratings yet
BDA-4 MapReduce v.2
22 pages
Mapreduce Model Principles
No ratings yet
Mapreduce Model Principles
65 pages
BDP 2023 09
No ratings yet
BDP 2023 09
15 pages
Bda Lab Exercises Lab Mannual - 2023
No ratings yet
Bda Lab Exercises Lab Mannual - 2023
72 pages
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
No ratings yet
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
55 pages
084 Liza Bda File
No ratings yet
084 Liza Bda File
23 pages
09b - MapReduce
No ratings yet
09b - MapReduce
44 pages
Mapreduce Class Notes
No ratings yet
Mapreduce Class Notes
43 pages
Lecture 3 - MapReduce
No ratings yet
Lecture 3 - MapReduce
9 pages
MapReduce Programming Model Guide
No ratings yet
MapReduce Programming Model Guide
55 pages
Map Reduce
No ratings yet
Map Reduce
28 pages
Day-4 Deep Learning and Machine Learning
No ratings yet
Day-4 Deep Learning and Machine Learning
11 pages
Optimizing Hybrid Cloud Database Architecture: Integrating SQL Server and Mongodb in Azure Environments
No ratings yet
Optimizing Hybrid Cloud Database Architecture: Integrating SQL Server and Mongodb in Azure Environments
12 pages
KNN Datacamp
No ratings yet
KNN Datacamp
31 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
Bigquery
No ratings yet
Bigquery
25 pages
Ned University of Engineering & Technology: SPRING SEMESTER EXAMINATIONS 2025 (Postgraduate)
No ratings yet
Ned University of Engineering & Technology: SPRING SEMESTER EXAMINATIONS 2025 (Postgraduate)
1 page
Ambulance Dispatch System Test Plan
No ratings yet
Ambulance Dispatch System Test Plan
22 pages
Add Patient Use Case Document
No ratings yet
Add Patient Use Case Document
10 pages
SRS Master Login Module
No ratings yet
SRS Master Login Module
17 pages
2 Node Modules
No ratings yet
2 Node Modules
5 pages
Callback and Promise
No ratings yet
Callback and Promise
2 pages
API Specification Doc: (Online Hotel Mangement App)
No ratings yet
API Specification Doc: (Online Hotel Mangement App)
7 pages
Day 2 Groups Distribution
No ratings yet
Day 2 Groups Distribution
2 pages
CROSSWORD PUZZLE - Health Problems
No ratings yet
CROSSWORD PUZZLE - Health Problems
4 pages
Wind Energy Brochure - 03.17 PDF
No ratings yet
Wind Energy Brochure - 03.17 PDF
5 pages
DataStructureBCAUGCA1918pdf 2025 07 22 11 43 03
No ratings yet
DataStructureBCAUGCA1918pdf 2025 07 22 11 43 03
54 pages
Summary of The Phil - Constitution
No ratings yet
Summary of The Phil - Constitution
9 pages
Tip Answer Key Course5
No ratings yet
Tip Answer Key Course5
18 pages
Rúbrica para El Speaking
100% (1)
Rúbrica para El Speaking
2 pages
Dạng Bài Đoạn Văn Khuyết Câu (Bản Hs)
No ratings yet
Dạng Bài Đoạn Văn Khuyết Câu (Bản Hs)
28 pages
Portfolio Erin Estabrooks: 801.473.0025 Est13004@byui - Edu
No ratings yet
Portfolio Erin Estabrooks: 801.473.0025 Est13004@byui - Edu
21 pages
RADAR APN 241 Users Manual - Rev J
No ratings yet
RADAR APN 241 Users Manual - Rev J
120 pages
CH 5 Properties of Hardened Concrete
No ratings yet
CH 5 Properties of Hardened Concrete
39 pages
Brainpop Thesis Statement
100% (2)
Brainpop Thesis Statement
7 pages
Unit 03 - Literature Review
No ratings yet
Unit 03 - Literature Review
15 pages
Conflict Mapping & Triangle Analysis
No ratings yet
Conflict Mapping & Triangle Analysis
7 pages
Vienna Convention Treaty Law 1969
No ratings yet
Vienna Convention Treaty Law 1969
19 pages
Compliance To Woqod Guidelines
No ratings yet
Compliance To Woqod Guidelines
18 pages
Reliable Easy To Use: Truck-Mounted Concrete Boom Pump
100% (1)
Reliable Easy To Use: Truck-Mounted Concrete Boom Pump
2 pages
Naps PPT Update
No ratings yet
Naps PPT Update
17 pages
Ecumenism The Heresy of Our Times
100% (1)
Ecumenism The Heresy of Our Times
62 pages
CS Form No. 212 Attachment - Work Experience Sheet
No ratings yet
CS Form No. 212 Attachment - Work Experience Sheet
1 page
Is It Bad To Put Statistics in Your Medical School Secondary Essay
100% (2)
Is It Bad To Put Statistics in Your Medical School Secondary Essay
14 pages
Desiree Hopkins: Objective Experience
No ratings yet
Desiree Hopkins: Objective Experience
2 pages
Clinical Lab Safety & QA/QC Guide
No ratings yet
Clinical Lab Safety & QA/QC Guide
14 pages
Soal Pats English Genap Kelas 9 Prin
No ratings yet
Soal Pats English Genap Kelas 9 Prin
7 pages
Citicoline's Impact on Pediatric Amblyopia
No ratings yet
Citicoline's Impact on Pediatric Amblyopia
1 page
Cyber Security Unit 2 Notes
No ratings yet
Cyber Security Unit 2 Notes
32 pages
Customs Appeal No. K-1428 - 2015
No ratings yet
Customs Appeal No. K-1428 - 2015
11 pages
Visa4UK - Visa Application Complete
No ratings yet
Visa4UK - Visa Application Complete
2 pages
Singh, Khushwant - Sikhism
No ratings yet
Singh, Khushwant - Sikhism
6 pages
GR 12 History Studyguide Eng
No ratings yet
GR 12 History Studyguide Eng
35 pages

Mapreduce Final

Uploaded by

Mapreduce Final

Uploaded by

MapReduce:

Simple Programming for

• Design a WordCount application using the

Map = apply operation f (x) = y

(apple -> http://apple1.fake,

Frequently changing data

Simplified parallel Applications with

Loop Get word

keys = A long time ago in a galaxy far far

Emit <key, value> ...

Loop Get next <word><value>

Let <key, value> =

You might also like