0% found this document useful (0 votes)

7 views49 pages

Map-Reduce 1

The document explains the Map Reduce computing paradigm, illustrating its application through examples such as calculating maximum temperature from weather data and counting words in a document. It details the roles of the Map and Reduce functions, the process of shuffling and sorting, and the use of combiners to optimize data processing. Additionally, it provides various examples of Map Reduce applications, including distributed grep and inverted index computation.

Uploaded by

funbobbythewineguy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views49 pages

Map-Reduce 1

Uploaded by

funbobbythewineguy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

“Map Reduce” Computing Paradigm

pm jat @ daiict
MR Programming – Example First[3]
Here is an example from book “Hadoop: The definitive guide”[3]
Weather Dataset (raw NCDC data)
• Consider a huge log file contains weather records of a city.
• Each row has many values but, we will use two values – month and temperature.
• Computing Goal: find out maximum temperature for each year.
• In SQL terms, we basically are attempting compute following on a plain text data file
(not a DB table)
select month, max(temperature)
from “temp_data.csv” group by month

7-Aug-25 map-reduce computing paradigm 2

Example #1: Max Temperature
• Input: a temperature data file without delimiter).

• Output: month wise maximum temperature.

• In SQL terms, we compute following on a plain text data file (not a DB table)
select month, max(temperature)
from “temp_data.csv” group by month

7-Aug-25 map-reduce computing paradigm 3

Map Task
• Data File
– Partitioned in chunks, and Replicated
• Multiple instances of “map” function run
on machines, typically on that are having
data chunks
• Runs parallel on multiple machines
• Their outputs are stored “locally”

7-Aug-25 map-reduce computing paradigm 4

Shuffle [and Sort ]

Map-Reduce System does it

and Keys are sorted

7-Aug-25 map-reduce computing paradigm 5

Reduce Task

• “Reducers” pull data from all “mappers” as (key,

value-list), and “aggregates” as specified in the
“reduce” function
• A “reducer” takes values for a certain keys only
• Multiple reducers run in parallel working on
different sets of keys
• They produce their outputs on DFS

7-Aug-25 map-reduce computing paradigm 6

Example #1: Max Temperature
• Here is a pair of Map and Reduce function that is to process said file and compute
monthly maximum temperature!

7-Aug-25 map-reduce computing paradigm 7

…. take away
• Map as a Task
– Input a List of <K1, V1> pairs; say List of (RecNo, Record)
– Output: List of <K2, V2> pairs; Say List of (Month, Temp)
– Execution happens for each file split in parallel; called mappers
– Typically MR master nodes invoke M map tasks, typically one for each split
• Map Function
– Input: <K1,V1> pair; say <RecNo, Recod>
– Output: <K2, V2> pair; say <Month, Temp>
– The map function is executed for all records from all input file splits
• Note execution of the map function should not require any movement of data.
Execution happens on relevant chunk servers only

7-Aug-25 map-reduce computing paradigm 8

…. take away
• Reduce as a Task
– Input a List of <K2, List(V2)> pairs; say List of (Month, List(Temps))
all values (from output of all mappers) of a Key taken by reduce task
– Output: List of <K3, V3> pairs; Say List of (Month, Max-Temp)
Mostly K2 and K3 are same and V3 is some aggregation of List(V12)
– Execution happens on a number of machines in parallel; called reducers
Say R numbers
• Reduce Function
– Runs on every reducer for every distinct Key of mapper outputs
– Input: <K2,List(V2)> pair; say Month, List(Temps)
– Output: <K3, V3> pair; say Month, Temp
– The map function is iteratively called for all data records on all splits
7-Aug-25 map-reduce computing paradigm 9
“Shuffle and Sort” the Map output
• “shuffling” is the process of making mapper outputs available to the reducers
• This is done as following
– Partition (determine which reducer a output should go) – “shuffling”
Done by applying a hash function on K2, say as “encode(K2) MOD R”
– Sort By Key (K2)
– Group By Key (K2) ==> <K2, List(V2)>
• As a result, we prepare input for “reducer” in the form of <K2, List(V2)> for all
distinct keys of map output
• Data actually move here for computation from “mappers” to “reducers” in “pull”
mode!

7-Aug-25 map-reduce computing paradigm 10

MR data flow [2]

Figure Source: [2]

7-Aug-25 map-reduce computing 11

Data Flow in a Map Reduce Computation

Shuffle and Sort

http://infolab.stanford.edu/~ullman/mmds/ch2.pdf
7-Aug-25 map-reduce computing paradigm 12
Hope this makes sense now!

7-Aug-25 map-reduce computing paradigm 13

Example: “Lab01 code”
• !

7-Aug-25 map-reduce computing paradigm 14

“Map Reduce” Computing Paradigm
• In many cases, a single pair of map-reduce functions should be performing
meaningful computations.
• However, a “pipeline” of map-reduce pairs can be used for performing complex
tasks!
– Output of one MR pair becomes input to another MR job, and so forth!

7-Aug-25 map-reduce computing paradigm 15

MR Examples* #: Word Count
• Word count from a huge document corpus
– A huge set of documents stored on HDFS
• Want to compute word frequency, aggregated over all documents

7-Aug-25 map-reduce computing paradigm 16

MR Examples* #: Word Count
• In pseudo code!

MAP function
input: <key, value>
<DocID, Document>
output: <key, value>
<word, 1>

Reduce function
input: <key, value-list>
<word, <1,1,1,1,1,1>>
output: <key, value>
<word, sum(1’s)>
7-Aug-25 map-reduce computing paradigm 17
Do it Yourself!

• MR Colab:
https://colab.research.google.com/drive/1ma8h2dZMzUUobXsJ-_gTNvyaj25n6boY

7-Aug-25 map-reduce computing paradigm 18

3 Mappers and 3 Reducers
7-Aug-25 map-reduce computing paradigm 19
Aggregate Operations using MR
• Mapper outputs: Key: Grouping Attribute, Value: Aggregating Attribute Value
• Reducer outputs: Key: Grouping Attribute, Value: Aggregated value
• For example if want to compute statewide number of employees. Output Key of
Map function would be “State” OR
• STATE, and GENDER WISE then Key shall be composite?

7-Aug-25 map-reduce computing 20

Confirm
• Appreciate the importance of keys and values as inputs and outputs to map and
reduce function!
• In queries like select DNO, AVG(Salary) from “employee.csv”
you can draw a rule to decide the key?

7-Aug-25 map-reduce computing paradigm 21

“Selection and Projection” using Map Reduce

• Have selection condition applied in Map functions

• It can be map only Job.
• Value here is a Tuple.

7-Aug-25 map-reduce computing paradigm 22

MR Example #: lines, words, and characters count

• Suppose we need to count the number of lines, words, and characters count in a
text file.
• Try defining the “Map” and “Reduce” functions for this task!

7-Aug-25 map-reduce computing paradigm 23

MR Example #: “line count”
• Suppose, we need to count the number of lines, words, and characters in a file.
• Following Map-Reduce shall do the job!

7-Aug-25 map-reduce computing paradigm 24

MR Example #3: “line count”
• Here is sample output of map function
#chars, 80
#words, 12
#lines, 1
#chars, 44
#words, 5
#lines, 1
…

7-Aug-25 map-reduce computing paradigm 25

MR Example #: “line count”
• Here is sample shuffled input to reduce function and its output

#chars, [80,44,67,108,..]
#words, [12, 5, 8, 9, … ]
#lines, [1, 1, 1, 1, …]

#chars, 23456
#words, 8653
#lines, 563

7-Aug-25 map-reduce computing paradigm 26

Exercise ##
• Suppose we have a Inventory data file with attribute values (Item ID, Description,
Cost, Price, Stock, Category), and
– Total Cost of inventory
– Category wise cost of inventory
– Category wise Count of Items which are costlier than 1000
– List of Items having “Cleaner” keyword in description

7-Aug-25 map-reduce computing paradigm 27

Combine Function - Motivation
• On shuffle, the number of values for a key could be large, or too large.
• That has problems
– Large values to shuffle
– More values to transfer over the network
– Memory Concerns at Reducers
– Increased load on reducers (reducers are much lesser in numbers than mappers
in an MR Job)

7-Aug-25 map-reduce computing 28

Combine Function - Motivation
• On shuffle, number of values for a key could be large, too large.
• That has problems
…
• The solution is “we do all possible aggregation at mapper” itself!
• For “aggregation” at the mapper level; we define a “combine” function; also
referred as “combiner”.
• This comes as the third function in Map-Reduce programming
• The function is expressed as: combine(k, list(v1))  (k,v2)
• V2 is some aggregated values on List(V1)

7-Aug-25 map-reduce computing 29

Word Count with combiner

7-Aug-25 map-reduce computing 30

7-Aug-25 map-reduce computing 31
Combiner in SUM
• Combiner for SUM: select dno, sum(salary) from employee group by dno;

7-Aug-25 map-reduce computing 32

Map-Reduce Combiners
• In many cases, Combiner is usually same as the reduce function.
• This however works only when reduce function is commutative and associative.
SUM is, where as AVERAGE is not.
• However we can have a trick used in Combiner: we output sum and count at
combiner!
• Example next:

7-Aug-25 map-reduce computing 33

Combiner in Average
• Combiner for AVG: select dno, avg(salary) from employee group by dno;

7-Aug-25 map-reduce computing 34

Examples from Original Article[1]
• Word Count
• Distributed GREP
• Count of URL Access Frequency
• Reverse Web-Link Graph from HTML pages
• Term Vector per Host
– Vector of (word, frequency) pairs for each host
• Inverted Index
– List of Document IDs for each “word”
• Distributed Sort

7-Aug-25 map-reduce computing paradigm 35

MR Examples* #2: Distributed “Grep”
• The "grep" is a powerful unix command with rich sets of options
• Primarily it finds lines in files on the file system that match the
specified “expressions”
• The map function primarily does the “necessary matching”
• There is nothing for the reducer map(line-no, line)
if (regex.search(pattern, line)
write(line-no, line)

reduce(line-no, line)
write(line-no, line)
-- Nothing requires to be done at reduce end.
Can be defined to be map-only job.
7-Aug-25 map-reduce computing paradigm 36
MR Examples #2: Count of “URL Access Frequency”

Input: “Web Access Log” generated by

web server

map(recno, record)
url = extract_url(record)
write(url, 1)

reduce(url, value)
count = 0
for each v in values
count += v
Sample Web Access Log: https://drive.google.com/file/d/1ZT6IpAS1ephI_GapXEOFk7Sqa3fIL8qK
7-Aug-25 map-reduce computing paradigm 37
MR Examples* #3: Reverse Web-Link Graph
• A Reverse Web-Link Graph (also known as a reverse link graph or backlink graph) is a
graph structure that shows, for each web page, which other pages link to it.
• Input: “Web Page URL”, Web Page itself ( .html )
• Required Output:
URL, List of URLs referring to it

7-Aug-25 map-reduce computing paradigm 38

MR Examples* #3: Reverse Web-Link Graph
• The map function outputs “target, source pairs” for each given page-url
• The reduce function concatenates the list of all source URLs associated with a given
target URL and emits the pair

Input: page-URL, page-itself (say .html)

map(page-url, page)
expr = xpath-expr-for-hrefs
targets = page.XPath( expr) reduce(target, source-list)
source = page-url write(target, source-list)
for each target in targets
write(target, source)

7-Aug-25 map-reduce computing paradigm 39

MR Examples* #4: Term-Vector per Host
• The “term” refers to a "word" in a document.
• A “term vectors” are often used to "summarize“ or "represent" a document in NLP,
IR, and Text Mining area.
• It is a "vector" of "term" and "frequency measure" pairs/tuples.
• The “Frequency Measure” is often TF (Term Frequency) or TFIDF(Term Frequency
Inverse Document Frequency)
• Google used Map-Reduce for doing this and related tasks!
• Input: “Web Page URL”, Web Page itself ( .html )
• Required Output: Host-Name, Term-Vector

7-Aug-25 map-reduce computing paradigm 40

MR Examples* #4: Term-Vector per Host
• The map function emits a hostname, term vector pair for each input document
(where the hostname is extracted from the URL of the document).
• The reduce function receives (Host-Name, term vectors of documents under the
host). it combines all term vectors together, discarding infrequent terms, and then
emits a final host-name, term vector pair

Input: document-URL, web-document (say .html) reduce(host-name, list-term-vectors)

map(url, document) combined-term-vector (ctv) = []
//compute term-vector locally for each vector in list-term-vectors
host = host_name( url ) merge(ctv, vector)
write(host_name, term-vector) ctv = trim(sort(ctv)) //keep only frequent
write(host-name, ctv)

7-Aug-25 map-reduce computing paradigm 41

MR Examples* #5: “Compute Inverted Index”
• What is “Inverted Index”?
• A term popular in “Information Retrieval”, is a data structure that contains
Map from “term” to a “document(s)”.
• It is used for finding out documents that contains the given term!
• Input: “Web Page URL”, Web Page itself ( .html )
• Required Output: term, list of document-URLs

7-Aug-25 map-reduce computing paradigm 42

MR Examples* #5: “Compute Inverted Index”
• The map function parses each document, and emits a sequence of word,
document ID pairs.
• The reduce function outputs document-list (may be sorted) for each "term"

map(doc-url, document)
words = {} //set of words
for each word in document reduce(word, list-doc-urls)
words.add (word ) write(word, list-doc-urls)
for each word in words
write(word, doc-url )

7-Aug-25 map-reduce computing paradigm 43

MR Examples* #6: Distributed Sort

Suppose this needs to be sorted on empno

Map:
outputs(emp-no, row)

The reduce function shall receive them sorted

On emp-no. It can just output as such!
7-Aug-25 map-reduce computing paradigm 44
Programming Map-Reduce in Python
• A library is available: https://mrjob.readthedocs.io/en/latest/
• Guide/Documentation is available at
https://mrjob.readthedocs.io/en/latest/guides.html
• How do you create MR programs here?
– Create a class that extends MRJob class
from the library
– Override at least “mapper” and
“reducer” methods
– Here is an example!
– Do you get what is t doing?

7-Aug-25 map-reduce computing 45

MR Job (mrjob python library) functions
• Other methods that MRJob class provides and you can override
mapper_init(self) #executed once for every mapper, and before mapper
mapper(self, key, value) #map function
mapper_final(self) #executed once for every mapper, and after mapper
reducer(self, key, values) #reduce function
combiner(self, key, values) #combine function
reducer_init, reducer_final, combiner_init, combiner_final

https://mrjob.readthedocs.io/en/latest/job.html
7-Aug-25 map-reduce computing 46
“Google Colab” for practice
• Here is a sample MR program:
https://colab.research.google.com/drive/1ma8h2dZMzUUobXsJ-_gTNvyaj25n6boY
• You can copy this in your account and experiment with!

• Here are some datasets that are used in my exercises! I may be adding more.
https://drive.google.com/drive/folders/1Q0sy0NlD2nkjmzxuYURQoFt5XRZpcScs

7-Aug-25 map-reduce computing paradigm 47

Further reading
• Chapter 2 at http://mmds.org
• Articles "The Google file system“[1] and "MapReduce: Simplified data processing on
large clusters“ [2]
• (Book) Radtka, Zachary, and Donald Miner. Hadoop with Python. O'Reilly Media, 2015.

7-Aug-25 map-reduce computing paradigm 48

References
[1] Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. "The Google file system." Proceedings of
the nineteenth ACM symposium on Operating systems principles. 2003.
https://www.usenix.org/legacy/event/osdi04/tech/full_papers/dean/dean_html/
[2] Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: Simplified data processing on large clusters."
(2004)
[3] White, Tom. “Hadoop: The definitive guide“, 4th ed, O'Reilly Media, Inc., 2015.

7-Aug-25 Big Data Infrastructure 49

Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Map-Reduce 2
No ratings yet
Map-Reduce 2
38 pages
MapReduce BDA
No ratings yet
MapReduce BDA
32 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
BDA Module 3
No ratings yet
BDA Module 3
66 pages
MapReduce Programming in Hadoop
No ratings yet
MapReduce Programming in Hadoop
42 pages
Module - 4 - UNDERSTANDING MAP REDUCE FUNDAMENTALS
No ratings yet
Module - 4 - UNDERSTANDING MAP REDUCE FUNDAMENTALS
6 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
MapReduce & Hadoop for CS Students
No ratings yet
MapReduce & Hadoop for CS Students
25 pages
Map Reduce
No ratings yet
Map Reduce
7 pages
Module 3
No ratings yet
Module 3
36 pages
The Mapreduce Paradigm: Michael Kleber
No ratings yet
The Mapreduce Paradigm: Michael Kleber
13 pages
Hadoop MapReduce
No ratings yet
Hadoop MapReduce
25 pages
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
No ratings yet
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
36 pages
Map Reduce
No ratings yet
Map Reduce
44 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
26 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
MapReduce: Big Data Processing Guide
No ratings yet
MapReduce: Big Data Processing Guide
25 pages
09b - MapReduce
No ratings yet
09b - MapReduce
44 pages
Cloud Computing & MapReduce Basics
No ratings yet
Cloud Computing & MapReduce Basics
55 pages
Unit 3
No ratings yet
Unit 3
22 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
Mapreduce Model Principles
No ratings yet
Mapreduce Model Principles
65 pages
6.unit 3 Bda
No ratings yet
6.unit 3 Bda
18 pages
Map Reduce
No ratings yet
Map Reduce
33 pages
Dllction To MAPREDUCE Afflrlling: L Tro
No ratings yet
Dllction To MAPREDUCE Afflrlling: L Tro
12 pages
MapReduce Programming Model Guide
No ratings yet
MapReduce Programming Model Guide
55 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
Chapter 2 - Introduction To MapReduce - New
No ratings yet
Chapter 2 - Introduction To MapReduce - New
107 pages
Map Reduce PArt 2
No ratings yet
Map Reduce PArt 2
40 pages
Map Reduce
No ratings yet
Map Reduce
42 pages
Ch02a Mapreduce
No ratings yet
Ch02a Mapreduce
53 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Map Reduce Design and EXECUTION FRAMEWORK
No ratings yet
Map Reduce Design and EXECUTION FRAMEWORK
21 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Big Data Unit - 3
No ratings yet
Big Data Unit - 3
7 pages
Hadoop For Dummies: Mapreduce To The Rescue
No ratings yet
Hadoop For Dummies: Mapreduce To The Rescue
17 pages
MapReduce for Data Engineers
No ratings yet
MapReduce for Data Engineers
59 pages
Unit 1 Lecture 3
No ratings yet
Unit 1 Lecture 3
12 pages
M4 06 MapReduce
No ratings yet
M4 06 MapReduce
28 pages
3.Map-Reduce Framework - 1
No ratings yet
3.Map-Reduce Framework - 1
47 pages
Unit - Iii
No ratings yet
Unit - Iii
38 pages
BDP 2024 09
No ratings yet
BDP 2024 09
24 pages
03 MapReduce
No ratings yet
03 MapReduce
184 pages
Lecture 2.1
No ratings yet
Lecture 2.1
13 pages
MapReduce Guide for Data Engineers
No ratings yet
MapReduce Guide for Data Engineers
7 pages
Module2 D MapReduceParadigm
No ratings yet
Module2 D MapReduceParadigm
90 pages
Map Reduce
No ratings yet
Map Reduce
11 pages
Unit Ii Iintroduction To Map Reduce
No ratings yet
Unit Ii Iintroduction To Map Reduce
4 pages
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
No ratings yet
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
37 pages
ELECTRIC - AIS 123 Part 3 (L5)
No ratings yet
ELECTRIC - AIS 123 Part 3 (L5)
1 page
Brother Ce 5500 PRW Service Manual
No ratings yet
Brother Ce 5500 PRW Service Manual
196 pages
DBMS Lab Manual R20
No ratings yet
DBMS Lab Manual R20
71 pages
OMEN Nanowire User Guide
No ratings yet
OMEN Nanowire User Guide
21 pages
RadEye B20-e-V1 0 A4
No ratings yet
RadEye B20-e-V1 0 A4
4 pages
KISS ImPRESS No Glue Needed Press On Nails, Design, Cupid Heart, White, Short Oval, 30 Count
No ratings yet
KISS ImPRESS No Glue Needed Press On Nails, Design, Cupid Heart, White, Short Oval, 30 Count
1 page
Smls 1000 Catalogues
No ratings yet
Smls 1000 Catalogues
2 pages
Basics of TAT Cards
No ratings yet
Basics of TAT Cards
7 pages
Furnuejace Cleaning
No ratings yet
Furnuejace Cleaning
6 pages
Vacuum Tube Enthusiast Catalog
No ratings yet
Vacuum Tube Enthusiast Catalog
15 pages
5 - 2-Chemical Kinetics (Level)
No ratings yet
5 - 2-Chemical Kinetics (Level)
33 pages
2014 Common Specs & Procedures Encore & Trax
No ratings yet
2014 Common Specs & Procedures Encore & Trax
3 pages
Newton vs Leibniz: Calculus Origins
No ratings yet
Newton vs Leibniz: Calculus Origins
27 pages
CIRSOC-102-Guia (1) - ENGLISH
No ratings yet
CIRSOC-102-Guia (1) - ENGLISH
124 pages
4G1 Series: Engine
No ratings yet
4G1 Series: Engine
136 pages
M
No ratings yet
M
2 pages
Jeep-Grand Cherokee (WK2) 2011-2017
No ratings yet
Jeep-Grand Cherokee (WK2) 2011-2017
4 pages
Efficient SNR Estimation in OFDM System
No ratings yet
Efficient SNR Estimation in OFDM System
3 pages
Harga Pokok
No ratings yet
Harga Pokok
19 pages
Not For Sale: Biosafety Cabinetry: Design, Construction, Performance, and Field Certification
No ratings yet
Not For Sale: Biosafety Cabinetry: Design, Construction, Performance, and Field Certification
24 pages
NJW4381 e
No ratings yet
NJW4381 e
19 pages
Performance Review Plan Template
No ratings yet
Performance Review Plan Template
9 pages
AKMEN
No ratings yet
AKMEN
8 pages
Monitoring and Evaluation Tool: Jevelyn M. Albiso Contact No. 9 1 6 2 2 9 6 1 2 8
No ratings yet
Monitoring and Evaluation Tool: Jevelyn M. Albiso Contact No. 9 1 6 2 2 9 6 1 2 8
7 pages
Lenovo Laptops: Installment Plan Name Details
No ratings yet
Lenovo Laptops: Installment Plan Name Details
4 pages
Adaptasi Lansia Dalam Memenuhi Tugas Perkembangan
No ratings yet
Adaptasi Lansia Dalam Memenuhi Tugas Perkembangan
6 pages
Electrical Engineer Resume Overview
No ratings yet
Electrical Engineer Resume Overview
2 pages
SM1 - Detailed Lesson Plan
No ratings yet
SM1 - Detailed Lesson Plan
13 pages
Elixair E400: Genano - Pure Air. Nothing Else
No ratings yet
Elixair E400: Genano - Pure Air. Nothing Else
2 pages
De Vegetalibus Magicis PDF
100% (1)
De Vegetalibus Magicis PDF
42 pages

Map-Reduce 1

Uploaded by

Map-Reduce 1

Uploaded by

“Map Reduce” Computing Paradigm

7-Aug-25 map-reduce computing paradigm 2

• Output: month wise maximum temperature.

7-Aug-25 map-reduce computing paradigm 3

7-Aug-25 map-reduce computing paradigm 4

Map-Reduce System does it

7-Aug-25 map-reduce computing paradigm 5

• “Reducers” pull data from all “mappers” as (key,

7-Aug-25 map-reduce computing paradigm 6

7-Aug-25 map-reduce computing paradigm 7

7-Aug-25 map-reduce computing paradigm 8

7-Aug-25 map-reduce computing paradigm 10

Figure Source: [2]

7-Aug-25 map-reduce computing 11

Shuffle and Sort

7-Aug-25 map-reduce computing paradigm 13

7-Aug-25 map-reduce computing paradigm 14

7-Aug-25 map-reduce computing paradigm 15

7-Aug-25 map-reduce computing paradigm 16

7-Aug-25 map-reduce computing paradigm 18

7-Aug-25 map-reduce computing 20

7-Aug-25 map-reduce computing paradigm 21

• Have selection condition applied in Map functions

7-Aug-25 map-reduce computing paradigm 22

7-Aug-25 map-reduce computing paradigm 23

7-Aug-25 map-reduce computing paradigm 24

7-Aug-25 map-reduce computing paradigm 25

7-Aug-25 map-reduce computing paradigm 26

7-Aug-25 map-reduce computing paradigm 27

7-Aug-25 map-reduce computing 28

7-Aug-25 map-reduce computing 29

7-Aug-25 map-reduce computing 30

7-Aug-25 map-reduce computing 32

7-Aug-25 map-reduce computing 33

7-Aug-25 map-reduce computing 34

7-Aug-25 map-reduce computing paradigm 35

Input: “Web Access Log” generated by

7-Aug-25 map-reduce computing paradigm 38

Input: page-URL, page-itself (say .html)

7-Aug-25 map-reduce computing paradigm 39

7-Aug-25 map-reduce computing paradigm 40

Input: document-URL, web-document (say .html) reduce(host-name, list-term-vectors)

7-Aug-25 map-reduce computing paradigm 41

7-Aug-25 map-reduce computing paradigm 42

7-Aug-25 map-reduce computing paradigm 43

Suppose this needs to be sorted on empno

The reduce function shall receive them sorted

7-Aug-25 map-reduce computing 45

7-Aug-25 map-reduce computing paradigm 47

7-Aug-25 map-reduce computing paradigm 48

7-Aug-25 Big Data Infrastructure 49

You might also like