BDA Lab
Aim: To implement matrix multiplication using Map Reduce.
Theory:
Matrix Multiplication with One MapReduce Step:
There often is more than one way to use MapReduce to solve a problem. You may wish to use
only a single MapReduce pass to perform matrix multiplication P = MN. 5 It is possible to do so
if we put more work into the two functions. Start by using the Map function to create the sets
of matrix elements that are needed to compute each element of the answer P. Notice that an
element of M or N contributes to many elements of the result, so one input element will be
turned into many key-value pairs. The keys will be pairs (i, k), where i is a row of M and k is a
column of N. Here is a synopsis of the Map and Reduce functions.
The Map Function: For each element mij of M, produce all the key-value pairs (i, k), (M, j, mij )
for k = 1, 2, . . ., up to the number of columns of N. Similarly, for each element njk of N, produce
all the key-value pairs (i, k), (N, j, njk) fori = 1, 2, . . ., up to the number of rows of M. As before,
M and N are really bits to tell which of the two matrices a value comes from.
The Reduce Function: Each key (i, k) will have an associated list with all the values (M, j, mij )
and (N, j, njk), for all possible values of j. The Reduce function needs to connect the two values
on the list that have the same value of j, for each j. An easy way to do this step is to sort by j the
values that begin with M and sort by j the values that begin with N, in separate lists. The jth
values on each list must have their third components, mij and njk extracted and multiplied.
Then, these products are summed and the result is paired with (i, k) in the output of the Reduce
function
You may notice that if a row of the matrix M or a column of the matrix N is so large that it will
not fit in main memory, then the Reduce tasks will be forced to use an external sort to order
the values associated with a given key (i, k). However, in that case, the matrices themselves are
so large, perhaps 1020 elements, that it is unlikely we would attempt this calculation if the
matrices were dense. If they are sparse, then we would expect many fewer values to be
associated with any one key, and it would be feasible to do the sum of products in main
memory.
PseudoCode:
map(key, value):
// value is ("A", i, j, a_ij) or ("B", j, k, b_jk) if
value[0] == "A":
i = value[1] j=
value[2] a_ij =
value[3] for k = 1
to p: emit((i, k),
(A, j, a_ij)) else: j
= value[1] k=
value[2] b_jk =
value[3] for i = 1 to
m: emit((i, k), (B,
j, b_jk))
reduce(key, values):
// key is (i, k)
// values is a list of ("A", j, a_ij) and ("B", j, b_jk)
hash_A = {j: a_ij for (x, j, a_ij) in values if x == A} hash_B
= {j: b_jk for (x, j, b_jk) in values if x == B} result = 0 for
j = 1 to n:
result += hash_A[j] * hash_B[j] emit(key, result)
https://lendap.wordpress.com/2015/02/16/matrix-multiplication-with-mapreduce/
SOURCE CODE:
1. https://github.com/marufaytekin/MatrixMultiply
2. https://adhoop.wordpress.com/2012/03/31/matrix-multiplication-using-mapreduce-
1step-solution/
[training@localhost ~]$ export JAVA_HOME=/usr/java/jdk1.6.0_25
[training@localhost ~]$ mkdir -p build
[training@localhost ~]$ javac -cp /usr/lib/hadoop/*:/usr/lib/hadoop/hadoop-core.jar
Matrix.java -d build
[training@localhost ~]$ jar -cvf Matrix.jar -C build/ .added manifest
[training@localhost ~]$ hadoop jar Matrix.jar Matrix matrixInput matrixOutput216/03/01
19:21:00 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments.
Applications should implement Tool for the same.
[training@localhost ~]$ hadoopfs -cat /user/training/matrixOutput2/*
Input:
A,0,1,1.0
A,0,2,2.0
A,0,3,3.0
A,0,4,4.0
A,1,0,5.0
A,1,1,6.0
A,1,2,7.0
A,1,3,8.0
A,1,4,9.0
B,0,1,1.0
B,0,2,2.0
B,1,0,3.0
B,1,1,4.0
B,1,2,5.0
B,2,0,6.0
B,2,1,7.0
B,2,2,8.0
B,3,0,9.0
B,3,1,10.0
B,3,2,11.0
B,4,0,12.0
B,4,1,13.0
B,4,2,14.0
Output:
0,0,90.0
0,1,100.0
0,2,110.0
1,0,240.0
1,1,275.0
1,2,310.0
Conclusion: Thus, we have implemented one-step matrix multiplication using MapReduce.