Distributed Computing Lab Workbook V1.0
Distributed Computing Lab Workbook V1.0
Team DC
K L UNIVERSITY | DISTRIBURED COMPUTING – 17CS3115
17CS3115 DISTRIBUTED COMPUTING
LABORATORY WORKBOOK
STUDENT NAME
REG. NO
YEAR
SEMESTER
SECTION
FACULTY
1
17CS3115 DISTRIBUTED COMPUTING
Table of Contents
Organization of the STUDENT LAB WORKBOOK ..................................................................................................... 3
Lab-1: IMPLEMENTATION OF BASIC HADOOP COMMANDS. ................................................................................. 6
DISTRIBUTED COMPUTING CONTINUOUS PROJECT EVALUATION-I..................................................................... 13
LAB-2: IMPLEMENTATION OF WORD COUNT AND FREQUENCY OF A WORD IN A STRING IN JAVA .................... 14
LAB-3: IMPLEMENTATION OF WORD COUNT AND LENGTH OF WORD USING MAPREDUCE .............................. 24
LAB-4: IMPLEMENTATION OF HADOOP MAPREDUCE PROGRAM TO FIND NUMBER OF PRODUCTS IN EACH
COUNTRY .............................................................................................................................................................. 34
DISTRIBUTED COMPUTING CONTINUOUS PROJECT EVALUATION- II................................................................... 43
LAB-6: IMPLEMENTATION OF MPI PROGRAM TO CALCULATE SUM OF NUMBERS 1 TO 1000. ........................... 52
LAB-7: IMPLEMENTATION OF ALGORITHMS ON LEADER ELECTION .................................................................... 63
LAB-8: IMPLEMENTATION OF MUTUAL EXCLUSION -BAKERY ALGORITHM AND TEST & SET REGISTER .............. 71
LAB-9: IMPLEMENTATION OF MUTUAL EXCLUSION -LAMPORT’S ALGORITHM AND RICART’S AGARWALA’S
ALGORITHM IN JAVA. ........................................................................................................................................... 80
LAB-10: IMPLEMENTATION OF JAVA PROGRAM ON SYNCHRONIZATION. .......................................................... 88
LAB-11: IMPLEMENTATION OF JAVA PROGRAM ON VECTOR CLOCK’S. ............................................................... 97
LAB-12: IMPLEMENTATION OF JAVA PROGRAM FOR THE FAULT FREE SYSTEM. .............................................. 101
LAB-13: IMPLEMENTATION OF BASIC QUERIES IN HIVE. .................................................................................... 109
LAB-14: IMPLEMENTATION OF ADVANCED QUERIES IN HIVE ............................................................................ 116
LAB-15: IMPLEMENTATION OF BASIC QUERIES IN H-BASE ................................................................................ 123
2
17CS3115 DISTRIBUTED COMPUTING
The laboratory framework includes a creative element but shifts the time-intensive
aspects outside of the Two-Hourclosed laboratory period. Within this structure, each
laboratory includes three parts: Prelab, In-lab, and Post-lab.
a. Pre-Lab
The Prelab exercise is a homework assignment that links the lecture with the
laboratory period - typically takes 2 hours to complete. The goal is to synthesize the
information they learn in lecture with material from their textbook to produce a
working piece of software. Prelab Students attending a two-hour closed laboratory
are expected to make a good-faith effort to complete the Prelab exercise before
coming to the lab. Their work need not be perfect, but their effort must be real
(roughly 80 percent correct).
b. In-Lab
The In-lab section takes place during the actual laboratory period. The First hour of
the laboratory period can be used to resolve any problems the students might have
experienced in completing the Prelab exercises. The intent is to give constructive
feedback so that students leave the lab with working Prelab software - a significant
accomplishment on their part. During the second hour, students complete the In-lab
exercise to reinforce the concepts learned in the Prelab. Students leave the lab
having received feedback on their Prelab and In-lab work.
c. Post-Lab
The last phase of each laboratory is a homework assignment that is done following
the laboratory period. In the Post-lab, students analyse the efficiency or utility of a
given system call. Each Post-lab exercise should take roughly 120 minutes to
complete.
Note:
Additional files are required for completion of lab tasks.These files can be found in
the GitHub repository below:
http://github.com/dc-klu/labs
3
17CS3115 DISTRIBUTED COMPUTING
In-Lab
Sl. Pre-Lab Post Lab Viva Voce Total Faculty
Date Experiment Name
No (5M) LOGIC EXECUTION RESULT ANALYSIS (5M) (5M) (50M) Signature
(10M) (10M) (10M) (5M)
4
17CS3115 DISTRIBUTED COMPUTING
In-Lab
Sl. Pre-Lab Post Lab Viva Voce Total Faculty
Date Experiment Name
No (5M) LOGIC EXECUTION RESULT ANALYSIS (5M) (5M) (50M) Signature
(10M) (10M) (10M) (5M)
5
17CS3115 DISTRIBUTED COMPUTING
Prerequisite:
Linux Commands
File Systems
Pre-Lab:
1. What are distributed Systems?
6
17CS3115 DISTRIBUTED COMPUTING
4. What is Hadoop?
7
17CS3115 DISTRIBUTED COMPUTING
In-Lab:
To complete this lab, you will need the following:
8
17CS3115 DISTRIBUTED COMPUTING
9
17CS3115 DISTRIBUTED COMPUTING
10
17CS3115 DISTRIBUTED COMPUTING
Post-Lab:
1) What is a cluster?
2) Types of cluster?
11
17CS3115 DISTRIBUTED COMPUTING
12
17CS3115 DISTRIBUTED COMPUTING
TASK – I
Objective:
Setup a Single Node Hadoop Cluster in your laptops.
Description:
A Hadoop cluster is a special type of computational cluster designed specifically for storing
and analyzing huge amounts of unstructured data in a distributed computing environment.
Software:
Sample Output:
13
17CS3115 DISTRIBUTED COMPUTING
Prerequisite:
Java
Data Structures
Pre-lab:
Implement the HashSet and Map data structures in java.
HashSet
Input Format
In the first line, there will be an integer N denoting number of inputs. Each of the next N lines will
contain strings.
One Five
Two One
Three Four
Four Two
Five Three
14
17CS3115 DISTRIBUTED COMPUTING
15
17CS3115 DISTRIBUTED COMPUTING
Map Interface
Input Format
In the first line, there will be an integer N denoting number of inputs. Each of the next N lines will
contain strings.
1 One 1: One
2 Two 2: Two
3 Three 3: Three
4 Four 4: Four
5 Five 5: Five
16
17CS3115 DISTRIBUTED COMPUTING
17
17CS3115 DISTRIBUTED COMPUTING
In-Lab:
1) Write a Java program to print the number of occurrences of a given word in a text file.
Input Format
The required input file can be downloaded from the GitHub Repo,
Maven
Enter the search word: Maven
Cat
Cat
The word Maven occurred 2 times
Maven
Cat
18
17CS3115 DISTRIBUTED COMPUTING
19
17CS3115 DISTRIBUTED COMPUTING
2) Write a Java program to find the frequency of all the words in a given string.
Input Format
A String of size N will begiven as input
Output Format
Sample Input:
DBMS, DC, AI, DC, CN, OS, DC, DC, AI, AI, OS, DBMS, CN
Sample Output:
Output = DBMS, DC, AI, CN, OS
Values = [2, 4, 3, 2, 2]
20
17CS3115 DISTRIBUTED COMPUTING
21
17CS3115 DISTRIBUTED COMPUTING
Post-lab:
2. What are the various Hadoop daemons and explain their functionality?
3. What is a YARN?
4. What are the data accessing and data storage components of Hadoop?
22
17CS3115 DISTRIBUTED COMPUTING
5. What is the data integration, data management and data monitoring components of
Hadoop?
23
17CS3115 DISTRIBUTED COMPUTING
Prerequisite:
Hadoop Packages
MapReduce
Pre-Lab:
Write the Functionality / Description of the following interfaces/classes
Package Functionality
org.apache.hadoop.io. IntWritable
org.apache.hadoop.io.LongWritable
org.apache.hadoop.io.Text
org.apache.hadoop.mapred.MapReduceB
ase
org.apache.hadoop.mapred.Mapper
24
17CS3115 DISTRIBUTED COMPUTING
org.apache.hadoop.mapred.OutputCollec
tor
org.apache.hadoop.mapred.Reporter
java.util.StringTokenizer
25
17CS3115 DISTRIBUTED COMPUTING
setJarByClass
setMapperClass
setReducerClass
setNumReduceTasks
setOutputkeyClass
setOutputValueClass
addInputPath
26
17CS3115 DISTRIBUTED COMPUTING
setOutputPath
setMapOutputKeyClass
setMapOutputValueClass
setInputFormat
setOutputFormat
runJob
27
17CS3115 DISTRIBUTED COMPUTING
In-Lab:
1) Given a repository of text files, Using MapReduce find the frequency of each word.
Input Format
The required input file can be downloaded from the GitHub Repo,
Output Format
28
17CS3115 DISTRIBUTED COMPUTING
29
17CS3115 DISTRIBUTED COMPUTING
2) Given a repository of text files, find the number of words of each word length.
Input Format
The required input file can be downloaded from the GitHub Repo,
Output Format
The length delimited by tab space followed bytotal number of words of the given length
30
17CS3115 DISTRIBUTED COMPUTING
Post-Lab:
1) Given a repository of weather records, find the annual global minima and maxima by
year.
*The size of the repository is huge
In LAB’s Single Node Cluster, the time taken for MapReduce is 10 min
Perform the MapReduce on your single node cluster, benchmark it.
Do a detailed analysis and make a document of screenshots of the output.
Input Format
The required input file can be downloaded from the GitHub Repo,
Output Format
The global maxima and global minima should be delimited by tab space, next years’ record
should be displayed in next line.
31
17CS3115 DISTRIBUTED COMPUTING
32
17CS3115 DISTRIBUTED COMPUTING
33
17CS3115 DISTRIBUTED COMPUTING
Prerequisite:
MapReduce
Graphs
Pre-Lab:
1.Implement a Hadoop program to calculate sum of the n numbers using Map Reduce
Input Format
In the first line, there will be an integer N denoting number of inputs. Each of the next N
lines will contain N strings for which the sum to be calculated.
Input
The required input file can be downloaded from the GitHub Repository.
Ans.
34
17CS3115 DISTRIBUTED COMPUTING
35
17CS3115 DISTRIBUTED COMPUTING
36
17CS3115 DISTRIBUTED COMPUTING
In-Lab:
1. Find out Number of Products Sold in Each Country using MapReduce. The input data used
is SalesJan2009.csv. It contains Sales related information like Product name, price, payment
mode, city, country of client etc.
Input
The required input file can be downloaded from the GitHub Repository
Output
37
17CS3115 DISTRIBUTED COMPUTING
38
17CS3115 DISTRIBUTED COMPUTING
39
17CS3115 DISTRIBUTED COMPUTING
Post-Lab:
1.XYZ.com is an online music website where users listen to various tracks, the data gets
collected like shown below. Write a map reduce program to get following stats
The data is coming in log files and looks like as shown below.
UserId|TrackId|Shared|Radio|Skip
111115|222|0|1|0
111113|225|1|0|0
111117|223|0|1|1
111115|225|1|0|0
40
17CS3115 DISTRIBUTED COMPUTING
41
17CS3115 DISTRIBUTED COMPUTING
42
17CS3115 DISTRIBUTED COMPUTING
TASK – II
Objective:
Setup a Multi Node Hadoop Cluster in your laptops.
This Task to be done in a group of 3
Description:
A Hadoop cluster is a special type of computational cluster designed specifically for storing
and analyzing huge amounts of unstructured data in a distributed computing environment.
43
17CS3115 DISTRIBUTED COMPUTING
44
17CS3115 DISTRIBUTED COMPUTING
In-Lab:
1) Compute total and average salary of organization ABC and group by based on gender
Sample Input:
1 Bob M 25000
2 Mary F 30000
3 Jack M 27000
4 Rose F 15000
Sample Output:
F Total:45000 Average:22500
M Total:52000 Average:26000
45
17CS3115 DISTRIBUTED COMPUTING
46
17CS3115 DISTRIBUTED COMPUTING
2) The weather forecast department wants to know the year with maximum temperature till
date in Brooklyn
Sample Input:
2000 40
2001 36
2002 39
2003 44
Sample Output:
2003 44
47
17CS3115 DISTRIBUTED COMPUTING
48
17CS3115 DISTRIBUTED COMPUTING
Post-Lab:
Input Format
The required input file can be downloaded from the GitHub Repo,
Output Format
The length delimited by tab space followed bytotal number of words of the given length
N File
N,0,0,5
N,0,1,6
N,1,0,7
N,1,1,8
49
17CS3115 DISTRIBUTED COMPUTING
50
17CS3115 DISTRIBUTED COMPUTING
51
17CS3115 DISTRIBUTED COMPUTING
Pre-Lab:
52
17CS3115 DISTRIBUTED COMPUTING
53
17CS3115 DISTRIBUTED COMPUTING
MPI_Comm_size
MPI_Comm_rank
MPI_Send
MPI_Recv
MPI_Finalize
MPI_Get_processor_name
54
17CS3115 DISTRIBUTED COMPUTING
6)Write an MPI program to print “hello world” that runs on four processes.
55
17CS3115 DISTRIBUTED COMPUTING
In-Lab:
1) Write an MPI program that calculates the sum of numbers from 1 to 1000 in a parallel
fashion while executing on all the cluster nodes and providing the result at the end on only
one node. It should be noted that the print statement for the sum is only executed on the
node that is ranked zero (0) otherwise the statement would be printed as much time as the
number of nodes in the cluster.
Output Format
The length delimited by tab space followed bytotal number of words of the given length
56
17CS3115 DISTRIBUTED COMPUTING
57
17CS3115 DISTRIBUTED COMPUTING
2) Write an MPI code that creates an array on each process, initialize it on process 0 and
once the array has been initialized on process 0, then it is sent out to each process.
58
17CS3115 DISTRIBUTED COMPUTING
59
17CS3115 DISTRIBUTED COMPUTING
Post-Lab:
1. The following program demonstrate the use of send/receive function in which sender is
initialized as node two (2) whereas receiver is assigned as node four (4). The following
program requires that it should be accommodated on five (5) nodes otherwise the sender
and receiver should be initialized to suitable ranks.
60
17CS3115 DISTRIBUTED COMPUTING
61
17CS3115 DISTRIBUTED COMPUTING
62
17CS3115 DISTRIBUTED COMPUTING
Pre-Lab:
63
17CS3115 DISTRIBUTED COMPUTING
64
17CS3115 DISTRIBUTED COMPUTING
In-Lab:
65
17CS3115 DISTRIBUTED COMPUTING
66
17CS3115 DISTRIBUTED COMPUTING
b. Ring Algorithm
67
17CS3115 DISTRIBUTED COMPUTING
68
17CS3115 DISTRIBUTED COMPUTING
Post-Lab:
69
17CS3115 DISTRIBUTED COMPUTING
70
17CS3115 DISTRIBUTED COMPUTING
Pre-Lab:
2. What is the difference between mutual exclusion in distributed systems and mutual
exclusion in single computer system?
71
17CS3115 DISTRIBUTED COMPUTING
6. What are the powerful operations that are supported by shared variables?
72
17CS3115 DISTRIBUTED COMPUTING
8. Compare all 3 algorithm of mutual exclusion and explain how they are different from
other
73
17CS3115 DISTRIBUTED COMPUTING
In-Lab:
1) Implement BAKERY algorithm for mutual exclusion among n processors!!
Consider processors wishing to enter the critical section as customers in a bakery. Each
customer arriving at the bakery gets a number, and the one with smallest number is the
next to be served. The number of a customer who is not standing in line is set to 0(which
does not count as the smallest ticket)
By using the above data implement BAKERY algorithm in java.
Input Format
The required input file can be downloaded from the GitHub Repo,
74
17CS3115 DISTRIBUTED COMPUTING
75
17CS3115 DISTRIBUTED COMPUTING
76
17CS3115 DISTRIBUTED COMPUTING
Post-Lab:
1. What are the problems for Centralized, Distributed and Token Ring Algorithm?
2. What are the finite properties of Critical Section Execution or reliable delivery of
messages?
3. What is the Time complexity, Delay before entry (in message times) in Centralized,
Distributed and Token Ring Algorithm?
77
17CS3115 DISTRIBUTED COMPUTING
8. Mutual exclusion is "not sharing a resource" but what if there are more than one
instances of a resource and different processes using these instances concurrently, is it still
holding mutual exclusion in reference to this resource?
78
17CS3115 DISTRIBUTED COMPUTING
79
17CS3115 DISTRIBUTED COMPUTING
Pre-Lab:
1.What is the difference between Mutual Exclusion and Hold and Wait?
80
17CS3115 DISTRIBUTED COMPUTING
81
17CS3115 DISTRIBUTED COMPUTING
In-Lab:
1. Implement LAMPORT’S algorithm in java using queues.
82
17CS3115 DISTRIBUTED COMPUTING
83
17CS3115 DISTRIBUTED COMPUTING
2. Two sites on the distributed algorithm have requested access for the
criticalsection.Implement RICART AGARWALA’S algorithm in java using timestamps.
84
17CS3115 DISTRIBUTED COMPUTING
85
17CS3115 DISTRIBUTED COMPUTING
Post-Lab:
86
17CS3115 DISTRIBUTED COMPUTING
87
17CS3115 DISTRIBUTED COMPUTING
Pre-Lab:
1) What is the purpose of Synchronization in Distributed Systems?
88
17CS3115 DISTRIBUTED COMPUTING
89
17CS3115 DISTRIBUTED COMPUTING
In-Lab:
1. Implement Cristian's Algorithm for clock synchronization
Run the server first then run the client.
Constraints
Write two java programs one for server-side clock synchronization and the other for client-
side clock synchronization.
Run the server first then the client
Output:
Server
Enter the port no
2222
Server Started
Exiting
Client
Enter the port no
2222
Enter the host name
localhost
Client Started
90
17CS3115 DISTRIBUTED COMPUTING
91
17CS3115 DISTRIBUTED COMPUTING
92
17CS3115 DISTRIBUTED COMPUTING
Post-Lab:
1) Consider 3 processes having 5 events each, where each event starts at zero. Now by
calculating the time stamp of each event. Implement a java program by displaying a graph
using LAMPORT Algorithm?
93
17CS3115 DISTRIBUTED COMPUTING
94
17CS3115 DISTRIBUTED COMPUTING
2) A client’s clock needs 5:40:00 The server’s clock needs 5:30:00 Now when they
synchronized using Cristian’s algorithm assume Message delays are negligible. What is the
time at the client after the synchronization?
3) A client’s clock needs 1:30:00 The server’s clock needs 1:20:00 Now when they
synchronized using Berkeley’s algorithm assume Message delays are negligible. What is the
time at the client after the synchronization?
95
17CS3115 DISTRIBUTED COMPUTING
96
17CS3115 DISTRIBUTED COMPUTING
Pre-lab:
1. How do vector clocks work?
97
17CS3115 DISTRIBUTED COMPUTING
In-Lab:
1)Implement a program for generating a partial ordering of events in a distributed system
and detecting causality violations.
98
17CS3115 DISTRIBUTED COMPUTING
99
17CS3115 DISTRIBUTED COMPUTING
Post-Lab:
1)Why is Network Time Protocol important?
100
17CS3115 DISTRIBUTED COMPUTING
Pre-Lab:
1) Describe the kinds of problems that occur in distributed system also mention types of
failures and types of faults.
2) What is fault tolerance and mention two main reasons for the occurrence of a fault.
101
17CS3115 DISTRIBUTED COMPUTING
102
17CS3115 DISTRIBUTED COMPUTING
In-Lab:
1) Under the Byzantine failure, develop a java program for the fault free system which
executes a terminating program by sending out a finite sequence of integers
0,1,2,0,1,2,0,1,2. Specify a fault statement in the program (either using loops) where the
system changes the 2’s to 9’s.
Sample output: 0 1 2 0 1 2 0 1 2 0 1 9 0 1 9 0 1 9
103
17CS3115 DISTRIBUTED COMPUTING
104
17CS3115 DISTRIBUTED COMPUTING
end
Due to a malfunction, the following fault action is activated in the soda machine:
F:
do
Balance=5 ---> balance: = 50
end
Modify the program for the soda machine such that it tolerates the specific failure
and customers are not affected at all.
105
17CS3115 DISTRIBUTED COMPUTING
106
17CS3115 DISTRIBUTED COMPUTING
Post-Lab:
1) What is the importance of implementing a fault tolerance system. Are there any
limitations?
2) What do you mean by fault coverage? Compute the fault coverage of a system where 7
faults remained undetected out of the total 23 faults in the system.
107
17CS3115 DISTRIBUTED COMPUTING
108
17CS3115 DISTRIBUTED COMPUTING
Prerequisite:
SQL
Hadoop
Pre-lab:
1) What is hive?
4) What are the operations with which hive interacts with Hadoop framework?
109
17CS3115 DISTRIBUTED COMPUTING
110
17CS3115 DISTRIBUTED COMPUTING
In-Lab:
To complete this lab, you will need the following:
111
17CS3115 DISTRIBUTED COMPUTING
You are working for a government agency as a Data Scientist. You have been supplied a dataset with
consumer complaints received by financial institutions in 2013-15
Your task is to upload the data into a Database (which was done in the previous step) and perform
some preliminary analysis:
Write the queries and perform the operations in hive CLI (Command Line Interface)
112
17CS3115 DISTRIBUTED COMPUTING
7. Find out how many complaints were received and sent on the same day
113
17CS3115 DISTRIBUTED COMPUTING
9. Extract the complaints received in the states of New York and California
10. Extract all rows with the word “Credit” in the Product field
11. Extract all rows with the word “Late” in the Issue field
114
17CS3115 DISTRIBUTED COMPUTING
Post-Lab:
2. By default, which execution engine is used by hive? What are the other execution
engines available?
115
17CS3115 DISTRIBUTED COMPUTING
Prerequisite:
SQL
Hive
Hadoop
Pre-lab:
You are an analytics consultant helping a console games company conduct market research.
Download the CSV files from the GitHub Repo.
ConsoleGames.csv - a historic list of all console games released between 1980 and 2015
ConsoleDates.csv - a historic list of all console platforms (such as Wii,Play Station, Xbox) and
information about them
You have been tasked to upload the dataset into hive Database and perform the following
analytics:
1. Calculate what % of Global Sales were made in North America
2. Extract a view of the console game titles ordered by platform name in Ascending order
and Year of release in descending order
116
17CS3115 DISTRIBUTED COMPUTING
3. For each game title extract the first four letters of the publisher's name
4. Display all console platforms which were released either just before Black Friday or just
before Christmas (in any year)
5. Order the platforms by their longevity in ascending order (i.e. the platform which was
available for the longest at the bottom)
117
17CS3115 DISTRIBUTED COMPUTING
In-Lab:
Lab Instructions
Create files/directories wherever required to execute the following commands
Make a document containing the screenshots of the output and upload them to
LMS.
Step Action
118
17CS3115 DISTRIBUTED COMPUTING
You are a Data Analyst assisting a veterinarian clinic make sense of their data. Their data is
dispersed across multiple csv files and they need you to first upload all of them to database
and then perform the following analytics:
1. Extract information on pets’ names and owner names side-by-side
2. Find out which pets from this clinic had procedures performed
119
17CS3115 DISTRIBUTED COMPUTING
5. Extract a table of individual costs (procedure prices) incurred by owners of pets from the
clinic in question (this table should have owner and procedure price side-by-side)
120
17CS3115 DISTRIBUTED COMPUTING
Post-lab:
Download the required files from the GitHub Repo.
You have been tasked to upload the dataset into hive Database and perform the following
analytics.
From the dataset given in GitHub Repo, write query for the following.
1. Identify which numbers made calls as well as made SMS messages, where number of calls
made should be more than 10 and number of messages should be more than 5.
2. What is the length of average SMS conversation? A length of 2 means that there were 2 SMS
messages exchanged the two parties.
121
17CS3115 DISTRIBUTED COMPUTING
4. What is the frequency of 'SMS', 'calls' by each day of week? What about by hour of day?
5. Find which users sent at least 10 more messages in the month of Jan than in the month of
Feb.
122
17CS3115 DISTRIBUTED COMPUTING
Prerequisite:
No-SQL
Hadoop
Pre-lab:
1. What is a HBase?
123
17CS3115 DISTRIBUTED COMPUTING
124
17CS3115 DISTRIBUTED COMPUTING
In-Lab:
To complete this lab, you will need the following:
2 User Login
Login to user ‘klu’ with the credentials ‘klu123’
3 Terminal
Open Gnome Terminal Window by searching in start-menu or can use the
shortcut ‘ctrl+alt+t’ (In ubuntu distros’)
4 Login to hadoop
klu@klu:~$ su – hadoop
Enter the password ‘hadoop’ for logging in
7 Start HBase
hadoop@klu:~$ hbase shell
125
17CS3115 DISTRIBUTED COMPUTING
Walmart store all data of customers, purchases and stock details in a single node database. Analytics
on that data takes long and there is a high risk of losing data. So, the CTO of Walmart implemented
HBase a No-SQL database which is built on top of Hadoop ecosystem.
126
17CS3115 DISTRIBUTED COMPUTING
5. Create a relation for list of products that are available in the store? (hint: relation consisting of
product name, cost, no-of-units)
127
17CS3115 DISTRIBUTED COMPUTING
Post-Lab:
1. Define the difference between Hive and HBase?
128