[go: up one dir, main page]

100% found this document useful (1 vote)
201 views129 pages

Distributed Computing Lab Workbook V1.0

Here are the steps to complete the lab: 1. Boot up Linux Subsystem and login as user 'klu' 2. Open a terminal window and login to hadoop using 'su - hadoop' 3. Start the Hadoop daemons using 'start-all.sh' 4. Check running services using 'jps' 5. Practice HDFS commands: - List HDFS home directory contents using 'hadoop fs -ls' - Recursively list contents using 'hadoop fs -lsr' - View file contents using 'hadoop fs -cat' - Create directory 'Lab1' using 'hadoop fs -mkdir' - Copy file/directory using

Uploaded by

ch.Bhanu rekha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
201 views129 pages

Distributed Computing Lab Workbook V1.0

Here are the steps to complete the lab: 1. Boot up Linux Subsystem and login as user 'klu' 2. Open a terminal window and login to hadoop using 'su - hadoop' 3. Start the Hadoop daemons using 'start-all.sh' 4. Check running services using 'jps' 5. Practice HDFS commands: - List HDFS home directory contents using 'hadoop fs -ls' - Recursively list contents using 'hadoop fs -lsr' - View file contents using 'hadoop fs -cat' - Create directory 'Lab1' using 'hadoop fs -mkdir' - Copy file/directory using

Uploaded by

ch.Bhanu rekha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 129

LAB WORKBOOK

17CS3115 DISTRIBUTED COMPUTING

Team DC
K L UNIVERSITY | DISTRIBURED COMPUTING – 17CS3115
17CS3115 DISTRIBUTED COMPUTING

LABORATORY WORKBOOK

STUDENT NAME
REG. NO
YEAR
SEMESTER
SECTION
FACULTY

1
17CS3115 DISTRIBUTED COMPUTING

Table of Contents
Organization of the STUDENT LAB WORKBOOK ..................................................................................................... 3
Lab-1: IMPLEMENTATION OF BASIC HADOOP COMMANDS. ................................................................................. 6
DISTRIBUTED COMPUTING CONTINUOUS PROJECT EVALUATION-I..................................................................... 13
LAB-2: IMPLEMENTATION OF WORD COUNT AND FREQUENCY OF A WORD IN A STRING IN JAVA .................... 14
LAB-3: IMPLEMENTATION OF WORD COUNT AND LENGTH OF WORD USING MAPREDUCE .............................. 24
LAB-4: IMPLEMENTATION OF HADOOP MAPREDUCE PROGRAM TO FIND NUMBER OF PRODUCTS IN EACH
COUNTRY .............................................................................................................................................................. 34
DISTRIBUTED COMPUTING CONTINUOUS PROJECT EVALUATION- II................................................................... 43
LAB-6: IMPLEMENTATION OF MPI PROGRAM TO CALCULATE SUM OF NUMBERS 1 TO 1000. ........................... 52
LAB-7: IMPLEMENTATION OF ALGORITHMS ON LEADER ELECTION .................................................................... 63
LAB-8: IMPLEMENTATION OF MUTUAL EXCLUSION -BAKERY ALGORITHM AND TEST & SET REGISTER .............. 71
LAB-9: IMPLEMENTATION OF MUTUAL EXCLUSION -LAMPORT’S ALGORITHM AND RICART’S AGARWALA’S
ALGORITHM IN JAVA. ........................................................................................................................................... 80
LAB-10: IMPLEMENTATION OF JAVA PROGRAM ON SYNCHRONIZATION. .......................................................... 88
LAB-11: IMPLEMENTATION OF JAVA PROGRAM ON VECTOR CLOCK’S. ............................................................... 97
LAB-12: IMPLEMENTATION OF JAVA PROGRAM FOR THE FAULT FREE SYSTEM. .............................................. 101
LAB-13: IMPLEMENTATION OF BASIC QUERIES IN HIVE. .................................................................................... 109
LAB-14: IMPLEMENTATION OF ADVANCED QUERIES IN HIVE ............................................................................ 116
LAB-15: IMPLEMENTATION OF BASIC QUERIES IN H-BASE ................................................................................ 123

2
17CS3115 DISTRIBUTED COMPUTING

Organization of the STUDENT LAB WORKBOOK

The laboratory framework includes a creative element but shifts the time-intensive
aspects outside of the Two-Hourclosed laboratory period. Within this structure, each
laboratory includes three parts: Prelab, In-lab, and Post-lab.
a. Pre-Lab
The Prelab exercise is a homework assignment that links the lecture with the
laboratory period - typically takes 2 hours to complete. The goal is to synthesize the
information they learn in lecture with material from their textbook to produce a
working piece of software. Prelab Students attending a two-hour closed laboratory
are expected to make a good-faith effort to complete the Prelab exercise before
coming to the lab. Their work need not be perfect, but their effort must be real
(roughly 80 percent correct).
b. In-Lab
The In-lab section takes place during the actual laboratory period. The First hour of
the laboratory period can be used to resolve any problems the students might have
experienced in completing the Prelab exercises. The intent is to give constructive
feedback so that students leave the lab with working Prelab software - a significant
accomplishment on their part. During the second hour, students complete the In-lab
exercise to reinforce the concepts learned in the Prelab. Students leave the lab
having received feedback on their Prelab and In-lab work.
c. Post-Lab
The last phase of each laboratory is a homework assignment that is done following
the laboratory period. In the Post-lab, students analyse the efficiency or utility of a
given system call. Each Post-lab exercise should take roughly 120 minutes to
complete.
Note:
Additional files are required for completion of lab tasks.These files can be found in
the GitHub repository below:
http://github.com/dc-klu/labs

3
17CS3115 DISTRIBUTED COMPUTING

2019-20 EVEN SEMESTER LAB CONTINUOUS EVALUATION

In-Lab
Sl. Pre-Lab Post Lab Viva Voce Total Faculty
Date Experiment Name
No (5M) LOGIC EXECUTION RESULT ANALYSIS (5M) (5M) (50M) Signature
(10M) (10M) (10M) (5M)

4
17CS3115 DISTRIBUTED COMPUTING

2019-20 EVEN SEMESTER LAB CONTINUOUS EVALUATION

In-Lab
Sl. Pre-Lab Post Lab Viva Voce Total Faculty
Date Experiment Name
No (5M) LOGIC EXECUTION RESULT ANALYSIS (5M) (5M) (50M) Signature
(10M) (10M) (10M) (5M)

5
17CS3115 DISTRIBUTED COMPUTING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


SUBJECT CODE: 17CS3115
DISTRIBUTED COMPUTING WORKBOOK

LAB-1: IMPLEMENTATION OF BASIC HADOOP COMMANDS.

Date of the Session: ___/___/___ Time of the Session:_____to______

Prerequisite:
 Linux Commands
 File Systems

Pre-Lab:
1. What are distributed Systems?

2. What is distributed Computing?

3. Applications of distributed computing.

6
17CS3115 DISTRIBUTED COMPUTING

4. What is Hadoop?

5. Advantages of Distributed Computing?

7
17CS3115 DISTRIBUTED COMPUTING

In-Lab:
To complete this lab, you will need the following:

 Hadoop Environment installed preferably in Linux Subsystem.


Lab Instructions
 Create files/directories wherever required to execute the following commands
 Make a document containing the screenshots of the output and upload them to
LMS.
Step Action

1 Boot up Linux Subsystem


 Make sure you choose Linux Subsystem while booting, rather Windows.
2 User Login
 Login to user ‘klu’ with the credentials ‘klu123’
3 Terminal
 Open Gnome Terminal Window by searching in start-menu or can use the
shortcut ‘ctrl+alt+t’
4 Login to hadoop
 klu@klu:~$ su – hadoop
 Enter the password ‘hadoop’ for logging in
5 Start the Hadoop components (daemons) on the Linux Subsystem.
 You can practice starting all components with these commands. Please note
they will take a few minutes to run:
 hadoop@klu:~$ start-all.sh
6 Check the running hadoop services
 hadoop@klu:~$ jps
7 Exploring Hadoop Distributed File System (HDFS)
You can use the command-line approach and invoke the FileSystem (fs) shell using the
format:
 hadoop fs <args> Or by hdfs dfs <args>
To list the contents of the hadoop home directory
 hadoop fs –ls
Or
 hadoop fs -ls /usr/hadoop
To do a recursive listing we’ll use the –lsr command rather than just –ls
 hadoop fs -ls -R /usr
To view the contents of a file
 Hadoop fs -cat <file path>
To create the directory ‘Lab1’ you can issue the following command:
 hadoop fs -mkdir Lab1
To copy a file in HDFS
 Hadoop fs –cp <path of src file/direc><path of dest file>
To copy a directory in HDFS
 Hadoop fs –cp –r <path of src directory><path of dest directory>
To remove a file in HDFS
 Hadoop fs –rm <path of file>

8
17CS3115 DISTRIBUTED COMPUTING

To remove a directory in HDFS


 Hadoop fs –rm -r <path of directory>
To copy files between your regular Linux filesystem and HDFS you can use the put and
get commands.
 Hadoop fs –put <local system file path of file><hdfs directory path>
 Hadoop fs –get <hdfs file path><local system directory path>
Or
 hadoop fs –copyFromLocal <local system file path of file><hdfs directory path>
 hadoop fs –copyToLocal <hdfs file path><local system directory path>
To move files between your regular Linux filesystem and HDFS you can use the put and
get commands.
 hadoop fs –moveFromLocal <local system file path of file><hdfs directory path>
 hadoop fs –moveToLocal <hdfs file path><local system directory path>
To change the replication factor of the file in HDFS
 hadoop fs –setrep <number><path to file>
To check block size and replication factor of a file
 hadoop fsck <path to the file>
To change the block size
 hadoop fs -D dfs.blocksize=<block size in bytes> -put <file path><hdfs directory
path>
To get safe mode status
 hadoop dfsadmin –safemode get
To enter safe mode
 hadoop dfsadmin –safemode enter
To exit safe mode
 hadoop dfsadmin –safemode leave
To display bytes info
 hadoop fs –du <file path>
8 To Stop the Hadoop Services
 hadoop@klu: ~$ stop-all.sh

9
17CS3115 DISTRIBUTED COMPUTING

Writing space for the Problem:(For Student’s use only)

10
17CS3115 DISTRIBUTED COMPUTING

Post-Lab:

1) What is a cluster?

2) Types of cluster?

3) Difference between parallel and distributed computing?

11
17CS3115 DISTRIBUTED COMPUTING

4) Use cases of cluster.

5) Advantages and disadvantages of a cluster.

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

12
17CS3115 DISTRIBUTED COMPUTING

DISTRIBUTED COMPUTING CONTINUOUS PROJECT EVALUATION-I

TASK – I
Objective:
Setup a Single Node Hadoop Cluster in your laptops.
Description:
A Hadoop cluster is a special type of computational cluster designed specifically for storing
and analyzing huge amounts of unstructured data in a distributed computing environment.

 In this Task you should be setting up a standalone Hadoop Cluster


 The single node cluster should be able
o Start all the Node(i.e.) NAMENODE, DATANODE, NODEMANAGER,
ResourceManager, Secondary NAMENODE
o Run a sample MapReduce on a sample input file
Software and Hardware Recommendations:
Hardware:

If Setting up on a Virtual Machine,

 Allocate a minimum of 8GB of RAM


 Allocate 2 cores of CPU
 Storage of 20GB

Software:

 Linux Distro (Ubuntu or CentOS recommended)


 JDK version 7 or 8 (Higher versions not recommended)
 Hadoop version 2.x.x

Sample Output:

13
17CS3115 DISTRIBUTED COMPUTING

LAB-2: IMPLEMENTATION OF WORD COUNT AND FREQUENCY OF A WORD IN A


STRING IN JAVA

Date of the Session: ___/___/___ Time of the Session:_____to______

Prerequisite:
 Java
 Data Structures

Pre-lab:
Implement the HashSet and Map data structures in java.
HashSet
Input Format

In the first line, there will be an integer N denoting number of inputs. Each of the next N lines will
contain strings.

Sample Input Sample Output


5

One Five

Two One

Three Four

Four Two

Five Three

14
17CS3115 DISTRIBUTED COMPUTING

Writing space for the Problem:(For Student’s use only)

15
17CS3115 DISTRIBUTED COMPUTING

Map Interface
Input Format

In the first line, there will be an integer N denoting number of inputs. Each of the next N lines will
contain strings.

Sample Input Sample Output


5

1 One 1: One

2 Two 2: Two

3 Three 3: Three

4 Four 4: Four

5 Five 5: Five

16
17CS3115 DISTRIBUTED COMPUTING

Writing space for the Problem:(For Student’s use only)

17
17CS3115 DISTRIBUTED COMPUTING

In-Lab:

1) Write a Java program to print the number of occurrences of a given word in a text file.
Input Format

The required input file can be downloaded from the GitHub Repo,

Sample Input Sample Output


Mate

Maven
Enter the search word: Maven
Cat

Cat
The word Maven occurred 2 times
Maven

Cat

18
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

19
17CS3115 DISTRIBUTED COMPUTING

2) Write a Java program to find the frequency of all the words in a given string.
Input Format
A String of size N will begiven as input
Output Format

Sample Input:
DBMS, DC, AI, DC, CN, OS, DC, DC, AI, AI, OS, DBMS, CN
Sample Output:
Output = DBMS, DC, AI, CN, OS
Values = [2, 4, 3, 2, 2]

20
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

21
17CS3115 DISTRIBUTED COMPUTING

Post-lab:

1. How is HDFS different from other file systems?

2. What are the various Hadoop daemons and explain their functionality?

3. What is a YARN?

4. What are the data accessing and data storage components of Hadoop?

22
17CS3115 DISTRIBUTED COMPUTING

5. What is the data integration, data management and data monitoring components of
Hadoop?

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

23
17CS3115 DISTRIBUTED COMPUTING

LAB-3: IMPLEMENTATION OF WORD COUNT AND LENGTH OF WORD USING


MAPREDUCE

Date of the Session: ___/___/___ Time of the Session:_____to______

Prerequisite:
 Hadoop Packages
 MapReduce

Pre-Lab:
Write the Functionality / Description of the following interfaces/classes

Package Functionality

org.apache.hadoop.io. IntWritable

org.apache.hadoop.io.LongWritable

org.apache.hadoop.io.Text

org.apache.hadoop.mapred.MapReduceB
ase

org.apache.hadoop.mapred.Mapper

24
17CS3115 DISTRIBUTED COMPUTING

org.apache.hadoop.mapred.OutputCollec
tor

org.apache.hadoop.mapred.Reporter

java.util.StringTokenizer

Writing space of the Problem:(For Student’s use only)

25
17CS3115 DISTRIBUTED COMPUTING

Write the functionality of the following methods

Method Functionality and Syntax

setJarByClass

setMapperClass

setReducerClass

setNumReduceTasks

setOutputkeyClass

setOutputValueClass

addInputPath

26
17CS3115 DISTRIBUTED COMPUTING

setOutputPath

setMapOutputKeyClass

setMapOutputValueClass

setInputFormat

setOutputFormat

runJob

27
17CS3115 DISTRIBUTED COMPUTING

In-Lab:

1) Given a repository of text files, Using MapReduce find the frequency of each word.
Input Format

The required input file can be downloaded from the GitHub Repo,

Output Format

The word delimited by tab space and the frequency of word

Sample Input Sample Output


Mate Maven Aubrie Cat 6
Maven Sense Dome Maven 5
Cat Mate Maven
Mate 2
CatCatCat
Aubrie 2
Maven Cat Sense
Sense 2
Cat Dome Maven
Dome 1

28
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

29
17CS3115 DISTRIBUTED COMPUTING

2) Given a repository of text files, find the number of words of each word length.

Input Format
The required input file can be downloaded from the GitHub Repo,
Output Format

The length delimited by tab space followed bytotal number of words of the given length

Sample Input Sample Output


Man can dust dawn sun Length Words
Santa Impossible draw 3 3
4 3
5 1
10 1

30
17CS3115 DISTRIBUTED COMPUTING

Post-Lab:
1) Given a repository of weather records, find the annual global minima and maxima by
year.
*The size of the repository is huge
 In LAB’s Single Node Cluster, the time taken for MapReduce is 10 min
Perform the MapReduce on your single node cluster, benchmark it.
Do a detailed analysis and make a document of screenshots of the output.
Input Format
The required input file can be downloaded from the GitHub Repo,
Output Format
The global maxima and global minima should be delimited by tab space, next years’ record
should be displayed in next line.

31
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

32
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

33
17CS3115 DISTRIBUTED COMPUTING

LAB-4: IMPLEMENTATION OF HADOOP MAPREDUCE PROGRAM TO FIND NUMBER


OF PRODUCTS IN EACH COUNTRY

Date of the Session: ___/___/___ Time of the Session:_____to______

Prerequisite:
 MapReduce
 Graphs
Pre-Lab:

1.Implement a Hadoop program to calculate sum of the n numbers using Map Reduce

Input Format

In the first line, there will be an integer N denoting number of inputs. Each of the next N
lines will contain N strings for which the sum to be calculated.

Input

The required input file can be downloaded from the GitHub Repository.

Ans.

34
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

35
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

36
17CS3115 DISTRIBUTED COMPUTING

In-Lab:

1. Find out Number of Products Sold in Each Country using MapReduce. The input data used
is SalesJan2009.csv. It contains Sales related information like Product name, price, payment
mode, city, country of client etc.

Input

The required input file can be downloaded from the GitHub Repository

Output

37
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

38
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

39
17CS3115 DISTRIBUTED COMPUTING

Post-Lab:

1.XYZ.com is an online music website where users listen to various tracks, the data gets
collected like shown below. Write a map reduce program to get following stats

Number of unique listeners

Number of times the track was shared with others

Number of times the track was listened to on the radio

Number of times the track was listened to in total

Number of times the track was skipped on the radio

The data is coming in log files and looks like as shown below.

UserId|TrackId|Shared|Radio|Skip

111115|222|0|1|0

111113|225|1|0|0

111117|223|0|1|1

111115|225|1|0|0

40
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

41
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

42
17CS3115 DISTRIBUTED COMPUTING

DISTRIBUTED COMPUTING CONTINUOUS PROJECT EVALUATION- II

TASK – II
Objective:
Setup a Multi Node Hadoop Cluster in your laptops.
This Task to be done in a group of 3
Description:
A Hadoop cluster is a special type of computational cluster designed specifically for storing
and analyzing huge amounts of unstructured data in a distributed computing environment.

 In this Task you should be setting up a Multi Node Hadoop Cluster


 The Multi node cluster should be able
o Start all the Node(i.e.) NAMENODE, DATANODE, NodeManager, RESOURCE
MANAGER, SECONDAY NAMENODE
o Run a sample MapReduce on a sample input file

43
17CS3115 DISTRIBUTED COMPUTING

LAB-4: IMPLEMENTATION OF HADOOP MAPREDUCE PROGRAM TO FIND THE TOTAL


AND AVERAGE OF SALARY AND MAXIMUM TEMPERATURE OF THE YEAR.

Date of the Session: ___/___/___ Time of the Session:_____to______


Pre-Lab:
1) Discuss the main components of MapReduce job

2) Compare Spark and MapReduce

3) How would you split data into Hadoop?

4) How many daemon processes run on a Hadoop system?

44
17CS3115 DISTRIBUTED COMPUTING

In-Lab:
1) Compute total and average salary of organization ABC and group by based on gender

Sample Input:
1 Bob M 25000
2 Mary F 30000
3 Jack M 27000
4 Rose F 15000

Sample Output:
F Total:45000 Average:22500
M Total:52000 Average:26000

45
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

46
17CS3115 DISTRIBUTED COMPUTING

2) The weather forecast department wants to know the year with maximum temperature till
date in Brooklyn

Sample Input:
2000 40
2001 36
2002 39
2003 44

Sample Output:
2003 44

47
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

48
17CS3115 DISTRIBUTED COMPUTING

Post-Lab:

1) Implement a program to calculate matrix multiplication using map reduce

Input Format
The required input file can be downloaded from the GitHub Repo,
Output Format
The length delimited by tab space followed bytotal number of words of the given length

Sample Input Sample Output


Inputs: -
0,0,19.0
M file
M,0,0,1 0,1,22.0
M,0,1,2 1,0,43.0
M,1,0,3 1,1,50.0
M,1,1,4

N File
N,0,0,5
N,0,1,6
N,1,0,7
N,1,1,8

49
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

50
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

51
17CS3115 DISTRIBUTED COMPUTING

LAB-6: IMPLEMENTATION OF MPI PROGRAM TO CALCULATE SUM OF NUMBERS 1


TO 1000.

Date of the Session: ___/___/___ Time of the Session:_____to______

Pre-Lab:

1) What is Message Passing Interface (MPI)?

2) List few MPI datatypes.

52
17CS3115 DISTRIBUTED COMPUTING

3)What are synchronous and asynchronous systems?

4)Write the difference between Broadcast and Flooding algorithms.

53
17CS3115 DISTRIBUTED COMPUTING

5)Write the functionality of the following methods


Functions Functionality and syntax
MPI_Init

MPI_Comm_size

MPI_Comm_rank

MPI_Send

MPI_Recv

MPI_Finalize

MPI_Get_processor_name

54
17CS3115 DISTRIBUTED COMPUTING

6)Write an MPI program to print “hello world” that runs on four processes.

55
17CS3115 DISTRIBUTED COMPUTING

In-Lab:
1) Write an MPI program that calculates the sum of numbers from 1 to 1000 in a parallel
fashion while executing on all the cluster nodes and providing the result at the end on only
one node. It should be noted that the print statement for the sum is only executed on the
node that is ranked zero (0) otherwise the statement would be printed as much time as the
number of nodes in the cluster.
Output Format
The length delimited by tab space followed bytotal number of words of the given length

56
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

57
17CS3115 DISTRIBUTED COMPUTING

2) Write an MPI code that creates an array on each process, initialize it on process 0 and
once the array has been initialized on process 0, then it is sent out to each process.

58
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

59
17CS3115 DISTRIBUTED COMPUTING

Post-Lab:

1. The following program demonstrate the use of send/receive function in which sender is
initialized as node two (2) whereas receiver is assigned as node four (4). The following
program requires that it should be accommodated on five (5) nodes otherwise the sender
and receiver should be initialized to suitable ranks.

60
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

61
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

62
17CS3115 DISTRIBUTED COMPUTING

LAB-7: IMPLEMENTATION OF ALGORITHMS ON LEADER ELECTION

Date of the Session: ___/___/___ Time of the Session:_____to______

Pre-Lab:

1. What do you understand by leader election in Distributed Systems?

2. What is the need of a leader in Distributed Systems?

3. Write Pseudo code for Bully Election Algorithm

63
17CS3115 DISTRIBUTED COMPUTING

4. Write Pseudo code for Ring Election Algorithm

64
17CS3115 DISTRIBUTED COMPUTING

In-Lab:

1. Implement the following Algorithms


a. Bully Algorithm
Input Format
The required input file can be downloaded from the GitHub Repo,

65
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

66
17CS3115 DISTRIBUTED COMPUTING

b. Ring Algorithm

67
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

68
17CS3115 DISTRIBUTED COMPUTING

Post-Lab:

1. What is Apache Zookeeper?

2. What is the role of Apache Zookeeper in Distributed Systems?

3. What are the functionalities of Apache Zookeeper?

69
17CS3115 DISTRIBUTED COMPUTING

4. Draw the architecture of Apache Zookeeper?

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

70
17CS3115 DISTRIBUTED COMPUTING

LAB-8: IMPLEMENTATION OF MUTUAL EXCLUSION -BAKERY ALGORITHM AND TEST


& SET REGISTER

Date of the Session:___/___/___ Time of the Session:_____to______

Pre-Lab:

1. Is mutual exclusion possible in distributed system? Explain why?

2. What is the difference between mutual exclusion in distributed systems and mutual
exclusion in single computer system?

3. What are the requirements for mutual exclusion algorithms?

4. Difference between mutual exclusion and synchronization?

71
17CS3115 DISTRIBUTED COMPUTING

5. How race condition is prevented in mutual exclusion in distributed systems?

6. What are the powerful operations that are supported by shared variables?

7. Advantages and disadvantages of centralized algorithm

72
17CS3115 DISTRIBUTED COMPUTING

8. Compare all 3 algorithm of mutual exclusion and explain how they are different from
other

9. Explain how mutual exclusion creates problem in distributed systems?

10. What is Starvation?

73
17CS3115 DISTRIBUTED COMPUTING

In-Lab:
1) Implement BAKERY algorithm for mutual exclusion among n processors!!
Consider processors wishing to enter the critical section as customers in a bakery. Each
customer arriving at the bakery gets a number, and the one with smallest number is the
next to be served. The number of a customer who is not standing in line is set to 0(which
does not count as the smallest ticket)
By using the above data implement BAKERY algorithm in java.

Input Format
The required input file can be downloaded from the GitHub Repo,

74
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

75
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

76
17CS3115 DISTRIBUTED COMPUTING

Post-Lab:
1. What are the problems for Centralized, Distributed and Token Ring Algorithm?

2. What are the finite properties of Critical Section Execution or reliable delivery of
messages?

3. What is the Time complexity, Delay before entry (in message times) in Centralized,
Distributed and Token Ring Algorithm?

4. Explain each algorithm in mutual exclusion with a simple real-time example?

77
17CS3115 DISTRIBUTED COMPUTING

5. What are the characteristics of mutual exclusion using centralized approach?

6. What is mutual exclusion with busy waiting?

7. What is the difference between synchronization and mutual exclusion?

8. Mutual exclusion is "not sharing a resource" but what if there are more than one
instances of a resource and different processes using these instances concurrently, is it still
holding mutual exclusion in reference to this resource?

78
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

79
17CS3115 DISTRIBUTED COMPUTING

LAB-9: IMPLEMENTATION OF MUTUAL EXCLUSION -LAMPORT’S ALGORITHM AND


RICART’S AGARWALA’S ALGORITHM IN JAVA.

Date of the Session: ___/___/___ Time of the Session: _____to______

Pre-Lab:
1.What is the difference between Mutual Exclusion and Hold and Wait?

2. What does mean by LAMPORT’S Time Stamp in Mutual Exclusion?

3. What is the difference between LAMPORT and RICART-AGARWALA’s algorithms?

4. How spooling can help in preventing mutual exclusion?

80
17CS3115 DISTRIBUTED COMPUTING

5. What are the different non-token-based algorithms?

81
17CS3115 DISTRIBUTED COMPUTING

In-Lab:
1. Implement LAMPORT’S algorithm in java using queues.

82
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

83
17CS3115 DISTRIBUTED COMPUTING

2. Two sites on the distributed algorithm have requested access for the
criticalsection.Implement RICART AGARWALA’S algorithm in java using timestamps.

84
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

85
17CS3115 DISTRIBUTED COMPUTING

Post-Lab:

1. What is Maekawa’s algorithm. Mention its principles.

86
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

87
17CS3115 DISTRIBUTED COMPUTING

LAB-10: IMPLEMENTATION OF JAVA PROGRAM ON SYNCHRONIZATION.

Date of the Session: ___/___/___ Time of the Session: _____to______

Pre-Lab:
1) What is the purpose of Synchronization in Distributed Systems?

2) Distinguish External and Internal Synchronization?

3) What are the algorithms implemented in Clock synchronization?

4) In what aspects Synchronization takes place?

88
17CS3115 DISTRIBUTED COMPUTING

5) How procedure call mode is like Cristian’s Algorithm?

89
17CS3115 DISTRIBUTED COMPUTING

In-Lab:
1. Implement Cristian's Algorithm for clock synchronization
Run the server first then run the client.
Constraints
Write two java programs one for server-side clock synchronization and the other for client-
side clock synchronization.
Run the server first then the client
Output:
Server
Enter the port no
2222
Server Started
Exiting
Client
Enter the port no
2222
Enter the host name
localhost
Client Started

Enter Exit to stop


Client Time: ----------
Server Time: ----------
Client Time after reset: -------

90
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

91
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

92
17CS3115 DISTRIBUTED COMPUTING

Post-Lab:
1) Consider 3 processes having 5 events each, where each event starts at zero. Now by
calculating the time stamp of each event. Implement a java program by displaying a graph
using LAMPORT Algorithm?

93
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

94
17CS3115 DISTRIBUTED COMPUTING

2) A client’s clock needs 5:40:00 The server’s clock needs 5:30:00 Now when they
synchronized using Cristian’s algorithm assume Message delays are negligible. What is the
time at the client after the synchronization?

3) A client’s clock needs 1:30:00 The server’s clock needs 1:20:00 Now when they
synchronized using Berkeley’s algorithm assume Message delays are negligible. What is the
time at the client after the synchronization?

95
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

96
17CS3115 DISTRIBUTED COMPUTING

LAB-11: IMPLEMENTATION OF JAVA PROGRAM ON VECTOR CLOCK’S.

Date of the Session: ___/___/___ Time of the Session: _____to______

Pre-lab:
1. How do vector clocks work?

2. Compare vector clock and LAMPORT” S logical clock.

3. Are the following pairs of Timestamps concurrent or casually related?


(2,3,4) and (1,4,4)
(1,2,1) and (0,1,1)
(3,4,1) and (0,0,2)

97
17CS3115 DISTRIBUTED COMPUTING

In-Lab:
1)Implement a program for generating a partial ordering of events in a distributed system
and detecting causality violations.

98
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

99
17CS3115 DISTRIBUTED COMPUTING

Post-Lab:
1)Why is Network Time Protocol important?

2)How does the Network Time Protocol work?

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

100
17CS3115 DISTRIBUTED COMPUTING

LAB-12: IMPLEMENTATION OF JAVA PROGRAM FOR THE FAULT FREE SYSTEM.

Date of the Session: ___/___/___ Time of the Session: _____to______

Pre-Lab:
1) Describe the kinds of problems that occur in distributed system also mention types of
failures and types of faults.

2) What is fault tolerance and mention two main reasons for the occurrence of a fault.

3) How do we achieve fault tolerance. Describe in brief.

101
17CS3115 DISTRIBUTED COMPUTING

4) Mention and describe different fault tolerance techniques.

5) Mention the phases of fault tolerance.

102
17CS3115 DISTRIBUTED COMPUTING

In-Lab:
1) Under the Byzantine failure, develop a java program for the fault free system which
executes a terminating program by sending out a finite sequence of integers
0,1,2,0,1,2,0,1,2. Specify a fault statement in the program (either using loops) where the
system changes the 2’s to 9’s.

Sample output: 0 1 2 0 1 2 0 1 2 0 1 9 0 1 9 0 1 9

103
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

104
17CS3115 DISTRIBUTED COMPUTING

2) Given a pseudo code for a soda machine:


program soda machine
define balance: integer

do if rupee --> balance: = balance-25;


If else Dollars --> balance: = balance-10;
If else euros --> balance: = balance-5;
Else balance <=0 --> dispense a can of soda;
balance: =50;

end
Due to a malfunction, the following fault action is activated in the soda machine:
F:
do
Balance=5 ---> balance: = 50
end
Modify the program for the soda machine such that it tolerates the specific failure
and customers are not affected at all.

105
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

106
17CS3115 DISTRIBUTED COMPUTING

Post-Lab:
1) What is the importance of implementing a fault tolerance system. Are there any
limitations?

2) What do you mean by fault coverage? Compute the fault coverage of a system where 7
faults remained undetected out of the total 23 faults in the system.

3) What is meant by self-stabilizing protocol? Mention its significance in practical problem.

107
17CS3115 DISTRIBUTED COMPUTING

Writing space of the Problem:(For Student’s use only)

(For Evaluator’s use only)

Comment of the Evaluator (if Any) Evaluator’s Observation


Marks Secured: _______ out of ________

Full Name of the Evaluator:

Signature of the Evaluator Date of Evaluation:

108
17CS3115 DISTRIBUTED COMPUTING

LAB-13: IMPLEMENTATION OF BASIC QUERIES IN HIVE.

Date of the Session: ___/___/___ Time of the Session:_____to______

Prerequisite:
 SQL
 Hadoop

Pre-lab:
1) What is hive?

2) Why does hive need Hadoop? justify?

3) How is HiveQL different from SQL?

4) What are the operations with which hive interacts with Hadoop framework?

109
17CS3115 DISTRIBUTED COMPUTING

5) Write all Hive Data Types

6) Mention the applications that are supported by Hive?

110
17CS3115 DISTRIBUTED COMPUTING

In-Lab:
To complete this lab, you will need the following:

 Hadoop Environment installed preferably in Linux Subsystem.


Lab Instructions
 Create files/directories wherever required to execute the following commands
 Make a document containing the screenshots of the output and upload them to
LMS.
Step Action

1 Boot up Linux Subsystem


 Make sure you choose Linux Subsystem while booting rather Windows.
2 User Login
 Login to user ‘klu’ with the credentials ‘klu123’
3 Terminal
 Open Gnome Terminal Window by searching in start-menu or can use the
shortcut ‘ctrl+alt+t’(In ubuntu distros’)
4 Login to hadoop
 klu@klu:~$ su – hadoop
 Enter the password ‘hadoop’ for logging in
5 Start the Hadoop components (daemons) on the Linux Subsystem.
 You can practice starting all components with these commands. Please note
they will take a few minutes to run:
 hadoop@klu:~$ start-all.sh
6 To Download the files required for this lab
 hadoop@klu:~$ wget <link>
The Link contains CSV file and README file (compressed in .tar.gz), the README file
contains the schema and other instructions.
7 Check the contents
 hadoop@klu:~$ head <filename>.csv
8 Create a directory in HDFS
 hadoop@klu:~$ hdfs dfs –mkdir <directory path and name>
9 Copy CSV to HDFS
 hadoop@klu:~$ hdfs dfs –put <csv file path><hdfs path>
10 Start the hive service
 hadoop@klu:~$hive
11 Create a database in hive with your name_KL ID
 Hive> CREATE DATABASE <name_KL ID>
 Ex : For the name John Doe and id no 1876
hive> CREATE DATABASE John_1876;
12 Select that databse
 hive> use John_1876;
11 Creating table in hive
 hive> CREATE TABLE <table name> (<schema>) row format delimited fields
terminated by ‘,’;

111
17CS3115 DISTRIBUTED COMPUTING

12 Load the data from HDFS to CSV


 hive> load data inpath '<hdfs csv filepath>' into table <table name>
13 Verify data
 hive> select * from <table name>;
14 To exit from CLI
 hive> quit;

You are working for a government agency as a Data Scientist. You have been supplied a dataset with
consumer complaints received by financial institutions in 2013-15

Your task is to upload the data into a Database (which was done in the previous step) and perform
some preliminary analysis:

Write the queries and perform the operations in hive CLI (Command Line Interface)

1. List all the databases in hive

2. List all the tables in that database

3. Count number of rows in the table

112
17CS3115 DISTRIBUTED COMPUTING

4. Display all the columns and datatypes

5. Insert a new entry in the table

6. Display distinct records in the table

7. Find out how many complaints were received and sent on the same day

8. Extract the complaints received in the states of New York

113
17CS3115 DISTRIBUTED COMPUTING

9. Extract the complaints received in the states of New York and California

10. Extract all rows with the word “Credit” in the Product field

11. Extract all rows with the word “Late” in the Issue field

114
17CS3115 DISTRIBUTED COMPUTING

Post-Lab:

1. Draw the hive 2 architecture and explain each component.

2. By default, which execution engine is used by hive? What are the other execution
engines available?

115
17CS3115 DISTRIBUTED COMPUTING

LAB-14: IMPLEMENTATION OF ADVANCED QUERIES IN HIVE

Date of the Session: ___/___/___ Time of the Session: _____to______

Prerequisite:
 SQL
 Hive
 Hadoop

Pre-lab:
You are an analytics consultant helping a console games company conduct market research.
Download the CSV files from the GitHub Repo.
ConsoleGames.csv - a historic list of all console games released between 1980 and 2015
ConsoleDates.csv - a historic list of all console platforms (such as Wii,Play Station, Xbox) and
information about them
You have been tasked to upload the dataset into hive Database and perform the following
analytics:
1. Calculate what % of Global Sales were made in North America

2. Extract a view of the console game titles ordered by platform name in Ascending order
and Year of release in descending order

116
17CS3115 DISTRIBUTED COMPUTING

3. For each game title extract the first four letters of the publisher's name

4. Display all console platforms which were released either just before Black Friday or just
before Christmas (in any year)

5. Order the platforms by their longevity in ascending order (i.e. the platform which was
available for the longest at the bottom)

117
17CS3115 DISTRIBUTED COMPUTING

In-Lab:
Lab Instructions
 Create files/directories wherever required to execute the following commands
 Make a document containing the screenshots of the output and upload them to
LMS.
Step Action

1 Boot up Linux Subsystem


 Make sure you choose Linux Subsystem while booting rather Windows.
2 User Login
 Login to user ‘klu’ with the credentials ‘klu123’
3 Terminal
 Open Gnome Terminal Window by searching in start-menu or can use the
shortcut ‘ctrl+alt+t’ (In ubuntu distros’)
4 Login to hadoop
 klu@klu:~$ su – hadoop
 Enter the password ‘hadoop’ for logging in
5 Start the Hadoop components (daemons) on the Linux Subsystem.
 You can practice starting all components with these commands. Please note
they will take a few minutes to run:
 hadoop@klu:~$ start-all.sh
6 To Download the files required for this lab
 hadoop@klu:~$ wget <link>
The Link contains CSV file and README file (compressed in .tar.gz), the README file
contains the schema and other instructions.
7 Check the contents
 hadoop@klu:~$ head <filename>.csv
8 Create a directory in HDFS
 hadoop@klu:~$ hdfs dfs –mkdir <directory path and name>
9 Copy CSV to HDFS
 hadoop@klu:~$ hdfs dfs –put <csv file path><hdfs path>
10 Start the hive service
 hadoop@klu:~$hive
11 Create a database in hive with your name_KL ID
 Hive> CREATE DATABASE <name_KL ID>
 Ex : For the name John Doe and id no 1876
hive> CREATE DATABASE John_1876;
12 Select that databse
 hive> use John_1876;
11 Creating table in hive
 hive> CREATE TABLE <table name> (<schema>) row format delimited fields
terminated by ‘,’;
12 Load the data from HDFS to CSV
 hive> load data inpath '<hdfs csv file path>' into table <table name>
13 Verify data

118
17CS3115 DISTRIBUTED COMPUTING

 hive> select * from <table name>;


14 To exit from CLI
 hive> quit;

You are a Data Analyst assisting a veterinarian clinic make sense of their data. Their data is
dispersed across multiple csv files and they need you to first upload all of them to database
and then perform the following analytics:
1. Extract information on pets’ names and owner names side-by-side

2. Find out which pets from this clinic had procedures performed

3. Match up all procedures performed to their descriptions

119
17CS3115 DISTRIBUTED COMPUTING

4. Same as above but only or pets from the clinic in question

5. Extract a table of individual costs (procedure prices) incurred by owners of pets from the
clinic in question (this table should have owner and procedure price side-by-side)

120
17CS3115 DISTRIBUTED COMPUTING

Post-lab:
Download the required files from the GitHub Repo.
You have been tasked to upload the dataset into hive Database and perform the following
analytics.
From the dataset given in GitHub Repo, write query for the following.
1. Identify which numbers made calls as well as made SMS messages, where number of calls
made should be more than 10 and number of messages should be more than 5.

2. What is the length of average SMS conversation? A length of 2 means that there were 2 SMS
messages exchanged the two parties.

3. What is the nth longest SMS conversation length?

121
17CS3115 DISTRIBUTED COMPUTING

4. What is the frequency of 'SMS', 'calls' by each day of week? What about by hour of day?

5. Find which users sent at least 10 more messages in the month of Jan than in the month of
Feb.

122
17CS3115 DISTRIBUTED COMPUTING

LAB-15: IMPLEMENTATION OF BASIC QUERIES IN H-BASE

Date of the Session: ___/___/___ Time of the Session: _____to______

Prerequisite:
 No-SQL
 Hadoop

Pre-lab:
1. What is a HBase?

2. What are the features of the HBase?

3. Explain the components in HBase Architecture?

123
17CS3115 DISTRIBUTED COMPUTING

4. Where we can use HBase?

5. Name three disadvantages HBase has as compared to RDBMS?

124
17CS3115 DISTRIBUTED COMPUTING

In-Lab:
To complete this lab, you will need the following:

 Hadoop Environment installed preferably in Linux Subsystem.


 HBase Installed and configured
Lab Instructions
 Create files/directories wherever required to execute the following commands
 Make a document containing the screenshots of the output and upload them to
LMS.
Step Action

1 Boot up Linux Subsystem


 Make sure you choose Linux Subsystem while booting rather Windows.

2 User Login
 Login to user ‘klu’ with the credentials ‘klu123’

3 Terminal
 Open Gnome Terminal Window by searching in start-menu or can use the
shortcut ‘ctrl+alt+t’ (In ubuntu distros’)

4 Login to hadoop
 klu@klu:~$ su – hadoop
 Enter the password ‘hadoop’ for logging in

5 Start the Hadoop components (daemons) on the Linux Subsystem.


 You can practice starting all components with these commands. Please note
they will take a few minutes to run:
 hadoop@klu:~$ start-all.sh

6 To Download the files required for this lab


 hadoop@klu:~$ wget <link>
The Link contains CSV file and README file (compressed in .tar.gz), the README file
contains the schema and other instructions.

7 Start HBase
 hadoop@klu:~$ hbase shell

8 Check status of Hbase Server


 hbase> status

125
17CS3115 DISTRIBUTED COMPUTING

Walmart store all data of customers, purchases and stock details in a single node database. Analytics
on that data takes long and there is a high risk of losing data. So, the CTO of Walmart implemented
HBase a No-SQL database which is built on top of Hadoop ecosystem.

Write queries for the following Operations

1. Create a relation consisting of customer name, id, bill

2. Insert customer data into the relation (minimum 10 fields)

3. Retrieve data of first row using get command?

126
17CS3115 DISTRIBUTED COMPUTING

4. Display complete data that are stored in the database?

5. Create a relation for list of products that are available in the store? (hint: relation consisting of
product name, cost, no-of-units)

6. Get the total number of products available in the store?

7. Use disable and enable statements of the created tables?

127
17CS3115 DISTRIBUTED COMPUTING

Post-Lab:
1. Define the difference between Hive and HBase?

2. What are the data manipulation commands of HBase?

3. Which command is used to show the current HBase user?

4. What are the filters are available in Apache HBase?

128

You might also like