0% found this document useful (0 votes)

56 views16 pages

Big Data File

1. The document describes setting up a Hadoop environment and labs for a Big Data Analytics course. 2. It outlines 7 programs to be completed including installing Hadoop in standalone, pseudo-distributed and fully distributed modes, running basic Linux commands, managing files in Hadoop, running a word count MapReduce program, and writing a MapReduce program to analyze weather data. 3. The tasks are to gain hands-on experience with Hadoop installation, configuration, and development of MapReduce programs.

Uploaded by

Arnav Shrivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views16 pages

Big Data File

Uploaded by

Arnav Shrivastava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

SCHOOL OF INFORMATION AND

COMMUNICATION TECHNOLOGY

BIG DATA ANALYTICS LAB

AI 381

NAME- ANUSHKA SRIVASTAVA

ROLL NO- 215/UAI/031
BRANCH- B.TECH AI
SEM-5th
INDEX

S.No Program Date Signature

1. Installation of VMWare to setup the
Hadoop environment and its ecosystems.

2. Perform setting up and Installing Hadoop in

its three operating modes. i. Standalone. ii.
Pseudo distributed. iii. Fully distributed.

3. Use web based tools to monitor your Hadoop

setup.

4. Implementing the basic commands of LINUX

Operating System – File/Directory creation,
deletion, update operations

5. Implement the following file management

tasks in Hadoop: i. Adding files and
directories ii. Retrieving files iii. Deleting files

6. Run a basic word count Map Reduce

program to understand Map Reduce
Paradigm

7. Write a Map Reduce program that mines

weather data

8. Matrix multiplication with Hadoop

MapReduce
1

1. Installation of VMWare to setup the Hadoop environment and its

ecosystems.

Steps-
Step 1 : Install VMWare Player before downloading the Hadoop
Step 2: Download the “Cloudera Setup File” from any of the above links and extract that
zipped file on your hard drive. Scroll down and select Accept
Step 3: Start VMPlayer and click open a Virtual Machine. Browse the extracted folder.
Login credentials: Machine Login credentials are: Username - admin Password - admin
Cloudera Manager Credentials are: Username - admin Password – admin
Step 4: Checking your Hadoop Cluster
● Type: sudo jps to see if all nodes are running (if you see an error like below, wait for
some time and then try again, your threads are not started yet)
● Type: sudo su hdfs
● Execute your command ie – hadoop dfs –ls /

Screenshot
2
3

2. Perform setting up and Installing Hadoop in its three operating modes. 1.

Standalone. 2. Pseudo distributed. 3. Fully distributed.

1)Standalone-
ALGORITHM
● Command for installing ssh is “sudo apt-get install ssh”.
● Command for key generation is ssh-keygen –t rsa –P “ ”.
● Store the key into rsa.pub by using the command cat $HOME/.ssh/id_rsa.pub >>
$HOME/.ssh/authorized_keys

● Extract the java by using the command tar xvfz

jdk-8u60-linux-i586.tar.gz.
● Extract the eclipse by using the command tar xvfz eclipse-jee-mars-R-linux-
gtk.tar.gz
● Extract the hadoop by using the command tar xvfz hadoop-2.7.1.tar.gz

2) Pseudo distributed-

ALGORITHM
● In order install pseudo distributed mode we need to configure the
hadoop configuration files resides in the directory
/home/lendi/hadoop-2.7.1/etc/hadoop.
● First configure the hadoop-env.sh file by changing the java path.
● Configure the core-site.xml which contains a property tag, it contains name
and value. Name as fs.defaultFS and value as hdfs://localhost:9000

● Configure hdfs-site.xml.

● Configure yarn-site.xml.

● Configure mapred-site.xml before configure the copy

mapred-site.xml.template to mapred-site.xml.

● Now format the name node by using command hdfs namenode

–format.
● Type the command start-dfs.sh,start-yarn.sh means that starts the
daemons like
4

● Run JPS which views all daemons. Create a directory in the hadoop by
using command hdfs dfs –mkdr /csedir and enter some data into lendi.txt
using command nano lendi.txt and copy from local directory to hadoop using
command hdfs dfs copyFromLocal lendi.txt /csedir/and run sample jar file
wordcount to check whether pseudo distributed mode is working or not.

● Display the contents of file by using command hdfs dfs –cat

/newdir/part-r-00000

3) Fully distributed-

ALGORITHM
● Stop all single node clusters $stop-all.sh

● Decide one as NameNode (Master) and remaining as

DataNodes(Slaves).
● Copy public key to all three hosts to get a password less SSH access
$ssh-copy-id –I $HOME/.ssh/id_rsa.pub lendi@l5sys24
● Configure all Configuration files, to name Master and Slave Nodes. $cd
$HADOOP_HOME/etc/hadoop $nano core-site.xml $ nano hdfs-site.xml
5

● Add hostnames to file slaves and save it. $ nano slaves

● Configure $ nano yarn-site.xml

● Do in Master Node $ hdfs namenode –format $ start-dfs.sh $start-yarn.sh

● Format NameNode

● Daemons Starting in Master and Slave Nodes

3. Use web based tools to monitor your Hadoop setup.

Introduction-
Hadoop set up can be managed by different web based tools, which can be easy for the
user to identify the running daemons. Few of the tools used in the real world are-

● Apache Ambari
● Horton Works
● Apache Spark
7

4. Implementing the basic commands of LINUX Operating System –

File/Directory creation, deletion, update operations.

File Operations-
● Creating a File:
Touch filename.txt

● Editing a File:
nano filename.txt

● Deleting a File:
rm filename.txt

Directory Operations-
● Creating a Directory:
mkdir directoryname

● Changing Directory:
cd directoryname
● Deleting a Directory:
rmdir directoryname
8

5. Implement the following file management tasks in Hadoop: 1. Adding files

and directories 2. Retrieving files 3. Deleting files.

1)Adding files and directories-

Before we run Hadoop programs on data stored in HDFS, we‘ll need to put the data
into HDFS first. Creating a directory and putting a file in it. HDFS has a default
working directory of /user/$USER, where $USER is our login user name. This
directory isn‘t automatically created for us, though, creating it with the mkdir
command. For the purpose of illustration, we use chuck. We should substitute our
user name in the example commands.
hadoop fs -mkdir /user/chuck hadoop
fs -put example.txt
hadoop fs -put example.txt /user/chuck

2)Retrieving files-

The Hadoop command get copies files from HDFS back to the local filesystem. To
retrieve example.txt, we can run the following command:
hadoop fs -cat example.txt

3)Fully distributed-

hadoop fs -rm example.txt

Command for creating a directory in hdfs is
“hdfs dfs –mkdir /lendicse”.
Adding directory is done through the command
“hdfs dfs –put lendi_english/”

OUTPUT-
9

6. Run a basic word count Map Reduce program to understand Map Reduce
Paradigm.

Prerequisites-
● Java Installation - Check whether the Java is installed or not using the following
command. java -version.
● Hadoop Installation - Check whether the Hadoop is installed or not using the
following command. hadoop version.
Steps-
Step-1 Write a Mapper
● A Mapper overrides the “map” function from the Class
"org.apache.hadoop.mapreduce.Mapper" which provides <key,value> pairs
as the input. A Mapper implementation may output <key,value> pairs using
the provided Context .

● Input value of the WordCount Map task will be a line of text from the input
data file and the key would be the line number <line_number, line_of_text>.
Map task outputs <word,one> for each word in the line of text.

Pseudo-code
void Map (key, value){
for each max_temp x in value:
output.collect(x, 1);
}
void Map (key, value){
for each min_temp x in value:
output.collect(x, 1);
}

Step-2 Write a Reducer

A Reducer collects the intermediate <key,value> output from multiple map tasks and
assemble a single result. Here, the WordCount program will sum up the occurrence of
each word to pairs as <word,occurrence>
Pseudo-code
void Reduce (keyword, ){
for each x in :
sum+=x;
final_output.collect(keyword, sum);
}
10

OUTPUT-
11
7. Write a Map Reduce program that mines weather data.

Steps-
Step-1. Write a Mapper
● A Mapper overrides the “map” function from the Class
"org.apache.hadoop.mapreduce.Mapper" which provides <key,value> pairs
as the input. A Mapper implementation may output <key,value> pairs using
the provided Context .

Pseudo-code
void Map (key, value){
for each max_temp x in value:
output.collect(x, 1);
}
void Map (key, value){
for each min_temp x in value:
output.collect(x, 1);
}

Step-2 Write a Reducer

A Reducer collects the intermediate output from multiple map tasks and
assembles a single result. Here, the WordCount program will sum up the
occurrence of each word to pairs as <word, occurrence>.
Pseudo-code
void Reduce (max_temp, ){
for each x in :
sum+=x;
final_output.collect(max_temp, sum);
}
void Reduce (min_temp, ){
for each x in :
sum+=x;
final_output.collect
(min_temp,sum);
}

Step-3 Write Driver

The Driver program configures and run the MapReduce job. We use the main
program to perform basic configurations such as:
● Job Name : name of this Job
● Executable (Jar) Class: the main executable class. For here, WordCount.
12
● Mapper Class: class which overrides the "map" function. For here, Map.
● Reducer: class which override the "reduce" function. For here , Reduce.
● Output Key: type of output key. For here, Text.
● Output Value: type of output value. For here, IntWritable
File Input Path
File Output Path

OUTPUT-
13

8. Program to implement the naïve Bayesian classifier for a sample training

data set stored as a .CSV file. Compute the accuracy of the classifier,
considering a few test data sets.

Steps-

● setup ()
● var NIB = (I-1)/IB+1
● var NKB = (K-1)/KB+1
● var NJB = (J-1)/JB+1
● map (key, value)
● if from matrix A with key=(i,k) and value=a(i,k)
● for 0 <= jb < NJB
● emit (i/IB, k/KB, jb, 0), (i mod IB, k mod KB, a(i,k))
● if from matrix B with key=(k,j) and value=b(k,j)
● for 0 <= ib < NIB emit (ib, k/KB, j/JB, 1), (k mod KB, j mod JB, b(k,j))
● Intermediate keys (ib, kb, jb, m) sort in increasing order first by ib, then by
kb, then by jb, then by m. Note that m = 0 for A data and m = 1 for B data.
● The partitioner maps intermediate key (ib, kb, jb, m) to a reducer r as
follows:
r = ((ib*JB + jb)*KB + kb) mod R
● These definitions for the sorting order and partitioner guarantee that each reducer
R[ib,kb,jb] receives the data it needs for blocks A[ib,kb] and B[kb,jb], with the data
for the A block immediately preceding the data for the B block.
● var A = new matrix of dimension IBxKB
● var B = new matrix of dimension KBxJB
● var sib = -1
● var skb = -1
● Reduce (key, valueList)
● if key is (ib, kb, jb, 0)
● sib = ib
● skb = kb
● Zero matrix A
● for each value = (i, k, v) in valueList A(i,k) = v
● if key is (ib, kb, jb, 1)
● if ib != sib or kb != skb return // A[ib,kb] must be zero!
● Zero matrix B
● for each value = (k, j, v) in valueList B(k,j) = v
● ibase = ib*IB
● jbase = jb*JB
● for 0 <= i < row dimension of A
● for 0 <= j < column dimension of B
● sum = 0
● for 0 <= k < column dimension of A = row dimension of B a. sum +=
● A(i,k)*B(k,j)
● if sum != 0 emit (ibase+i, jbase+j), sum
14

OUTPUT-

Big Datalab
No ratings yet
Big Datalab
4 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
42 pages
Big Data Analytics lab-JD
No ratings yet
Big Data Analytics lab-JD
49 pages
Bi Lab File
No ratings yet
Bi Lab File
19 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
CCS334-BDA LAB MANUAL Final
No ratings yet
CCS334-BDA LAB MANUAL Final
46 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
Big Data
No ratings yet
Big Data
28 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
BigData Lab Manual
No ratings yet
BigData Lab Manual
44 pages
Big Data Record
No ratings yet
Big Data Record
69 pages
Lab Manual
No ratings yet
Lab Manual
34 pages
Hadoop Lab Practical Guide
No ratings yet
Hadoop Lab Practical Guide
69 pages
Hadoop Installation & MapReduce Guide
No ratings yet
Hadoop Installation & MapReduce Guide
13 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
34 pages
Hadoop Setup Guide for Developers
No ratings yet
Hadoop Setup Guide for Developers
7 pages
Hadoop Setup & File Management Guide
No ratings yet
Hadoop Setup & File Management Guide
16 pages
A Report On Distributed Computing
No ratings yet
A Report On Distributed Computing
25 pages
Bdafile
No ratings yet
Bdafile
9 pages
Data Analytics Lab
No ratings yet
Data Analytics Lab
42 pages
Exp 1-2
No ratings yet
Exp 1-2
9 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Ccs334-Bda Lab Manual
No ratings yet
Ccs334-Bda Lab Manual
48 pages
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
No ratings yet
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
61 pages
Bda Manual
No ratings yet
Bda Manual
33 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
Bda File
No ratings yet
Bda File
28 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
Ccs334 Bda Lab Ex
No ratings yet
Ccs334 Bda Lab Ex
45 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
Bda Lab Manuel
No ratings yet
Bda Lab Manuel
9 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
Lab Manual
No ratings yet
Lab Manual
27 pages
BDA-Lab Record
No ratings yet
BDA-Lab Record
43 pages
Data Science
No ratings yet
Data Science
82 pages
Hadoop Administrator Training - Lab Hand Book
No ratings yet
Hadoop Administrator Training - Lab Hand Book
12 pages
BDA Manual
No ratings yet
BDA Manual
41 pages
Bda Lab
No ratings yet
Bda Lab
47 pages
Ccs334 Bda Lab Manual PRINT
No ratings yet
Ccs334 Bda Lab Manual PRINT
53 pages
Hadoop & HDFS Installation Guide
No ratings yet
Hadoop & HDFS Installation Guide
54 pages
Bigdata Manual Final
No ratings yet
Bigdata Manual Final
66 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
33 pages
Big Data
No ratings yet
Big Data
23 pages
1.mrplab Intro
No ratings yet
1.mrplab Intro
18 pages
Ex 1
No ratings yet
Ex 1
5 pages
Lsde Workshop wk9
No ratings yet
Lsde Workshop wk9
31 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
45 pages
Bda Lab Manual Print 3.6.24
No ratings yet
Bda Lab Manual Print 3.6.24
45 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
Exp 5 - 9
No ratings yet
Exp 5 - 9
25 pages
Big Data Lab Manual Printout
No ratings yet
Big Data Lab Manual Printout
51 pages
BDA Lab Manual UPDATED
No ratings yet
BDA Lab Manual UPDATED
45 pages
Hadoop Configuration
No ratings yet
Hadoop Configuration
12 pages
HADOOP AND BIG DATA - Final
No ratings yet
HADOOP AND BIG DATA - Final
26 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
49 pages
Hadoop Setup Guide for Linux Users
No ratings yet
Hadoop Setup Guide for Linux Users
23 pages
Computer Vision Detailed
No ratings yet
Computer Vision Detailed
7 pages
Reinforcement Learning Detailed
No ratings yet
Reinforcement Learning Detailed
7 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
1 page
Tariff Reference Document 13 March 2019
No ratings yet
Tariff Reference Document 13 March 2019
1,478 pages
OS Comparison for Students
No ratings yet
OS Comparison for Students
1 page
Red Hat Satellite-6.3-Hammer CLI Guide-en-US
No ratings yet
Red Hat Satellite-6.3-Hammer CLI Guide-en-US
331 pages
IoT Infrastructure Automation Tools
No ratings yet
IoT Infrastructure Automation Tools
60 pages
Guía de Linux Mint
No ratings yet
Guía de Linux Mint
48 pages
Ubuntu Vs Kali
No ratings yet
Ubuntu Vs Kali
7 pages
CS2106 Cheatsheet MT
No ratings yet
CS2106 Cheatsheet MT
5 pages
Bda Unit 5
No ratings yet
Bda Unit 5
17 pages
PostgreSQL Essentials v16 Student
100% (1)
PostgreSQL Essentials v16 Student
462 pages
AWS DevOps and CI/CD Essentials
No ratings yet
AWS DevOps and CI/CD Essentials
93 pages
Linux File System - Ext2 Vs Ext3 Vs Ext4 Vs XFS
No ratings yet
Linux File System - Ext2 Vs Ext3 Vs Ext4 Vs XFS
2 pages
PM Debug Info
No ratings yet
PM Debug Info
78 pages
R22 MCA Syllabus - OPERATING SYSTEMS
No ratings yet
R22 MCA Syllabus - OPERATING SYSTEMS
1 page
Lakshmi Chandrika Attuluri Resume
No ratings yet
Lakshmi Chandrika Attuluri Resume
2 pages
DXGD
No ratings yet
DXGD
8 pages
Lokesh Kumar K Contact Number 9108583151 Contact Number 8050007035 Email
No ratings yet
Lokesh Kumar K Contact Number 9108583151 Contact Number 8050007035 Email
2 pages
Learn Docker in A Month of Lunches 1st Edition Elton Stoneman Digital Version 2025
100% (1)
Learn Docker in A Month of Lunches 1st Edition Elton Stoneman Digital Version 2025
95 pages
Dumpstate
No ratings yet
Dumpstate
1,147 pages
Copy and Move Command
No ratings yet
Copy and Move Command
3 pages
Internship Report: Scientific Calculator
0% (1)
Internship Report: Scientific Calculator
34 pages
DO405 RHS6.1 en 1 20151204 Slides
No ratings yet
DO405 RHS6.1 en 1 20151204 Slides
138 pages
Linux Commands
No ratings yet
Linux Commands
23 pages
Metanorma Available For macOS As Homebrew Formula - Metanorma
No ratings yet
Metanorma Available For macOS As Homebrew Formula - Metanorma
1 page
Top 20 Git Commands With Examples - DZone DevOps
100% (1)
Top 20 Git Commands With Examples - DZone DevOps
9 pages
Text 1260841491251219
No ratings yet
Text 1260841491251219
21 pages
C Programming Lab Manual
No ratings yet
C Programming Lab Manual
42 pages
Lab Manual's
No ratings yet
Lab Manual's
121 pages
The Woodnotes Guide To Emacs For Writers
No ratings yet
The Woodnotes Guide To Emacs For Writers
22 pages
Git GitHub Interview Questions
No ratings yet
Git GitHub Interview Questions
3 pages
Mike Chirico
No ratings yet
Mike Chirico
240 pages
VTU Mca Syllabus
No ratings yet
VTU Mca Syllabus
103 pages

Big Data File

Uploaded by

Big Data File

Uploaded by

SCHOOL OF INFORMATION AND

BIG DATA ANALYTICS LAB

NAME- ANUSHKA SRIVASTAVA

S.No Program Date Signature

2. Perform setting up and Installing Hadoop in

3. Use web based tools to monitor your Hadoop

4. Implementing the basic commands of LINUX

5. Implement the following file management

6. Run a basic word count Map Reduce

7. Write a Map Reduce program that mines

8. Matrix multiplication with Hadoop

1. Installation of VMWare to setup the Hadoop environment and its

2. Perform setting up and Installing Hadoop in its three operating modes. 1.

● Extract the java by using the command tar xvfz

● Configure mapred-site.xml before configure the copy

● Now format the name node by using command hdfs namenode

● Display the contents of file by using command hdfs dfs –cat

● Decide one as NameNode (Master) and remaining as

● Add hostnames to file slaves and save it. $ nano slaves

● Configure $ nano yarn-site.xml

● Do in Master Node $ hdfs namenode –format $ start-dfs.sh $start-yarn.sh

● Daemons Starting in Master and Slave Nodes

3. Use web based tools to monitor your Hadoop setup.

4. Implementing the basic commands of LINUX Operating System –

5. Implement the following file management tasks in Hadoop: 1. Adding files

1)Adding files and directories-

hadoop fs -rm example.txt

Step-2 Write a Reducer

Step-2 Write a Reducer

Step-3 Write Driver

8. Program to implement the naïve Bayesian classifier for a sample training

You might also like