0% found this document useful (0 votes)

32 views26 pages

Big Data Hadoop Lab File

The document outlines the installation processes for VirtualBox and Hadoop as part of a Big Data and Hadoop lab course. It details the benefits of virtualization with VirtualBox, including running multiple operating systems and easier software installations, followed by step-by-step instructions for installing both VirtualBox and Hadoop. Additionally, it explains the components of Hadoop and its advantages, such as fault tolerance and cost-effectiveness.

Uploaded by

jayshukla700

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views26 pages

Big Data Hadoop Lab File

Uploaded by

jayshukla700

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Shri Vaishnav Vidhyapeeth Vishwavidhyalaya

Shri Vaishnav Institute of Information Technology

Department of Information Technology

Year (2022-23)
Subject Name with Code:-
BIG DATA AND HADOOP LAB
(BTCS-702)
4thYear (Semester VII)

SUBMITTED TO :- SUBMITTED BY:-

Prof. Anand Gadwal JAY

PARTH SHARMA 1
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Experiment-01
Objective:- Installation process of VirtualBox.
Introduction of VirtualBox:-
VirtualBox is a powerful x86 and AMD64/Intel64 virtualization product for
enterprise as well as home use. Not only is VirtualBox an extremely feature
rich, high performance product for enterprise customers, it is also the only
professional solution that is freely available as Open Source Software under the
terms of the GNU General Public License (GPL) version 2. VirtualBox is a
general-purpose full virtualizer for x86 hardware, targeted at server, desktop and
embedded use.
Why is Virtualization useful:-

The techniques and features that Oracle VM VirtualBox provides are useful in
the following scenarios:-

1. Running multiple operating systems simultaneously:- Oracle VM

VirtualBox enables you to run more than one OS at a time. This way, you
can run software written for one OS on another, such as Windows
software on Linux or a Mac, without having to reboot to use it. Since you
can configure what kinds of virtual hardware should be presented to each
such OS, you can install an old OS such as DOS or OS/2 even if your real
computer's hardware is no longer supported by that OS.
2. Easier software installations:- Software vendors can use virtual
machines to ship entire software configurations. For example, installing a
complete mail server solution on a real machine can be a tedious task.
With Oracle VM VirtualBox, such a complex setup, often called an
appliance, can be packed into a virtual machine. Installing and running a
mail server becomes as easy as importing such an appliance into Oracle
VM VirtualBox.
3. Testing and disaster recovery:- Once installed, a virtual machine and its
virtual hard disks can be considered a container that can be arbitrarily
frozen, copied, backed up, and transported between hosts. On top of that,
with the use of another Oracle VM VirtualBox feature called snapshots,

PARTH SHARMA 2
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

one can save a particular state of a virtual machine and revert back to that
state, if necessary. This way, one can freely experiment with a computing
environment. If something goes wrong, such as problems after installing

PARTH SHARMA 3
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

software or infecting the guest with a virus, you can easily switch back to a
previous snapshot and avoid the need of frequent backups and restores. Any
number of snapshots can be created, allowing you to travel back and forward
in virtual machine time. You can delete snapshots while a VM is running to
reclaim disk space.
4. Infrastructure consolidation:- Virtualization can significantly reduce
hardware and electricity costs. Most of the time, computers today only
use a fraction of their potential power and run with low average system
loads. A lot of hardware resources as well as electricity is thereby wasted.
So, instead of running many such physical computers that are only
partially used, one can pack many virtual machines onto a few powerful
hosts and balance the loads between them.

Steps of Installation of VirtualBox:-

Step 1:- To download VirtualBox, click on the following:-
link https://www.virtualbox.org/wiki/Downloads Now, depending on your OS,
select which version to install. In our case, it will be the first one (Windows
host).

Step 2:- Once the option is selected, click on “Next”.

PARTH SHARMA 4
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Step 3:- You have the option asking where to install the application. We can
leave it as default and click on “Next”.

PARTH SHARMA 5
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Step 4:- Once the options are selected as shown in the following screenshot,
click on Next.

PARTH SHARMA 6
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Step 5:- A dialog box will come up asking whether to proceed with the
installation. Click “Yes”.

PARTH SHARMA 7
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Step 6:- In the next step, click on “Install”.

Step 7:- Tick the start VirtualBox check box and click on “Finish”.

PARTH SHARMA 8
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Step 8:- VirtualBox application will now open as shown in the following
screenshot. Now, we are ready to install the virtual machines.

PARTH SHARMA 9
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Installation of VirtualBox in Windows 10:-

Step 1:- Download the ISO file which is required to install Ubuntu on our
virtual box. Click on the first link.

Step 2:- Then click on the download tab to install the latest version that is
available.

PARTH SHARMA 10
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Step 3:- After that thank you page will available and the ISO file of ubuntu
will be installed in our system.

PARTH SHARMA 11
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Step 4:- Next click on the virtual box and click on the new button and provide
the name of the machine. Then click on next.

Step 5:- Then provide the memory size then click on next.

PARTH SHARMA 12
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Step 6:- Then select the file location and the size and click on create.

Step 7:- Then new machine will be visible which is ubuntu 20.04.

PARTH SHARMA 13
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Step 8:- Then select the machine and open settings and select advanced in
general and make shared clipboard and drag n drop bidirectional which allow us
to share files between our host machine i.e. windows and our virtual machine
i.e. ubuntu.

PARTH SHARMA 14
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Step 9:- Then select the ISO file and location then click on OK.

That’s how a virtual machine is created in virtual box.

PARTH SHARMA 15
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Experiment-02
Objective:- Installation process of Hadoop.
Introduction of Hadoop:-
Hadoop is an open source framework by Apache that is used to efficiently store
and process large datasets ranging in size from gigabytes to petabytes of data.
Instead of using one large computer to store and process the data, Hadoop
allows clustering multiple computers to analyse massive datasets in parallel
more quickly. Hadoop consists of four main modules:

● Hadoop Distributed File System (HDFS) – A distributed file system

that runs on standard or low-end hardware. HDFS provides better
data throughput than traditional file systems, in addition to high fault
tolerance and native support of large datasets.

● Yet Another Resource Negotiator (YARN) – Manages and monitors

cluster nodes and resource usage. It schedules jobs and tasks.

● MapReduce – A framework that helps programs do the parallel

computation on data. The map task takes input data and converts it
into a dataset that can be computed in key value pairs. The output of
the map task is consumed by reduce tasks to aggregate output and
provide the desired result.

● Hadoop Common – Provides common Java libraries that can be used

across all modules.

Why is Hadoop useful :-

● Open Source:- Hadoop is open source which means it is free to use. The
source code is available online for anyone to understand it or make some
modifications as per the industry requirement.
● Fault Tolerance:- Hadoop uses inexpensive system which can be
crashed at any moment in Hadoop data is replicated on various data notes

PARTH SHARMA 16
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

in Hadoop clusters which ensures the availability of data if somehow any

of our system got crashed.
● High Availability:- Due to fault tolerance in case if any of the data notes
goes down the same data can be retrieved from other node where the data
is replicated.
● Cost Effective:- Runs on low cost commodity hardware.
● Easy to use:- Hadoop is easy to use since the developers need not to
worry about of any processing work since it is managed by Hadoop itself.

Steps of Installation of Hadoop:-

Step 1:- Check java version through this command on command prompt.

Step 2:- Download Hadoop version from this link:-

https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.4/hadoop-3.
2.4-src.tar.gz
And extract it to a folder.

PARTH SHARMA 17
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Step 3:- Setup system environment variable:-

Open control panel to edit the system environment variable.
Create a new user variable. Put the variable_name as HADOOP_HOME and
variable_value as the path of the bin folder where you extracted Hadoop.
Similarly create a new user variable with variable name as JAVA_HOME and
variable value as the path of the bin folder in the java directory.

PARTH SHARMA 18
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Set Hadoop bin directory and java bin directory path in system variable path.
Edit Path in system variable bin directory path in system variable path. Edit
Path in system variable.

Click on New and add the bin directory path of Hadoop and Java in it.

PARTH SHARMA 19
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Step 4:- Configurations:-

Edit some files located in the Hadoop directory of the etc folder where we
installed Hadoop. The files that need to be edited core-site, Hadoop-env,
hdfs-site, mapred-site, yarn-site.

PARTH SHARMA 20
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

1. Edit the file core-site.xml in the Hadoop directory. Copy this xml property in
the configuration in the file.
/span>configuration>
/span>property>
/span>name>fs.defaultFS/span>/name>
/span>value>hdfs://localhost:9000</value>
/span>/property>
/span>/configuration>

2. Edit mapred-site.xml and copy this property in the cofiguration.

/span>configuration>
/span>property>
/span>name>mapreduce.framework.name/span>/name>
/span>value>yarn/span>/value>
/span>/property>
/span>/configuration>
Step 4:- Create a folder ‘data’ in the Hadoop directory.

PARTH SHARMA 21
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Step 5:-Create a folder with the name ‘datanode’ and a folder ‘namenode’ in
this data directory.

PARTH SHARMA 22
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

4. Edit the file hdfs-site.xml and add below property in the configuration.
Note: The path of namenode and datanode across value would be the path of the
datanode and namenode folders you just created.
/span>configuration>
/span>property>
/span>name>dfs.replication/span>/name>
/span>value>1/span>/value>
/span>/property>
/span>property>
/span>name>dfs.namenode.name.dir/span>/name>
/span>value>C:\Users\hp\Downloads\hadoop-3.1.0\hadoop-3.1.0\data\namenod
e/span>/value>
/span>/property>
/span>property>
/span>name>dfs.datanode.data.dir/span>/name>
/span>value>
C:\Users\hp\Downloads\hadoop-3.1.0\hadoop-3.1.0\data\datanode/span>/value>
/span>/property>
/span>/configuration>
5. Edit the file yarn-site.xml and add below property in the configuration.
/span>configuration>
/span>property>
/span>name>yarn.nodemanager.aux-services/span>/name>
/span>value>mapreduce_shuffle/span>/value>
/span>/property>
/span>property>
/span>name>yarn.nodemanager.auxservices.mapreduce.shuffle.class/span>/nam
e>
/span>value>org.apache.hadoop.mapred.ShuffleHandler/span>/value>
/span>/property>
/span>/configuration>

Step 6:- Edit hadoop-env.cmd and replace %JAVA_HOME% with the path of
the java folder where jdk 1.8 is installed.

PARTH SHARMA 23
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Step 7:- Hadoop needs windows OS specific files which does not come with
default download of Hadoop. To include those files, replace the bin folder in
Hadoop directory with the bin folder provided in this github link.
https://github.com/s911415/apache-hadoop-3.1.0-winutils
Download it as zip file. Extract it and copy the bin folder in it.

PARTH SHARMA 24
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

Step 8:- Check whether Hadoop is successfully installed by running this

command on cmd-
hadoop version

Since it doesn’t throw error and successfully shows the Hadoop version, that
means Hadoop is successfully installed in the system.

PARTH SHARMA 25
BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

PARTH SHARMA 26

Ubuntu & Hadoop Setup Guide
No ratings yet
Ubuntu & Hadoop Setup Guide
30 pages
Business Intelligence & Big Data Analytics-CSE3124Y
No ratings yet
Business Intelligence & Big Data Analytics-CSE3124Y
14 pages
Big Data Setup Guide for Students
No ratings yet
Big Data Setup Guide for Students
29 pages
Big Data Lab Manual and Syllabus
No ratings yet
Big Data Lab Manual and Syllabus
71 pages
Hortonworks Sandbox (HDP 2.2.4) Installation Guide: Big Data and Hadoop For Beginners
100% (1)
Hortonworks Sandbox (HDP 2.2.4) Installation Guide: Big Data and Hadoop For Beginners
13 pages
Ba Lab Record-It b2022-26
No ratings yet
Ba Lab Record-It b2022-26
43 pages
Hadoop Setup Guide for Beginners
No ratings yet
Hadoop Setup Guide for Beginners
113 pages
Icc 1-8 Exp
No ratings yet
Icc 1-8 Exp
28 pages
Lab Submission 4023
No ratings yet
Lab Submission 4023
19 pages
Big Data-1 Starting A VM
No ratings yet
Big Data-1 Starting A VM
21 pages
CL NetworkAdmin AE Pro2of2 Appx Prep
No ratings yet
CL NetworkAdmin AE Pro2of2 Appx Prep
75 pages
Expt 2
No ratings yet
Expt 2
12 pages
ITVAA0-22-Formative assesment-EduVOS BEDFRODVIEW 22
No ratings yet
ITVAA0-22-Formative assesment-EduVOS BEDFRODVIEW 22
16 pages
Assign 1
No ratings yet
Assign 1
19 pages
BDA LAB FILE Final 18EGICS110
No ratings yet
BDA LAB FILE Final 18EGICS110
54 pages
Big Data Apache Spark123
No ratings yet
Big Data Apache Spark123
121 pages
CC Record
No ratings yet
CC Record
59 pages
A Step-By-Step Approach On Installing Hadoop in Vmware Workstation
No ratings yet
A Step-By-Step Approach On Installing Hadoop in Vmware Workstation
9 pages
Big Data Security 20100BTCSDSI07268
No ratings yet
Big Data Security 20100BTCSDSI07268
76 pages
Hadoop Installaion 3.4.1 - 24-04-2025
No ratings yet
Hadoop Installaion 3.4.1 - 24-04-2025
68 pages
02 Haddop Biginsights
No ratings yet
02 Haddop Biginsights
36 pages
Cloud Computing Lab Manual
No ratings yet
Cloud Computing Lab Manual
30 pages
BDA LabManual
No ratings yet
BDA LabManual
32 pages
HDP Admin 1 Download VM SetupGuide
No ratings yet
HDP Admin 1 Download VM SetupGuide
8 pages
Big Data Analytics Lab Guide
No ratings yet
Big Data Analytics Lab Guide
44 pages
Industrial Training Report
No ratings yet
Industrial Training Report
8 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
45 pages
Bda M2
No ratings yet
Bda M2
60 pages
Bda 2
No ratings yet
Bda 2
25 pages
CS8711-Cloud Computing Lab Manual
No ratings yet
CS8711-Cloud Computing Lab Manual
95 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
37 pages
ROSPlan VIRTUAL MACHINE SETUP GUIDE
No ratings yet
ROSPlan VIRTUAL MACHINE SETUP GUIDE
7 pages
ccs335 Manual
No ratings yet
ccs335 Manual
85 pages
Big Data Course Guide - IIMCal
No ratings yet
Big Data Course Guide - IIMCal
131 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
56 pages
CC Lab Manual
No ratings yet
CC Lab Manual
52 pages
CC Lab-I To 6 - Experiments
No ratings yet
CC Lab-I To 6 - Experiments
35 pages
Cloud Project
No ratings yet
Cloud Project
32 pages
Microsoft Visual Studio - Team Foundation Server 2013 PDF
No ratings yet
Microsoft Visual Studio - Team Foundation Server 2013 PDF
16 pages
Gpu Virtualisation
No ratings yet
Gpu Virtualisation
14 pages
Big Data Lab
No ratings yet
Big Data Lab
159 pages
VirtualBox & Ubuntu Setup Guide
No ratings yet
VirtualBox & Ubuntu Setup Guide
24 pages
Virtualizing Your Datacenter: With Windows Server 2012 R2 & System Center 2012 R2 Hands-On Lab - Step-by-Step Guide
No ratings yet
Virtualizing Your Datacenter: With Windows Server 2012 R2 & System Center 2012 R2 Hands-On Lab - Step-by-Step Guide
160 pages
Hadoop Installaion
No ratings yet
Hadoop Installaion
113 pages
Computer Science & Engineering: Department of
No ratings yet
Computer Science & Engineering: Department of
6 pages
Lab 12.2 Installing VirtualBox
No ratings yet
Lab 12.2 Installing VirtualBox
3 pages
PR 03
No ratings yet
PR 03
8 pages
Doc-20240501-Wa0000 240501 192042
No ratings yet
Doc-20240501-Wa0000 240501 192042
44 pages
BDC Output 1
No ratings yet
BDC Output 1
9 pages
Cloudcomputing Labmanual Sai
No ratings yet
Cloudcomputing Labmanual Sai
102 pages
Batch 2020 2024 OE Syllabus 175 Credits 1
No ratings yet
Batch 2020 2024 OE Syllabus 175 Credits 1
48 pages
b03 CCL 3 (1) - Compressed
No ratings yet
b03 CCL 3 (1) - Compressed
37 pages
Unit 4 Unit 4 Bda
No ratings yet
Unit 4 Unit 4 Bda
16 pages
Vrinda Act01
No ratings yet
Vrinda Act01
11 pages
Cloud Final Vivi
No ratings yet
Cloud Final Vivi
39 pages
CC Worksheet 1 Him
No ratings yet
CC Worksheet 1 Him
4 pages
CC pr1
No ratings yet
CC pr1
11 pages
Presentation 3
No ratings yet
Presentation 3
11 pages
New Virtual JNC - SD Agent For Virtual Systems - Instruction Manual - 2018 10 - Iomvjncxxx03 en - en
No ratings yet
New Virtual JNC - SD Agent For Virtual Systems - Instruction Manual - 2018 10 - Iomvjncxxx03 en - en
28 pages
Cloudera Kafka
No ratings yet
Cloudera Kafka
175 pages
Unit 4 Lec 1 Cloud Computing
No ratings yet
Unit 4 Lec 1 Cloud Computing
37 pages
Veeam Backup
No ratings yet
Veeam Backup
5 pages
VMware Virtualization Guide
No ratings yet
VMware Virtualization Guide
150 pages
OPS239001 NCE-T Solution Overview ISSUE19
No ratings yet
OPS239001 NCE-T Solution Overview ISSUE19
83 pages
Andromeda: Network Stack Virtualization
No ratings yet
Andromeda: Network Stack Virtualization
16 pages
Unit 1
No ratings yet
Unit 1
83 pages
VMware NSX - Networking Fundamentals HOL-2540-01-VCF-L
No ratings yet
VMware NSX - Networking Fundamentals HOL-2540-01-VCF-L
255 pages
Full Honeypot Breakdown
No ratings yet
Full Honeypot Breakdown
27 pages
Integrated Loan Solution for Banks
No ratings yet
Integrated Loan Solution for Banks
18 pages
Windows Server 2019 PDF
No ratings yet
Windows Server 2019 PDF
2 pages
V Sphere Apis For Performance Monitoring
No ratings yet
V Sphere Apis For Performance Monitoring
92 pages
IBM Lotus Domino On VMware
No ratings yet
IBM Lotus Domino On VMware
6 pages
Steps Overview - Azure - VPC
No ratings yet
Steps Overview - Azure - VPC
9 pages
Cadence Machine Fingerprint
No ratings yet
Cadence Machine Fingerprint
15 pages
IBM Cloud Advocate Review PDF
No ratings yet
IBM Cloud Advocate Review PDF
67 pages
Anti VM
No ratings yet
Anti VM
26 pages
Docker and Kubernetes Fundamentals: Hands-On Training
No ratings yet
Docker and Kubernetes Fundamentals: Hands-On Training
18 pages
Getting Started Guide
No ratings yet
Getting Started Guide
102 pages
DeepSecurity 9.6SP1 CertProf Slides FINAL 28jan2016 Fab
No ratings yet
DeepSecurity 9.6SP1 CertProf Slides FINAL 28jan2016 Fab
481 pages
Cloud Computing University Exams
No ratings yet
Cloud Computing University Exams
14 pages
VMWare + EMC Storage PDF
No ratings yet
VMWare + EMC Storage PDF
102 pages
TR 3848 RH l6 KVM
No ratings yet
TR 3848 RH l6 KVM
25 pages
JVM Internals Series
No ratings yet
JVM Internals Series
1 page
Virtualization Exam for 4 ARCTIC Students
100% (1)
Virtualization Exam for 4 ARCTIC Students
5 pages
Cloud Lab Report 10
No ratings yet
Cloud Lab Report 10
26 pages
Lab - CTF - Basic Pentesting
No ratings yet
Lab - CTF - Basic Pentesting
12 pages
Business Intelligence Applications: Lab Guide
No ratings yet
Business Intelligence Applications: Lab Guide
105 pages
Urban Waste Management Challenges
No ratings yet
Urban Waste Management Challenges
6 pages

Big Data Hadoop Lab File

Uploaded by

Big Data Hadoop Lab File

Uploaded by

BIG DATA AND HADOOP (BTCS-702) 19100BTIT06586

​ Shri Vaishnav Vidhyapeeth Vishwavidhyalaya

Shri Vaishnav Institute of Information Technology

Department of Information Technology

SUBMITTED TO :- SUBMITTED BY:-

1.​ Running multiple operating systems simultaneously:- Oracle VM

Steps of Installation of VirtualBox:-

Step 2:- Once the option is selected, click on “Next”.

Step 6:- In the next step, click on “Install”.

Installation of VirtualBox in Windows 10:-

That’s how a virtual machine is created in virtual box.

●​ Hadoop Distributed File System (HDFS) – A distributed file system

●​ Yet Another Resource Negotiator (YARN) – Manages and monitors

●​ MapReduce – A framework that helps programs do the parallel

●​ Hadoop Common – Provides common Java libraries that can be used

Why is Hadoop useful :-

in Hadoop clusters which ensures the availability of data if somehow any

Steps of Installation of Hadoop:-

Step 2:- Download Hadoop version from this link:-

Step 3:- Setup system environment variable:-

Step 4:- Configurations:-

2. Edit mapred-site.xml and copy this property in the cofiguration.

Step 8:- Check whether Hadoop is successfully installed by running this

You might also like

Shri Vaishnav Vidhyapeeth Vishwavidhyalaya

1. Running multiple operating systems simultaneously:- Oracle VM

● Hadoop Distributed File System (HDFS) – A distributed file system

● Yet Another Resource Negotiator (YARN) – Manages and monitors

● MapReduce – A framework that helps programs do the parallel

● Hadoop Common – Provides common Java libraries that can be used