0% found this document useful (0 votes)

9 views8 pages

Experiment 1 Copy 1

The document outlines two experiments involving Apache Hadoop: the first focuses on installing Hadoop on Ubuntu, highlighting its benefits such as compatibility, resource management, and security. The second experiment involves developing a MapReduce program to calculate word frequency in a text file, demonstrating Hadoop's capabilities in distributed computing. Both experiments emphasize the efficiency and scalability of Hadoop for big data processing tasks.

Uploaded by

maneetrathee98

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views8 pages

Experiment 1 Copy 1

Uploaded by

maneetrathee98

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

0961771922 KINJAL BANSAL

Experiment 1
Aim: Install Apache Hadoop.
Theory:

Apache Hadoop is a powerful framework designed for the distributed storage and processing of large
datasets using clusters of commodity hardware. Its integration with Ubuntu provides a stable and efficient
environment for big data processing. Here’s how Hadoop relates to Ubuntu:

1. Installation – Apache Hadoop can be installed on Ubuntu by downloading the official Hadoop
binaries or using package managers like apt. Ubuntu’s robust package management system
simplifies the setup of Hadoop clusters.
2. Compatibility – Hadoop runs smoothly on Ubuntu servers and desktops without major
compatibility issues, allowing users to leverage Ubuntu’s stability and efficiency for big data
processing.
3. Resource Management – Ubuntu offers various tools for managing system resources, which is
crucial when running Hadoop clusters. Proper resource management ensures optimal performance
and efficient utilization of cluster resources.
4. Security – Ubuntu provides strong security features, including firewall configurations, user
permissions, and encryption, which help secure Hadoop clusters and protect stored and processed
data.
5. Maintenance – With its regular updates and long-term support (LTS) releases, Ubuntu ensures
the stability and security of Hadoop clusters over extended periods. Updates can be easily applied
to both Ubuntu and Hadoop components for seamless operation.
6. Community Support – Both Hadoop and Ubuntu have active communities that offer extensive
documentation, troubleshooting resources, and support, making it easier for users to resolve
issues and stay updated with new developments.

By using Ubuntu as the operating system for Hadoop, users can take advantage of its reliability, security,
and ease of maintenance to build scalable and efficient big data processing systems.

Steps:

1. Install Hadoop in the virtual machine.

0961771922 KINJAL BANSAL

2. Unzip Hadoop
0961771922 KINJAL BANSAL

3. Setup environment variables

4. Download Java SE Development Kit

0961771922 KINJAL BANSAL

5. Configure hadoop

6. Check Hadoop version.

Learning Outcome
0961771922 KINJAL BANSAL

Experiment 2
Aim: Develop a mapreduce program to calculate the frequency of a given word in a given file.
Theory

This experiment focuses on developing a MapReduce program using Apache Hadoop to efficiently
compute the frequency of a specified word within a given text file. MapReduce is a programming model
designed for processing large datasets in a distributed manner. By utilizing Hadoop's parallel processing
capabilities, this approach enables efficient text analysis and provides insights into word frequency.

Word frequency analysis is a key task in natural language processing (NLP), involving the determination
of how often each word appears in a document or corpus. Apache Hadoop offers a scalable framework for
distributed computing, making it well-suited for parallel processing tasks like MapReduce. This
experiment employs the MapReduce paradigm to distribute the workload across multiple nodes in a
Hadoop cluster, significantly enhancing the speed and efficiency of text data analysis.

The primary objective of this experiment is to develop and implement a MapReduce program that
calculates the frequency of a given word in a text file. By distributing computations across multiple
nodes, this experiment highlights the scalability and efficiency of Hadoop in handling large-scale text
processing tasks.

Ultimately, the experiment showcases the effectiveness of Apache Hadoop and the MapReduce
framework in processing big data. By leveraging distributed computing, Hadoop enables efficient and
scalable word frequency analysis, making it applicable to various domains requiring large-scale data
processing.

Steps
Check Hadoop version

Create folder named wordcount

0961771922 KINJAL BANSAL

Install JDK

Code of WordCount.java file

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
import java.util.StringTokenizer;

public class WordCount {

public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context) throws IOException, InterruptedException
{
StringTokenizer itr = new StringTokenizer(value.toString());
0961771922 KINJAL BANSAL

while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,
InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Compile java program

Create jar file

Copy the input file to Hadoop's HDFS:

0961771922 KINJAL BANSAL

Run the MapReduce Job

Check the output directory and Retrieve and display the word count:

Learning Output

Hadoop MapReduce Lab Guide
No ratings yet
Hadoop MapReduce Lab Guide
24 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
59 pages
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
BDF Programs
No ratings yet
BDF Programs
32 pages
DSBDA GRP B Print
No ratings yet
DSBDA GRP B Print
21 pages
Bda Lab S
No ratings yet
Bda Lab S
92 pages
Bda Exp1 Chinmay
No ratings yet
Bda Exp1 Chinmay
13 pages
Big Data Analytics Lab Manual (BE AI&DS)
No ratings yet
Big Data Analytics Lab Manual (BE AI&DS)
29 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
Ex No 04
No ratings yet
Ex No 04
4 pages
Practical 2c
No ratings yet
Practical 2c
2 pages
Mapreduce Program
No ratings yet
Mapreduce Program
3 pages
Hadoop Administrator Training - Lab Hand Book
No ratings yet
Hadoop Administrator Training - Lab Hand Book
12 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
58 pages
PART 1 - Install Java and Hadoop On Ubuntu
No ratings yet
PART 1 - Install Java and Hadoop On Ubuntu
4 pages
Big Data Lab Manual Printout
No ratings yet
Big Data Lab Manual Printout
51 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
BDC Output 3
No ratings yet
BDC Output 3
4 pages
BDAPract 4
No ratings yet
BDAPract 4
5 pages
Palak
No ratings yet
Palak
10 pages
Bda Lab Manual - Cse 8 Sem - Compl
No ratings yet
Bda Lab Manual - Cse 8 Sem - Compl
57 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
Lab3 BigData-MapReduce
No ratings yet
Lab3 BigData-MapReduce
8 pages
Big Data Lab Guide for CS Students
No ratings yet
Big Data Lab Guide for CS Students
53 pages
Sanjith BDA 2
No ratings yet
Sanjith BDA 2
4 pages
Exp 11
No ratings yet
Exp 11
4 pages
CSF443 Lab-Report Nimish Shandilya 1000016934
No ratings yet
CSF443 Lab-Report Nimish Shandilya 1000016934
17 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Hadoop Mini Project
No ratings yet
Hadoop Mini Project
8 pages
BDA Lab 8 Manual
No ratings yet
BDA Lab 8 Manual
7 pages
Aji Bda2 Final
No ratings yet
Aji Bda2 Final
4 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Sanoob BDA - 2
No ratings yet
Sanoob BDA - 2
4 pages
Big Data Akshat
No ratings yet
Big Data Akshat
57 pages
Hadoop MapReduce WordCount Guide
No ratings yet
Hadoop MapReduce WordCount Guide
5 pages
Bda Exp2 Chinmay
No ratings yet
Bda Exp2 Chinmay
7 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
6 WIBD-Practicals
No ratings yet
6 WIBD-Practicals
19 pages
B1 Instructions
No ratings yet
B1 Instructions
9 pages
Assignment 2
No ratings yet
Assignment 2
7 pages
Week 2 de Unedited
No ratings yet
Week 2 de Unedited
13 pages
BDA Manual
No ratings yet
BDA Manual
41 pages
BDA Exp (1 To 7)
No ratings yet
BDA Exp (1 To 7)
22 pages
32 BDA Exp3
No ratings yet
32 BDA Exp3
11 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
Lab-1-Steps-Word Count Problem-Hadoop
No ratings yet
Lab-1-Steps-Word Count Problem-Hadoop
6 pages
Sanoob BDA 1 S Merged
No ratings yet
Sanoob BDA 1 S Merged
8 pages
CTBD Ex02
No ratings yet
CTBD Ex02
3 pages
Run Wordcount
No ratings yet
Run Wordcount
3 pages
MapReduce Programs
No ratings yet
MapReduce Programs
10 pages
Hadoop Installation & MapReduce Guide
No ratings yet
Hadoop Installation & MapReduce Guide
13 pages
Word Count Program
No ratings yet
Word Count Program
3 pages
Practical-2 Aim: Write A Program of Word Count in Map Reduce Over HDFS. Description
No ratings yet
Practical-2 Aim: Write A Program of Word Count in Map Reduce Over HDFS. Description
6 pages
Java WordCount with Hadoop Guide
No ratings yet
Java WordCount with Hadoop Guide
6 pages
2 - Hadoop MapReduce
No ratings yet
2 - Hadoop MapReduce
2 pages
Bda Megh
No ratings yet
Bda Megh
50 pages
Big Data Practical 2
No ratings yet
Big Data Practical 2
11 pages
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
No ratings yet
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
9 pages
Java
No ratings yet
Java
20 pages
Experiment 1: Cse-B Azad
No ratings yet
Experiment 1: Cse-B Azad
16 pages
GATE 2026 Study Plan
No ratings yet
GATE 2026 Study Plan
2 pages
Files 1 2020 October NotesHubDocument 1603567679
No ratings yet
Files 1 2020 October NotesHubDocument 1603567679
34 pages
Transformers
No ratings yet
Transformers
13 pages
School of Engineering E116 - Computer Programming Worksheet For Problem 01: Data Types
No ratings yet
School of Engineering E116 - Computer Programming Worksheet For Problem 01: Data Types
13 pages
Dashboard Web Technology CSS 3 With Bootstrap Test Your Understanding
50% (2)
Dashboard Web Technology CSS 3 With Bootstrap Test Your Understanding
5 pages
Wireless ROS Communication Guide
No ratings yet
Wireless ROS Communication Guide
4 pages
STI College Baguio Course Manager Mobile Application Proposal
No ratings yet
STI College Baguio Course Manager Mobile Application Proposal
12 pages
DS - Unit - 2
No ratings yet
DS - Unit - 2
129 pages
ABAP Package Interfaces Guide
No ratings yet
ABAP Package Interfaces Guide
4 pages
Testing Concepts
No ratings yet
Testing Concepts
45 pages
Intership Report Sample
No ratings yet
Intership Report Sample
16 pages
Scripting Languages & JavaScript Guide
No ratings yet
Scripting Languages & JavaScript Guide
7 pages
CPB 2010
No ratings yet
CPB 2010
83 pages
JavaScript Basics and DOM Guide
No ratings yet
JavaScript Basics and DOM Guide
39 pages
Advanced Guide To Python 3 Programming, 2nd 2nd Edition John Huntdownload
100% (4)
Advanced Guide To Python 3 Programming, 2nd 2nd Edition John Huntdownload
57 pages
Temp Anr 5081872112394973443
100% (1)
Temp Anr 5081872112394973443
101 pages
Diploma in Computer Engineering: Annexure-1 Part A Micro Project Report On
No ratings yet
Diploma in Computer Engineering: Annexure-1 Part A Micro Project Report On
16 pages
Static Variable and Method
No ratings yet
Static Variable and Method
9 pages
Spreadsheet Modelling
No ratings yet
Spreadsheet Modelling
1 page
Lecture 4 Control Structures - Repetitive Statements Loops 14032023 102037am
No ratings yet
Lecture 4 Control Structures - Repetitive Statements Loops 14032023 102037am
25 pages
Syllabus Web Devlopment
No ratings yet
Syllabus Web Devlopment
3 pages
JAVA LAB Manual 3 RD Sem Vtu
No ratings yet
JAVA LAB Manual 3 RD Sem Vtu
35 pages
Detailed Exceptions in
No ratings yet
Detailed Exceptions in
53 pages
Blockchain Developer & Tech Enthusiast Resume
No ratings yet
Blockchain Developer & Tech Enthusiast Resume
1 page
PowerPoint Presentation Python
100% (3)
PowerPoint Presentation Python
17 pages
01 Laboratory Exercise 15
No ratings yet
01 Laboratory Exercise 15
3 pages
(Tutorial) HTML5 Games
No ratings yet
(Tutorial) HTML5 Games
44 pages
JSF: Java Web App Framework Guide
No ratings yet
JSF: Java Web App Framework Guide
251 pages
Object Oriented Programming: Dr. R Lawrance Director / MCA / ANJAC
No ratings yet
Object Oriented Programming: Dr. R Lawrance Director / MCA / ANJAC
77 pages
SAP R/3 ABAP Workbench Concept and Tool
100% (1)
SAP R/3 ABAP Workbench Concept and Tool
247 pages
Quality Engineer Assessment
No ratings yet
Quality Engineer Assessment
4 pages
User Requirement Document Guide
No ratings yet
User Requirement Document Guide
5 pages
Python If Statements & While Loops Guide
No ratings yet
Python If Statements & While Loops Guide
2 pages

Experiment 1 Copy 1

Uploaded by

Experiment 1 Copy 1

Uploaded by

0961771922 KINJAL BANSAL

1. Install Hadoop in the virtual machine.

3. Setup environment variables

4. Download Java SE Development Kit

6. Check Hadoop version.

Create folder named wordcount

Code of WordCount.java file

public class WordCount {

public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

public static void main(String[] args) throws Exception {

Compile java program

Create jar file

Copy the input file to Hadoop's HDFS:

Run the MapReduce Job

You might also like