0% found this document useful (0 votes)

5 views24 pages

Lecture 4 Introduction To Hadoop

The document provides an overview of Hadoop, an open-source tool for processing and analyzing Big Data, including its history, architecture, and functionality. It explains how Hadoop works through a distributed system using HDFS and MapReduce, detailing the roles of NameNode and DataNode. Additionally, it covers basic commands for interacting with HDFS and highlights the importance of fault-tolerance in data storage.

Uploaded by

Ravi Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views24 pages

Lecture 4 Introduction To Hadoop

Uploaded by

Ravi Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Processing Big Data

With Hadoop Map

Reduce Technology

By
Dr. Aditya Bhardwaj

aditya.bhardwaj@bennett.edu.in

Big Data Analytics and Business Intelligence (CSET 371)

Session 1. History and Introduction to Hadoop

1.1. Introduction to Hadoop

1.1.2 History of Hadoop

1.1.3 How does Hadoop

Works?
1.1.1 Introduction to Hadoop

• Hadoop is an open source tool to handle, process and

analyze Big Data.

• Hadoop platform provides an improved programming

model, which is used to create and run distributed
systems quickly and efficiently.

• It is licensed under the Apache, so also called Apache-

Hadoop

3/24
Who Uses Hadoop ?

08/20/2025 4
1.1.2 History of Hadoop
• In 2003, Google developed the Distributed File System (DFS) called Google
File System GFS to provide efficient, reliable access to data using large
cluster of master slave architecture where the data is stored and executed
on distributed data nodes servers.

• Although, GFS was a great discovery but this file system was
commercialized.

• In 2005, Doug Cutting an employee at Yahoo developed an open

source implementation of GFS for search engine project and that
open-source project was named as Apache Hadoop.

• With Open Source we have freedom to run the program,

freedom to modify the source code and the freedom to
redistribute its exact copies
Keyword ‘Hadoop’ Search on Google

6/24
Inventor of Hadoop
• Doug named it as Hadoop just by seeing his son’s toy elephant
name

08/20/2025 7
How Big Data Is Used In Amazon Recommendation
Systems

https://youtu.be/S4RL6prqtGQ

8
While Developing Hadoop Two Major Concerns for the Team of Doug Cutt

How to stores large size files from terabytes to petabytes

across different terminals i.e. storage framework was
required ?.

How to facilitates the processing of large amount of data

present in structured and unstructured format i.e. parallel
processing framework was required ?.

 Small Concern: How to provide fault-tolerance to

Hadoop architecture.

9/24
Functional Architecture of Hadoop
• The core component of Hadoop includes HDFS and MapReduce.
How does Hadoop Works?
Working of Hadoop can be described using five steps:
Step-1: Input data is broken into small chunks called as
blocks size 64 Mb or 128 Mb and these blocks are stored
distributed on different nodes in the cluster using HDFS
mechanism thus enabling highly efficient parallel
processing. Each block is replicated as per the
replication factor (By default 3). (Implemented in hdfs-
site.xml )

Step-2: Once all the blocks of the data are stored on

data-nodes, user can process the data.

11/
How does Hadoop Works? (contd..)
Step-3: JobTracker receives the requests for MapReduce
execution from the client.

Step-4: TaskTracker runs on DataNode and it actually

implement the map reduce algorithm for the data
stored in data nodes.

Step-5: Once all the nodes process the data, output is

written back on HDFS.

12/
Session2. Understanding HDFS
Syllabus to be covered with this topic:
 2.1.1. Introduction to HDFS

 2.1.2 HDFS Architecture (Using Read and

Write Operations)
 2.1.3 Hadoop Distribution and basic

Commands

 2.1.4 HDFS Command Line and Web

Interface
2.1.1 High Level Architecture of Hadoop Multimode
Cluster

14/
Why Distributed System DFS Preform Well !!

TO READ 1 TB OF DATA

1 MACHIENE 10 MACHIENES

4 I/O Channels 4 I/O Channels

Each Channel – 100 MB/s Each Channel – 100 MB/s

45 MINUTES 4.5 minutes

2.1.1 Introduction to HDFS
 The file in HDFS is split into large block size of 64 to
128 MB by default and each block of the file is
independently replicated at multiple data nodes.

 HDFS provides fault-tolerance by replicating the data on

three nodes: two on the same rack and one on a different
rack. NameNode implement this functionality and it
actively monitors the information regarding replicas of a
block. By default replica factor is 3

HDFS has a master-slave architecture, which comprises

of NameNode and number of DataNodes.

16/
HDFS Components:
There are two major components of Hadoop HDFS- NameNode and
DataNode

i. NameNode
It is also known as Master node.
NameNode does not store actual data or dataset. NameNode stores
Metadata i.e. number of blocks, their location, on which Rack, which
Datanode the data is stored and other details. It consists of files and
directories.

Tasks of HDFS NameNode

Manage file system namespace.
Regulates client’s access to files.
Executes file system execution such as naming,
closing, opening files and directories.
17/
2.1.1 HDFS (contd..)
DataNode: It is also known as Slave.
• HDFS Datanode is responsible for storing actual data in HDFS.
• Datanode performs read and write operation as per the request of
the clients.

 HDFS provides fault-tolerance by replicating the data on three

nodes: two on the same rack and one on a different rack.
NameNode implement this functionality and it actively monitors
the information regarding replicas of a block. By default replica
factor is 3

 HDFS has a master-slave architecture, which comprises of

NameNode and DataNodes.

18/
2.1.2 HDFS Architecture (Read-Write
• Operations)
This figure demonstrate how to read a file store at datanode in HDFS architecture.
2.1.2 HDFS Architecture (contd..)
• This figure demonstrate how to read a file store at datanode in HDFS architecture.
2.1.6 HDFS Basic Commands
Command Description
hdfs dfs version HDFS version

mkdir <path> creates directories.

hdfs dfs –ls path displays a list of the contents of a

directory specified by path
provided by the user, showing the
names, permissions, owner, size
and modification date for each
entry.
put copies the file or directory from
the local file system to the
destination within the DFS.

21/
2.1.6 HDFS Basic Commands
Command Description
hdfs dfs –copyFromLocal This hadoop shell command is
<localSrc> <dest> similar to put command, but the
source is restricted to a local file
reference.
hdfs dfs –cat This Hadoop shell command
displays the contents of the
filename on console or stdout.
hdfs dfs –mv source destination Used for moving files from source
to destination

hdfs dfs –chmod Used for changing the file

permission.

22/
2.1.7 HDFS Command Line and Web Interface
Step 1. Install Hadoop cluster setup
Step-2. There is fs.default.name property in core-site.xml
file set to hdfs://localhost/, which is used to set a default
Hadoop distributed file system.

The HDFS file system is accessed by user applications

with the help of the HDFS client. It is a library that
reveals the interface of the HDFS file system and hides
almost all the complexities that appear during the
implementation of HDFS.

23/
Thanks Note

24
tungal/presentations/ad2012

Lecture 4 Introduction To Hadoop
No ratings yet
Lecture 4 Introduction To Hadoop
25 pages
Big Data Lecture # 05
No ratings yet
Big Data Lecture # 05
22 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
56 pages
10th August Morning and Afternoon Session Hadoop
No ratings yet
10th August Morning and Afternoon Session Hadoop
18 pages
Unit 3
No ratings yet
Unit 3
5 pages
2-Hadoop History Terminologies DFS-03-01-2025
No ratings yet
2-Hadoop History Terminologies DFS-03-01-2025
52 pages
HDFS Overview for Tech Professionals
No ratings yet
HDFS Overview for Tech Professionals
88 pages
Unit-2 Introduction To Hadoop
No ratings yet
Unit-2 Introduction To Hadoop
19 pages
Big Data Unit-2 PPT Part1
No ratings yet
Big Data Unit-2 PPT Part1
76 pages
HDFS
No ratings yet
HDFS
11 pages
BDT - Unit - II - Hdfs and Hadoop Io
No ratings yet
BDT - Unit - II - Hdfs and Hadoop Io
42 pages
Apex Institute of Technology: Big Data Security
No ratings yet
Apex Institute of Technology: Big Data Security
30 pages
Unit-Iv CC&BD CS71
No ratings yet
Unit-Iv CC&BD CS71
148 pages
Big Data Analytics Syllabus
No ratings yet
Big Data Analytics Syllabus
169 pages
Big Data Unit 3 by Multi Atoms
No ratings yet
Big Data Unit 3 by Multi Atoms
6 pages
Lecture-1 - 3 Hadoop - HDFS - Mapreduce (Self Study)
No ratings yet
Lecture-1 - 3 Hadoop - HDFS - Mapreduce (Self Study)
25 pages
Hadoop Basics for Engineering Students
No ratings yet
Hadoop Basics for Engineering Students
18 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
5 pages
Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
248 pages
BDA Module 2 - Notes PDF
No ratings yet
BDA Module 2 - Notes PDF
101 pages
Introduction To Hadoop and MapReduce Programming
No ratings yet
Introduction To Hadoop and MapReduce Programming
29 pages
Hadoop Basics and HDFS Overview
No ratings yet
Hadoop Basics and HDFS Overview
126 pages
BDA UNIT-2dhhhhbv
No ratings yet
BDA UNIT-2dhhhhbv
23 pages
Big Data-UNIT-2
No ratings yet
Big Data-UNIT-2
46 pages
Exp1 Bda
No ratings yet
Exp1 Bda
11 pages
Fbda Unit-3
No ratings yet
Fbda Unit-3
27 pages
HDFS 79
No ratings yet
HDFS 79
74 pages
Hadoop Configuration Guide
No ratings yet
Hadoop Configuration Guide
22 pages
Bda Unit-Iv
No ratings yet
Bda Unit-Iv
37 pages
Business Intelligence & Big Data Analytics-CSE3124Y
No ratings yet
Business Intelligence & Big Data Analytics-CSE3124Y
26 pages
10 Dfs
No ratings yet
10 Dfs
5 pages
Bda Unit 2
No ratings yet
Bda Unit 2
79 pages
Unit 4
No ratings yet
Unit 4
36 pages
Lecture 2
No ratings yet
Lecture 2
28 pages
Big Data Unit-3
No ratings yet
Big Data Unit-3
46 pages
HDFS Internals for Developers
No ratings yet
HDFS Internals for Developers
30 pages
Big Data Aktu Unit 3
No ratings yet
Big Data Aktu Unit 3
90 pages
Lab2 BD
No ratings yet
Lab2 BD
20 pages
Bigdata Unit 3
No ratings yet
Bigdata Unit 3
96 pages
BIG DATA - Unit 4 HADOOP AND MAP REDUCE - Mini Xerox - Easy Read
No ratings yet
BIG DATA - Unit 4 HADOOP AND MAP REDUCE - Mini Xerox - Easy Read
16 pages
Big Data Ia Answers
No ratings yet
Big Data Ia Answers
14 pages
Hadoop Framework & HDFS Overview
No ratings yet
Hadoop Framework & HDFS Overview
10 pages
Hdfs Architecture
No ratings yet
Hdfs Architecture
16 pages
Unit IV
No ratings yet
Unit IV
248 pages
Unit-1 Introduction To Big Data
No ratings yet
Unit-1 Introduction To Big Data
38 pages
BDS Session 6
No ratings yet
BDS Session 6
78 pages
Module - 2
No ratings yet
Module - 2
84 pages
Hadoop Presentaton
No ratings yet
Hadoop Presentaton
47 pages
Prepared By: Manoj Kumar Joshi & Vikas Sawhney
No ratings yet
Prepared By: Manoj Kumar Joshi & Vikas Sawhney
47 pages
Bigdta Unit 3
No ratings yet
Bigdta Unit 3
65 pages
Hadoop
No ratings yet
Hadoop
9 pages
HDFS
No ratings yet
HDFS
8 pages
Notes - 3 Unit Neha
No ratings yet
Notes - 3 Unit Neha
25 pages
Unit - 2
No ratings yet
Unit - 2
42 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
84 pages
Bda 3
No ratings yet
Bda 3
70 pages
Unit 2
No ratings yet
Unit 2
56 pages
Introduction To Hadoop - Chapter-2
No ratings yet
Introduction To Hadoop - Chapter-2
59 pages
2017 Year End Data Breach QuickView Report
No ratings yet
2017 Year End Data Breach QuickView Report
19 pages
Office Libre 14
No ratings yet
Office Libre 14
10 pages
HR Database Management System
No ratings yet
HR Database Management System
1 page
GS2200M IP2WiFi Adapter Command Reference Rev 1.0b
No ratings yet
GS2200M IP2WiFi Adapter Command Reference Rev 1.0b
203 pages
PRG510S LAB 05: Mark: 70 Gender: Female Age: 21
100% (1)
PRG510S LAB 05: Mark: 70 Gender: Female Age: 21
4 pages
Safety-Critical Systems Risk Guide
No ratings yet
Safety-Critical Systems Risk Guide
4 pages
Restore Windows 10 Computer Using System Image
No ratings yet
Restore Windows 10 Computer Using System Image
2 pages
An Introduction To ESPRIT Post Processor
No ratings yet
An Introduction To ESPRIT Post Processor
22 pages
DIH 1011 AdministratorGuide en
No ratings yet
DIH 1011 AdministratorGuide en
114 pages
A Briefly Introduction To Test-Driven Development
No ratings yet
A Briefly Introduction To Test-Driven Development
3 pages
Professional Planner On Primavera
No ratings yet
Professional Planner On Primavera
50 pages
Lesson 9
No ratings yet
Lesson 9
17 pages
Upwork Projects Dart
No ratings yet
Upwork Projects Dart
13 pages
HewJunwei Resume
No ratings yet
HewJunwei Resume
1 page
Chapter 1
No ratings yet
Chapter 1
3 pages
Java and Havi: Topics in This Chapter
No ratings yet
Java and Havi: Topics in This Chapter
30 pages
Retail Device Branding List
No ratings yet
Retail Device Branding List
778 pages
Kris R. Carillo Resume
No ratings yet
Kris R. Carillo Resume
1 page
Security Specialist Demo
No ratings yet
Security Specialist Demo
4 pages
Instrumentation Insights for Students
No ratings yet
Instrumentation Insights for Students
5 pages
EN Piovan Winfactory-4.0 DSC 70 00
No ratings yet
EN Piovan Winfactory-4.0 DSC 70 00
2 pages
Wimlib Imagex Delete
No ratings yet
Wimlib Imagex Delete
1 page
Stack Data Structure Guide
No ratings yet
Stack Data Structure Guide
46 pages
Gidis Project
No ratings yet
Gidis Project
36 pages
Tiger Software Suite 3.0 Guide
No ratings yet
Tiger Software Suite 3.0 Guide
19 pages
SEPM - Report Final
No ratings yet
SEPM - Report Final
23 pages
Q4 ICT Reviewer
No ratings yet
Q4 ICT Reviewer
3 pages
AI in Japanese Quince Breeding
No ratings yet
AI in Japanese Quince Breeding
1 page
Prinect Package Designer Installation
No ratings yet
Prinect Package Designer Installation
24 pages
Digial Image Processing - Lecture-1
No ratings yet
Digial Image Processing - Lecture-1
22 pages

Lecture 4 Introduction To Hadoop

Uploaded by

Lecture 4 Introduction To Hadoop

Uploaded by

Processing Big Data

With Hadoop Map

Big Data Analytics and Business Intelligence (CSET 371)

1.1. Introduction to Hadoop

1.1.2 History of Hadoop

1.1.3 How does Hadoop

• Hadoop is an open source tool to handle, process and

• Hadoop platform provides an improved programming

• It is licensed under the Apache, so also called Apache-

• In 2005, Doug Cutting an employee at Yahoo developed an open

• With Open Source we have freedom to run the program,

How to stores large size files from terabytes to petabytes

How to facilitates the processing of large amount of data

 Small Concern: How to provide fault-tolerance to

Step-2: Once all the blocks of the data are stored on

Step-4: TaskTracker runs on DataNode and it actually

Step-5: Once all the nodes process the data, output is

 2.1.2 HDFS Architecture (Using Read and

 2.1.4 HDFS Command Line and Web

4 I/O Channels 4 I/O Channels

45 MINUTES 4.5 minutes

 HDFS provides fault-tolerance by replicating the data on

HDFS has a master-slave architecture, which comprises

Tasks of HDFS NameNode

 HDFS provides fault-tolerance by replicating the data on three

 HDFS has a master-slave architecture, which comprises of

mkdir <path> creates directories.

hdfs dfs –ls path displays a list of the contents of a

hdfs dfs –chmod Used for changing the file

The HDFS file system is accessed by user applications

You might also like