Address
:
[go:
up one dir
,
main page
]
Include Form
Remove Scripts
Session Cookies
Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
269 views
24 pages
BDACh 02 L01 Hadoop
Uploaded by
mkarveer
AI-enhanced title
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF, TXT or read online on Scribd
Download
Save
Save BDACh02L01Hadoop For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
0 ratings
0% found this document useful (0 votes)
269 views
24 pages
BDACh 02 L01 Hadoop
Uploaded by
mkarveer
AI-enhanced title
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF, TXT or read online on Scribd
Carousel Previous
Carousel Next
Download
Save
Save BDACh02L01Hadoop For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Download now
Download
You are on page 1
/ 24
Search
Fullscreen
Lesson 1
Hadoop
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 1
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Big Data Programming Model
• Distributed pieces of codes as well as
the data at the computing nodes
• Distributed data storage systems do not
use the concept of joins
• Hadoop provides that model
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 2
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Big Data Distributed Computing
Model in Hadoop
• Distributed model which requires no
sharing between data nodes
• Multiple tasks of an application also
distribute, run using machines
associated with multiple data nodes
and execute at the same time in
parallel.
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 3
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Big Data Storage Model in
Hadoop
• Data partitions into data blocks and
written at one set of nodes
• The blocks replicate at multiple nodes
to take care of possibilities of network
faults; (When a network fault occurs,
then replicated node makes the data
available)
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 4
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Big Data Computing Model
• Fault tolerant due to replication
• Follows CAP theorem─ out of three
properties (consistency, availability
and partitions), two must at least be
present
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 5
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop
• Hadoop consisted of two components:
data store in blocks in the clusters and
the other is computations at each
individual cluster in parallel with
another.
• Hadoop system uses the Big Data
programming and storage models
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 6
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop
• Jobs or tasks assigned and scheduled
on the same servers which hold the
data
• The system provides faster results
from Big Data and from unstructured
data as well
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 7
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop Infrastructure
• Execution of instructions in two
interrelated entities, such as a query
and the database
• Cloud for clusters
• A cluster consists of sets of computers
or PCs
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 8
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop Platform
• Provides a low cost Big Data
platform, which is open source and
uses cloud services
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 9
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop
• Tera Bytes of data processing takes
just few minutes
• Hadoop enables distributed processing
of large datasets (above 10 million
bytes) across clusters of computers
using a programming model called
MapReduce.
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 10
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop System Characteristics
• Scalable
• Self-manageable
• Self-healing
• Distributed file system
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 11
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Scalability
• Means can be scaled up (enhanced) by
adding storage and processing units as
per the requirements failure.
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 12
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Self Manageability
• Means creation of storage and
processing resources which are used,
scheduled and reduced or increased
with the help of the system itself
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 13
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Self Healing
• Means taken care of by the system
itself in case of faults
• Enables functioning and resources
availability
• Software detect and handle failures at
the task level and also Software
enable the task execution on
communication failure.
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 14
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop Hardware Need
• The hardware scales up from a single
server to thousands of machines that
store the clusters
• Each cluster stores a large number of
data blocks in racks. Default data
block size is 64 MB.
• IBM BigInsights, built on Hadoop
deploys default 128 MB block size. of
data.
2019 “Big Data Analytics “, Ch.02 L01: Introduction To Hadoop 15
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Big Data analytics applications
• Software applications that leverage
large-scale data
• The applications analyze Big Data
using massive parallel processing
frameworks
• Hadoop provides that framework
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 16
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop Framework
• Provides the computing features of a
system of distributed, flexible,
scalable, fault tolerant computing with
high computing power
• Provides an efficient platform for the
distributed storage and processing of a
large amount
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 17
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop Big Data storage and
cluster computing
• Manages both, large-sized structured
and unstructured data in different
formats, such as XML, JSON and text
with efficiency and effectiveness
• Performs better with clusters of many
servers when the focus is on
horizontal scalability
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 18
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Figure 2.1 Core components of Hadoop
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 19
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Hadoop
• Open Source Framework
• Java and Linux based: Hadoop uses Java
interfaces
• Base is Linux but has its own set of shell
commands support
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 20
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Figure 2.2 Hadoop main components and
ecosystem components
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 21
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Summary
We learnt
• Hadoop Distributed model with pieces
of codes as well as the data at the
computing nodes which requires no
sharing between data nodes
• Hadoop multiple tasks distribution,
running using machines associated,
execute at the same time in parallel
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 22
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
Summary
We learnt
• Partitionability
• Replication of Data
• Java, Linux based, Hadoop Shell
Command Codes
• Hadoop Core Components and
Ecosystem Tools
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 23
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
End of Lesson 1 on
Hadoop
“Big Data Analytics “, Ch.02 L01: Introduction To Hadoop
2019 24
Raj Kamal and Preeti Saxena, © McGraw-Hill Higher Edu. India
You might also like
Gpon Ma5800
PDF
100% (3)
Gpon Ma5800
75 pages
QGIS 3.10 UserGuide Fi
PDF
No ratings yet
QGIS 3.10 UserGuide Fi
1,165 pages
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
DAN Lab ManuaL
PDF
No ratings yet
DAN Lab ManuaL
53 pages
Cloud Computing Lab Manual-New
PDF
No ratings yet
Cloud Computing Lab Manual-New
150 pages
Data Warehousing Full
PDF
No ratings yet
Data Warehousing Full
41 pages
BDA Presentations Unit-4 - Hadoop, Ecosystem
PDF
100% (1)
BDA Presentations Unit-4 - Hadoop, Ecosystem
25 pages
CC Unit-5
PDF
No ratings yet
CC Unit-5
19 pages
UNIX PROGRAMMING Tie
PDF
No ratings yet
UNIX PROGRAMMING Tie
60 pages
Experiment 3 Module 1
PDF
No ratings yet
Experiment 3 Module 1
6 pages
Cloud Computing Unit-1 Notes
PDF
No ratings yet
Cloud Computing Unit-1 Notes
12 pages
P.prabu (28x61c) CCS334 BDA - Unit 4
PDF
No ratings yet
P.prabu (28x61c) CCS334 BDA - Unit 4
28 pages
Hbase PPT PDF
PDF
No ratings yet
Hbase PPT PDF
100 pages
CN Manual Lab (R20)
PDF
No ratings yet
CN Manual Lab (R20)
85 pages
Hadoop Unit-4
PDF
No ratings yet
Hadoop Unit-4
44 pages
Subject Name Parallel and Distributed Computing
PDF
100% (1)
Subject Name Parallel and Distributed Computing
3 pages
Unit 3 AI Srs 13-14
PDF
No ratings yet
Unit 3 AI Srs 13-14
45 pages
Computer Networks - CS3591 - Notes - Unit 5 - Data Link and Physical Layers
PDF
No ratings yet
Computer Networks - CS3591 - Notes - Unit 5 - Data Link and Physical Layers
31 pages
UNIT 1 Ost - Part-1
PDF
No ratings yet
UNIT 1 Ost - Part-1
4 pages
Anna University, Chennai Non-Autonomous Affiliated Colleges Regulations 2021 Choice Based Credit System B.E. Computer Science and Engineering
PDF
No ratings yet
Anna University, Chennai Non-Autonomous Affiliated Colleges Regulations 2021 Choice Based Credit System B.E. Computer Science and Engineering
86 pages
Big Data Analytics
PDF
No ratings yet
Big Data Analytics
131 pages
Class Diagram UML
PDF
No ratings yet
Class Diagram UML
5 pages
Deep Learning r18 Jntuh Lab Manual
PDF
No ratings yet
Deep Learning r18 Jntuh Lab Manual
20 pages
Ad3251 Unit 2 Notes Edu Engg
PDF
No ratings yet
Ad3251 Unit 2 Notes Edu Engg
35 pages
22 PLC15 B
PDF
No ratings yet
22 PLC15 B
5 pages
CS3492 Database Management Systems Apr May 2024 Question Paper Download
PDF
No ratings yet
CS3492 Database Management Systems Apr May 2024 Question Paper Download
2 pages
AI Lab MAnual Final
PDF
No ratings yet
AI Lab MAnual Final
44 pages
BDA Lab ManuaL
PDF
No ratings yet
BDA Lab ManuaL
83 pages
CCS367 Storage Technologies - QB
PDF
No ratings yet
CCS367 Storage Technologies - QB
6 pages
Advanced Java Unit 3 Digital Notes
PDF
100% (1)
Advanced Java Unit 3 Digital Notes
67 pages
MCQ Type Questions
PDF
No ratings yet
MCQ Type Questions
24 pages
SOFTWARE PROJECT PLANNINGb SYLLABUS
PDF
No ratings yet
SOFTWARE PROJECT PLANNINGb SYLLABUS
3 pages
Question Bank - OS
PDF
No ratings yet
Question Bank - OS
6 pages
Cs8582-Object Oriented Analysisand Design Laboratory-46023968-Cs8582 - Ooad Lab
PDF
No ratings yet
Cs8582-Object Oriented Analysisand Design Laboratory-46023968-Cs8582 - Ooad Lab
132 pages
Data Structures Design - AD3251 - Important Questions With Answer - Unit 1 - Abstract Data Types
PDF
No ratings yet
Data Structures Design - AD3251 - Important Questions With Answer - Unit 1 - Abstract Data Types
15 pages
Aca Lab Manual Final
PDF
No ratings yet
Aca Lab Manual Final
28 pages
Question Bank: T.E. (Computer Engineering) Data Science and Big Data Analytics (2019 Pattern)
PDF
No ratings yet
Question Bank: T.E. (Computer Engineering) Data Science and Big Data Analytics (2019 Pattern)
4 pages
Os Lab Manual AI&DS
PDF
No ratings yet
Os Lab Manual AI&DS
64 pages
Develop A Java Program To Demonstrate Applet Life Cycle
PDF
No ratings yet
Develop A Java Program To Demonstrate Applet Life Cycle
8 pages
3-1 Bigdata (Spark)
PDF
No ratings yet
3-1 Bigdata (Spark)
3 pages
Cp4152 Database Practice Lab Manual R 2021
PDF
No ratings yet
Cp4152 Database Practice Lab Manual R 2021
48 pages
Distributed File Systems
PDF
No ratings yet
Distributed File Systems
18 pages
Os Module of Python
PDF
No ratings yet
Os Module of Python
73 pages
CCS335-Cloud-Computing-QB - Unit 3, 4 & 5
PDF
No ratings yet
CCS335-Cloud-Computing-QB - Unit 3, 4 & 5
57 pages
CS8461 Os Lab Manual Print
PDF
67% (3)
CS8461 Os Lab Manual Print
58 pages
Django Ppts
PDF
No ratings yet
Django Ppts
243 pages
Module 3 Python (Chap 2)
PDF
No ratings yet
Module 3 Python (Chap 2)
13 pages
Python Record
PDF
No ratings yet
Python Record
35 pages
GE3151 PYTHON Syllabus
PDF
No ratings yet
GE3151 PYTHON Syllabus
2 pages
21CSE354T - Full Stack Web Development Question Bank
PDF
100% (1)
21CSE354T - Full Stack Web Development Question Bank
9 pages
Important Questions
PDF
No ratings yet
Important Questions
8 pages
BDC Previous Papers 2 Marks
PDF
100% (1)
BDC Previous Papers 2 Marks
7 pages
R Language
PDF
No ratings yet
R Language
59 pages
SM 6th-Sem Cse Internet-Of-Things
PDF
No ratings yet
SM 6th-Sem Cse Internet-Of-Things
76 pages
Iii Year Vi Sem CS6659 Artificial Intelligence
PDF
No ratings yet
Iii Year Vi Sem CS6659 Artificial Intelligence
44 pages
Linux VI Editor
PDF
No ratings yet
Linux VI Editor
4 pages
Jerusalem College of Engineering: ACADEMIC YEAR 2021 - 2022
PDF
No ratings yet
Jerusalem College of Engineering: ACADEMIC YEAR 2021 - 2022
40 pages
Dap M4
PDF
No ratings yet
Dap M4
18 pages
Leftist Trees Extended Binary Trees
PDF
No ratings yet
Leftist Trees Extended Binary Trees
9 pages
r22 1 9 ML Lab Manual r22 Regulations
PDF
No ratings yet
r22 1 9 ML Lab Manual r22 Regulations
24 pages
AL3391-AI Unit IV
PDF
No ratings yet
AL3391-AI Unit IV
65 pages
BDACh01L03DesignLayersindata Processingarchitecture
PDF
No ratings yet
BDACh01L03DesignLayersindata Processingarchitecture
12 pages
Object Relational DBMSs
PDF
No ratings yet
Object Relational DBMSs
34 pages
Nicolas Hohn PHD Thesis
PDF
No ratings yet
Nicolas Hohn PHD Thesis
210 pages
Central Social Welfare Board: Chapter-7
PDF
No ratings yet
Central Social Welfare Board: Chapter-7
12 pages
Installation Guide: Biotime 8.0
PDF
No ratings yet
Installation Guide: Biotime 8.0
19 pages
Cruise Management System: Kashish Giri
PDF
No ratings yet
Cruise Management System: Kashish Giri
14 pages
Wine Wiki
PDF
No ratings yet
Wine Wiki
33 pages
San Fabric
PDF
No ratings yet
San Fabric
1 page
NX 3200
PDF
No ratings yet
NX 3200
2 pages
Important Function Module (Technical People Only) PDF
PDF
No ratings yet
Important Function Module (Technical People Only) PDF
4 pages
Unit 15 2
PDF
No ratings yet
Unit 15 2
11 pages
MongoDB Manual Master
PDF
No ratings yet
MongoDB Manual Master
618 pages
Case Study of Os2
PDF
No ratings yet
Case Study of Os2
32 pages
FinalDraft Manual PDF
PDF
No ratings yet
FinalDraft Manual PDF
277 pages
Readme Hxs7
PDF
No ratings yet
Readme Hxs7
7 pages
CL 6 Scratch Programming
PDF
No ratings yet
CL 6 Scratch Programming
15 pages
Advanced Java Prog - Syllabus
PDF
No ratings yet
Advanced Java Prog - Syllabus
2 pages
Tutorial OCI 1.2
PDF
No ratings yet
Tutorial OCI 1.2
11 pages
FlowCon FAC6HP Software Instruction 05.2013 PDF
PDF
No ratings yet
FlowCon FAC6HP Software Instruction 05.2013 PDF
4 pages
Bim Collab Manual
PDF
No ratings yet
Bim Collab Manual
16 pages
Sunlife E-Learning User Manual
PDF
No ratings yet
Sunlife E-Learning User Manual
16 pages
WN1 Assessment 2
PDF
No ratings yet
WN1 Assessment 2
6 pages
Mobile Application Development Laboratory
PDF
No ratings yet
Mobile Application Development Laboratory
106 pages
PRN0124 - A DCTools 1-12-14
PDF
No ratings yet
PRN0124 - A DCTools 1-12-14
3 pages
Fujitsu DeskView BIOS Management BIOS Update Tutorial
PDF
No ratings yet
Fujitsu DeskView BIOS Management BIOS Update Tutorial
5 pages
Docker Basic Commands
PDF
No ratings yet
Docker Basic Commands
4 pages
Ant Design - Filled Icons
PDF
No ratings yet
Ant Design - Filled Icons
1 page
QUICKII Software PLC Array
PDF
No ratings yet
QUICKII Software PLC Array
55 pages
Ceresian 1
PDF
No ratings yet
Ceresian 1
36 pages
7.4.7 Lab - Install The Virtual Machine
PDF
No ratings yet
7.4.7 Lab - Install The Virtual Machine
5 pages
Adeptwindows: User'S Guide
PDF
No ratings yet
Adeptwindows: User'S Guide
98 pages
Sai Lab 1
PDF
No ratings yet
Sai Lab 1
14 pages