Distributed Storage & Horizontal Scalability Increasing The No of Systems & Operate in Parallel

The document provides an overview of distributed storage and processing using Hadoop. It describes key components like HDFS for distributed storage, MapReduce for parallel processing, Hive and Pig for analytics, Sqoop and Flume for data import/export, and HBase as the only database supported. It also mentions Oozie for workflow scheduling of Hadoop processes.

Uploaded by

Ajay Sreedhar Janapala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views5 pages

Distributed Storage & Horizontal Scalability Increasing The No of Systems & Operate in Parallel

Uploaded by

Ajay Sreedhar Janapala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Distributed storage & horizontal scalability

Increasing the no of systems & operate in Parallel.

Vertical scalability

Increasing the disk size & RAM

HDFS -> Used for storage-> distributed file system ->based on Name node[master] & multiple Data
nodes[slaves] based on the data size ->HDFS fault tolerance is based on 2 factors->Replication factor
& Block size -> Block Size by default is 64MB ->The total blocks required is (File Size)/(Default Block
Size) -> Each block will be distributed based on the Replication factor ->used for replication the same
dataset over N no of Datanodes, where N is the Replication factor.
MAP REDUCE ->Native support for Java ->framework used for processing of data->Mapper &
Reducer combination->Mapper used for parallel processing of the instructions input to the Map
Reduce framework-> distributes the instructions set among the Data nodes for parallel processing ->
Reducer will merge the results obtained from the parallel processed instruction from different data
nodes & aggregates(merge) them.

HIVE -> SQL query based support for analytics

PIG -> defined Functions support for analytics

SQOOP ->for Importing/Exporting data from DMS/RDBMS systems to HDFS

FLUME -> for Importing Streaming data from to HDFS

HBASE -> a NoSQL database ->Column based storage->Only Database supported by Hadoop

APACHE OOZIE -> scheduler to control the workflow of all the process

Overview of the Hadoop ecosystem

Unit 4 Bda
No ratings yet
Unit 4 Bda
33 pages
Introduction to Hadoop Basics
No ratings yet
Introduction to Hadoop Basics
26 pages
Bigdata Module2 7th-Sem 18cs72
No ratings yet
Bigdata Module2 7th-Sem 18cs72
64 pages
Ha Do Op
No ratings yet
Ha Do Op
24 pages
Introduction to Hadoop Framework
No ratings yet
Introduction to Hadoop Framework
152 pages
Unit-5 - Hadoop
No ratings yet
Unit-5 - Hadoop
29 pages
Hadoop and Their Ecosystem
100% (2)
Hadoop and Their Ecosystem
24 pages
Hadoop Presentation
No ratings yet
Hadoop Presentation
19 pages
Bda Viva Questions
No ratings yet
Bda Viva Questions
8 pages
Hadoop Training in Bangalore
No ratings yet
Hadoop Training in Bangalore
31 pages
BDA Module 2-2023
No ratings yet
BDA Module 2-2023
30 pages
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
No ratings yet
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
23 pages
Big Data Unit 2 (Easy Notes) Edushine Classes
No ratings yet
Big Data Unit 2 (Easy Notes) Edushine Classes
35 pages
Big Data
No ratings yet
Big Data
67 pages
1.mrplab Intro
No ratings yet
1.mrplab Intro
18 pages
Unit IV Hadoop
No ratings yet
Unit IV Hadoop
90 pages
BIGDATA
No ratings yet
BIGDATA
180 pages
BD Unit-02
No ratings yet
BD Unit-02
16 pages
BDA Module 2
No ratings yet
BDA Module 2
40 pages
1 MapReduce Introduction With Example
No ratings yet
1 MapReduce Introduction With Example
52 pages
2 Hadoop
No ratings yet
2 Hadoop
20 pages
Unit 2 Part A
No ratings yet
Unit 2 Part A
34 pages
Hadoop Architecture - Hadoop Distributed File System (HDFS) - 2
No ratings yet
Hadoop Architecture - Hadoop Distributed File System (HDFS) - 2
39 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
Week-2 Lecture Notes
No ratings yet
Week-2 Lecture Notes
101 pages
Big Data Solutions with Hadoop
No ratings yet
Big Data Solutions with Hadoop
27 pages
Module - 2
No ratings yet
Module - 2
84 pages
BDA Module-02 Search Creators
No ratings yet
BDA Module-02 Search Creators
33 pages
Hadoop for Big Data Professionals
No ratings yet
Hadoop for Big Data Professionals
24 pages
Hadoop Setup for CSE Students
No ratings yet
Hadoop Setup for CSE Students
17 pages
Hadoop Ecosystem Lab Manual
0% (1)
Hadoop Ecosystem Lab Manual
40 pages
Bda Notes
No ratings yet
Bda Notes
110 pages
Unit-2 - Hadoop2
No ratings yet
Unit-2 - Hadoop2
30 pages
Hadoop for Data Engineers
No ratings yet
Hadoop for Data Engineers
180 pages
Hadoop: A Software Framework For Data Intensive Computing Applications
No ratings yet
Hadoop: A Software Framework For Data Intensive Computing Applications
47 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
5 pages
Unit 2
No ratings yet
Unit 2
17 pages
Hadoop
No ratings yet
Hadoop
154 pages
Unit Ii
No ratings yet
Unit Ii
30 pages
wk8 Final
No ratings yet
wk8 Final
39 pages
DW - Bigdata9
No ratings yet
DW - Bigdata9
113 pages
Hadoop Overview
100% (1)
Hadoop Overview
16 pages
Hadoop and Pig Overview - Hands-On: Outline of Tutorial
No ratings yet
Hadoop and Pig Overview - Hands-On: Outline of Tutorial
52 pages
BDA Manual
No ratings yet
BDA Manual
57 pages
Hadoop Unit-4
No ratings yet
Hadoop Unit-4
44 pages
Unit I
No ratings yet
Unit I
38 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
56 pages
554 Cheatsheet
No ratings yet
554 Cheatsheet
1 page
Big Data Refers To Extremely Large and Complex Datasets That 1
No ratings yet
Big Data Refers To Extremely Large and Complex Datasets That 1
421 pages
BDA CW Chapter 2
No ratings yet
BDA CW Chapter 2
6 pages
Hadoop Ecosystem Overview
No ratings yet
Hadoop Ecosystem Overview
38 pages
Hadoop Map-Reduce Guide
No ratings yet
Hadoop Map-Reduce Guide
28 pages
02 Hadoop
No ratings yet
02 Hadoop
117 pages
Unit 2
No ratings yet
Unit 2
9 pages
2-Hadoop History Terminologies DFS-03-01-2025
No ratings yet
2-Hadoop History Terminologies DFS-03-01-2025
52 pages
Unit-Iv CC&BD CS71
No ratings yet
Unit-Iv CC&BD CS71
148 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Getnumberofsheets: Hssfsheet Getsheet
No ratings yet
Getnumberofsheets: Hssfsheet Getsheet
10 pages
Collection (And Hence All Its Subinterfaces) Implements Iterable
No ratings yet
Collection (And Hence All Its Subinterfaces) Implements Iterable
14 pages
Testing With Groovy
No ratings yet
Testing With Groovy
40 pages
XML Presentation
No ratings yet
XML Presentation
17 pages

Distributed Storage & Horizontal Scalability Increasing The No of Systems & Operate in Parallel

Uploaded by

Distributed Storage & Horizontal Scalability Increasing The No of Systems & Operate in Parallel

Uploaded by

Distributed storage & horizontal scalability

Increasing the no of systems & operate in Parallel.

Increasing the disk size & RAM

HIVE -> SQL query based support for analytics

PIG -> defined Functions support for analytics

SQOOP ->for Importing/Exporting data from DMS/RDBMS systems to HDFS

FLUME -> for Importing Streaming data from to HDFS

Overview of the Hadoop ecosystem

You might also like