0% found this document useful (0 votes)

84 views11 pages

HBase Architecture and Its Important Components

Apache HBase is an open-source, distributed NoSQL database designed for fast random access to large volumes of structured data, running on top of the Hadoop Distributed File System (HDFS). Its architecture includes components such as HMaster, HRegionServers, HRegions, and ZooKeeper, which work together to manage data storage and retrieval efficiently. HBase is particularly suitable for applications requiring quick access to massive datasets, such as in the telecom and banking industries.

Uploaded by

suj37874

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views11 pages

HBase Architecture and Its Important Components

Uploaded by

suj37874

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

What is Apache HBase?

HBase is an open-source, distributed, scalable, and NoSQL database written in Java.

HBase runs on top of the Hadoop Distributed File System and provides random read
and write access. It is a data model similar to Google’s big table designed to provide
fast random access to huge volumes of structured data. HBase takes advantage of the
fault tolerance capability provided by HDFS. It is designed to achieve a fault-tolerant
way of storing large numbers of sparse datasets.
HBase is the perfect choice for applications that require fast and random access to
huge amounts of data. HBase achieves low latency and high throughput by providing
faster read and writes access to large data.
You can store data in HDFS either directly or via HBase. Using HBase, we can read
data in HDFS or access it randomly.

HBase Architecture and its Important Components

Below is a detailed architrecutre of HBase with components:

HBase Architecture Diagram

HBase architecture consists mainly of four components

 HMaster
 HRegionserver
 HRegions
 Zookeeper
 HDFS

HMaster
HMaster in HBase is the implementation of a Master server in HBase architecture. It
acts as a monitoring agent to monitor all Region Server instances present in the cluster
and acts as an interface for all the metadata changes. In a distributed cluster
environment, Master runs on NameNode. Master runs several background threads.
The following are important roles performed by HMaster in HBase.

 Plays a vital role in terms of performance and maintaining nodes in the cluster.
 HMaster provides admin performance and distributes services to different region
servers.
 HMaster assigns regions to region servers.
 HMaster has the features like controlling load balancing and failover to handle
the load over nodes present in the cluster.
 When a client wants to change any schema and to change any Metadata
operations, HMaster takes responsibility for these operations.

Some of the methods exposed by HMaster Interface are primarily Metadata oriented
methods.

 Table (createTable, removeTable, enable, disable)

 ColumnFamily (add Column, modify Column)
 Region (move, assign)

The client communicates in a bi-directional way with both HMaster and ZooKeeper. For
read and write operations, it directly contacts with HRegion servers. HMaster assigns
regions to region servers and in turn, check the health status of region servers.

In entire architecture, we have multiple region servers. Hlog present in region servers
which are going to store all the log files.

HBase Region Servers

When HBase Region Server receives writes and read requests from the client, it
assigns the request to a specific region, where the actual column family resides.
However, the client can directly contact with HRegion servers, there is no need of
HMaster mandatory permission to the client regarding communication with HRegion
servers. The client requires HMaster help when operations related to metadata and
schema changes are required.

HRegionServer is the Region Server implementation. It is responsible for serving and

managing regions or data that is present in a distributed cluster. The region servers run
on Data Nodes present in the Hadoop cluster.

HMaster can get into contact with multiple HRegion servers and performs the following
functions.

 Hosting and managing regions

 Splitting regions automatically
 Handling read and writes requests
 Communicating with the client directly

HBase Regions
HRegions are the basic building elements of HBase cluster that consists of the
distribution of tables and are comprised of Column families. It contains multiple stores,
one for each column family. It consists of mainly two components, which are Memstore
and Hfile.

ZooKeeper
HBase Zookeeper is a centralized monitoring server which maintains configuration
information and provides distributed synchronization. Distributed synchronization is to
access the distributed applications running across the cluster with the responsibility of
providing coordination services between nodes. If the client wants to communicate with
regions, the server’s client has to approach ZooKeeper first.

It is an open source project, and it provides so many important services.

Services provided by ZooKeeper

 Maintains Configuration information

 Provides distributed synchronization
 Client Communication establishment with region servers
 Provides ephemeral nodes for which represent different region servers
 Master servers usability of ephemeral nodes for discovering available servers in
the cluster
 To track server failure and network partitions

Master and HBase slave nodes ( region servers) registered themselves with
ZooKeeper. The client needs access to ZK(zookeeper) quorum configuration to connect
with master and region servers.

During a failure of nodes that present in HBase cluster, ZKquoram will trigger error
messages, and it starts to repair the failed nodes.

HDFS
HDFS is a Hadoop distributed File System, as the name implies it provides a distributed
environment for the storage and it is a file system designed in a way to run on
commodity hardware. It stores each file in multiple blocks and to maintain fault
tolerance, the blocks are replicated across a Hadoop cluster.

HDFS provides a high degree of fault –tolerance and runs on cheap commodity
hardware. By adding nodes to the cluster and performing processing & storing by using
the cheap commodity hardware, it will give the client better results as compared to the
existing one.

In here, the data stored in each block replicates into 3 nodes any in a case when any
node goes down there will be no loss of data, it will have a proper backup recovery
mechanism.
HDFS get in contact with the HBase components and stores a large amount of data in a
distributed manner.

HBase meta table

META table is a special HBase catalog table that maintains a list of all region servers in
the HBase storage system. A . META file manages a table in the form of keys and
values. The key will represent the initial key of the HBase region and its id. The value
will contain the path to the region server.

HBase Data Model

HBase Data Model is a set of components that consists of Tables, Rows, Column
families, Cells, Columns, and Versions. HBase tables contain column families and rows
with elements defined as Primary keys. A column in HBase data model table represents
attributes to the objects.

HBase Data Model consists of following elements,

 Set of tables
 Each table with column families and rows
 Each table must have an element defined as Primary Key.
 Row key acts as a Primary key in HBase.
 Any access to HBase tables uses this Primary Key
 Each column present in HBase denotes attribute corresponding to object

HBase Use Cases

Following are examples of HBase use cases with a detailed explanation of the solution
it provides to various technical problems

Problem Statement Solution

Telecom Industry faces following
Technical challenges
HBase is used to store billions of rows of
 Storing billions of CDR (Call
detailed call records. If 20TB of data is
detailed recording) log records
added per month to the existing RDBMS
generated by telecom domain
database, performance will deteriorate. To
 Providing real-time access to
handle a large amount of data in this use
CDR logs and billing
case, HBase is the best solution. HBase
information of customers
performs fast querying and displays
 Provide cost-effective solution
records.
comparing to traditional
database systems
The Banking industry generates
millions of records on a daily basis. In To store, process and update vast volumes
addition to this, the banking industry of data and performing analytics, an ideal
also needs an analytics solution that solution is – HBase integrated with several
can detect Fraud in money Hadoop ecosystem components.
transactions

That apart, HBase can be used

 Whenever there is a need to write heavy applications.

 Performing online log analytics and to generate compliance reports.

Storage Mechanism in HBase

HBase is a column-oriented database and data is stored in tables. The tables are sorted
by RowId. As shown below, HBase has RowId, which is the collection of several column
families that are present in the table.

The column families that are present in the schema are key-value pairs. If we observe in
detail each column family having multiple numbers of columns. The column values
stored into disk memory. Each cell of the table has its own Metadata like timestamp and
other information.

Storage Mechanism in HBase

Coming to HBase the following are the key terms representing table schema

 Table: Collection of rows present.

 Row: Collection of column families.
 Column Family: Collection of columns.
 Column: Collection of key-value pairs.
 Namespace: Logical grouping of tables.
 Cell: A {row, column, version} tuple exactly specifies a cell definition in HBase.

Column-oriented vs Row-oriented storages

Column and Row-oriented storages differ in their storage mechanism. As we all know
traditional relational models store data in terms of row-based format like in terms of
rows of data. Column-oriented storages store data tables in terms of columns and
column families.

The following Table gives some key differences between these two storages

Column-oriented Database Row oriented Database

 When the situation comes to process  Online Transactional
and analytics we use this approach. process such as banking
Such as Online Analytical and finance domains use
Processing and it’s applications. this approach.

 The amount of data that can able to  It is designed for a small

store in this model is very huge like in number of rows and
terms of petabytes columns.

HBase Data Model

HBase is a column-oriented database. A column-oriented database stores data in cells
grouped into columns, not rows.
Source: tibco.com

1. Table & 2. Row

Several Rows are multiple in Hbase Table. Columns have values assigned to them.
HBase sorts rows alphabetically by row key.

The main goal is to store data so that related rows are closer together. The domain of
the site is used as a common row-key pattern. For example, if our row keys are
domains, we should store them in reverse, i.e. org.apache.www or org.apache.mail or
org. Apache.Jira. This way, all Apache domains are close to each other in the HBase
table.

3. Column

An HBase column consists of a column family and a column qualifier separated by the :
(colon) character.

A. Column family

Column families physically house a set of columns and their values; then, Each column
family has a set of storage properties, such as how its data is compressed, whether its
values should be cached, how its row keys are encoded, and more. Each row in an
HBase table has the same column families.

b. Column qualifications

A column qualifier for qualification is added to the column family to provide an index for
that data part.

Example: the column family is content, then the column qualifier can be content: HTML
or content: pdf.
The Column families are fixed during table creation, but column qualifiers are mutable
and vary widely between rows.

4. The cell

A cell is essentially a combination of a row, a column family, and a column qualifier.

Contains a value and a timestamp that represents the version of the value.

5. Timestamp

A timestamp is an identifier for a given value version and is written next to each value.
The timestamp default represents the time on the RegionServer when the data was
written. However, we can specify a different timestamp value when inserting data into a
cell.

HBase Read and Write Data Explained

The Read and Write operations from Client into Hfile can be shown in below diagram.

Step 1) Client wants to write data and in turn first communicates with Regions server
and then regions
Step 2) Regions contacting memstore for storing associated with the column family

Step 3) First data stores into Memstore, where the data is sorted and after that, it
flushes into HFile. The main reason for using Memstore is to store data in a Distributed
file system based on Row Key. Memstore will be placed in Region server main memory
while HFiles are written into HDFS.

Step 4) Client wants to read data from Regions

Step 5) In turn Client can have direct access to Mem store, and it can request for data.

Step 6) Client approaches HFiles to get the data. The data are fetched and retrieved by
the Client.

Memstore holds in-memory modifications to the store. The hierarchy of objects in

HBase Regions is as shown from top to bottom in below table.

Table HBase table present in the HBase cluster

Region HRegions for the presented tables
Store It stores per ColumnFamily for each region for the table
 Memstore for each store for each region for the table
 It sorts data before flushing into HFiles
Memstore
 Write and read performance will increase because of sorting

StoreFile StoreFiles for each store for each region for the table
Block Blocks present inside StoreFiles

HBase vs. HDFS

HBase runs on top of HDFS and Hadoop. Some key differences between HDFS and
HBase are in terms of data operations and processing.

HBASE HDFS
Low latency operations High latency operations
Random reads and writes Write once Read many times
Accessed through shell commands, client API Primarily accessed through MR
in Java, REST, Avro or Thrift (Map Reduce) jobs
Storage and process both can be perform It’s only for storage areas

Some typical IT industrial applications use HBase operations along with Hadoop.
Applications include stock exchange data, online banking data operations, and
processing Hbase is best-suited solution method.

BDA Unit-4 Part-2 HBase, Hive, Pig
No ratings yet
BDA Unit-4 Part-2 HBase, Hive, Pig
74 pages
HBase
No ratings yet
HBase
31 pages
Hadoop HBASE
No ratings yet
Hadoop HBASE
71 pages
BDT Unit - V
No ratings yet
BDT Unit - V
15 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
34 pages
Ba Iift 17-18
No ratings yet
Ba Iift 17-18
40 pages
Bda - Unit 5
No ratings yet
Bda - Unit 5
30 pages
Apache HBase Tutorial & Setup Guide
No ratings yet
Apache HBase Tutorial & Setup Guide
19 pages
Unit 3
No ratings yet
Unit 3
15 pages
Unit 4
No ratings yet
Unit 4
15 pages
HBASE
No ratings yet
HBASE
18 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
10 HBase
No ratings yet
10 HBase
13 pages
Lec 18
No ratings yet
Lec 18
18 pages
HBase - Tutorial
No ratings yet
HBase - Tutorial
14 pages
HBase
No ratings yet
HBase
27 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
HBASE
No ratings yet
HBASE
11 pages
CCS334 BDA - Unit 5
No ratings yet
CCS334 BDA - Unit 5
27 pages
HBase
No ratings yet
HBase
6 pages
Hbase: Q) What Is Hbase ?
No ratings yet
Hbase: Q) What Is Hbase ?
15 pages
BDA Unit-5
No ratings yet
BDA Unit-5
31 pages
BDA Unit 5
No ratings yet
BDA Unit 5
33 pages
Unit 1 P2 HBase
No ratings yet
Unit 1 P2 HBase
22 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
4 4HBase
No ratings yet
4 4HBase
17 pages
Unit 5 BDA
No ratings yet
Unit 5 BDA
34 pages
Lec 18
No ratings yet
Lec 18
21 pages
Unit - IV - Notes
No ratings yet
Unit - IV - Notes
23 pages
Unit - 5 Part - 1
No ratings yet
Unit - 5 Part - 1
8 pages
Cs525: Special Topics in DBS: Large-Scale Data Management
No ratings yet
Cs525: Special Topics in DBS: Large-Scale Data Management
35 pages
UNIT5
No ratings yet
UNIT5
42 pages
HBase (Unit 4)
No ratings yet
HBase (Unit 4)
37 pages
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
No ratings yet
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
32 pages
HBase: Data Management & Architecture
No ratings yet
HBase: Data Management & Architecture
36 pages
Big Data Unit 5
No ratings yet
Big Data Unit 5
18 pages
Hbase - Quick Guide Hbase - Overview
No ratings yet
Hbase - Quick Guide Hbase - Overview
53 pages
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
No ratings yet
Hbase Big Table: Oriented vs. Column-Oriented Data Stores. As Shown Below, in A Row
6 pages
HBase
No ratings yet
HBase
39 pages
Big Data 22MSM40206
No ratings yet
Big Data 22MSM40206
9 pages
Bda Unit 5
No ratings yet
Bda Unit 5
16 pages
Unit-5 Notes
No ratings yet
Unit-5 Notes
61 pages
Unit V Hadoop Related Tools
No ratings yet
Unit V Hadoop Related Tools
54 pages
Unit V
No ratings yet
Unit V
6 pages
HBase Architecture & Features Guide
No ratings yet
HBase Architecture & Features Guide
35 pages
Adobe Scan 06-Aug-2025
No ratings yet
Adobe Scan 06-Aug-2025
9 pages
UNIT 5 Notes
No ratings yet
UNIT 5 Notes
47 pages
HBase & Hive Architecture Guide
No ratings yet
HBase & Hive Architecture Guide
10 pages
Unit 5 Hbase
No ratings yet
Unit 5 Hbase
15 pages
H Base Tutorial
No ratings yet
H Base Tutorial
38 pages
HBase Presentation
No ratings yet
HBase Presentation
23 pages
BDA Module 2-2023
No ratings yet
BDA Module 2-2023
30 pages
Big Data UNIT 5 Own
No ratings yet
Big Data UNIT 5 Own
18 pages
HBase
No ratings yet
HBase
14 pages
Hadoop Week 6
No ratings yet
Hadoop Week 6
38 pages
HBase NoSQL Database Overview
No ratings yet
HBase NoSQL Database Overview
9 pages
HBase
No ratings yet
HBase
38 pages
Unit 5 Bda
No ratings yet
Unit 5 Bda
42 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
1 page
Leaflet
No ratings yet
Leaflet
57 pages
Quick-Start Guide - AVR 158 (English EU)
No ratings yet
Quick-Start Guide - AVR 158 (English EU)
8 pages
1215 - 1106 - Ceragon - XPIC - Presentation v6.7 PDF
50% (2)
1215 - 1106 - Ceragon - XPIC - Presentation v6.7 PDF
31 pages
Cisco AMP For Endpoints: Exploit Prevention
No ratings yet
Cisco AMP For Endpoints: Exploit Prevention
12 pages
02 134221 083 10676495395 30032023 100026am
No ratings yet
02 134221 083 10676495395 30032023 100026am
5 pages
Fall 2024 - CS619 - 10104
No ratings yet
Fall 2024 - CS619 - 10104
2 pages
Adroit, Ehospital Proposal
No ratings yet
Adroit, Ehospital Proposal
14 pages
2 Process Designer - Digital Factory Planning and Simulation With Tecnomatix
No ratings yet
2 Process Designer - Digital Factory Planning and Simulation With Tecnomatix
2 pages
A Survey of Text Classification With Transformers How Wide How Large How Long How Accurate How Expensive How Safe
No ratings yet
A Survey of Text Classification With Transformers How Wide How Large How Long How Accurate How Expensive How Safe
14 pages
DJ Strike
No ratings yet
DJ Strike
12 pages
Wireshark Lab Guide for Beginners
No ratings yet
Wireshark Lab Guide for Beginners
11 pages
Draw A Speaker Icon in Photoshop: - Coming Soon
No ratings yet
Draw A Speaker Icon in Photoshop: - Coming Soon
35 pages
Automatic Mini Floor Cleaner
No ratings yet
Automatic Mini Floor Cleaner
20 pages
CyberAces Module1-Linux 3 CoreCommands
No ratings yet
CyberAces Module1-Linux 3 CoreCommands
19 pages
Student Management System Project
No ratings yet
Student Management System Project
22 pages
What Contributes To AI
100% (1)
What Contributes To AI
11 pages
Viewse Um006 - en e (337 448)
100% (1)
Viewse Um006 - en e (337 448)
112 pages
Nvision 8500 Amt
No ratings yet
Nvision 8500 Amt
12 pages
Apache Tomcat - 8.5.20
No ratings yet
Apache Tomcat - 8.5.20
1 page
Vaibhav Prakash (No Number)
No ratings yet
Vaibhav Prakash (No Number)
1 page
Report Card Comments, ICT (File 3), High School/Secondary
100% (1)
Report Card Comments, ICT (File 3), High School/Secondary
9 pages
IR Underwater Comms for Engineers
75% (8)
IR Underwater Comms for Engineers
4 pages
Install Process Freepbx Centos 5.1
No ratings yet
Install Process Freepbx Centos 5.1
35 pages
EEB334 Computer Programming I Course Outline and Teaching Plan 2022
No ratings yet
EEB334 Computer Programming I Course Outline and Teaching Plan 2022
3 pages
Bootstrap HTML Calculator Guide
No ratings yet
Bootstrap HTML Calculator Guide
20 pages
Introduction To Voip: Cisco Networking Academy Program
No ratings yet
Introduction To Voip: Cisco Networking Academy Program
29 pages
Resume: Indian Institute of Technology (Indian School of Mines), DHANBAD-826004
No ratings yet
Resume: Indian Institute of Technology (Indian School of Mines), DHANBAD-826004
2 pages
FRST Analysis Results 07-12-2023
No ratings yet
FRST Analysis Results 07-12-2023
12 pages
IG 500A Leaflet
No ratings yet
IG 500A Leaflet
4 pages

HBase Architecture and Its Important Components

Uploaded by

HBase Architecture and Its Important Components

Uploaded by

What is Apache HBase?

HBase is an open-source, distributed, scalable, and NoSQL database written in Java.

HBase Architecture and its Important Components

HBase Architecture Diagram

 Table (createTable, removeTable, enable, disable)

HBase Region Servers

HRegionServer is the Region Server implementation. It is responsible for serving and

 Hosting and managing regions

It is an open source project, and it provides so many important services.

Services provided by ZooKeeper

 Maintains Configuration information

HBase meta table

HBase Data Model

HBase Data Model consists of following elements,

HBase Use Cases

Problem Statement Solution

That apart, HBase can be used

 Whenever there is a need to write heavy applications.

Storage Mechanism in HBase

Storage Mechanism in HBase

 Table: Collection of rows present.

Column-oriented vs Row-oriented storages

Column-oriented Database Row oriented Database

 The amount of data that can able to  It is designed for a small

HBase Data Model

1. Table & 2. Row

A cell is essentially a combination of a row, a column family, and a column qualifier.

HBase Read and Write Data Explained

Step 4) Client wants to read data from Regions

Memstore holds in-memory modifications to the store. The hierarchy of objects in

Table HBase table present in the HBase cluster

HBase vs. HDFS

You might also like