[go: up one dir, main page]

0% found this document useful (0 votes)
50 views2 pages

HDFS Blocks

The document discusses the differences between HDFS and network attached storage. HDFS is the primary storage system for Hadoop that stores very large files across a cluster of commodity hardware. In contrast, NAS provides file-level data storage on dedicated hardware. HDFS distributes blocks across all machines in a cluster, while NAS stores data separately on its own hardware. HDFS is designed to work with MapReduce to move computation to data, which NAS does not support as well.

Uploaded by

sharan kommi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views2 pages

HDFS Blocks

The document discusses the differences between HDFS and network attached storage. HDFS is the primary storage system for Hadoop that stores very large files across a cluster of commodity hardware. In contrast, NAS provides file-level data storage on dedicated hardware. HDFS distributes blocks across all machines in a cluster, while NAS stores data separately on its own hardware. HDFS is designed to work with MapReduce to move computation to data, which NAS does not support as well.

Uploaded by

sharan kommi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Previous year last question

DFS blocks are large compared to disk blocks, because to minimize the cost of
seeks. If we have many smaller size disk blocks, the seek time would be
maximum (time spent to seek/look for an information). And also, having
multiple small sized blocks is the burden on name node/master, as ultimately
the name node stores metadata, so it has to save this disk block information.
If the Data Block is large enough, the time it takes to transfer the data from the
disk can be significantly longer than the time to seek to the start of the block.
Thus, transferring a large file made of multiple blocks operates at the disk
transfer rate.
For each block we need a Mapper. So, in the case of small-sized blocks, there
will be a lot of Mappers. Each will be processing the data, which isn’t efficient.

Diff b/w HDFS and network attacked storage

 1) HDFS is the primary storage system of Hadoop.


HDFS designs to store very large files running on a cluster
of commodity hardware.
Network-attached storage (NAS) is a file-level computer
data storage server.
NAS provides data access to a heterogeneous group of
clients.

2) HDFS distribute blocks across all the machines in a


Hadoop cluster.
NAS data stores on a dedicated hardware.

3) HDFS is designed to work


with MapReduce Framework.
In MapReduce Framework computation move to the data
instead of Data to computation.
NAS is not suitable for MapReduce, as it stores data
separately from the computations.
 September 20, 2018 at 4:03 pm#5730

DataFlair Team
1)NAS stands for Network Attached storage which is a
file-level computer data storage server connected to a
computer network providing network access to
heterogeneous group of clients
HDFS stands for Hadoop distributed file system which is
a java based file system that provides scalable and reliable
data storage and is designed to span large clusters of
commodity hardware.
2)In HDFS data blocks are distributed across the local
drives of all machines in a cluster whereas in NAS data is
stored on a dedicated server.

3)HDFS includes commodity hardware which will be


cost-effective, but NAS is a high-end storage device
which is expensive.

4)It includes features like rack-awarenessHDFS, data


locality which makes it more scalable and effective then
NAS.

You might also like