[go: up one dir, main page]

0% found this document useful (0 votes)
47 views2 pages

Big Data Technologies On Map Reduce and Hadoop

Normalisation

Uploaded by

priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views2 pages

Big Data Technologies On Map Reduce and Hadoop

Normalisation

Uploaded by

priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Big Data Technologies on Map Reduce and Hadoop

Big Data technologies like MapReduce and Hadoop are


foundational for processing and analyzing vast amounts
of data. Here’s an overview of both:
Hadoop
Hadoop is an open-source framework that allows for the
distributed storage and processing of large datasets
across clusters of computers. Its main components
include:
1. Hadoop Distributed File System (HDFS): A
distributed file system that stores data across
multiple machines, providing high throughput
access to application data.
2. YARN (Yet Another Resource Negotiator): The
resource management layer of Hadoop, which
schedules and allocates resources across various
applications running on the cluster.
3. Hadoop Common: The libraries and utilities that
support the other Hadoop modules.
MapReduce
MapReduce is a programming model and processing
engine used within the Hadoop ecosystem to handle
large-scale data processing tasks. It consists of two
main functions:
1. Map: This phase takes input data, processes it,
and transforms it into a set of key-value pairs. Each
mapper operates in parallel across the distributed
data.
2. Reduce: The reducer takes the output of the
mappers and merges those key-value pairs into a
smaller set of results. This phase also runs in
parallel.
Key Benefits
 Scalability: Both technologies can scale
horizontally, meaning you can add more nodes to
handle increased data loads.
 Fault Tolerance: Hadoop is designed to handle
hardware failures gracefully, redistributing tasks
and data as necessary.
 Flexibility: They can process a variety of data
types, including structured, semi-structured, and
unstructured data.
Use Cases
 Data Warehousing: Storing and processing large
datasets for business intelligence.
 Log Analysis: Processing logs from various
systems to gain insights.
 Machine Learning: Training models on large
datasets using frameworks like Apache Mahout or
Spark.
Ecosystem
Hadoop has a rich ecosystem of tools that enhance its
capabilities, including:
 Apache Hive: A data warehouse software that
provides SQL-like querying capabilities.
 Apache Pig: A high-level platform for creating
programs that run on Hadoop.
 Apache HBase: A NoSQL database that runs on
top of HDFS for real-time read/write access to large
datasets.
Conclusion
MapReduce and Hadoop are integral to the Big Data
landscape, enabling organizations to efficiently process
and analyze massive datasets. Their open-source
nature and extensibility through various tools make
them popular choices for data-driven projects.

You might also like