Concepts and Terminology
Concepts and Terminology
UNIT 4
Big Data Analysis Techniques
Big Data analysis involves processing and interpreting massive datasets to extract
meaningful insights. Various techniques are used depending on the nature of the data
and the objectives of the analysis. The major Big Data analysis techniques include
Quantitative Analysis, Qualitative Analysis, Statistical Analysis, Semantic Analysis, and
Visual Analysis.
Quantitative Analysis
Quantitative analysis is a data-driven approach that focuses on numerical data,
mathematical calculations, and statistical techniques to extract meaningful insights from
large datasets. It is widely used in Big Data Analytics to identify patterns, correlations,
and trends, making data-driven decision-making more efficient and reliable.
Quantitative analysis is objective and measurable, making it useful in fields such as
finance, healthcare, business intelligence, and scientific research.
2. Key Features of Quantitative Analysis
Uses Numerical Data: Involves structured datasets such as sales numbers,
financial records, and performance metrics.
Objective and Repeatable: Results can be tested and verified multiple times.
Statistical and Mathematical Methods: Uses probability, regression, hypothesis
testing, and machine learning models.
Predictive Capabilities: Helps in forecasting future trends using historical data.
Automation and Scalability: Can be applied to large datasets using
computational algorithms and machine learning models.
3. Techniques Used in Quantitative Analysis
A. Descriptive Analytics
Focuses on summarizing historical data to understand trends and patterns.
Uses mean, median, mode, variance, and standard deviation to describe
datasets.
Example: A company analyzing monthly sales revenue to track performance over
time.
B. Predictive Analytics
Uses historical data and statistical models to predict future outcomes.
Techniques include Regression Analysis, Time Series Forecasting, and Machine
Learning Models.
Example: Weather forecasting models that predict temperature based on past
climate data.
C. Prescriptive Analytics
Provides actionable recommendations based on data-driven insights.
Uses advanced optimization algorithms and decision models to suggest the best
course of action.
Example: An e-commerce company adjusting prices dynamically based on
customer demand predictions.
D. Regression Analysis
Examines the relationship between dependent and independent variables.
Used to predict how one factor affects another (e.g., sales vs. advertising spend).
Example: Analyzing the effect of marketing expenditure on sales revenue.
E. Probability and Statistical Inference
Uses probability theory to predict outcomes and assess uncertainty.
Hypothesis testing, confidence intervals, and Bayesian statistics are key
methods.
Example: A pharmaceutical company testing the effectiveness of a new drug.
4. Applications of Quantitative Analysis in Big Data
A. Finance
Stock market prediction using quantitative trading models.
Fraud detection by analyzing transaction patterns.
B. Healthcare
Predicting disease outbreaks based on historical patient records.
Analyzing drug effectiveness using statistical experiments.
C. Business Intelligence
Customer segmentation for targeted marketing campaigns.
Sales forecasting for inventory management.
D. Social Media Analytics
Analyzing engagement metrics such as likes, shares, and comments.
Sentiment analysis using numerical sentiment scores.
E. Manufacturing and Supply Chain
Optimizing logistics using demand forecasting models.
Predicting machine failures using sensor data.
5. Tools and Technologies Used
Python (Pandas, NumPy, Scikit-learn) – For data analysis and machine learning.
R – Statistical computing and visualization.
SQL – Querying large structured datasets.
Power BI & Tableau – Data visualization tools for business intelligence.
Hadoop & Spark – Big Data frameworks for large-scale data processing.
6. Challenges in Quantitative Analysis
Data Quality Issues: Inaccurate or incomplete data can affect results.
Scalability: Processing extremely large datasets requires high-performance
computing.
Interpretability: Complex models may provide insights, but understanding their
decision-making process can be challenging.
Bias in Data: Historical data may contain biases that impact predictions.
Qualitative Analysis
1. Introduction to Qualitative Analysis
Qualitative analysis in Big Data refers to the process of analyzing non-numerical data,
such as text, images, videos, and social media interactions, to extract meaningful
insights. Unlike quantitative analysis, which focuses on numerical data, qualitative
analysis is more subjective, interpretive, and exploratory. It is widely used in fields like
marketing, social sciences, healthcare, and customer experience research.
2. Key Features of Qualitative Analysis
Focuses on Unstructured Data: Analyzes text, speech, images, videos, and social
media content.
Subjective and Contextual: Interpretation depends on human understanding and
cultural factors.
Exploratory Approach: Often used to uncover hidden patterns, themes, and
sentiments.
Uses Natural Language Processing (NLP): AI-driven techniques help analyze text
and speech data.
Case-Specific Insights: More useful for understanding customer behavior, brand
perception, and emotions.
3. Techniques Used in Qualitative Analysis
A. Sentiment Analysis (Opinion Mining)
Identifies emotions and opinions in text data (e.g., positive, negative, neutral
sentiments).
Uses Natural Language Processing (NLP) and Machine Learning (ML) algorithms.
Example: Analyzing Twitter comments to gauge customer satisfaction with a
product.
B. Thematic Analysis
Identifies common themes and patterns in text data.
Often used in research, interviews, and social media analysis.
Example: Analyzing customer feedback to find recurring complaints about a
service.
C. Content Analysis
Systematically categorizes and interprets textual, visual, or audio data.
Uses coding techniques to classify words, phrases, and patterns.
Example: Studying political speeches to identify recurring themes in a leader’s
communication.
D. Discourse Analysis
Examines language, communication styles, and contextual meanings.
Often used in media, linguistics, and social science research.
Example: Analyzing newspaper articles to understand media bias in reporting.
E. Social Media Analytics
Examines social media interactions (likes, shares, comments, hashtags) to
understand trends.
Uses text mining and NLP to process large-scale social data.
Example: Analyzing viral trends on Instagram to understand audience
engagement.
F. Image and Video Analysis
Uses AI and computer vision to analyze visual content.
Identifies objects, scenes, emotions, and actions in images/videos.
Example: Facial recognition software identifying emotions in customer reaction
videos.
4. Applications of Qualitative Analysis in Big Data
A. Marketing and Brand Analysis
Understanding consumer perception through social media and customer reviews.
Analyzing brand sentiment to improve marketing strategies.
B. Healthcare and Patient Feedback
Studying doctor-patient conversations to improve healthcare services.
Analyzing social media discussions about diseases to track outbreaks.
C. Business Intelligence
Evaluating employee feedback and workplace sentiment to enhance HR policies.
Understanding competitor strategies by analyzing news and social media content.
D. Political and Media Analysis
Identifying political sentiment before elections.
Analyzing news bias and misinformation in digital media.
E. Customer Support Optimization
Analyzing chatbot and call center interactions to improve customer service.
Understanding customer emotions to personalize responses.
5. Tools and Technologies Used
Natural Language Processing (NLP) Tools: NLTK, SpaCy, BERT, GPT models.
Sentiment Analysis Tools: VADER, TextBlob, IBM Watson.
Social Media Analytics Tools: Hootsuite, Brandwatch, Sprout Social.
Computer Vision Tools: OpenCV, TensorFlow, AWS Rekognition.
Data Visualization Tools: Tableau, Power BI, Python (Matplotlib, Seaborn).
6. Challenges in Qualitative Analysis
Subjectivity in Interpretation: Results can vary based on human biases.
Complexity of Unstructured Data: Requires advanced AI and NLP models.
Scalability Issues: Analyzing large-scale text and media data can be
computationally expensive.
Contextual Understanding: Words and images may have different meanings
based on cultural or situational factors.
Statistical Analysis
1. Introduction to Statistical Analysis
Statistical analysis in Big Data involves applying mathematical techniques to analyze and
interpret large datasets. It helps in identifying patterns, relationships, trends, and outliers
in the data. Statistical analysis is widely used in various fields such as finance, healthcare,
business intelligence, social sciences, and artificial intelligence.
Unlike qualitative analysis, which focuses on non-numeric data, statistical analysis deals
with numerical and structured data to derive insights using probability, distributions,
and inferential techniques.
2. Types of Statistical Analysis
A. Descriptive Statistical Analysis
Summarizes and describes features of a dataset.
Uses measures like mean, median, mode, standard deviation, variance, range,
and frequency distributions.
Example: Calculating the average income of employees in a company.
B. Inferential Statistical Analysis
Makes predictions or inferences about a larger population based on a sample.
Uses techniques like hypothesis testing, confidence intervals, and regression
analysis.
Example: Predicting election results based on exit poll data from a sample of
voters.
C. Predictive Statistical Analysis
Uses historical data to predict future trends.
Involves techniques like regression models, time-series forecasting, and machine
learning algorithms.
Example: Predicting stock market trends based on past data.
D. Prescriptive Statistical Analysis
Suggests the best course of action based on the analyzed data.
Uses decision trees, optimization algorithms, and simulation techniques.
Example: Recommending the best marketing strategy based on customer
purchase patterns.
E. Exploratory Data Analysis (EDA)
Helps in identifying patterns, trends, and relationships in data before applying
complex models.
Uses data visualization, scatter plots, and correlation analysis.
Example: Analyzing customer transaction data to find seasonal purchasing trends.
F. Bayesian Statistical Analysis
Uses Bayes' theorem to update probabilities as new data becomes available.
Example: Spam filters in email services use Bayesian probability to classify
messages as spam or not spam.
3. Key Statistical Techniques in Big Data Analysis
A. Measures of Central Tendency
Mean (Average): The sum of all values divided by the number of values.
Median: The middle value in a sorted dataset.
Mode: The most frequently occurring value in a dataset.
Example: The mean age of customers visiting a shopping mall.
B. Measures of Dispersion (Variability in Data)
Range: The difference between the highest and lowest values.
Variance: Measures the spread of data points from the mean.
Standard Deviation: The square root of variance, showing how much values
deviate from the mean.
Example: Analyzing the variation in exam scores of students in a university.
C. Correlation Analysis
Determines the strength and direction of the relationship between two variables.
Positive correlation: When one variable increases, the other also increases.
Negative correlation: When one variable increases, the other decreases.
No correlation: No relationship between variables.
Example: Correlation between temperature and ice cream sales.
D. Regression Analysis
Predicts the relationship between dependent and independent variables.
Linear Regression: Predicts outcomes using a straight-line relationship.
Multiple Regression: Uses multiple independent variables to predict the
outcome.
Logistic Regression: Used for classification problems (e.g., predicting whether a
customer will buy a product or not).
Example: Predicting house prices based on area, number of bedrooms, and
location.
E. Hypothesis Testing
Determines if an assumption about a dataset is statistically significant.
Null Hypothesis (H₀): No effect or relationship exists.
Alternative Hypothesis (H₁): There is a significant effect or relationship.
Uses tests like T-test, Chi-square test, and ANOVA (Analysis of Variance).
Example: Testing if a new drug is more effective than an existing one.
F. Time Series Analysis
Analyzes data collected over time to identify trends and seasonality.
Techniques include moving averages, ARIMA models, and exponential
smoothing.
Example: Forecasting sales of an online store based on past sales trends.
G. Outlier Detection
Identifies data points that are significantly different from the rest of the dataset.
Uses techniques like Z-score, IQR (Interquartile Range), and Boxplots.
Example: Detecting fraudulent transactions in banking.
4. Applications of Statistical Analysis in Big Data
A. Business and Finance
Forecasting stock market trends.
Risk assessment in investment and insurance.
Customer segmentation for targeted marketing.
B. Healthcare and Medicine
Analyzing patient records for disease predictions.
Clinical trial analysis for drug effectiveness.
Epidemic and pandemic outbreak prediction.
C. Social Media and Marketing
Sentiment analysis for brand perception.
Analyzing consumer behavior and preferences.
Predicting viral trends and user engagement.
D. Supply Chain and Logistics
Demand forecasting for inventory management.
Route optimization for delivery services.
Supplier risk analysis for business continuity.
E. Government and Policy Making
Census data analysis for urban planning.
Crime rate prediction for law enforcement.
Economic forecasting for policy decisions.
5. Tools and Technologies for Statistical Analysis
Programming Languages: Python (NumPy, Pandas, Statsmodels, SciPy), R, SAS
Data Visualization: Tableau, Power BI, Matplotlib, Seaborn
Machine Learning Frameworks: TensorFlow, Scikit-Learn
Big Data Platforms: Apache Spark, Hadoop, Google BigQuery
6. Challenges in Statistical Analysis
Data Quality Issues: Missing values, inconsistencies, and errors in large datasets.
Scalability: Handling massive datasets efficiently.
Computational Complexity: Processing time-consuming models.
Interpretability: Understanding and explaining complex statistical results.
Bias and Sampling Errors: Incorrect inferences due to biased or unrepresentative
samples.
Semantic Analysis
1. Introduction to Semantic Analysis
Semantic analysis is a Natural Language Processing (NLP) technique that helps computers
understand the meaning, intent, and context of words, phrases, and sentences in textual
data. It goes beyond basic keyword-based analysis to determine the true meaning of a
text based on linguistic structure, relationships, and context.
Big data systems use semantic analysis to extract insights from unstructured data sources
such as social media, emails, blogs, customer reviews, and research papers. It is widely
used in search engines, chatbots, sentiment analysis, machine translation, and
knowledge graphs.
2. Key Features of Semantic Analysis
Contextual Understanding: Determines meaning based on the relationship
between words.
Disambiguation: Differentiates between multiple meanings of a word (e.g.,
"bank" as a financial institution vs. a riverbank).
Named Entity Recognition (NER): Identifies names of people, places, companies,
etc.
Sentiment Detection: Understands emotional tone behind words.
Topic Modeling: Identifies main topics in large datasets.
3. Types of Semantic Analysis
A. Lexical Semantics
Focuses on individual words and their meanings.
Examines synonyms, antonyms, homonyms, hypernyms (broader categories), and
hyponyms (specific subcategories).
Example: Understanding that "big" and "large" have similar meanings in a given
context.
B. Compositional Semantics
Focuses on sentence-level meaning by analyzing grammatical structure and
relationships between words.
Example: The phrase "The cat sat on the mat" conveys different meaning than
"The mat sat on the cat".
4. Techniques Used in Semantic Analysis
A. Named Entity Recognition (NER)
Identifies important names, locations, dates, and organizations in text.
Example: "Apple is launching a new iPhone in California" → (Apple: Company,
iPhone: Product, California: Location).
B. Word Sense Disambiguation (WSD)
Differentiates between multiple meanings of a word based on context.
Example: "I went to the bank to deposit money" vs. "The boat reached the bank
of the river."
C. Relationship Extraction
Identifies relationships between different entities in a text.
Example: "Elon Musk is the CEO of Tesla." (Identifies CEO as a relationship
between Elon Musk and Tesla).
D. Sentiment Analysis
Determines whether a piece of text conveys positive, negative, or neutral
emotions.
Example: "I love this movie" (Positive) vs. "This product is terrible" (Negative).
E. Latent Semantic Analysis (LSA)
Identifies hidden relationships between words in a large dataset using
mathematical techniques like Singular Value Decomposition (SVD).
Example: Analyzing customer reviews to detect frequently occurring topics.
5. Applications of Semantic Analysis in Big Data
A. Search Engines (Google, Bing, etc.)
Helps improve search accuracy by understanding intent behind queries.
Example: Searching “best laptops for students” provides educational laptops
rather than all laptops.
B. Chatbots and Virtual Assistants (Siri, Alexa, etc.)
Understands and responds to human queries with context-aware answers.
Example: A chatbot understanding "I need a flight to New York next Monday" and
booking accordingly.
C. Sentiment Analysis for Business and Marketing
Analyzes customer reviews, social media comments, and feedback to determine
public opinion.
Example: Tracking Twitter reactions to a product launch.
D. Fraud Detection and Cybersecurity
Identifies suspicious patterns and phishing attempts by analyzing emails and
messages.
Example: Detecting spam emails offering fake discounts.
E. Healthcare and Medical Research
Extracts relevant medical information from research papers and patient records.
Example: Identifying symptoms and disease relationships from doctor’s notes.
6. Tools and Technologies Used
NLP Libraries: SpaCy, NLTK, BERT, Word2Vec
Sentiment Analysis Tools: VADER, TextBlob, IBM Watson
Search Engines: Elasticsearch, Apache Solr
Big Data Platforms: Apache Hadoop, Spark NLP
AI-based Chatbots: Google Dialogflow, Microsoft Bot Framework
7. Challenges in Semantic Analysis
Ambiguity: Words with multiple meanings can lead to misinterpretation.
Context Sensitivity: Cultural and regional variations in language.
Large-Scale Processing: Analyzing massive datasets requires high computational
power.
Evolving Language Trends: Slang, emojis, and new words require constant
updates.
Visual Analysis
1. Introduction to Visual Analysis
Visual Analysis is the process of extracting, interpreting, and analyzing information from
images, videos, graphs, and other visual data formats. Unlike traditional data analysis,
which focuses on numerical or textual data, visual analysis helps in identifying patterns,
trends, and insights through graphical representation.
In the context of Big Data, where massive volumes of images, videos, and infographics
are generated daily, visual analysis techniques play a crucial role in areas like computer
vision, medical imaging, surveillance, social media monitoring, and business
intelligence.
2. Importance of Visual Analysis in Big Data
Better Understanding of Complex Data: Converts large datasets into easy-to-
understand visual representations.
Pattern Recognition: Identifies hidden trends that may not be visible in raw data.
Real-Time Decision Making: Helps organizations make quick and informed
decisions based on live visual data.
Enhanced User Experience: Provides interactive dashboards and reports for
better insights.
3. Types of Visual Analysis
A. Image and Video Analysis
Focuses on extracting information from images and videos using computer vision
techniques.
Example: Facial recognition in security systems, medical imaging (X-rays, MRI
scans).
B. Graphical Data Visualization
Represents structured and unstructured data visually using graphs, charts, maps,
and dashboards.
Example: Stock market trends, customer analytics, heatmaps in business
intelligence.
C. Interactive Visualization
Allows users to explore and manipulate data visually in real-time dashboards.
Example: Google Analytics, Tableau, Microsoft Power BI.
D. 3D and Augmented Reality (AR) Visualization
Used in gaming, simulations, architecture, and scientific research.
Example: 3D medical scans, AR in retail and e-commerce.
4. Techniques Used in Visual Analysis
A. Image Processing and Computer Vision
Feature Extraction: Identifies edges, colors, shapes, and textures in images.
Object Detection: Recognizes objects in images/videos (e.g., self-driving cars
detecting pedestrians).
Facial Recognition: Identifies individuals in photos or surveillance footage.
B. Data Visualization Techniques
Charts and Graphs: Line charts, bar charts, histograms, scatter plots.
Heatmaps: Shows intensity variations across a geographical area or dataset.
Network Graphs: Displays relationships between entities (e.g., social network
connections).
C. Machine Learning & AI in Visual Analysis
Deep Learning Models: Convolutional Neural Networks (CNNs) for image
recognition.
Natural Language Processing (NLP) + Visual Data: Combining text analysis with
images (e.g., image captions).
Anomaly Detection: Detecting fraud or unusual patterns in visual data (e.g.,
security surveillance).
D. Augmented Reality (AR) and Virtual Reality (VR)
AR Applications: Virtual try-on in e-commerce, AR navigation in maps.
VR Simulations: Used in medical training, flight simulations, and immersive data
exploration.
5. Applications of Visual Analysis in Big Data
A. Healthcare and Medical Imaging
MRI and X-ray Analysis: AI-assisted diagnosis of diseases from medical scans.
Microscopic Image Analysis: Identifying bacteria, viruses, or abnormalities in
biological samples.
B. Business Intelligence and Market Analytics
Customer Behavior Tracking: Heatmaps and dashboards in e-commerce (e.g.,
Amazon, Flipkart).
Sales Forecasting: Interactive charts showing sales trends over time.
C. Social Media and Sentiment Analysis
Trend Analysis: Identifying viral content from images, memes, and videos.
Fake News Detection: Analyzing manipulated images or deepfake videos.
D. Security and Surveillance
Facial Recognition Systems: Used in airports, public places, and smart homes.
Anomaly Detection: Identifying suspicious activities from CCTV footage.
E. Agriculture and Remote Sensing
Satellite Image Analysis: Monitoring crop health, deforestation, and climate
change.
Drone-Based Analysis: Assessing soil conditions and farm productivity.
6. Tools and Technologies Used
Image and Video Processing: OpenCV, TensorFlow, PyTorch
Data Visualization Tools: Tableau, Power BI, Google Data Studio
AI-Based Analysis: Google Vision API, IBM Watson Visual Recognition
Geospatial Analysis: ArcGIS, Google Earth Engine
7. Challenges in Visual Analysis
Handling Large-Scale Data: Processing high-resolution images and videos
requires powerful computational resources.
Data Privacy Issues: Facial recognition and surveillance raise ethical concerns.
Complexity in Interpretation: Requires expertise to analyze and interpret visual
data accurately.
Real-Time Processing Needs: AI-driven applications must process data instantly
(e.g., self-driving cars).
Introduction to Hadoop
Hadoop is an open-source framework developed by Apache for storing and processing
massive amounts of data in a distributed and fault-tolerant manner. It is designed to
handle Big Data efficiently by breaking down large datasets and processing them in
parallel across multiple nodes in a cluster. Hadoop is widely used in industries such as e-
commerce, finance, healthcare, and social media for large-scale data analysis.
Key Features of Hadoop
Scalability – Can expand by adding more nodes without major reconfiguration.
Fault Tolerance – Data is replicated across multiple nodes, ensuring no data loss.
Cost-Effective – Runs on commodity hardware, reducing infrastructure costs.
Flexibility – Handles structured, semi-structured, and unstructured data.
High Availability – Even if some nodes fail, data processing continues seamlessly.
Core Components of Hadoop
1. Hadoop Distributed File System (HDFS)
o A distributed storage system that splits large files into smaller blocks and
stores them across multiple machines.
o Uses a master-slave architecture, where the NameNode manages metadata
and DataNodes store actual data.
o Ensures fault tolerance through data replication across different nodes.
2. MapReduce
o A programming model for processing large-scale data in parallel.
o Works in two phases:
Map Phase – Breaks data into key-value pairs and distributes it for
processing.
Reduce Phase – Aggregates and summarizes the results.
o Efficient for batch processing but slower compared to newer technologies like
Spark.
3. YARN (Yet Another Resource Negotiator)
o Manages system resources and job scheduling.
o Allows multiple applications (like Spark, Hive, etc.) to run on the same
Hadoop cluster.
4. Hadoop Common
o Provides essential libraries and utilities required for all other Hadoop
modules.
Hadoop Ecosystem and Tools
Hadoop is not just a single framework; it includes a variety of tools that enhance its
functionality:
Hive – Provides an SQL-like interface to query large datasets stored in Hadoop.
Pig – A scripting language that simplifies complex data transformation tasks.
HBase – A NoSQL database that supports real-time data access.
Spark – An in-memory processing framework that is much faster than MapReduce.
Sqoop – Helps transfer data between Hadoop and relational databases.
Flume – Collects and moves large amounts of log data into Hadoop.
Applications of Hadoop
Hadoop is widely used across various industries for data-driven decision-making:
Social Media – Platforms like Facebook and Twitter use Hadoop to analyze user
behavior.
E-Commerce – Helps track customer preferences and improve product
recommendations.
Finance – Used for fraud detection, risk analysis, and real-time transaction
monitoring.
Healthcare – Assists in processing patient records, genomic data, and medical
imaging.
Smart Cities – Analyzes sensor data for traffic management and energy optimization.
Advantages of Hadoop
Handles Large-Scale Data – Can process petabytes of data efficiently.
Cost-Effective – Runs on low-cost hardware, reducing IT expenses.
Open Source – Constantly evolving with community support.
Supports Multiple Data Types – Works with structured, semi-structured, and
unstructured data.
Parallel Processing – Divides workload across multiple nodes, ensuring faster
execution.
Challenges of Hadoop
Complex Setup and Maintenance – Requires expertise for installation and
configuration.
Security Issues – As a distributed system, data security and access control need to be
managed carefully.
High Resource Consumption – Running Hadoop clusters demands significant
computational power and storage.
Not Suitable for Real-Time Processing – MapReduce is batch-oriented and slower
compared to Spark.
MapReduce: A Distributed Data Processing Model
MapReduce is a programming model and processing framework in Hadoop that enables
parallel computation of large datasets across a distributed cluster of computers. It
follows a divide-and-conquer approach where data is processed in two main stages:
Map and Reduce.
How MapReduce Works
1. Map Phase
o The input dataset is split into smaller chunks and distributed across multiple
nodes.
o Each node processes its assigned data and converts it into intermediate key-
value pairs.
2. Shuffle & Sort Phase
o The intermediate results are shuffled and sorted to group similar keys
together.
3. Reduce Phase
o The grouped key-value pairs are processed, aggregated, or summarized to
produce the final output.
Example of MapReduce
If we need to count the number of occurrences of words in a document:
Map Function – Reads the text and outputs (word, 1) pairs.
Shuffle & Sort – Groups similar words together.
Reduce Function – Sums up the counts for each word to get the final result.
Advantages of MapReduce
Enables processing of large-scale data in a distributed manner.
Provides fault tolerance by replicating data across nodes.
Works well for batch processing tasks like log analysis and ETL processing.
Limitations of MapReduce
Not suitable for real-time analytics due to batch-oriented processing.
High disk I/O overhead as data is frequently read and written to disk.
Complex to develop and maintain compared to modern data frameworks like Spark.