0% found this document useful (0 votes)

44 views5 pages

Big Data Management

The document summarizes an analysis of airline data from June 2003-2004 using MapReduce to calculate the total number of cancelled flights from each major US airport. The MapReduce algorithm involves mapping input data to key-value pairs and then reducing the pairs to form an output result. The analysis found that Atlanta airport (ATL) had the highest number of cancelled flights in the year, which authorities could use to identify inefficiencies and ways to reduce cancellations.

Uploaded by

Keshav Chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views5 pages

Big Data Management

Uploaded by

Keshav Chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Big Data Management

Airline Data Analysis

Submitted to:
Dr. Ankush Maind
Assistant Professor
LMTSM

Master of Business Administration Department

Abstract
The analysis is based on data all about flights in the united states, including information about
the number, length, and type of delays. The data is reported for individual months at every major
airport for every carrier.

We use the MapReduce function to calculate the total number of flights being cancelled from
each airport from June 2003-2004 , in which the key and value attributes were taken.
MapReduce is a processing technique and a program model for distributed computing based on
java. The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map
takes a set of data and converts it into another set of data, where individual elements are
broken down into tuples (key/value pairs). Secondly, reduce the task, which takes the output
from a map as an input and combines those data tuples into a smaller set of tuples. As the
sequence of the name MapReduce implies, the reduce task is always performed after the map
job.
The key represents the airport name/code and value represents the flights cancelled in that
month.
The input data is taken from the excel/csv file attached to this mail and below is a snapshot of
the output we get after applying the map reducer function.

The Algorithm :-

Generally MapReduce paradigm is based on sending the computer to where the data resides.
MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce
stage.

Map stage − The map or mapper’s job is to process the input data. Generally the input data is in
the form of file or directory and is stored in the Hadoop file system (HDFS). The input file is
passed to the mapper function line by line. The mapper processes the data and creates several
small chunks of data.

Reduce stage − This stage is the combination of the Shuffle stage and the Reduce stage. The
Reducer’s job is to process the data that comes from the mapper. After processing, it produces
a new set of output, which will be stored in the HDFS.

During a MapReduce job, Hadoop sends the Map and Reduce tasks to the appropriate servers
in the cluster.

The framework manages all the details of data-passing such as issuing tasks, verifying task
completion, and copying data around the cluster between the nodes.

Most of the computing takes place on nodes with data on local disks that reduces the network
traffic.
After completion of the given tasks, the cluster collects and reduces the data to form an
appropriate result, and sends it back to the Hadoop server.

MapReduce Algorithm
Inputs and Outputs (Java Perspective)
The MapReduce framework operates on <key, value> pairs, that is, the framework views the
input to the job as a set of <key, value> pairs and produces a set of <key, value> pairs as the
output of the job, conceivably of different types.

The key and the value classes should be in serialized manner by the framework and hence,
need to implement the Writable interface. Additionally, the key classes have to implement the
Writable-Comparable interface to facilitate sorting by the framework. Input and Output types of a
MapReduce job − (Input) <k1, v1> → map → <k2, v2> → reduce → <k3, v3>(Output).

Terminology
PayLoad − Applications implement the Map and the Reduce functions, and form the core of the
job.

Mapper − Mapper maps the input key/value pairs to a set of intermediate key/value pair.

NamedNode − Node that manages the Hadoop Distributed File System (HDFS).

DataNode − Node where data is presented in advance before any processing takes place.
MasterNode − Node where JobTracker runs and which accepts job requests from clients.

SlaveNode − Node where Map and Reduce program runs.

JobTracker − Schedules jobs and tracks the assign jobs to Task tracker.

Task Tracker − Tracks the task and reports status to JobTracker.

Job − A program is an execution of a Mapper and Reducer across a dataset.

Task − An execution of a Mapper or a Reducer on a slice of data.

Task Attempt − A particular instance of an attempt to execute a task on a SlaveNode.

★ OUTPUT -

Snapshot of the output file.

Analysis :-

As we can see Atlanta(ATL) airport has the maximum number of flights being cancelled
in a year.
Thus, the authorities can use this information and from various factors to reduce their
inefficiencies from flights being cancelled and bring proper reconsideration or
amends/refunds so that in future people are not accustomed to these problems.

Learn SAP Basis in 24 Hours
From Everand
Learn SAP Basis in 24 Hours
Alex Nordeen
4.5/5 (2)
LeprohonRJ TheOfferingFormulaintheFIP JEA1990
No ratings yet
LeprohonRJ TheOfferingFormulaintheFIP JEA1990
4 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
Big Data Analytics UNIT 3 Notets
No ratings yet
Big Data Analytics UNIT 3 Notets
12 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Bda Module 4
No ratings yet
Bda Module 4
34 pages
3.1.how Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.how Map Reduce Works & 3.2 Anatomy
11 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
4 pages
Unit-2 (MapReduce-II)
No ratings yet
Unit-2 (MapReduce-II)
11 pages
Unit 2 Topic 5 Developing A Map Reduce Application
No ratings yet
Unit 2 Topic 5 Developing A Map Reduce Application
52 pages
132 P16cse5a-P16ite3a 2020052706582977
No ratings yet
132 P16cse5a-P16ite3a 2020052706582977
15 pages
Map Reduce Tutorial-1
No ratings yet
Map Reduce Tutorial-1
7 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
Unit 4 1
No ratings yet
Unit 4 1
12 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
UNIT 3bda
No ratings yet
UNIT 3bda
16 pages
Module 3 (Part-1) - Big Data
No ratings yet
Module 3 (Part-1) - Big Data
46 pages
Unit - Iii
No ratings yet
Unit - Iii
38 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
Big Data Unit - 3
No ratings yet
Big Data Unit - 3
7 pages
05 Movies Data Analysis Using Mapreduce
No ratings yet
05 Movies Data Analysis Using Mapreduce
20 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
21 pages
18mcs35e U4
No ratings yet
18mcs35e U4
7 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
Big Data
No ratings yet
Big Data
120 pages
Big Data BCA Unit4
No ratings yet
Big Data BCA Unit4
9 pages
Unit 3 BDT
No ratings yet
Unit 3 BDT
42 pages
Unit-2 Map Reduce Notes
No ratings yet
Unit-2 Map Reduce Notes
28 pages
DSBDA Manual Assignment 11
No ratings yet
DSBDA Manual Assignment 11
6 pages
Unit - III
No ratings yet
Unit - III
37 pages
Map Reduce
No ratings yet
Map Reduce
40 pages
BDA Notes
No ratings yet
BDA Notes
39 pages
5-Yarn Architecture Components Workflow Scheduling-22-01-2025
No ratings yet
5-Yarn Architecture Components Workflow Scheduling-22-01-2025
26 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Bda Unit 4
No ratings yet
Bda Unit 4
20 pages
Hadoop
No ratings yet
Hadoop
34 pages
Unit 3
No ratings yet
Unit 3
27 pages
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
No ratings yet
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
12 pages
Ditp - ch2 4
No ratings yet
Ditp - ch2 4
2 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
59 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
Unit4 Fos
No ratings yet
Unit4 Fos
7 pages
HadoopMapreduce Summerization
No ratings yet
HadoopMapreduce Summerization
24 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
26 pages
Top Answers To Map Reduce Interview Questions
No ratings yet
Top Answers To Map Reduce Interview Questions
6 pages
CC Unit-7
No ratings yet
CC Unit-7
16 pages
S MapReduce Types Formats Features 06
No ratings yet
S MapReduce Types Formats Features 06
26 pages
Bda 03
No ratings yet
Bda 03
10 pages
Anatomy of A MapReduce Job
No ratings yet
Anatomy of A MapReduce Job
5 pages
Term Paper Java
No ratings yet
Term Paper Java
14 pages
Map Reduce
No ratings yet
Map Reduce
35 pages
Chapter4 - MapReduce
No ratings yet
Chapter4 - MapReduce
29 pages
Unit 2
No ratings yet
Unit 2
12 pages
BDA - Mapreduce 31 01 2025
No ratings yet
BDA - Mapreduce 31 01 2025
48 pages
3D Hardware design:: Software applications for GPU
From Everand
3D Hardware design:: Software applications for GPU
S Mathioudakis
No ratings yet
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
From Everand
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
Karl Josef Hensel
No ratings yet
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
Unit 3
No ratings yet
Unit 3
80 pages
System Documentation
No ratings yet
System Documentation
6 pages
Priya
No ratings yet
Priya
2 pages
CMT 102
No ratings yet
CMT 102
2 pages
Harika - C - Java - Fullstack Template
No ratings yet
Harika - C - Java - Fullstack Template
4 pages
NMAT Question Bank
No ratings yet
NMAT Question Bank
36 pages
AASTU Tarikwa and Naol - Team FML Project Report Draft 3
No ratings yet
AASTU Tarikwa and Naol - Team FML Project Report Draft 3
39 pages
Electronic Theses and Dissertations Digital Library
100% (1)
Electronic Theses and Dissertations Digital Library
5 pages
International Journal of Web & Semantic Technology (IJWesT)
No ratings yet
International Journal of Web & Semantic Technology (IJWesT)
3 pages
ML-CBT July24
No ratings yet
ML-CBT July24
3 pages
Development of Online College Yearbook
No ratings yet
Development of Online College Yearbook
6 pages
CTS Possible Technical Interview Questions
No ratings yet
CTS Possible Technical Interview Questions
11 pages
CSC401 Database Management ECU Final Part 1
No ratings yet
CSC401 Database Management ECU Final Part 1
26 pages
CHAPTER-20-PRACTICE-APPLICATIONS-HARDCOPY Final
No ratings yet
CHAPTER-20-PRACTICE-APPLICATIONS-HARDCOPY Final
8 pages
Resume Abdelghani Benazzouz
No ratings yet
Resume Abdelghani Benazzouz
4 pages
5.2 Binary Heap An Min Heap
No ratings yet
5.2 Binary Heap An Min Heap
4 pages
6.interview Questions
No ratings yet
6.interview Questions
2 pages
A Survey of Large Language Models
No ratings yet
A Survey of Large Language Models
97 pages
Compiler Design
No ratings yet
Compiler Design
80 pages
SQL Example
No ratings yet
SQL Example
11 pages
2025 02 11 06 15 09 F4 75K 2023 Lecturer (Female) Urdu
No ratings yet
2025 02 11 06 15 09 F4 75K 2023 Lecturer (Female) Urdu
4 pages
Sai Charan de
No ratings yet
Sai Charan de
9 pages
APAR Timelines For Officers Other Than Group A 2024-25
No ratings yet
APAR Timelines For Officers Other Than Group A 2024-25
12 pages
Windows Presentation Foundation
No ratings yet
Windows Presentation Foundation
48 pages
Social Media Web and Text Analytics
No ratings yet
Social Media Web and Text Analytics
10 pages
Bug Tracking System Full Document
No ratings yet
Bug Tracking System Full Document
45 pages
Building An Ontology With The Ontology Development 101 Methodology
No ratings yet
Building An Ontology With The Ontology Development 101 Methodology
57 pages
ChatGPT A Comprehensive Review On Background, Applications, Key
No ratings yet
ChatGPT A Comprehensive Review On Background, Applications, Key
34 pages
Chapter 5 Machine Learning Basics
100% (1)
Chapter 5 Machine Learning Basics
32 pages

Big Data Management

Uploaded by

Big Data Management

Uploaded by

Big Data Management

Airline Data Analysis

Master of Business Administration Department

SlaveNode − Node where Map and Reduce program runs.

Task Tracker − Tracks the task and reports status to JobTracker.

Job − A program is an execution of a Mapper and Reducer across a dataset.

Task − An execution of a Mapper or a Reducer on a slice of data.

Task Attempt − A particular instance of an attempt to execute a task on a SlaveNode.

Snapshot of the output file.

You might also like