Machine Learning for Flight Scheduling
Machine Learning for Flight Scheduling
Supervisor:
Reza Malekian
Examiner:
Johan Holmgren
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
Contact information
Author:
Mashhood Vandehzad
E-mail: vandehzad@gmail.com
Supervisor:
Reza Malekian
E-mail: reza.malekian@mau.se
Malmo University, Departament of Computer Science
Examiner:
Johan Holmgren
E-mail: johan.holmgren@mau.se
Malmo University, Departament of Computer Science
1|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
Abstract
While data is becoming more and more pervasive and ubiquitous in today’s
life, businesses in modern societies prefer to take advantage of using data,
in particular Big Data, in their decision-making and analytical processes to
increase their product efficiency. Software applications which are being uti-
lized in the airline industry are one of the most complex and sophisticated
ones for which conducting of data analyzing techniques can make many de-
cision making processes easier and faster. Flight delays are one of the most
important areas under investigation in this area because they cause a lot
of overhead costs to the airline companies on one hand and airports on the
other hand. The aim of this study project is to utilize different machine
learning algorithms on real world data to be able to predict flight delays for
all causes like weather, passenger delays, maintenance, airport congestion etc
in order to create more efficient flight schedules. We will use python as the
programming language to create an artifact for our prediction purposes. We
will analyse different algorithms from the accuracy perspective and propose
a combined method in order to optimize our prediction results.
2|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
Acknowledgement
The fulfillment of this research study would not have been possible without
the constant guidance and help of certain people.
I would also like to thank Nils Genell, Paria Aghaeifar and Niklas Nordin
from Aviolinx company for the opportunity of creating this research study
with them.
Last but certainly not the least, I would like to appreciate my parents, my
family and friends for their unconditional support and also for inspiring and
motivating me through my path.
3|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
Contents
1 Introduction 8
1.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Expected Results . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Research Methodology 12
2.1 Research Method . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Literature Review 16
3.1 Search Keywords . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Machine Learning (ML) . . . . . . . . . . . . . . . . . . . . . 17
3.4 Airport Slots . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5 Slot Message . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.6 Airport Slot Allocation . . . . . . . . . . . . . . . . . . . . . . 20
3.7 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Method 21
4.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.5 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
5|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
List of Figures
6|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
List of Acronyms
AI Artificial Intelligence
CM Combined Method
GFR Grand Father Rights
LR Linear Regression
ML Machine Learning
PA Prediction Accuracy
PE Prediction Error
RBF Radial Basis Function
SVM Support Vector Machine
SVR Support Vector Regression
7|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
Chapter 1
Introduction
We are living in a world where massive amounts of data get generated ev-
eryday and collections of data are growing exponentially. The process of
converting raw data to meaningful and utilizable information can be a so-
phisticated process in which technology is playing a very important role to-
day. Nowadays we can implement machine learning techniques on different
types of data for the purpose of creating new knowledge from data sources
that where not being used before.
Since we as humans want to give the learning ability of human brain to
machines and also these new techniques are becoming an inseparable part of
latest information systems developed world wide; Therefore, the implementa-
tion of machine learning techniques are one of the most preponderantly under
research areas these days, especially in software-intensive companies. One of
the areas in which machine learning is mostly being used is for prediction
purposes. Prediction of future events can help humans prepare themselves to
react more effectively and efficiently and makes the decision making process
more facile. Examples of software systems that prediction methods could be
favorable for is stock market predicting or airline industry flight scheduling
systems.
This thesis project has been made in collaboration with a company called
Aviolinx in Malmo Sweden. Their line of work is software manufacturing for
airline industry. Currently they have not utilized any machine learning tech-
8|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
1.2 Goals
The aim of this project is to study how effectively machine learning tech-
niques can contribute to airline industry scheduling systems by forecasting
flight delays. Due to the problems airport coordinators encounter every-
day with airline companies regarding flight delays and the continues need
for changing and revising flight gantt schedules. Therefore, we propose to
develop an artifact utilizing machine learning in order to predict future de-
lays. This artifact may lead to mitigation of flight delays’ related schedule
revising and consequently reduce costs. Since we will evaluate four different
algorithms, we will analyse and compare the preciseness of each of the im-
plemented algorithms by utilizing graphs, charts and tables. So the reader
9|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
of this research study would realize which algorithms are most suitable to be
utilized in flight delay prediction systems.
1.4 Motivation
The most important motivation for this project is to help airline industry
software manufacturers utilizing machine learning [1] algorithms inside their
software products enabling them for prediction capabilities, so their cus-
tomers which are airline companies would be able to have near future fore-
casts and use this knowledge to prevent delays by different causes in their
schedules. Delays can impose variety of costs to airline companies which in
fact by reducing small amounts of delays, huge amounts of company revenue
would be affected.
10|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
1.5 Outline
This master thesis consists of six chapters. In the first chapter we introduce
this thesis project, the process of completing it and the goals and motiva-
tions for creating this research project in general. In the second chapter we
explain details about the research methodology chosen for this thesis and
how each stage of the process of this study is taken and we elaborate our
research questions and the goals for making the artifact and also the results
which we expect to achieve with this project. In the third chapter we will
have a literature review in which we divide our investigation into two main
subsections. In the first subsection we elaborate machine learning and all the
algorithms related to this thesis project and in the second subsection we in-
vestigate some of the related terms and conditions of the aviation and airline
industry. In the forth chapter we elaborate the design and implementation
of our application and explain how it performs via graphs so the audience of
this research study can visualize our methods and results. In the fifth chapter
we will rigorously analyse our results and demonstrate different comparisons
of the results for this study paper. Finally in the sixth chapter we discuss the
conclusions and the future work that can be done to optimize and improve
this project.
11|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
Chapter 2
Research Methodology
12|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
academic work on how flight delays could affect airline industry revenues and
costs.
13|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
2.1.3 Evaluation
As proposed by Hevner [2] in an information systems research study “The
utility, quality, and efficacy of a design artifact must be rigorously demon-
strated via well-executed evaluation methods” and that the accuracy of an
IT-Artifact can be evaluated as the quality attribute. Since design is an it-
erative and progressive process, feedback from the evaluation phase plays a
key role for the final artifact to be satisfactory for the problems it was meant
to solve [2].
Therefore after finishing with the development of our prototype we need to
test it rigorously to be able to analyze the prediction results. In this phase of
our project we will feed the available flight delays data to our application and
compare the outcomes of each algorithm by showing their strength in future
prediction using bar charts with percentage of accuracy. Considering that
we have some limitations accessing the real flight delays data from Aviolinx
we decided to assess this artifact with the limited amount of data which is
related to two different airline companies and show the reader of this paper
the results in a smooth and coherent way.
14|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
15|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
Chapter 3
Literature Review
In the first part of our literature review we will explain data, machine
learning and the algorithms utilized in our research study.
3.2 Data
Data sets grow exponentially in size everyday because of many different meth-
ods which are now available to collect different types of data. Mobile devices,
aerial sensory, software logs, cameras, microphones and many more ways
that we use gathering data has resulted in the collection of massive amounts
of data. “There are 2.5 quintillion bytes of data created every day, and
this number keeps increasing exponentially. The world’s technological ca-
pacity to store information has roughly doubled about every 3 years since
the 1980s” [7]. Large amounts of data in different environments like for in-
stance financial or medical areas, are created at high costs and unfortunately
deleted because of lack of required technologies to store them. These are
16|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
valuable data which could be utilized during the production of new features
for new software. This has become one of the biggest and most costly chal-
lenges for companies to come up with new solutions for storing huge amounts
of usable data. However, with the advancements of technology now we can
use new architectures and mechanisms to store and access valuable data for
software optimization purposes [7].
All the existing machine learning algorithms as shown in figure 3.1 [10]
are derived from two main strategies called supervised and unsupervised.
Supervised strategy is utilized when the training set comprises the data and
the authentic output of the process that uses that data. An example can be
when a set of problems and their solutions are given to a student in order to
solve future problems alike in that area. However, the unsupervised strategy
is used when the training set comprises the data but it does not contain
solutions for it and the computer must resolve the problem by itself. An
example can be when a set of patterns are given to a student and asking
them to reveal the underlying relations that generated those patterns.
17|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
18|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
functions called kernels [11]. When we want to utilize SVM for classification
purposes it is called SVC and when we want to use it for regression purposes
it is called SVR [12]. In our prototype we have utilized three different types
of linear and non linear SVR kernels. SVR RBF Which is a nonlinear kernel.
SVR Lin which is a linear kernel. SVR Poly which is a nonlinear polynomial
kernel.
In the second part of our literature review we will explain some of the terms
and conditions used in the airline industry.
19|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
agree on one slot for one season. An example of a slot message is shown in
figure 3.2
3.7 Statistics
A study in 2010 shows that the total cost for the US airline industry in
2007 related to flight delays was $32.9 billion. This amount is consisted
of $8.3 billion related to increasing of the expenses for the crew, fuel and
maintenance, $16.7 billion is related to passenger time lost, $3.9 billion is
related to demand loss for the passengers as the result of delays. And $4
billion is related to indirect effects of the flight delays on US economy as
reduction of US GDP. These statistics illustrate the significance of the flight
delays problems and the need for more efficient investigations and methods
for the purpose of reducing flight delays [13].
20|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
Chapter 4
Method
The prototype developed for this research study predicts flight delays by
utilizing Machine Learning. With utilizing this artifact we will be able to
convert raw flight delay data into usable information by machine learning
techniques, that can add more efficiency to a flight scheduling software system
and this way we answer our RQ1 on how to use flight delays data. With this
method, as we have discussed with Aviolinx staff, they have the option to
use this artifact in two different ways. They can either integrate it into their
software in the future or they can have this as a stand alone reporting system.
Either way they can implement these predictions into slot messages in which
airport coordinator and airline company can have an idea about upcoming
month in order to schedule in a more accurate state.
4.1 Description
This prototype is written in python programming language [14]. It uses
different python libraries to be able to predict delays based on four different
algorithms and then plots the models into graphs for visualization purposes.
We implemented our code into Google Colaboratory (also known as Colab)
which is a cloud service based on Jupyter IDE giving us the advantages of
investigating the purpose of our study without concerning about required
configurations, GPU access and sharing our code [15].
21|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
4.2 Datasets
We have received our real world flight delay datasets from Aviolinx for which
we experienced many limitations considering their regulations and also this
process consumed a lot of time. The two datasets that we use in our project
are related to two different airline companies which are using Aviolinx soft-
ware product. After requesting the real world datasets from Aviolinx we
ultimately were authorized to use 6 years flight delay data from two airline
companies from 2014 to 2019 anonymously and we did accordingly. There-
fore, since the data is real and we are not allowed to mention the names
of these companies, we will use ”Company A” and ”Company B” instead
of their names. Our datasets represent aggregated flight delay data by all
causes for each month since January 2014 which was the limit for us to access
the data. So we have access to all the data from January 2014 to December
2019 which we will use to evaluate our artifact. As an example if the data
shows 12.5 minutes for December 2016 it shows that the aggregated number
of minutes by all causes such as passenger delays, aircraft maintenance, air-
port congestion, weather etc. is 12.5 minutes for that specific airline company
in December 2016.
4.3 Libraries
The libraries utilized in our code are pandas [16], numpy [17], SKlearn [4],
Matplotlib [18] and Google colab [15]. We implemented our machine learn-
ing algorithms in the code utilizing three different Support Vector Regression
(SVR) Methods and Linear Regression from SKlearn library in order to train
our dataset and predict future delays. The main reason for using four algo-
rithms was to be able to compare the results. In order to load our dataset
we utilized Google colab library [15]. Finally when the models are set we
utilized Matplotlib library [18] to plot the models into graphs.
22|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
4.4 Algorithms
In this artifact We have utilized three types of support vector regression
linear and non-linear kernels and a linear regression algorithm on our data
for our accuracy comparison purposes. The algorithms are as follows:
After we load the dataset in our application we create the models and
train the models utilizing the flight delays data with previously mentioned
algorithms. After the models are trained we plot them into a graph and
show the results. For the purpose of prediction we load all the data from the
first month of 2014 to predict flight delays of the year 2019 so for example in
order to predict the first month of 2019 we investigate the data for 60 month
meaning that we have 60 data points.
4.5 Models
23|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
illustrated in figure 4.1 our models are trained with all the data from Jan-
uary 2014 till August 2019 and predict September 2019 for company A. The
prediction result with SVR RBF is estimated 12.43 minutes, with SVR Lin
is estimated 13.55 minutes, with SVR Poly is estimated 14.19 minutes and
with Linear Regression is estimated 13.95 and the real value is 13.1 minutes.
24|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
25|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
Equation 1:
We will demonstrate and analyse all the prediction results for each month of
2019 related to both companies in the next chapter. Also we will compare
the prediction capability of each algorithm and the combined method.
26|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
Chapter 5
27|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
28|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
to the real values of 2019 with the blue color. As illustrated in this figure
SVR RBF has the highest number of closest predictions with 6 close predic-
tions out of 12 which becomes 50% of close predictions in year 2019.
29|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
respectively are 0% and 8.3% which makes the SVR RBF predictions the
closest.
30|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
31|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
32|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
the testing purposes. We have used the Avg column of figures 5.4 and 5.5 to
create figure 5.6 prediction accuracy chart.
As illustrated above in Figure 5.6 the prediction accuracy by percentage
with SVR RBF for the companies A and B respectively are 88% and 87% and
with SVR Lin respectively are 79% and 85% and with SVR Poly respectively
are 75% and 83% and for Linear Regression respectively are 77% and 85%
which makes the SVR RBF the most precise prediction algorithm amongst
all.
33|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
5.3 are SVR RBF and SVR Lin and by utilizing weighted average method as
explained in the previous chapter equation 1 between the two algorithms’ re-
sults we will have new results for each month. We have tested many numbers
to investigate and discover the most proper weights for each of the selected
algorithms to create Equation 3 for the purpose of optimizing our results.
Our proposed method to calculate values for each month is Equation 3 as
follows:
Equation 3:
Then after we calculate each month value with equation above, we calculate
the prediction error (PE) and prediction accuracy (PA) for each of the values
compared to the real values of 2019 for both companies using equation 2.
Afterwards we calculate the average percentage of accuracy for this new
method and the new results are shown in figure 5.7 in which the CM column
is calculated with equation 4 as follows:
Equation 4:
34|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
35|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
36|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
5.5 Challenges
The most important challenge we encountered developing this artifact was
the limitations in order to access the data we needed to test it. According
to the regulations between Aviolinx and their customers which the data is
related to, we ultimately were authorized to use 6 years flight delay data from
two airline companies from 2014 to 2019 anonymously and we did accordingly.
5.6 Contribution
This research study demonstrates evaluation and investigation of the predic-
tion capabilities of the presented algorithms and proposed method for the
flight delays with the specific focus on slot scheduling efficiency. From the
business perspective, the contribution of this study is that it can be utilized
for adding machine learning prediction capability with the investigated meth-
ods to the slot scheduling system of the aviation software products in order
to reduce delays with an insight into future delays. Also from the research
community perspective, analysis, the comparison results and the methods
presented can be utilized for more investigations in order to develop more
accurate prediction methods for flight delays to use into slot scheduling sys-
tems.
37|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
Chapter 6
38|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
the prediction results from this research paper and the methods proposed,
within slot messages, airport coordinators and airline companies would be
able to plan their schedules in a more competent state and efficacious way.
Future investigations are necessary with larger amounts of data, more test
cases and more assessments of the methods proposed in this research study.
The correlation between the data from larger datasets may result in different
weights in our CM method. All in all, the more development and opti-
mization accomplished in this area, the more economically lucrative airline
information systems become.
39|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
References
[2] Alan R Hevner, Salvatore T March, Jinsoo Park, and Sudha Ram. De-
sign science in information systems research. MIS quarterly, pages 75–
105, 2004.
40|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
[8] P. P. Shinde and S. Shah. A review of machine learning and deep learning
applications. In 2018 Fourth International Conference on Computing
Communication Control and Automation (ICCUBEA), pages 1–6, 2018.
[9] S. Angra and S. Ahuja. Machine learning and its applications: A review.
In 2017 International Conference on Big Data Analytics and Computa-
tional Intelligence (ICBDAC), pages 57–60, 2017.
[12] J. Wang, X. Chen, and S. Guo. Bus travel time prediction model with
support vector regression. In 2009 12th International IEEE Conference
on Intelligent Transportation Systems, pages 1–6, 2009.
[13] Michael Ball, Cynthia Barnhart, Martin Dresner, Mark Hansen, Kevin
Neels, Amedeo Odoni, Everett Peterson, Lance Sherry, Antonio Trani,
Bo Zou, Rodrigo Britto, Doug Fearing, Prem Swaroop, Nitish Uman,
Vikrant Vaze, and Augusto Voltes. Total delay impact study: A com-
prehensive assessment of the costs and impacts of flight delay in the
united states. 10 2010.
41|
Efficient flight schedules with utilizing Machine Learning
prediction algorithms
[19] Xin Yan and Xiao Gang Su. Linear Regression Analysis: Theory and
Computing. World Scientific Publishing Co., Inc., USA, 2009.
42|