Fare Prediction
Fare Prediction
https://doi.org/10.1007/s10489-021-02602-0
Abstract
Airfare price prediction is one of the core facilities of the decision support system in civil aviation, which includes
departure time, days of purchase in advance and flight airline. The traditional airfare price prediction system is limited
by the nonlinear interrelationship of multiple factors and fails to deal with the impact of different time steps, resulting in
low prediction accuracy. To address these challenges, this paper proposes a novel civil airline fare prediction system with
a Multi-Attribute Dual-stage Attention (MADA) mechanism integrating different types of data extracted from the same
dimension. In this method, the Seq2Seq model is used to add attention mechanisms to both the encoder and the decoder.
The encoder attention mechanism extracts multi-attribute data from time series, which are optimized and filtered by the
temporal attention mechanism in the decoder to capture the complex time dependence of the ticket price sequence. Extensive
experiments with actual civil aviation data sets were performed, and the results suggested that MADA outperforms airfare
prediction models based on the Auto-Regressive Integrated Moving Average (ARIMA), random forest, or deep learning
models in MSE, RMSE, and MAE indicators. And from the results of a large amount of experimental data, it is proven that
the prediction results of the MADA model proposed in this paper on different routes are at least 2.3% better than the other
compared models.
Keywords Civil airline fare prediction · Time series · Attention mechanism · LSTM
1 Introduction accuracy of 95%, and 60% of its push messages tell its
consumers that it is not the optimal time to order tickets yet.
Air travels are becoming more and more popular in China, Ticket price forecasts are of great reference value for the
and numerous online booking channels for aircraft tickets aviation industry. Ticket prices are determined by various
are now available. It is well-recognized that airlines make factors, such as airlines, days of early purchase, as well
decisions about aircraft ticket prices based on the time as the departure time and airport. Airlines can adjust their
of purchase. Airlines nowadays use complex strategies to ticket prices based on these factors to get the expected
dynamically allocate ticket prices, and these strategies take income for an effective pricing strategy [10, 19, 30].
into account a variety of financial, marketing, commercial, However, several factors can limit the accuracy of air
and social factors. Because of the high complexity of the ticket price forecasts. First of all, the price of air tickets is a
pricing model and the dynamic price changes, it is tricky random walk time series, which is affected by the purchase
for customers to buy tickets at the lowest price. Therefore, time and other related factors; Secondly, with the ARIMA
several applications have been developed recently to predict model, only simple non-stationarity type relationships can
the ticket price, thereby guiding customers to buy tickets be acquired, but predictions of conventional time series are
at the most appropriate time. Specifically, Hopper [23] non-linear and non-stationary. The time series data used for
is a relatively mature airfare forecast app, producing an prediction is generally required as regressive and periodic,
which is not the case with air ticket price forecasts. Finally,
ticket prices are affected by many uncertain factors, such
Jinguo You as the long-term impact from governmental regulations, the
jgyou@kust.edu.cn
short-term impact from the market and the weather, as well
as some unexpected or international events. One example
Extended author information available on the last page of the article. of such events is the novel coronavirus outbreak, which
5048 Z. Zhao et al.
led the entire international airline industry to experience a comparisons of experimental results, and ablation analy-
downturn. sis, are illustrated in Section 5. Finally, in Section 6, the
A linear quantile model [18] was proposed to predict the work in this paper and the direction for future research are
ticket prices in 2014. The model integrates four LR models summarized.
to obtain the best fitting effect, mainly to provide passengers
with unbiased information about whether to purchase tickets
or wait longer to provide better prices. Besides, Tziridis 2 Related work
et al. [28] used eight machine learning models to predict
ticket prices, including ANNs, RF, SVM, and LR, and Airfare prediction is essentially a time series forecast.
compared their results. Bagging Regression Tree was found The Auto-Regressive Integrated Moving Average (ARIMA)
to be the best model in their comparison, as it is stable method is a traditional method for time series prediction,
and unaffected by various input feature sets. Moreover, where AR and MA eliminate positive and negative
deep learning has also demonstrated great promise and correlations, respectively. AR and MA offset each other,
made significant progress in computer vision and natural and they usually contain two elements that can avoid
language processing. Neural networks [1, 13, 37], instead of overfitting. Gordiievych et al. [12] proposed to use the
traditional methods, have become one of the latest trends to ARIMA model to build a system that will help customers
predict airfare ticket prices. make purchase decisions by predicting the price of air
This paper proposes a novel strategy for predicting tickets. Besides, another idea called Facebook-prophet
air ticket prices based on the multi-attribute dual-stage [35] is similar to STL (Seasonal-Trend decomposition
attention (MADA) mechanism to address this problem. procedure based on Loess) decomposition, which can divide
Besides, the Seq2Seq neural network is adopted to the time-series signals into seasonal, trend, and residue
encode and decode the input multi-dimensional fare-related ones. STL decomposition, as the name suggests, is better
attributes. Moreover, dual-stage attention mechanisms [31] suited to dealing with seasonal time series data than
are employed to extract effective information variables. The traditional time series models in terms of control and
mean square error loss function is used to train the real data interpretability.
to obtain the trend of fare changes. In Random forest (RF), multiple decision trees are inte-
Our main contributions are threefold. grated by ensemble learning. XGBoost (Extreme Gradient
Boosting Decision Tree) [5] is a machine learning algo-
1. An improved multi-attribute dual-stage attention mech-
rithm with higher robustness and efficiency, which can be
anism model is proposed. The first attention mechanism
applied to detect problems and predict time series. To aid
is performed by the encoder in the input time series,
the consumer decision-making process, Wohlfarth et al. [7]
which selects important weight information for the
integrated in the early phases of the cluster and used a
decoder layer. Subsequently, the decoder layer uses
variety of the latest supervised learning algorithms (classi-
such weight information in its temporal attention mech-
fication tree (CART) and RF). They then utilize CART to
anism to produce the final prediction outputs.
understand meaningful rules and RF to provide information
2. Various major models for airfare prediction were
about each function’s relevance. To compare relative val-
compared on real data sets. The results showed that the
ues with the total average price, Ren et al. [32] proposed
MADA model outperformed others in MSE, RMSE,
utilizing LR, Naive Bayes, Softmax regression, and SVMs
and MAE indicators.
to develop a prediction model and categorize the ticket
3. Finally, the accuracy of civil aviation ticket price pre-
price into five bins (60% to 80%, 80% to 100%, 100% to
diction was compared among different prediction mod-
120%, and so on). The models were built using over 9,000
els, and the influence extents of different data attributes
data points, comprising six features (such as the start of
when the proposed model was in different hidden layers
the departure week, the date of the price quote, the num-
were also analyzed. In terms of RMSE, MSE, and MAE
ber of stops in the itinerary, and etc). Using the LR model,
indicators, the MADA model outperformed the variant
the authors reported the best training error rate of around
and benchmark models.
22.9%. Instead, the prices were classified as “higher” or
The rest of the paper is organized as follows. Section 2 “lower” than the average using an SVM classification
introduces the related previous works about civil avia- model.
tion fare prediction, and Section 3 introduces the relevant It has been used to predict crude oil prices [39] and
models, including a detailed description of model data pre- housing prices [29], and it is more effective compared to
processing and the network models. Section 4 describes with the traditional ARIMA. As the XGBoost and deep
the algorithm in this paper. Subsequently, the experimen- learning techniques continue to develop, researches about
tal evaluations, including data sets, evaluation indexes, flight delays [13] have also integrated relevant random tree
Civil airline fare prediction... 5049
models with deep learning. The flight delays experiment As the deep learning technique continues to develop, RNN
shows LSTM cell is an effective structure to handle time (LSTM, GRU) time series analysis and CNN+RNN+ atten-
sequences and random forest-based method can obtain good tion prediction have been proposed as two prediction meth-
performance for the classification accuracy (90.2% for the ods. Specifically, CNN captures short-term local depen-
binary classification) and overcome the overfitting problem. dence, while RNN captures long-term macro dependence.
Table 1 Summary of airline fare prediction models and correlation time series forecasting models
Tim et al. [18] Predict the lowest ticket Linear quantile mixed Short-term performance is rea-
price before departure regression model sonable, but long-term perfor-
mance is inefficient.
Tziridis et al. [28] Find the optimal fare Eight regression machine Bagging Regression:87.42%
prediction model from learning models accuracy and Random Forest
the regression algorithm. Regression Tree:85.91%.
Gordiievych et al. [12] Prediction whether the ARIMA Not given
price of ticket drops in
the future.
Machine Learing Wohlfarth et al. [7] Predict the best time to CART and RF CART and RF should be used for
buy tickets preregistered purchase periods to
give a first coarse advice to the
customer.
Ren et al. [32] Predict the lowest ticket Ensemble model that The training error of Naive Bayes
price before departure uses LR, Naive Bayes, and Softmax Regression reduced
Softmax Regression, and to 24.88% and 20.22% .SVM is
SVM. also reduced by approximately
1%.
GuanGui et al. [13] Flight delay prediction Ensemble model that rel- LSTM is capable of handling
evant random tree mod- the obtained aviation sequence
els with deep learning data, RF (90.2% for the binary
classification) can overcome the
overfitting problem.
Guokun et al. [22] Multivariate time series Deep learning: CNN The results show that three of the
prediction and RNN to extract four experimental data sets have
short-term local depen- the best performance.
dency patterns among
variables.
Shih et al. [33] Multivariate time series Deep learning: Attention The proposed TPA-LSTM per-
prediction mechanism for selecting forms best in experiments.
important time series and
multivariate forecasting
using frequency domain
information.
Deep Learning Deng et al. [9] Prediction of flight pas- Deep learning: CNN The proposed MTA-RNN per-
senger load factors. using multi-granularity forms best in experiments.
temporal attention
mechanism(MTA-
RNN).
YaoQin et al. [9] Stock time series prediction Deep learning: Dual- The DA-RNN model has the
stage attention-based best performance in the data set
RNN. SML 2010 and NASDAQ 100
compared with other models.
TongChen et al. [6] Forecast sales volume in Deep learning: Dual- The results show that TADA
real-life commercial sce- stage attention-based prediction result is the best.
nario. RNN trend alignment
with dual-attention,
multi-task RNNs for
sales prediction.
5050 Z. Zhao et al.
Table 2 Input attributes and proved to produce more evident effects than a single-
layer encoder with attention. The TADA splits influencing
Feature attribute Description
factors into internal and exterior features, and then uses the
Airln cd Airline dual attention mechanism to forecast future price trends.
AirCrft Typ Aircraft type At present, although deep learning has received a lot
Dpt AirPt Cd Departure airfield of attention for stock forecasting [4, 16, 24, 27], it has
Arrv Airpt Cd Arrival airport received very little attention for forecasting civil aviation
Air route Route ticket prices. The RNN is the most widely used deep
Flt nbr Flight no. learning model for prediction, and it has gained a lot of
Flt Schd Dpt Tm Flight take-off time traction. Recent years have seen a surge in approaches that
Weekday Weekly attributes use neural network structures to make the prediction results
Holiday Holiday more accurate [17, 21, 38].
Pax Qty y Number of flights A summary for the discussed ticket price prediction
Fare Air ticket price models and correlation time series forecasting models is
shown in Table 1.
that only airfare information of the first eight months is non-critical dimensions, the model shows a higher overhead
displayed. and generalization capability.
Figure 1 shows the dynamic fluctuation of actual ticket
prices in the first eight months of a certain line segment. 3.2 The deep learning model for fare prediction
Figures 2, 3 and 4 show the price fluctuation trends
of 3 quarters in a year, respectively. Particularly, Fig. 2 The structure of the MADA mechanism model proposed in
stands out because of the phenomenal growth around mid- this paper is shown in Fig. 6.
February, which is because mid-February coincides with (1) Network model: To perform multi-dimensional
the traditional Chinese Spring Festival in that year. In the airfare prediction, a multi-attribute dual-stage attention
second quarter of Fig. 3, the air ticket prices exhibit periodic mechanism model is proposed in this paper.
changes; however, as can be seen in the third quarter of Ticket price forecasts need multidimensional data,
Fig. 4, the variation in air tickets becomes relatively stable. which are represented as x1 , x2 , . . . , xn . Then input X =
Therefore, it is fair to deduce that generally speaking, (x1 , x2 , . . . , xn )T = (x 1 , x 2 , . . . , x p ) ∈ R n∗p , where p
the airfare of this particular flight segment shows periodic represents the window size. The time periods t use X =
fluctuations. At the beginning of the year, the ticket prices (x1 , x2 , . . . , xn )T = (x 1 , x 2 , . . . , x p ) ∈ R n∗p to represent
vary significantly because of the holidays, and then they the processing results with multiple attributes. After that, X
exhibit periodic changes. After the middle of the year, which is input into the LSTM layer to obtain the feature vector,
is the off-season period, the ticket prices of this segment which is integrated with the feature weight at at time t to
reveal a stable increase. obtain the output Z t in the encoder layer.
In one execution session of the model, the original data Next, the input for the decoder layer of the LSTM
is cleaned to remove duplicate or missing values, and a network is the time series Zt = (Z 1 , Z
2 , Z
3 , ..., Z
n ) ∈
table with information such as departure time, airline, flight R n∗p of time t in the encoder layer. The decoding results
number, route, and guests’ number is created. are integrated with the feature weight lt at time t to
Subsequently, the non-numerical values are encoded by get the context vector Ct−1 . Finally, the final predicted
One-Hot and vectorized to serve as the inputs into the neural value Y t−1 is obtained from the final output layer in
network. After that, in an 8:2 ratio, the data is divided into LSTM.
training and test data. Following the steps above, the processed data can be
It’s worth noting that the selected feature attributes’ input into the LSTM network for relevant trainings. The
dimensionality is being reduced. First, properties that model employs a supervised learning methodology, with
are relevant to flight fares are visualized (Fig. 5) with the multi-dimensional data (airline, flight number, departure
the feature selection module in the Scikit-Learn machine airport, arrival airport, flight path, and flight number)
learning toolbox. From Fig. 5, it is obvious that airline representing X and the ticket price data representing Y. In
(Airln cd) is the most significant factor to affect ticket this way, model training is enabled, and the MADA model
fares, followed by the departure airport (Dpt AirPt Cd) can remember the law of changes of the relevant data.
and the route (Air route). By reducing the relatively During the model training, various parameters need to be
adjusted to optimize the model, so that both the training data can reasonably divide the negative values. Empirically, the
and the test data can achieve the best possible results. Leaky ReLU is more efficient than the ReLU.
(2) Loss function: The mean square error MSELoss On the other hand, Softmax can convert all the input
is used as the loss function. Set the vectors s and y as values into others within the range of 0–1. Its equation is:
the predicted and actual values, respectively. MSELoss ezi
calculates the error (scalar e) and the gradient of e yi = S(z)i = C , i = 1, . . . , C (2)
zj
concerning s. j =1 e
Z is the output of the previous layer, which serves as the
1
n
e = MSE Los s(s, y) = (st − yt )2 input of Softmax. The predicted object’s dimension is C,
n and yi is the probability that it belongs to the C-th category.
t=1
(4) Optimizer: Adam (Adaptive Moment Estimation)
The solution is: [20] is a first-order optimization algorithm that can be
de
ds = 2
n ((s1 − y1 ) , (s2 − y2 ) , . . . , (sn − yn )) used instead of the conventional stochastic gradient descent.
dy = − ds
de de Furthermore, iterations based on the training data can be
used to adjust neural network weights. Adam is essentially
The mean square error loss is calculated by the distance RMSprop with a momentum term, which uses the first-
between the target and the calculated values, and the order and second-order moment estimations of the gradient
gradient of each step is obtained by backward propagation. to realize dynamic adjustments of the learning rate for
After multiple iterations, the minimum loss is obtained. each parameter. It is especially beneficial because, after
(3) Activation function: The Leaky ReLU [26] and bias correction, the learning rate at each iteration has a
Softmax are used as the activation functions. Particularly, defined range, resulting in reasonably stable parameters. Its
the Leaky ReLU can extract the feature information hidden equations are as follows:
in the data and map it to the corresponding ranges. The
mt = μ ∗ mt−1 + (1 + μ) ∗ gt (3)
equation of the Leaky ReLU function is:
xi , if x ≥ 0
yi = xi (1) nt = v ∗ nt−1 + (1 − v) ∗ gt2 (4)
ai , if x < 0
mt
Compared with the ReLU [11], a general activation t =
m (5)
function, the Leaky ReLU is used in this paper because it 1 − μt
nt
nt = (6)
1 − vt
mt
Δθt = − √ ∗η (7)
nt +
The letter meaning in the above formulas are as follows
:μ, v ∈ [0, 1] represents exponential decay rates for the
moment estimates. m0 initialize first-order moment vector,
n0 initialize second-order moment vector, θ0 initialize
parameter vector, t initialize timestep, and η is stepsize.
Among them, (3) and (4) are the first-order and second-
order moment estimations of the gradient, which can be
considered the expected estimations of E|gt| and E|gt 2 |,
respectively. Besides, (5) and (6) are two correction
equations to (3) and (4), so that they can be approximated
Fig. 5 Important factors affecting fares as unbiased estimates of expectations. The direct moment
Civil airline fare prediction... 5053
estimations of the gradients, based on the equations, do not Within the time series X = (x 1 , x 2 , . . . , x p ) =
require additional memory and can be dynamically adjusted (x1 , x2 , . . . , xn )T ∈ R (n∗p) , deterministic attention is used
according to the gradients. The last item’s front part is a to extract the input time dimensions. The previously hidden
dynamic constraint on the learning rate n that is located states, namely ht−1 and vt−1 , serve as the attention input
within a precise range. in the LSTM of the encoder layer. The equations are as
follows:
3.3 Dual-stage attention mechanism bt = U
p
e tanh We ht−1 ; vt−1 + Be x
p
(9)
Aside from the Seq2Seq model, the model also integrates and
p
the attention mechanism into its time series dimension in its p exp bt
αt = n (10)
encoder and decoder layers. i
i=1 exp bt
βti = U
d tanh Wd dt−1 ; vt−1 + Bd hi (13)
and
exp βti
lti = (14) In Algorithm 1, the multi-attribute content X from the
T j
j =1 exp βt original user is entered to determine the data type. Before
executing Algorithm 1, the attributes are preprocessed based
Each hidden hi in the encoder layer is used as the input
on their types. For numerical data, MinMax is used to
in the decoder layer and calculated with its corresponding
normalize them within the range of [-1,1]; when the data is
attention weight to obtain a weighted average context vector
non-numerical, it is encoded as a One-Hot number. Finally,
Ct . The total hidden input is [h1 , h2 , h3 , ..., hT ].
after such preprocessing, the values are concatenated
T to obtain the output Z̃, which serves as the input for
Ct = lti hi (15) Algorithm 2.
i=1 In Algorithm 2, after inputting Z̃, to get the final
prediction result ỸT , dt−1 and vt−1 are fed into the decoder
Ct is the different times in the context vector. Once the
layer. MSELoss then calculates the difference between the
total amount of context vectors is obtained, it is combined
real and expected values, and the learning parameters are
with the input target (y1 , y2 , ..., yT −1 ), and that gives us:
updated by back propagation to gradually improve the
ỹt−1 = w̃ yt−1 ; Ct−1 + b̃ (16) model’s capability of generalization.
Then ỹt−1 and the input dt−1 in the decoder layer are
concatenated, and the concatenation results are input into
the LSTM network to find out dt . Subsequently, dt is
concatenated with Ct in the fully connected neural network
(FC), and ỸT is the final prediction result from training.
4 Algorithms
In this section, we conduct experiments on real civil aviation ARIMA, RF, XGBoost, CNN-LSTM, CNN-LSTM+Atten-
datasets to showcase the advantage of MADA in the task of tion, and other mechanical models were used in this
airline fare prediction. paper for comparison with the MADA prediction model.
ARIMA, RF, and XGBoost were executed without any
5.1 Datasets further configurations.
For experimental evaluations, our model was implemented LSTM-CNN [15] This approach first uses the LSTM model,
in Python 3.7.6 and performed on an Ubuntu 18.04 with a followed by CNN for parameter classification. Initially, this
2.5GHz Intel Core i7 CPU and 8GB memory. The data set model was used to predict gold prices.
was a two-year anonymous airfare record from a real airline,
and the training set contained more than 1.7 million data CNN-LSTM [14] This model uses CNN first, and then
pieces, including essential attributes such as airline, flight connects to LSTM. These two sub-modules are combined to
number, departure time, and passenger volume. The data set form a CNN-LSTM model. For time series forecasting, the
contains more than 200 routes, and the paper selects one of CNN-model is frequently used.
the representative routes for discussion.
CNN-LSTM+Attn [25] In this model, CNN is used to extract
5.2 Evaluation metrics multidimensional attributes, and the results are entered
into LSTM, then the final results are output through the
The absolute error (MSE), root mean square error (RMSE) attention mechanism. The model is used to predict PM2.5
and mean absolute error (MAE) were applied as the concentration.
assessment metrics. Their calculation equations are:
Seq2Seq [34] Seq2Seq is first proposed to be used to deal
1 (i)
m
(i) 2 with language translation problems. This paper extends its
MSE = ytest − ŷtest application to predict time series problems.
m
i=1
Seq2Seq+Attention [3] Seq2Seq with attention is first
proposed to be used to deal with language translation
m
1 (i) 2 problems when the input sequence length is longer. This
RMSE = (i)
ytest − ŷtest paper extends its application to predict time series problems.
m
i=1
previous models, implying that it can better predict price model integrated multi-dimensional data prediction, and it
fluctuation trends. The experiment applied predominantly showed greater relevant input attributes, suggesting more
the Pytorch deep learning framework for extensive com- accurate data prediction.
parisons in terms of airfare forecasts and analyses. With Moreover, the MADA mechanism is more effective in
repeated parameter adjustments, an optimal model was terms of extracting time series than common deep learning
trained with Adam of unequal steps until the model param- models. However, it took more time to train a MADA
eters converged. During the experiments, one of the routes’ model, and the prediction results still need much tuning
fare is used to compare the prediction results produced by for greatest accuracy. In short, the MADA model proposed
the involved models effectively. in this paper is more preferable than other methods in
Table 3 shows the data comparison results when the time predicting air ticket prices.
window is T=30 for a 7-day forecast (n=7 days).
5.4.2 Effectiveness comparison
5.4.1 Performance comparison
To compare the prediction effects, data from a certain flight
The performance comparison results can be seen in was used for forecasts. For data preprocessing, models
Table 3, in which the indicators for comparison are MSE, such as CNN-LSTM, Seq2Seq, and MADA were used for
RMSE, and MAE. Judging from the experimental results, prediction. Figure 7 shows the fluctuation trends of MSE,
the predictions produced by the MADA model were RMSE, and MAE predicted by the three models.
significantly more preferred with much lower MSE, RMSE, The X-axis in Fig. 7 indicates that different time windows
and MAE than those obtained from traditional machine were used, and the data in the table was used to predict
learning. Particularly, it is noteworthy that the MADA the prices next day under different time windows, which
Fig. 8 Visual comparison of different models, where T represents the time width, and D represents the number of days predicted in the future
were 10, 15, 20, 25, and 30 days. Under different time
windows, other models, such as CNN-LSTM+Attn, all
became unstable in their performance. However, the MADA
revealed better stability and more preferable performance
under different time windows.
Meanwhile, based on the experimental results, if the
model wants to predict the fare next day, the data
from the past 15 days would be necessary for best
training.
In Fig. 8, the predictions from CNN-LSTM, CNN-
LSTM+Attn, Seq2Seq, and Seq2Seq+Attn were com-
pared. Figure 9 depicts the MADA model’s visualiza-
tion results. Among them, the effects of the training set
and the test set on the final predictions are shown in
the upper parts of the graphs, while the effect of visu-
alizing the upper half of the test set on the final pre-
Fig. 9 Visual comparison of different models, where T represents the
dictions is shown in the lower parts. It can be con-
time width, and D represents the number of days predicted in the future cluded from the graphs that the MADA model produces
5058 Z. Zhao et al.
5.4.3 Ablation study MADA sAttn Seq2Seq with 16, 32, 64, 100, or 128 hidden
layers were used to build the model, followed by the
The ablation study involved experiments on the proposed addition of the temporal attention mechanism to the decoder
model MADA with different structures. The original model layer;
was altered as the follows.
MADA Seq2Seq with 16, 32, 64, 100, or 128 hidden layers
MADA nAttn Encoders and decoders with 16, 32, 64, 100, were used. Subsequently, the temporal attention mechanism
or 128 hidden layers were used to build the model. The was added to the encoder and decoder layers to build the
model.
Table 4 Comparison of the variant MADA models The experimental results are shown in Fig. 10, in
Model RMSE MSE MAE which the MSE, RMSE, and MAE of the MADA nAttn
model were negatively correlated with the number of
MADA nEx 0.00976 0.00010 0.00753 hidden layers. Besides, the prediction results from the
MADA nAttn allData 0.02422 0.00059 0.01768 MADA nAttn model became more favorable as the number
MADA sAttn allData 0.01771 0.00031 0.0146 of hidden layers rose. However, after producing the optimal
MADA 0.00610 0.00004 0.00463 results when there was 64 hidden layers, MADA nAttn
would lead to less satisfactory results when the number
Bold entries signify the model proposed in this article of model layers continued to increase. This is because
Civil airline fare prediction... 5059
the model does not perform well in the data set, and the airlines fare prediction, but optimal purchase time predic-
hidden information within the data cannot be learned from tion has not been studied. The prediction of the best time
a single attention, and such inaccessibility was exacerbated to buy air tickets may become a research direction in this
by a growing number of hidden layers. On the other hand, field next. In addition, as far as airlines are concerned,
the MADA model proposed in this paper cannot learn the there are also issues such as demand prediction and price
hidden information in the data with a small number of discrimination that require further in-depth research.
hidden layers (no larger than 32). However, as the number In the future, more accurate prediction methods should
of hidden layers grew, the MADA model outperforms other be explored to optimize the current imperfections. The
deep learning models. prediction of civil aviation ticket prices can be realized
Besides, the importance of multidimensional attributes by deep learning methods, so that the ticket buyers can
has been studied. Different attributes were input into the choose a more reasonable period to purchase. At the same
MADA model for training, and the prediction conditions time, the company can also increase its corresponding
were set as the time window T = 15 and n = 1. Here, the revenue through predictive models. There is a tradeoff
following different variants were defined. between money saving by customer and increasing revenue
by companies. Therefore, there is a need for a prediction
MADA nEx There were no weekend or holiday attributes in model that can predict the optimal ticket prices that can
the multi-attribute data. bring mutual benefit both for customers and airlines.
6 Conclusion References
Currently, the prediction for civil aviation ticket prices 1. Abdella JA, Zaki N, Shuaib K, Khan F (2019) Airline ticket price
and demand prediction: a survey. Journal of King Saud University
remains rather inaccurate and unreliable. To solve such - Computer and Information Sciences. https://doi.org/10.1016/j.
problem, a prediction method based on MADA is proposed. jksuci.2019.02.001. https://www.sciencedirect.com/science/articl
Judging from the experimental results, the MADA-based e/pii/S131915781830884X
method can produce more accurate prediction results than 2. Asteriou D, Hall SG (2016) ARIMA Models and the Box-Jenkins
Methodology
the traditional methods for civil aviation ticket prices. 3. Bahdanau D, Cho KH, Bengio Y (2015) Neural machine
Moreover, with multidimensional training models, the translation by jointly learning to align and translate. In: 3rd
prediction results will be more accurate. Combined with the international conference on learning representations. ICLR 2015 -
dual-stage attention mechanism, the implicit information of Conference Track Proceedings, pp 1–15. 1409.0473
4. Bao W, Yue J, Rao Y (2017) A deep learning framework for finan-
time series can be extracted to the utmost extent. cial time series using stacked autoencoders and long-short term
Although MADA has a certain effect from the experi- memory. PLOS ONE 12(7). https://doi.org/10.1371/journal.pone.
mental results, there are still some problems based on the 0180944
current research. Specifically, first of all, ticket prices will 5. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting
system. In: The 22nd ACM SIGKDD international conference
change with other uncontrollable attributes, such as weather 6. Chen T, Yin H, Chen H, Wu L, Wang H, Zhou X, Li X (2018)
conditions will also affect the change in ticket prices. Sec- TADA: trend alignment with dual-attention multi-task recurrent
ondly, although this paper to do a lot more research in neural networks for sales prediction. In: Proceedings - IEEE
5060 Z. Zhao et al.
International Conference on Data Mining. ICDM 2018-Novem, development in information retrieval, SIGIR 2018, pp 95–104.
pp 49–58. https://doi.org/10.1109/ICDM.2018.00020 https://doi.org/10.1145/3209978.3210006
7. Clémençon S, Casellato X, Roueff F, Wohlartfh T (2012) A 23. Lalonde F (2020) Hopper - book flights & hotels on mobile.
data-mining approach to travel price forecasting. In: ICMLA https://www.hopper.com/
8. Dal Molin Ribeiro MH, Coelho LDS (2020) Ensemble approach 24. Li M (2019) The study of stock market prediction based on deep
based on bagging, boosting and stacking for short-term predic- learning networks. PhD thesis
tion in agribusiness time series. Applied Soft Computing 86. 25. Li S, Xie G, Ren J, Guo L, Yang Y, Xu X (2020) Urban pm2.
https://doi.org/10.1016/j.asoc.2019.105837 5 concentration prediction via attention-based cnn–lstm. Applied
9. Yujing D, Zhihao W, Youfang L (2020) Flight passenger Sciences 10(6):1953
load factors prediction based on RNN using multi-granularity 26. Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities
time attention. Computer Engineering v.46 No.509(01):300–307. improve neural network acoustic models. In: Proc. icml, Citeseer,
https://doi.org/10.19678/j.issn.1000-3428.0053569 vol 30, p 3
10. Ding J (2018) Research on ticket pricing strategy of shandong 27. Pang X, Zhou Y, Wang P, Lin W, Chang V (2018) An innovative
airlines. PhD thesis neural network approach for stock market prediction. The Journal
11. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural of Supercomputing (1):1–21
networks. In: Gordon G, Dunson D, Dudı́k M (eds) Proceedings 28. Papadakis M (2012) Predicting airfare prices using machine
of the fourteenth international conference on artificial intelligence learning. Stanford Assignment
and statistics, PMLR, Fort Lauderdale, FL, USA. Proceedings 29. Peng Z, Huang Q, Han Y (2019) Model research on forecast
of Machine Learning Research, vol 15, pp 315–323. http:// of second-hand house price in Chengdu based on XGboost
proceedings.mlr.press/v15/glorot11a.html algorithm. In: 2019 IEEE 11th international conference on
12. Gordiievych A, Shubin I (2015) Forecasting of airfare prices advanced infocomm technology, ICAIT 2019, pp 168–172.
using time series. In: 2015 information technologies in innovation https://doi.org/10.1109/ICAIT.2019.8935894
business conference. ITIB 2015 - Proceedings, pp 68–71. 30. Qiang Z (2015) Research on the several issues about pricingmodel
https://doi.org/10.1109/ITIB.2015.7355055 in airline revenue management. PhD thesis
13. Gui G, Liu F, Sun J, Yang J, Zhou Z, Zhao D (2020) Flight delay 31. Qin Y, Song D, Cheng H, Cheng W, Jiang G, Cottrell
prediction based on aviation big data and machine learning. IEEE GW (2017) A dual-stage attention-based recurrent neural net-
Trans Veh Technol 69(1):140–150. https://doi.org/10.1109/TVT. work for time series prediction. In: IJCAI International Joint
2019.2954094 Conference on Artificial Intelligence, vol 0, pp 2627–2633.
14. Guo X, Zhao Q, Zheng D, Ning Y, Gao Y (2020) A short- https://doi.org/10.24963/ijcai.2017/366. 1704.02971
term load forecasting model of multi-scale cnn-lstm hybrid 32. Ren R, Yang Y, Yuan S (2014) Prediction of airline ticket price.
neural network considering the real-time electricity price. Energy University of Stanford
Reports 6:1046–1053 33. Shih SY, Sun FK, Yi Lee H (2019) Temporal pattern atten-
15. He Z, Zhou J, Dai HN, Wang H (2019) Gold price forecast based tion for multivariate time series forecasting. Machine Learn-
on LSTM-CNN model. In: Proceedings - IEEE 17th international ing 108(8-9):1421–1441. https://doi.org/10.1007/s10994-019-058
conference on dependable, autonomic and secure computing, 15-0. 1809.04206
IEEE 17th international conference on pervasive intelligence 34. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence
and computing, IEEE 5th international conference on cloud learning with neural networks. Advances in Neural Information
and big data computing. 4th Cyber Science, pp 1046–1053. Processing Systems 4(January):3104–3112. 1409.3215
https://doi.org/10.1109/DASC/PiCom/CBDCom/CyberSciTech. 35. Taylor SJ, Letham B (2018) Forecasting at scale. Am Stat
2019.00188 72(1):37–45. https://doi.org/10.1080/00031305.2017.1380080
16. Hoseinzade E, Haratizadeh S (2019) Cnnpred: cnn-based stock 36. Wang J, Sun X, Cheng Q, Cui Q (2021) An innova-
market prediction using a diverse set of variables. Expert Syst tive random forest-based nonlinear ensemble paradigm of
Appl 129(SEP.):273–285 improved feature extraction and deep learning for carbon
17. Huang B, Liang Y, Qiu X (2021) Wind power forecasting using price forecasting. Science of the Total Environment 762.
attention-based recurrent neural networks: a comparative study. https://doi.org/10.1016/j.scitotenv.2020.143099
IEEE Access 9:40432–40444. https://doi.org/10.1109/ACCESS. 37. Wang T, Pouyanfar S, Tian H, Tao Y, Alonso M, Luis S, Chen
2021.3065502 SC (2019) A framework for airfare price prediction: a machine
18. Janssen T (2014) A linear quantile mixed regression model for learning approach. In: 2019 IEEE 20th international conference on
prediction of airline ticket prices information reuse and integration for data science (IRI), pp 200–
19. Juan S (2017) Analysis of the rule of civil aviation passenger 207. https://doi.org/10.1109/IRI.2019.00041
reservation and parallel flights management. PhD thesis 38. Yang Z, Yan W, Huang X, Mei L (2020) Adaptive temporal-
20. Kingma DP, Ba J (2014) Adam: a method for stochastic frequency network for time-series forecasting. IEEE Trans Knowl
optimization. arXiv:14126980 Data Eng, pp 1–1. https://doi.org/10.1109/TKDE.2020.3003420
21. Lago J, De Ridder F, De Schutter B (2018) Forecasting 39. Zhou Y, Li T, Shi J, Qian Z (2019) A CEEMDAN and
spot electricity prices: deep learning approaches and empirical XGBOOST-based approach to forecast crude oil prices. Complex-
comparison of traditional algorithms. Applied Energy 221:386– ity 2019. https://doi.org/10.1155/2019/4392785
405. https://doi.org/10.1016/j.apenergy.2018.02.069
22. Lai G, Chang WC, Yang Y, Liu H (2018) Modeling long-
and short-term temporal patterns with deep neural networks. Publisher’s note Springer Nature remains neutral with regard to
In: 41st international ACM SIGIR conference on research and jurisdictional claims in published maps and institutional affiliations.
Civil airline fare prediction... 5061
Affiliations
Zhichao Zhao1,2 · Jinguo You1,2 · Guoyu Gan1,2 · Xiaowu Li1,2 · Jiaman Ding1,2
Zhichao Zhao
zhaozhichao study@stu.kust.edu.cn
Guoyu Gan
20182204189@stu.kust.edu.cn
Xiaowu Li
lxwlxw66@126.com
Jiaman Ding
tjom2008@163.com
1 Faculty of Information Engineering and Automation,
Kunming University of Science and Technology,
Kunming 650500, China
2 Yunnan Key Laboratory of Artificial Intelligence,
Kunming University of Science and Technology,
Kunming 650500, China