[go: up one dir, main page]

0% found this document useful (0 votes)
64 views11 pages

Forecasting Stock Market Indices Using Padding-Bas

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views11 pages

Forecasting Stock Market Indices Using Padding-Bas

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3086537, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2021.DOI

Forecasting Stock Market Indices Using


Padding-based Fourier Transform
Denoising and Time Series Deep
Learning Models
DONGHWAN SONG1 , ADRIAN M. CHUNG BAEK2 , AND NAMHUN KIM2*
1
Department of System Design and Control Engineering, Ulsan National Institute of Science and Technology, Ulsan, South Korea
2
Department of Mechanical Engineering, Ulsan National Institute of Science and Technology, Ulsan, South Korea
Corresponding author: Namhun Kim (e-mail: nhkim@unist.ac.kr).
This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the
Korea government (MSIT) (No.2020-0-01336, Artificial Intelligence Graduate School Program [UNIST]).

ABSTRACT Approaches for predicting financial markets, including conventional statistical methods and
recent deep learning methods, have been investigated in many studies. However, financial time series data
(e.g., daily stock market index) contain noises that prevent stable predictive model learning. Using these
noised data in predictions results in performance deterioration and time lag. This study proposes padding-
based Fourier transform denoising (P-FTD) that eliminates the noise waveform in the frequency domain
of financial time series data and solves the problem of data divergence at both ends when restoring to the
original time series. Experiments were conducted to predict the closing prices of S&P500, SSE, and KOSPI
by applying data, from which noise was removed by P-FTD, to different deep learning models based on time
series. Results show that the combination of the deep learning models and the proposed denoising technique
not only outperforms the basic models in terms of predictive performance but also mitigates the time lag
problem.

INDEX TERMS Deep learning, denoising framework, Fourier transform, stock index prediction, time
series

I. INTRODUCTION model. These econometric models are employed under the


For decades, stock market prediction has been receiving assumption that the time series data are linear. Because of
steady attention from researchers and investors as an attrac- linearity assumption, conventional approaches demonstrate
tive field. Nonetheless, accurately predicting the future of limited performances when predicting the financial time se-
the stock market has remained an open question because ries, which are mostly nonlinear and nonstationary. Accord-
stock markets are dynamic and possess several unpredictable ingly, machine learning models, such as the support vector
factors. According to the Efficient Market Theory proposed machine and artificial neural network (ANN) models, have
by Fama [1], financial markets are unpredictable because all been applied to forecast the value or direction of the stock
new information is already reflected on the price. Contrary market index to overcome the shortcomings of linear models
to this view, numerous studies have predicted stock markets [10]- [12]. Recently, deep learning models, including long
by taking diverse approaches, starting from conventional short-term memory (LSTM) [13]- [18], and their variants
statistical models to machine learning and deep learning [19]- [25] have been popularly proposed for stock prediction.
models in accordance with advancements in computational A major problem in stock prediction using deep learning
performance [2]- [7]. methods is that the financial time series contains considerable
Banerjee [8] forecasted the stock index for a day using noise [26]. When predicting using noise-included data, learn-
an autoregressive integrated moving average, whereas Liu ing becomes unstable because of the unwanted fitting data
and Hung [9] studied the stock index volatility through generated despite using machine learning and deep learning
a generalized autoregressive conditional heteroskedasticity models [27]. This could result in overfitting or underfitting

VOLUME 0, 2021 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3086537, IEEE Access

Author et al.: Preparation of Papers for IEEE Access (May 2021)

problems [28], [29]. Moreover, a time lag occurs, wherein performance by conducting experiments to predict the stock
the predicted value lags in the direction and size of the actual indices of the next day using different time series-based deep
data values in the past. learning models and comparing to models from previous
In view of these limitations, studies to eliminate noises in research. The remainder of this paper is organized as follows:
financial data have emerged. Raudys et al. [30] compared Section 2 explains the methods used herein, including the
different moving averages and verified the exponential mov- proposed methodology; Section 3 presents the details of the
ing average to achieve the best smoothing performance. Babu experimental procedures; Section 4 summarizes the exper-
and Reddy [31] proposed moving average (MA) to smoothen imental results and discussion; and Section 5 provides the
time series data and explore their trend component before conclusions of this study.
applying the ANN model. However, noise removal using MA
techniques is accompanied by the time lag problem caused II. RELATED WORK
by the reflection of the historical data. The smoothness of A. FOURIER TRANSFORM
the MA approach is inversely proportional to the time lag FT is a mathematical tool used to convert a finite sequence of
[30]. Several studies using denoising techniques other than waveform data in the time domain into equally spaced data
MA have been conducted to reduce the effect of noise in in the frequency domain [43]. The original data are restored
the financial time series data and enhance the capability of through an additional Fourier analysis using FT samples as
prediction models. Lu [32] proposed a denoising technique the coefficients of complex sinusoids at the corresponding FT
based on independent component analysis integrated with a frequencies. This process is known as the inverse FT. There-
backpropagation neural network for stock price prediction. fore, a classical FT and its inverse are said to be in a one-
Awajan [33] used empirical mode decomposition (EMD) to-one relationship between the time [x(t)] and frequency
along with a bagging method to predict the daily stock market [X(ω)] domains.
prices of six countries. Discrete FT (DFT) is the most common type of Fourier
In recent years, denoising algorithms based on transform analysis applied to a discrete complex-valued series. DFT
methods have gained preference for outperforming many tra- breaks down a waveform in a time domain into a series of
ditional methods such as the MA filter and simple nonlinear sinusoidal terms, each with a unique magnitude, frequency,
noise reduction [34]. For example, Yu et al. [35] proposed and phase. The DFT process converts the time-based wave-
a hybrid model comprising of empirical wavelet transform form expressed in complex functions into clearer sinusoidal
(EWT) and optimized extreme learning machine (ELM) to functions, which when combined, can exactly replicate the
present a stable and precise prediction of financial time original waveform. DFT transforming a sequence of N com-
series. Chan Phooi M’ng and Mehralizadeh [34] proposed plex numbers {xn } into {Xk } is presented below:
Wavelet-PCA denoising (WPCA), a hybrid model using
N −1
wavelet transform (WT) and principal component analysis. X
Xk = xn · e−i2πkn/N (1)
They applied their denoising method with ANN to analyze
n=0
and forecast the financial future markets. Meanwhile, Li and
Tam [36] combined the real-time wavelet denoising (RTWD) Its inverse is given by
with LSTM, a time series-based learning model, to predict N −1
East Asian stock market indexes. Bao et al. [37] combined 1 X
xn = Xk · ei2πkn/N (2)
wavelet transforms with stacked autoencoders (WSAEs) and N
k=0
LSTM to predict the closing price of six different mar-
ket indices with corresponding index futures. Their model where log expression e±i2πkn/N can be expressed as a
outperformed other models in the predictive performances. combination of sines and cosines according to Euler’s for-
However, these wavelet denoising methods have limitations mula
in terms of retrieving weak signals with magnitudes close to
the noise [38]. Fourier transform (FT) is another method that eiω = cos ω + i sin ω (3)
can be applied to various fields for denoising different types Hence, equations (1) and (2) can be expressed as equations
of discrete and continuous data including image data [39]- (4) and (5), respectively:
[41]. FT has also been shown to be effective in denoising time
N −1
series data. Chen and Chen [42] proposed a fuzzy time series X 2πkn 2πkn
forecasting model by combining the entropy discretization Xk = xn · [cos − i · sin ] (4)
n=0
N N
technique and fast FT (FFT) algorithm. Despite FT methods
being strong denoising methods, they have diverging issues N −1
1 X 2πkn 2πkn
when applied to financial time series due to information lost xn = Xk · [cos + i · sin ] (5)
N N N
in the removal process. k=0

This study proposes a denoising technique using FFT with DFT shows advantages in many fields, but computing it
padding to remove the noises of financial time series data directly is often computationally too expensive to be prac-
without leading to divergence and time lag. We verify the tical. FFT was introduced by Cooley and Tukey [44] in
2 VOLUME 0, 2021

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3086537, IEEE Access

Author et al.: Preparation of Papers for IEEE Access (May 2021)

1965. It is an optimized approach for implementing FT. the memory cell to influence the other units. The calculations
The computation complexity of FT can be reduced from for each gate and cell state are expressed as follows:
O(N 2 ) to O(N log N ). FFT is widely used in many practical
applications because it is believed to be an effective method it = σ(Wxi xt + Whi ht−1 + bi ) (8)
for disturbed signal denoising.
ft = σ(Wxf xt + Whf ht−1 + bf ) (9)
B. RECURRENT NEURAL NETWORK ot = σ(Wxo xt + Who ht−1 + bo ) (10)
A recurrent neural network (RNN) is a type of ANN with a ct = ft ct−1 + it c̃t (11)
“memory” that captures past information, making it suitable
for arbitrarily long sequence data [45]. This “memory” is c̃t = tanh(Wco xt + Wco ht−1 + bc ) (12)
known as the hidden state, which is the main and most ht = ot tanh(ct ) (13)
important feature of an RNN. An RNN comprises an input
layer, a hidden layer, and an output layer. The hidden states where ct is the memory cell; c̃t is the internal hidden state;
ht and the output of a single hidden layer ot given an input σ(·) is a sigmoid function; tanh(·) is a hyperbolic tangent
sequence at time step t xt can be derived as follows: function; and is the elementwise vector product.

ht = H(Wxh xt + Whh hh−1 + bh ) (6) D. GATED RECURRENT UNIT


The gated recurrent unit (GRU) was introduced by Cho et
ot = O(Who ht + bo ) (7) al. [47] to deal with the vanishing gradient problem of an
RNN. GRU is a variation of LSTM because it shows a
where W denotes the connection weights between the lay- similar structure to that of a long short-term memory with
ers shared across all steps and b represents the bias vectors. the forget gate. In GRU, the memory cell and the hidden
Equations (6) and (7) and Fig. 1(a) show that the hidden state are combined into a vector h̃t , while the input and forget
state ht is calculated based on the previous hidden state ht−1 gates are combined into a gate controller known as the reset
and the input at the current step xt . H(·) and O(·) are the gate zt . Despite the simplified parameters, GRU has shown a
activation functions, which, in most cases, are expressed as performance comparable with that of LSTM [48], [49].
tanh. The gated unit has several variations; the most general
form is depicted in Fig. 1(c). The cell states and the output
C. LONG SHORT-TERM MEMORY of each layer can be calculated as follows:
Hochreiter and Schmidhuber [46] proposed an RNN varia-
tion, called LSTM, which was designed to circumvent the rt = σ(Wxr xt + Whr ht−1 + br ) (14)
long-term dependency problem. A memory block comprising
zt = σ(Wxz xt + Whz ht−1 + bz ) (15)
multiple gates and a memory cell is the most important
feature of an LSTM system. Fig. 1(b) represents a memory h̃t = tanh(Wxg xt + Whg (rt ht−1 + bz ) (16)
block of the LSTM model.
ht = (zt ht + (1 − zt )h̃t ) (17)
LSTM has three main gates: input gate(it ), forget gate(ft ),
and output gate(ot ). The input gate controls the input signal Equations (14) to (17) and Fig. 1(c) show that the update
that alters the memory cell state. The forget gate regulates gate rt , which is also the gate controller, controls both the
the amount of the previous cell state (ht−1 ) that can pass forget and input gates. Although the reset gate zt seems like
through. The output gate decides whether to allow the state of the update gate, its weights and usage are different.

FIGURE 1. Cell diagram of each time series-based deep learning model.

VOLUME 0, 2021 3

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3086537, IEEE Access

Author et al.: Preparation of Papers for IEEE Access (May 2021)

III. PROPOSED PADDING-BASED FT DENOISING


METHOD s
Pn r Pn
1 (Xk − X1 ) 1 Xk
In the financial market, it is commonly accepted that the σ1 = , where X1 = (18)
n n
volatility resulting from short-term traders, who tend to exe-
cute relatively high-frequency trades throughout the day with s
PN
s
PN
small assets, can affect the daily price of the stock but has N −n (Xk − X2 ) N −n Xk
minimal effects on its momentum or major trend. In this σ2 = , where X2 =
n n
study, such volatility is considered noise. Because of the (19)
frequent buying and selling of the short-term traders, this Xt−1 = Xt + N1 , t = 1,...,1 − m (20)
noise corresponds to a waveform with low amplitude and
high frequency in the frequency domain and must be removed Xt+1 = Xt + N2 , t = N , ..., N + m (21)
for stable learning of predictive models. Unfortunately, it is
difficult to effectively remove such noise in the time domain. where, N1 and N2 are samples from the normal distribu-
Hence, using filtering approaches, such as MA, only leads to tion of (0,σ1 ) and that of (0, σ2 ), respectively. These samples,
issues such as time lagging and unstable learning. FFT was which reflected the recent volatility of the original time
used in this study to remove the noise in financial time series series, were attached to both ends of the data. This process
data by separating the data into different frequency domain was repeated as many as m times (i.e., size of the padding
waveforms and eliminating the low-amplitude and high- area). Fig. 2(b) displays the resulting time series with padding
frequency waveform. Both ends of the time series diverge regions.
when the original time series is restored with the removed Fig. 2(c) illustrates the result of decomposing the padded
noise waveform, thereby resulting in a large error from the time series data (padded as above in the time domain) into
original data. different N + 2m frequencies using FFT. The decomposed
frequencies displayed a bilateral symmetry based on the
In this study, we proposed a method for removing noise
center. The x-axis in Fig. 2(c) represents the decomposed
from a frequency domain and restoring it into smoothed data
frequencies, in which the absolute value increased from the
without significant loss of original information by applying
center to both ends. The y-axis represents the amplitude
the padding technique to FT. Fig. 2 presents the overall
of each signal waveform, which also corresponded to the
process of the proposed method.
Fourier coefficient. A large value means a waveform that sig-
Fig. 2(a) shows the original time series data X(t) with small nificantly affects the original data. In other words, waveforms
and highly volatile noises. Here, σ1 and σ2 , which represents with low amplitude and high frequency are considered noise
the recent n sample volatility at both ends of X(t), are derived and should be removed to smoothen the original data because
as follows: they generate considerable variation in a short period.

FIGURE 2. Illustration of the steps involved in padding-based Fourier transform denoising.

4 VOLUME 0, 2021

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3086537, IEEE Access

Author et al.: Preparation of Papers for IEEE Access (May 2021)

Therefore, the Fourier coefficient for a waveform, in which


the frequency does not exceed ε, is set to 0 to remove the
noise (Fig. 2(d) and (22)).
(
Ak if fk < |ε|
Ak = , k = 1 − m, ..., N + m, (22)
0 if fk ≥ |ε|
where Ak is the frequency amplitude at k and ε is the fre-
quency threshold value. After removing the noise waveform
of less than ε, N + 2m waveforms with different frequencies
were multiplied with the corresponding Fourier coefficient
and restored to the original time series through inverse FFT.
In this process, each frequency has a different periodic
function form. Moreover, divergence occurs at both ends of
the time series as the amplitude of the frequency waveform
below the threshold is converted into zero to eliminate the
noise and then merged into the original data (Fig. 2(e)).
The padded 2m data added to prevent the formation of the
diverging region are removed from the restored time series.
Fig. 2(f) shows the smoothed time series graph obtained from
the P-FTD process on the original time series data.
The noise-removal process in time series through the
padding-based FT denoising is summarized as follows:

Step 1 (A → B): The padding area is created by adding


randomly sampled values from the normal
FIGURE 3. Experimental framework of the proposed model.
distributions (0,σ1 ) and (0,σ2 ) to the data at
both ends and repeating this m times.
Step 2 (B → C): FFT is used to transform the padded
from January 2, 2001 to April 17, 2020. The input features
time series data from the time domain to
comprised the daily open (stock price at the start of each
the frequency domain and split them into
trading day), high (highest price of each trading day), low
N + 2m waveforms.
(lowest price of each trading day), close (stock price at the
Step 3 (C → D): Among the decomposed waveforms, the
end of each trading day), and volume (number of shares
amplitude values of the waveforms with a
traded each day) of S&P500, SSE, and KOSPI. An output
frequency value higher than the threshold
feature was set to the closing price of the next day. All data
(ε) are converted into 0 to remove noise.
can be obtained from Yahoo Finance and investing.com.
Step 4 (D → E): The remaining waveforms are recombined
with the data from the frequency domain to Fig. 4 and Table 1 summarize how the extracted data
the time domain using the inverse FFT. were divided into three subsets (i.e., training, validation, and
Step 5 (E → F): The padding areas containing the diverg- testing sets with proportions of 70% training, 10% validation,
ing part are removed to restore the denoised and 20% testing, respectively). The blue color in Fig. 4
time series data with the same length as that indicates the training period, the yellow color indicates the
of the original time series. validation period, and the red color indicates the testing
period.
IV. EXPERIMENT
This section presents the details of the experiment used to B. PREPROCESSING
verify the performance of the stock index prediction through Min-max normalization, also known as min-max scaling, was
P-FTD. Fig. 3 shows the overall research process to which performed on the input data before being feeding them into
the proposed model was applied. The specific data, pre- the model for more stable learning of the prediction model.
processing, models, hyperparameters, and evaluation metrics Large values of the input overwhelm the weight adjustment
involved in the experiment are explained herein. during network training even if errors occur owing to other
factors [50]. Moreover, input features can be dominated by a
A. DATASET particular input feature when the magnitudes of each variable
The research datasets used in this study comprised represen- are different. This scaling aims to ensure that larger input
tative stock market indices from the United States, China, and features do not overwhelm smaller input features by rescaling
Korea. The data comprised of the daily prices and volume the size of each variable between 0 and 1. The min–max
VOLUME 0, 2021 5

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3086537, IEEE Access

Author et al.: Preparation of Papers for IEEE Access (May 2021)

normalization is calculated as follows:


x − xmin
x0 = (23)
xmax − xmin
where x is the original value and xmax and xmin are the
maximum and minimum values of each feature, respectively.

C. MODELS AND HYPERPARAMETERS


In this study, we conducted experiments by integrating the
P-FTD process into three different time series-based deep
learning models: RNN, LSTM, and GRU. The P-FTD-
integrated models (i.e., P-FTD_RNN, P-FTD_LSTM, and P-
FTD_GRU) are categorized as Group 1, and the basic models
(i.e., RNN, LSTM, and GRU) as Group 2.
Hyperparameters of P-FTD (i.e., volatility period, padding
area size, and threshold) and the predictive models (i.e.,
time step, learning rate, hidden dimensions, and early stop)
were optimized using a grid search algorithm. The predictive
models were generated using the hyperparameters obtained
through this process and their performances were then evalu-
ated by feeding the testing data. The value and search space
of the hyperparameters are summarized in Tables 2 and 3.
Volatility period N, padding area size m, and threshold
value ε for the noise removal were set to 40, 40, and 0.2, re-
spectively. These values can be adjusted by the experimenter
considering each P-FTD hyperparameter characteristic. As
the volatility period N increases, a padding area reflecting
the recent volatility trend of a longer period is obtained,
whereas as N decreases, a padding area that is significantly
affected by the recent volatility is obtained. The divergence
region can be more stably removed as the padding area size
m increases; however, a random sample has a greater effect
on the original time series. The smaller the threshold value ε,

TABLE 2. Hyperparameters of P-FTD

Hyperparameter Value Search space


FIGURE 4. Daily closing prices from January 02, 2001 to April 17, 2020 Volatility period(N ) 40 10, 20, 30, 40
for each stock market index.
Padding area size(m) 40 10, 20, 30, 40
Threshold(ε) 0.2 0.1, 0.2, 0.3

TABLE 3. Hyperparameters of the time series-based deep learning


TABLE 1. Summary of the training and testing dataset after batch models
preprocessing.

Hyperparameter Value Search space


Training Validation
Testing period Timestep 20 10, 20, 60, 120, 240
Dataset period period
(sample)
(sample) (sample) Learning rate 0.001 0.01, 0.001, 0.0001
2001/01/02- 2014/07/16- 2016/06/15-
S&P500 2014/07/15 2016/06/14 2020/04/17 Input dimension 5 -
(3383) (483) (967) Hidden dimension 10 10, 20, 30
2001/01/02- 2014/07/24- 2016/06/21- Output dimension 1 -
SSE 2014/07/23 2016/06/20 2020/04/17
(3261) (465) (933) Loss function MSE -
2001/01/02- 2014/07/01- 2016/06/07- Optimizer Adam -
KOSPI 2014/06/30 2016/06/03 2020/04/17
(3345) (475) (950) Early stopping 20 10, 20, 30

6 VOLUME 0, 2021

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3086537, IEEE Access

Author et al.: Preparation of Papers for IEEE Access (May 2021)

the more numbers of high-frequency waveforms are removed time series stabilized without divergence when the data were
in the P-FTD process and a smoother time series is restored. denoised using padding techniques and restored to their time
Timestep, learning rate, input dimension, hidden dimen- domain form.
sion, and output dimension were set to 20 (trading days in Figs. 6 to 8 show the actual and forecasted values of the
a month), 0.001, 5 (open, high, low, close, volume), 10, and S&P500, SSE, and KOSPI indices in the last two months
1, respectively. The other parameters, i.e., the optimization from February 21, 2020 to April 17, 2020. The forecasting
algorithm and the loss function, were represented by the performances of the Group 2 models were compared with
Adam optimizer and the mean squared error, respectively. those of the Group 1 models, which are P-FTD-integrated
Early stopping was employed to prevent the model from models.
overfitting and terminate model learning when the loss of Table 4 presents the evaluation results for each model.
the validation data did not decrease by more than 20 epochs. Among the basic models belonging to Group 2, GRU showed
These values of the hyperparameters were fixed in every the best performance in predicting S&P500 and SSE, and
model to validate the denoising effects of P-FTD on each LSTM in predicting KOSPI. P-FTD_LSTM exhibited the
model. best performance among the denoised models in Group 1.
A comparison of the best performing models in each group
D. EVALUATION METRICS illustrated that P-FTD_LSTM outperformed GRU. S&P500
The forecasting performance of each model was evaluated had 49.04% (MAE), 51.03% (RMSE), and 49.56% (MAPE);
based on the following metrics: MAE, RMSE, MAPE, and SSE had 55.31% (MAE), 57.60% (RMSE), and 55.17%
hit ratio. Equations (24) to (28) define these metrics as (MAPE); and KOSPI had 47.16% (MAE), 49.58% (RMSE),
follows: and 47.76% (MAPE).
Table 4 and Figs. 6 to 8 show a slight difference in
n
1X the predicting performances within each group but a large
M AE = |yi − fi | (24)
n i=1 difference in the performances between the groups.
v In addition to the abovementioned results, two interesting
u n
u1 X aspects were also confirmed from the experimental results.
RM SE = t (yi − fi )2 (25) First, the time lag improved. In Figs. 6 to 8, a time lag is
n i=1
observed in Group 2 when predicting the next day’s index
n with the noised data. This result can be attributed to the
1 X yi − fi
M AP E(%) = × 100 (26) characteristics of the predictive models that learn weights
n i=1 yi by minimizing errors and the fact that following the value
Here, yi is the actual value, while fi is the forecast of the previous day tends to give a lower prediction error
value. Variable n represents the number of test samples. A more than actually predicting the value of the next day.
lower value of these indicators denotes a smaller difference By contrast, predictive models integrated with the P-FTD
between the actual and forecasted values, showing better showed an improved time lag even at the transition points.
network performance. To confirm the time lag improvement, a comparison of the
The performance of predicting the next-day direction is RSME results of each model for the three indexes according
evaluated using the hit ratio equation: to the time delay was performed (Table 5). The actual index
value was compared with the predicted value of the same
n
1X day in Lag 0, one day after in Lag 1, and two days after
Hit ratio = Di (i = 1, 2, ..., n) (27) in Lag 2. In the table, Group 1 models showed the lowest
n i=1
RMSE values in Lag 0, whereas Group 2 models showed the
where Di is the directional match result for the i-th trading lowest RMSE values in Lag 1. The result indicates that P-
day. FTD models, unlike basic models optimized to the value of
( the day before prediction and predict the following day, make
1, (yt+1 − yt )(ft+1 − ft ) > 0, predictions independent of the values of the previous day.
Di = (28)
0, Otherwise. The second aspect is the formation of a robust model as
a response to minor changes. Models without the P-FTD
V. EXPERIMENTAL RESULTS AND DISCUSSION process respond significantly to minor changes in the original
In this section, we describe experiments, in which P-FTD data and reduce the prediction performances. The P-FTD
was applied on different deep learning models, conducted to models responded more stably to small volatilities and made
verify the denoising effects in stock market index prediction. predictions based on the major trends of the financial time
Fig. 5 shows April 17, 2019 through April 17, 2020 S&P500 series data because the noises differing from the major trends
data, P-FTD-denoised data and noise data. The denoised were eliminated.
data exhibited a smoothed and stable time series even in the In addition to the next-day stock index prediction, the
volatile sections of the original data, where the trend was coincidence of the directionality of the stock index was
unstable. Furthermore, the index values at both ends of the further verified using the hit ratio, the trials percentage when
VOLUME 0, 2021 7

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3086537, IEEE Access

Author et al.: Preparation of Papers for IEEE Access (May 2021)

TABLE 4. Summary of the forecasting evaluation results for each model.

S&P500 SSE KOSPI


MAE RMSE MAPE MAE RMSE MAPE MAE RMSE MAPE
P-FTD_RNN
12.922 19.577 0.469 10.031 13.684 0.338 9.915 12.857 0.449
(Group 1)
P-FTD_LSTM
9.252 15.855 0.344 9.667 13.091 0.325 7.795 10.669 0.362
(Group 1)
P-FTD_GRU
11.359 18.313 0.424 10.903 14.395 0.365 8.594 11.572 0.394
(Group 1)
RNN
19.423 32.874 0.726 22.084 31.047 0.741 15.589 21.732 0.728
(Group 2)
LSTM
18.157 32.508 0.694 22.055 31.158 0.739 14.752 21.159 0.693
(Group 2)
GRU
18.156 32.378 0.682 21.636 30.882 0.725 15.482 21.519 0.725
(Group 2)

the predicted direction is correct. The average hit ratios for


Groups 1 and 2 were 72.15% and 49.43%, respectively, and
the best hit ratios were 73.58% and 50.70%, respectively. The
average hit ratio for Group 1 increased by 45.96% compared
with that of Group 2. Fig. 9 also shows that all models in
Group 1 were more efficient in predicting the direction of
the daily closing price of all stock indices than the models in
Group 2. The accurate prediction performance of the models
with the P-FTD-denoised data is beneficial to investors and
can be a favorable candidate for predicting the direction of
next day’s closing price.
FIGURE 6. Comparison of the predicted values before and after applying
To further illustrate the denoising effects of P-FTD on P-FTD for each S&P500 model.
the time-based deep learning models, the performance of
the proposed models was compared with that of the afore-
mentioned studies (Table 6). These studies present different
denoising methods combined with deep learning models
to predict the next day’s stock index using S&P500, SSE,
and KOSPI data. As the evaluation metric to compare the
prediction performances of the different studies, the error
ratio between predicted and actual value MAPE was selected
because other evaluation metrics mentioned in Section IV.D
are significantly affected by the period and scale of data
observed. The table reveals that time series based deep learn-
ing models integrated with P-FTD outperform other deep
FIGURE 7. Comparison of the predicted values before and after applying
P-FTD for each SSE model.

FIGURE 8. Comparison of the predicted values before and after applying


P-FTD for each KOSPI model.

FIGURE 5. S&P500 P-FTD-denoised data and their noise from April 17,
2019 to April 17, 2020.

8 VOLUME 0, 2021

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3086537, IEEE Access

Author et al.: Preparation of Papers for IEEE Access (May 2021)

TABLE 5. Comparison of RMSE according to time lag

S&P500 SSE KOSPI


Lag 0 Lag 1 Lag 2 Lag 0 Lag 1 Lag 2 Lag 0 Lag 1 Lag 2
P-FTD_RNN
19.577 25.241 39.319 13.684 25.289 41.959 12.857 16.849 27.598
(Group 1)
P-FTD_LSTM
15.855 23.355 37.128 13.091 25.343 41.238 10.669 12.986 23.990
(Group 1)
P-FTD_GRU
18.313 23.812 34.548 14.395 22.591 37.322 11.572 16.426 29.209
(Group 1)
RNN
32.874 8.524 31.288 31.047 11.234 34.124 21.732 7.493 22.792
(Group 2)
LSTM
32.508 10.130 30.976 31.158 6.922 33.074 21.159 4.042 20.561
(Group 2)
GRU
32.378 8.248 30.279 30.880 7.822 32.355 21.519 6.559 20.650
(Group 2)

learning models combined with different denoising methods. TABLE 6. Performance comparison of previous research works.
The prediction performance of P-FTD_LSTM was higher by
75.3% than RTWD-LSTM, which was the best-performing Denoising Prediction
Index data MAPE(%) Ref.
model model
model among those in the literature for predicting the KOSPI
WT LSTM 1.5
index. [37]
WSAEs LSTM 1.1
S&P500
EWT LSTM 0.7583
VI. CONCLUSIONS [35]
In this study, we proposed a denoising method based on EWT ELM 0.6830
FFT with a padding technique to remove noises, which Proposed models 0.344 - 0.469 -
result in unstable learning of predictive models and time RTWD LSTM 1.3824 [36]
lag, in the stock market index. The proposed method also SSE EWT LSTM 1.2860
[35]
presents a solution to the divergence problem of the time EWT ELM 1.4086
series data when transformed from the frequency domain to
Proposed models 0.325 - 0.338 -
the time domain using FFT. The divergence, which occurs
RTWD LSTM 0.6346 [36]
at both ends of the time series data owing to the original
information being removed along with the noise waveforms, KOSPI WT NN 2.805
[34]
was prevented by first adding the padding areas containing WPCA NN 2.243
the diverging components and then removing them from Proposed models 0.362 - 0.449 -
the restored time series. The performance of the proposed
denoising technique was verified through experiments that

FIGURE 9. Comparison of the hit ratios before and after denoising for each model.

VOLUME 0, 2021 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3086537, IEEE Access

Author et al.: Preparation of Papers for IEEE Access (May 2021)

predicted the next day’s stock index by applying it to the [4] B. Krollner, B. J. Vanstone, and G. R. Finnie, “Financial time series
major indices of different countries and diverse time series- forecasting with machine learning techniques: a survey,” in Proc. Eur.
Symp. Artif. Neural Netw., Comput. Mach. Learn., Apr. 2010.
based deep learning models. [5] X. Zhang, X. Zhang, S. Qu, J. Huang, B. Fang, and P. Yu, “Stock market
The daily price and the trading volume data of S&P500, prediction via multi-source multiple instance learning,” IEEE Access, vol.
SSE, and KOSPI, which are the representative indices of the 6, pp. 50720-50728, 2018.
[6] E. Chong, C. Han, F. C. Park, “Deep learning networks for stock market
United States, China, and Korea, respectively, were used in analysis and prediction: Methodology, data representations, and case stud-
this experiment. As predictive models, the RNN, LSTM, and ies,” Expert Syst. Appl., vol. 83, pp. 187-205, Oct. 2017.
GRU deep learning models were used, compared, and ana- [7] M. Nabipour, P. Nayyeri, H. Jabani, S. S., and A. Mosavi, "Predicting stock
market trends using machine learning and deep learning algorithms via
lyzed to investigate the noise-reduction performance through continuous and binary data; a comparative analysis," IEEE Access, vol. 8,
P-FTD. According to the results, GRU and LSTM showed pp. 150199–150212, 2020.
the best performance among the basic models, whereas P- [8] D. Banerjee, “Forecasting of Indian stock market using time-series
ARIMA model,” in Proc. IEEE 2nd Int. Conf. Bus. Inf. Manag., Jan. 2014,
FTD_LSTM showed the best performance among the de- pp. 131-135.
noised models. Comparing the best-performing models, the [9] H. C. Liu and J. C. Hung, “Forecasting S&P-100 stock index volatility:
MAE, RMSE, and MAPE values of P-FTD_LSTM decreased The role of volatility asymmetry and distributional assumption in GARCH
models,” Expert Syst. Appl., vol. 37, no. 7, pp. 4928-4934, Jul. 2010.
for all indexes. The performances of the basic and denoised [10] S. Pyo, J. Lee, M. Cha, and H. Jang, “Predictability of machine learning
models for predicting the directionality of the stock index techniques to forecast the trends of market index prices: Hypothesis testing
were also compared using the hit ratio. The average hit ratio for the Korean stock markets,” PloS one, vol. 12, no. 11, Nov. 2017.
[11] A. Dingli and K. S. Fournier, "Financial time series forecasting-a machine
for the denoised models was greater than that for the basic learning approach," Mach. Learn. Appl., vol. 4, pp. 3, 2017.
models by 45.96%. These results demonstrate that the pre- [12] V. K. S. Reddy, "Stock market prediction using machine learning," Int.
diction after removing the noise from the input data using the Res. J. Eng. Techno., vol. 4, no. 10, pp. 1033-1035, Oct. 2018.
[13] D. M. Nelson, A. C. Pereira, and R. A. de Oliveira, “Stock market’s price
P-FTD shows better performance in all indices and models movement prediction with LSTM neural networks,” in IEEE Int. Joint
compared with that performed without noise removal. The Conf. Neural Netw., May 2017, pp. 1419-1426.
results also show an improvement in the time lag, in which [14] T. Fischer and C. Krauss, “Deep learning with long short-term memory
networks for financial market predictions,” Eur. J. Oper. Res., vol. 270, no.
the predicted values follow preceding values when predicted 2, pp. 654-669, Oct. 2018.
using noised data, and a robust model that responds more [15] Y. Baek and H. Y. Kim, “ModAugNet: A new forecasting framework for
stably to noises. In addition, the performance of the proposed stock market index value with an overfitting prevention LSTM module and
a prediction LSTM module,” Expert Syst. Appl., no.113, pp. 457-480, Dec.
models with that of other previous studies were compared 2018.
and verified their superiority. [16] M. Nikou, G. Mansourfar, and J. Bagherzadeh, "Stock price prediction
Although only the financial time series data were treated in using DEEP learning algorithm and its comparison with machine learning
this study, the proposed denoising technique could be used to algorithms," Intell. Syst. Accounting, Finance Manage., vol. 26, no. 4, pp.
164–174, Oct. 2019.
filter other types of time series data. Moreover, it has strength [17] M. Nabipour, P. Nayyeri, H. Jabani, A. Mosavi, and E. Salwana, "Deep
in that users can adjust the range of noise values. learning for Stock Market Prediction," Entropy, vol. 22, no. 8, p. 840, Aug.
This study has a limitation. The padding technique was 2020.
[18] S. Mehtab and J. Sen, "A time series analysis-based stock price
used to prevent only the divergence that occurs when restor- prediction using machine learning and deep learning models," 2020,
ing the original data after removing the waveforms with arXiv:2004.11697. [Online]. Available: https://arxiv.org/abs/2004.11697
low amplitude and high frequency. The padding values were [19] M. Qiu and Y. Song, “Predicting the direction of stock market index
movement using an optimized artificial neural network model,” PloS one,
randomly sampled from a normal distribution considering vol. 11, no. 5, May 2016.
the volatility of the original data. The number of padded [20] D. L. Minh, A. Sadeghi-Niaraki, H. D. Huy, K. Min, and H. Moon, “Deep
values that is smaller than that of the original data has minor learning approach for short-term stock trends prediction based on two-
stream gated recurrent unit network,” IEEE Access, vol. 6, pp. 55392-
effects on the restored data after the P-FTD processing. 55404, 2018.
However, if the number of padded values increases compared [21] T. Kim and H. Y. Kim, “Forecasting stock prices with a feature fusion
with the number of original data, the restored data can be LSTM-CNN model using different representations of the same data,” PloS
one, vol. 14, no. 2, Feb. 2019.
affected by another type of noise generated by the padded [22] M. Wen, P. Li, L. Zhang, and Y. Chen, “Stock market trend prediction
values. Therefore, additional studies on methods of finding using high-order information of time series,” IEEE Access, vol. 7, pp.
padding values that minimize the impact when the original 28299-28308, 2019.
[23] S. Mehtab, J. Sen, and A. Dutta, "Stock price prediction using machine
data are small are needed. In addition, the correlation among learning LSTM-based deep learning models," 2020, arXiv:2009.10819.
individual parameters and the denoising performance in the [Online]. Available: https://arxiv.org/abs/2009.10819
P-FTD process must be investigated. [24] H. Liu and Z. Long, “An improved deep learning model for predicting
stock market price time series,” Digit. Signal Process. A Rev. J., vol. 102,
p. 102741, 2020.
REFERENCES [25] Y. Zhang, B. Yan, and M. Aasma, "A novel deep learning framework:
[1] E. F. Fama, “Efficient capital markets: II,” J. Finance, vol. 46, no. 5, pp. Prediction and analysis of financial time series using CEEMD and LSTM,"
1575-1617, Dec. 1991. Expert Syst. Appl., vol. 159, pp. 113609, Nov. 2020.
[2] P. R. Junior, F. L. R. Salomon, and E. de Oliveira Pamplona E, “ARIMA: [26] C. J. Lu, T. S. Lee, and C. C. Chiu, “Financial time series forecasting using
An applied time series forecasting model for the Bovespa stock index,” independent component analysis and support vector regression,” Decision
Appl. Math., vol. 5, no. 21, pp. 3383, 2014. Support Syst., vol. 47, no. 2, pp. 115-125, 2019.
[3] S. M. Idrees, M. A. Alam, and P. Agarwal, “A prediction approach for [27] H. Hassani, A. Dionisio, and M. Ghodsi, “The effect of noise reduction
stock market volatility based on time series data,” IEEE Access, vol. 7, in measuring the linear and nonlinear dependency of financial markets,”
pp.17287-17298, 2019. Nonlinear Anal., Real World Appl., vol. 11, no. 1, pp. 492-502, Feb. 2010.

10 VOLUME 0, 2021

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3086537, IEEE Access

Author et al.: Preparation of Papers for IEEE Access (May 2021)

[28] J. H. Wang, J. H. Jiang, and R. Q. Yu, “Robust back propagation algorithm DONGHWAN SONG received the B.S. degree
as a chemometric tool to prevent the overfitting to outliers,” Chemometrics from the Department of Mechanical Advanced
Intel. Lab. Syst., vol. 34, no. 1, pp. 109-115, Aug. 1996. Materials Engineering, Ulsan National Institute of
[29] W. Zhao, D. Chen, and S. Hu, “Detection of outlier and a robust BP Science and Technology, Ulsan, South Korea, in
algorithm against outlier,” Comput. Chem. Eng., vol. 28, no. 8, pp. 1403- 2015, where he is currently pursuing the Ph.D.
1408, Jul. 2004. degree with the Department of System Design
[30] A. Raudys, V. Lenčiauskas, E. Malčius, “Moving averages for financial and Control Engineering. He has been involved
data smoothing,” in Proc. Int. Conf. Inf. Softw. Techno., Oct. 2013, pp. 34-
in several research projects such as SVM-based
45.
automatic product quality inspection system using
[31] C. N. Babu and B. E. Reddy, “A moving-average filter based hybrid
ARIMA–ANN model for forecasting time series data,” Appl. Soft Comput, thermal image data and modeling of financial be-
vol. 23, pp. 27-38, Oct. 2014. havior of agent using deep learning. His current research interests include
[32] C. J. Lu, “Integrating independent component analysis-based denoising Time series analysis based on machine learning and deep learning in the
scheme with neural network for stock price prediction,” Expert Syst. Appl., field of finance and economic.
vol. 37, no. 10, pp. 7056-7064, Oct. 2010.
[33] A. M. Awajan, M. T. Ismail, and S. Al Wadi, "Improving forecasting
ADRIAN MATIAS CHUNG BAEK received the
accuracy for stock market data using EMD-HW bagging," Plos one, vol.
M.S. degree in Mechanical Engineering from Ul-
13, no. 3, Jul. 2018.
[34] J. Chan Phooi M’ng and M. Mehralizadeh, "Forecasting East Asian indices san National Institute of Science and Technology,
futures via a novel hybrid of wavelet-PCA denoising and artificial neural South Korea, in 2020. He is currently pursuing the
network models," Plos one, vol. 11, no. 6, Jun. 2016. Ph.D. degree with the Department of Mechanical
[35] H. Yu, L. J. Ming, R. Sumei, and Z. Shuping, "A hybrid model for financial Engineering, Ulsan National Institute of Science
time series forecasting-Integration of EWT, ARIMA with the improved and Technology. His research interests include
ABC optimized ELM," IEEE Access, vol. 8, pp. 84501–84518, 2020. artificial intelligence, in particular, deep learning
[36] Z. Li and V. Tam, "Combining the real-time wavelet denoising and long- and reinforcement learning and their application
short-term-memory neural network for predicting stock indexes," in IEEE in 3D printing.
Sym. Ser. Comp. Intel., 2017, pp. 1-8.
[37] W. Bao, J. Yue, and Y. Rao, “A deep learning framework for financial time
series using stacked autoencoders and long-short term memory,” PloS one, NAMHUN KIM earned his B.S. degree in 1998
vol. 12, no. 7, Jul 2017. and M.S. degree in 2000 from KAIST. After that,
[38] M. Srivastava, C. L. Anderson, and J. H. Freed, “A new wavelet denoising he worked as a senior researcher in Samsung
method for selecting decomposition levels and noise thresholds,” IEEE Corning, Co., LTD for five years. Then, he re-
access, vol. 4, pp. 3862-3877, 2016. ceived his Ph.D. in Industrial and Manufacturing
[39] L. Chiron, M. A. van Agthoven, B. Kieffer, C. Rolando, and M.-A. Engineering from Penn State University, Univer-
Delsuc, "Efficient denoising algorithms for large experimental datasets sity Park, PA, USA in 2010. He worked as a
and their applications in Fourier transform ion cyclotron resonance mass research associate in Penn State University from
spectrometry," Proc. Nat. Acad. Sci., vol. 111, no. 4, pp. 1385–1390, Jan. January to June 2010. He joined UNIST in 2010
2014. and is currently working as an associate professor
[40] J. Wang, Y. Guo, Y. Ying, Y. Liu, and Q. Peng, “Fast non-local algo- in the Department of System Design and Control Engineering, acting as the
rithm for image denoising,” in Proc. IEEE Int. Conf. Image Process., pp.
director of 3D printing research center at UNIST, Korea. His interest is in
1429–1432, Oct. 2006.
manufacturing technologies with emphasis on additive manufacturing (3D
[41] A. Mustafi and S. K. Ghorai, “A novel blind source separation technique
using fractional Fourier transform for denoising medical images,” Optik, printing), manufacturing system modeling and agent-based simulation.
vol. 124, no. 3, pp. 265-271, Feb. 2013.
[42] M.-Y. Chen and B.-T. Chen, “Online fuzzy time series analysis based on
entropy discretization and a Fast Fourier Transform,” Appl. Soft Comput.,
vol. 14, pp. 156-166, Jan. 2014.
[43] R. N. Bracewell, The Fourier Transform and its Applications. New York:
McGraw-Hill, 1986.
[44] J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation
of complex Fourier series,” Math. Comput., vol. 19, no. 9, pp. 297-301,
Apr. 1965.
[45] A. M. Logar, E. M. Corwin, and W. J. Oldham. "A comparison of recurrent
neural network learning algorithms," In Proc. IEEE Int. Conf. Neural Netw.
Mach. Learn., pp. 1129-1134, Mar. 1993.
[46] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
Comput., vol. 9, no. 8, pp. 1735-1780, Nov. 1997.
[47] K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio, “On the prop-
erties of neural machine translation: Encoder-decoder approaches,” 2014,
arXiv:1409.1259. [Online]. Available: https://arxiv.org/abs/1409.1259
[48] K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmid-
huber. "LSTM: A search space odyssey," IEEE Trans. Neural Netw. Mach.
Learn., vol. 28, no. 10, pp. 2222-2232, Oct. 2017.
[49] R. Jozefowicz, W. Zaremba, and I. Sutskever, "An empirical exploration of
recurrent network architectures," in Proc. Int. Conf. Mach. Learn., 2015,
pp. 2342-2350.
[50] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, New York:
John Wiley & Sons, 2012.

VOLUME 0, 2021 11

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/

You might also like