2310.16855v1
2310.16855v1
2310.16855v1
Algorithms
Ryan Chipwanya
Academy of Computer Science and Software Engineering
University of Johannesburg
Johannesburg, South Africa
219027938@student.uj.ac.za
Abstract—The stock market has been established since the 13th above 80 percent accuracy in forecasting stocks [1,2]. Despite
arXiv:2310.16855v1 [q-fin.ST] 24 Oct 2023
century, but in the current epoch of time, it is substantially more the fact that studies have continued to discuss the difficulties in
practicable to anticipate the stock market than it was at any other precisely predicting future stock prices. ML Algorithms have
point in time due to the tools and data that are available for both
traditional and algorithmic trading. There are many different made it possible to estimate future prices or directional biases
machine learning models that can do time-series forecasting in for stocks and other asset classes with significant outcomes
the context of machine learning. These models can be used to [1,2].
anticipate the future prices of assets and/or the directional bias The results of this study provide valuable insights into
of assets. In this study, we examine and contrast the effectiveness the investigation of a variety of ML and DL approaches to
of three different machine learning algorithms—namely, logistic
regression, decision tree, and random forest—to forecast the stock market forecasting. In this work, we investigate how
movement of the assets traded on the Japanese stock market. In well ML Algorithms like Logistic Regression, Decision Trees,
addition, the models are compared to a feed forward deep neural and Random Forest perform when it comes to forecasting the
network, and it is found that all of the models consistently reach directional bias in assets that are listed on the Japanese Stock
above 50% in directional bias forecasting for the stock market. Exchange, and we compare their results to those of the most
The results of our study contribute to a better understanding
of the complexity involved in stock market forecasting and give advanced DL models available. It is believed that this research
insight on the possible role that machine learning could play in will help to a greater understanding of the significance of ML
this context. and DL algorithms in the process of predicting stock market
Index Terms—Machine Learning, Stock Market, Prediction, prices.
Classification
II. M ETHOD
I. I NTRODUCTION A. ML Models
Forecasting the stock market and carrying out algorithmic In order to evaluate the efficacy of stock market forecasting,
trading both benefit significantly from the application of we will set up the task as a binary classification one, in which
machine learning (ML). A skill that can be acquired, predicting we will predict whether the movement of the stock for the
the stock market entails learning and utilising information day will be ”up” = 1 or ”down” = 0. Thawornwong and
and resources on both fundamental and technical analytical Enke’s research demonstrated that directional predictions for
techniques in order to estimate the future price of an asset stocks perform better than exact numerical predictions [7]. In
[1]. This talent can be acquired via practise. addition, we create three machine learning models, which are
Traditional trading methodologies, on the other hand, intro- referred to as a Logistic Regressor (LR), a Decision Tree (DT),
duce an increased probability of errors in accurate predictions. and a Random Forest (RF). We will utilise binary entropy and
These errors can be caused by human emotions (such as entropy for the Logistic Regressor as well as accuracy as a
fear and greed) that drive impulsive trading behaviour in metric to evaluate the performance of the LR and DT + RF
high-volume market conditions, unprovoked fundamental news models, respectively, in order to calculate the impurity.
events, and a lack of necessary skills to adequately forecast
assets [4]. B. The Data set
The availability of essential data, including as news, prices, The Kaggle JPX Tokyo Stock Exchange Prediction Compe-
and indicators for critical analysis and forecasting [5,6], has tition, which was organised by the Japanese Exchange Group
made it significantly simpler throughout the course of human [8] will serve as the data set that we will be utilising in this
history to foresee the behaviour of various markets, and this study. In addition, for the purpose of assessment, we will apply
trend continues into the present day. a filter to the data set, extract the price data for Sony, and then
Studies have continued to address the problems in exactly remove the columns labelled ”Date,” ”Open,” ”High,” ”Low,”
predicting future stock prices; yet, they have also proceeded to and ”Close,” as well as the ”Volume” column. In addition to
reach promising findings in asset forecasting, evaluated using this, we are going to make two more columns and label them
ML and Deep Learning (DL) algorithms, proving to attain ”Next” and ”Target.” The ”Next” column displays the ”Close”
price for the following trading day, while the ”Target” column a comparison of the various models’ performance as well as
is used to classify whether the movement will be an increase their overall cost.
or a decrease, shown by the numbers 1 and 0 accordingly. In addition, in order to evaluate how well our ML models
The ”Target” column is the result of applying a comparison perform in comparison to those of our DL model, a Feed
operator to the daily data, which compares the prices at the forward Neural Network (FNN) was developed throughout the
end of the current day to those of the following day. ”Date”, course of the research using the TensorFlow library and given
”Open”, ”High”, ”Low”, ”Close”, and ”Volume” will be the the same task as the ML models. In order to fine-tune the FNN
features that are retrieved and used for training in order to architecture’s predictive capabilities, some hyper parameters
properly anticipate this time-series forecasting domain. were adjusted throughout the configuration process. An input
layer that consisted of five neurons was included in the model.
These neurons corresponded to the features that were selected
for the task. These features included ’Close,’ ’Volume,’ ’Open,’
’High,’ and ’Low.’ After that, two hidden layers with a total
of 128 and 64 neurons were added to the model in order to
give it the ability to recognise detailed patterns in the financial
data. In the output layer, which was designed specifically
for binary classification, a single neuron equipped with the
sigmoid activation function was used to determine whether
or not the stock price will go up. Model optimisation was
accomplished through the use of the Adam optimizer, and the
binary cross-entropy loss function was selected as the optimal
option given the characteristics of the binary classification
challenge. Table 4 displays the architecture of the FNN. The
FNN was put through a strenuous training routine that lasted
for ten epochs and included a batch size of 32 in addition
to validation monitoring. The performance of the model was
evaluated using a specific test data set, and the results showed
that the model had a test loss of roughly 0.68 and a test
accuracy of 59%. The F1 score, which exhibited a balanced
performance with a score of 0.74, demonstrated that the model
was proficient in both precision and recall. However, it is
important to note that the confusion matrix highlighted a
substantial class imbalance. This indicated that the model
consistently identified all occurrences as the positive class,
which highlights the necessity for additional analysis and
future model changes. In contrast, the accuracy and F1 score
have similar performance to our baseline and state-of-the-art
models, which have proven that adopting a DL or ML model
does not make a substantial difference in the results.
A. Evaluation
Fig. 1. The proposed framework In the course of our empirical research, we predicted the
direction of the bias by applying three different machine
learning models: logistic regression (LR), decision tree (DT),
III. R ESULTS and random forest (RF). Our LR model scored a remarkable
accuracy of 55%, which indicates that it is able to accurately
The outcomes of each machine learning model are sum- estimate directional trends in slightly more than half of the
marised in the tables that follow. The model setup for the situations. The LR model was surpassed by the baseline DT
logistic regression model can be seen in Table 1. This con- model, which had an accuracy rate of 59%. This finding
figuration exhibits the implemented functions, the training demonstrates that it has more advanced skills for recognising
information, and the model assessment metrics. Table 2 pro- data patterns than LR does. In conclusion, the high-performing
vides an overview of the DT and RF classifiers, including RF classifier had an accuracy of 63%, demonstrating the
the parameters, types of sampling, and model evaluation benefits of using ensemble methods to improve prediction
metrics that are specific to each classifier. Table 3 provides accuracy.
TABLE I TABLE III
E XPERIMENTAL S ETUP FOR L OGISTIC R EGRESSION M ODEL E XPERIMENTAL R ESULTS VALIDATION ACCURACY & F1 S CORE