2310.16855v1

Stock Market Directional Bias Prediction Using ML
Algorithms
Ryan Chipwanya
Academy of Computer Science and Software Engineering
University of Johannesburg
Johannesburg, South Africa
219027938@student.uj.ac.za
Abstract—The stock market has been established since the 13th above 80 percent accuracy in forecasting stocks [1,2]. Despite
arXiv:2310.16855v1 [q-fin.ST] 24 Oct 2023
century, but in the current epoch of time, it is substantially more the fact that studies have continued to discuss the difficulties in
practicable to anticipate the stock market than it was at any other precisely predicting future stock prices. ML Algorithms have
point in time due to the tools and data that are available for both
traditional and algorithmic trading. There are many different made it possible to estimate future prices or directional biases
machine learning models that can do time-series forecasting in for stocks and other asset classes with significant outcomes
the context of machine learning. These models can be used to [1,2].
anticipate the future prices of assets and/or the directional bias The results of this study provide valuable insights into
of assets. In this study, we examine and contrast the effectiveness the investigation of a variety of ML and DL approaches to
of three different machine learning algorithms—namely, logistic
regression, decision tree, and random forest—to forecast the stock market forecasting. In this work, we investigate how
movement of the assets traded on the Japanese stock market. In well ML Algorithms like Logistic Regression, Decision Trees,
addition, the models are compared to a feed forward deep neural and Random Forest perform when it comes to forecasting the
network, and it is found that all of the models consistently reach directional bias in assets that are listed on the Japanese Stock
above 50% in directional bias forecasting for the stock market. Exchange, and we compare their results to those of the most
The results of our study contribute to a better understanding
of the complexity involved in stock market forecasting and give advanced DL models available. It is believed that this research
insight on the possible role that machine learning could play in will help to a greater understanding of the significance of ML
this context. and DL algorithms in the process of predicting stock market
Index Terms—Machine Learning, Stock Market, Prediction, prices.
Classification
II. M ETHOD
I. I NTRODUCTION A. ML Models
Forecasting the stock market and carrying out algorithmic In order to evaluate the efficacy of stock market forecasting,
trading both benefit significantly from the application of we will set up the task as a binary classification one, in which
machine learning (ML). A skill that can be acquired, predicting we will predict whether the movement of the stock for the
the stock market entails learning and utilising information day will be ”up” = 1 or ”down” = 0. Thawornwong and
and resources on both fundamental and technical analytical Enke’s research demonstrated that directional predictions for
techniques in order to estimate the future price of an asset stocks perform better than exact numerical predictions [7]. In
[1]. This talent can be acquired via practise. addition, we create three machine learning models, which are
Traditional trading methodologies, on the other hand, intro- referred to as a Logistic Regressor (LR), a Decision Tree (DT),
duce an increased probability of errors in accurate predictions. and a Random Forest (RF). We will utilise binary entropy and
These errors can be caused by human emotions (such as entropy for the Logistic Regressor as well as accuracy as a
fear and greed) that drive impulsive trading behaviour in metric to evaluate the performance of the LR and DT + RF
high-volume market conditions, unprovoked fundamental news models, respectively, in order to calculate the impurity.
events, and a lack of necessary skills to adequately forecast
assets [4]. B. The Data set
The availability of essential data, including as news, prices, The Kaggle JPX Tokyo Stock Exchange Prediction Compe-
and indicators for critical analysis and forecasting [5,6], has tition, which was organised by the Japanese Exchange Group
made it significantly simpler throughout the course of human [8] will serve as the data set that we will be utilising in this
history to foresee the behaviour of various markets, and this study. In addition, for the purpose of assessment, we will apply
trend continues into the present day. a filter to the data set, extract the price data for Sony, and then
Studies have continued to address the problems in exactly remove the columns labelled ”Date,” ”Open,” ”High,” ”Low,”
predicting future stock prices; yet, they have also proceeded to and ”Close,” as well as the ”Volume” column. In addition to
reach promising findings in asset forecasting, evaluated using this, we are going to make two more columns and label them
ML and Deep Learning (DL) algorithms, proving to attain ”Next” and ”Target.” The ”Next” column displays the ”Close”
price for the following trading day, while the ”Target” column a comparison of the various models’ performance as well as
is used to classify whether the movement will be an increase their overall cost.
or a decrease, shown by the numbers 1 and 0 accordingly. In addition, in order to evaluate how well our ML models
The ”Target” column is the result of applying a comparison perform in comparison to those of our DL model, a Feed
operator to the daily data, which compares the prices at the forward Neural Network (FNN) was developed throughout the
end of the current day to those of the following day. ”Date”, course of the research using the TensorFlow library and given
”Open”, ”High”, ”Low”, ”Close”, and ”Volume” will be the the same task as the ML models. In order to fine-tune the FNN
features that are retrieved and used for training in order to architecture’s predictive capabilities, some hyper parameters
properly anticipate this time-series forecasting domain. were adjusted throughout the configuration process. An input
layer that consisted of five neurons was included in the model.
These neurons corresponded to the features that were selected
for the task. These features included ’Close,’ ’Volume,’ ’Open,’
’High,’ and ’Low.’ After that, two hidden layers with a total
of 128 and 64 neurons were added to the model in order to
give it the ability to recognise detailed patterns in the financial
data. In the output layer, which was designed specifically
for binary classification, a single neuron equipped with the
sigmoid activation function was used to determine whether
or not the stock price will go up. Model optimisation was
accomplished through the use of the Adam optimizer, and the
binary cross-entropy loss function was selected as the optimal
option given the characteristics of the binary classification
challenge. Table 4 displays the architecture of the FNN. The
FNN was put through a strenuous training routine that lasted
for ten epochs and included a batch size of 32 in addition
to validation monitoring. The performance of the model was
evaluated using a specific test data set, and the results showed
that the model had a test loss of roughly 0.68 and a test
accuracy of 59%. The F1 score, which exhibited a balanced
performance with a score of 0.74, demonstrated that the model
was proficient in both precision and recall. However, it is
important to note that the confusion matrix highlighted a
substantial class imbalance. This indicated that the model
consistently identified all occurrences as the positive class,
which highlights the necessity for additional analysis and
future model changes. In contrast, the accuracy and F1 score
have similar performance to our baseline and state-of-the-art
models, which have proven that adopting a DL or ML model
does not make a substantial difference in the results.
A. Evaluation
Fig. 1. The proposed framework In the course of our empirical research, we predicted the
direction of the bias by applying three different machine
learning models: logistic regression (LR), decision tree (DT),
III. R ESULTS and random forest (RF). Our LR model scored a remarkable
accuracy of 55%, which indicates that it is able to accurately
The outcomes of each machine learning model are sum- estimate directional trends in slightly more than half of the
marised in the tables that follow. The model setup for the situations. The LR model was surpassed by the baseline DT
logistic regression model can be seen in Table 1. This con- model, which had an accuracy rate of 59%. This finding
figuration exhibits the implemented functions, the training demonstrates that it has more advanced skills for recognising
information, and the model assessment metrics. Table 2 pro- data patterns than LR does. In conclusion, the high-performing
vides an overview of the DT and RF classifiers, including RF classifier had an accuracy of 63%, demonstrating the
the parameters, types of sampling, and model evaluation benefits of using ensemble methods to improve prediction
metrics that are specific to each classifier. Table 3 provides accuracy.
TABLE I TABLE III
E XPERIMENTAL S ETUP FOR L OGISTIC R EGRESSION M ODEL E XPERIMENTAL R ESULTS VALIDATION ACCURACY & F1 S CORE
LR Model Description Model Accuracy F1 Score

Functions LR 0.55 0.71
• sigmoid(X, theta): Computes the
sigmoid function for logistic regression. DT 0.59 0.74
• entropy(X, y, theta) RF 0.63 0.74
• logistic_regression(X, y,
alpha, epochs): Trains the logistic
regression model using gradient TABLE IV
descent. It takes learning rate (α) and E XPERIMENTAL S ETUP FOR F EED FORWARD N EURAL N ETWORK M ODEL
the number of training epochs (epochs)
as hyper parameters. Feed forward Description
Neural
Training Network
Details • Trained the logistic regression model us- Model
ing the training data. Architecture
• Learning Rate (α): 0.01 • Input Layer: 5 input features (Close, Volume,
• Number of Training Epochs (epochs): Open, High, Low).
1000 • Hidden Layers: Two hidden layers with 128 and
• Model’s parameters (θ) and cost history 64 neurons, respectively.
are saved. • Output Layer: 1 neuron with a sigmoid activa-
tion function.
Model Evalua-
tion Training
Model Testing Details • Trained the feed forward neural network model
• Made predictions on the validation set using the training data.
using the trained logistic regression • Optimizer: Adam optimizer.
model. • Loss Function: Binary cross-entropy loss.
• Number of Training Epochs: 10
• Batch Size: 32
Performance • Validation Split: 20% of the training data.
Metric • Evaluated the model’s performance us-
ing accuracy as the evaluation metric.
• The accuracy score was calculated by Model Evalua-
comparing the model’s predictions to the tion
true target values. Model Testing
• Made predictions on the test set using the trained
Experimental feedforward neural network model.
Configuration
Learning Rate 0.01 Performance
(α) Metrics • Evaluated the model’s performance using the
following metrics:
Number of 1000
– Test Loss: Binary cross-entropy loss on the
Training
test set.
Epochs
– Test Accuracy: Accuracy of the model on
(epochs)
the test set.
– F1 Score: F1 score for binary classification
on the test set.
– Confusion Matrix: Confusion matrix on the
TABLE II test set.
E XPERIMENTAL S ETUP D ETAILS FOR DT & RF
Experimental
Component Description Configuration
DT & RF Pa- Learning Rate
rameters • Number of Trees (Estimators): 250 & Adam
• Maximum Features: 5 optimizer
• Maximum Depth: 100 default
• Minimum Samples Split: 100 (adaptive
learning rate)
Sampling Bootstrap sampling to create multiple subsets of the Number of 10
training data, used to train individual decision trees. Training
Error Estima- The Out-of-Bag (OOB) error is calculated by measur- Epochs
tion ing mis-classifications on data points not included in
the bootstrap sample.
Model Evalua- The validation accuracy of the model by comparing its IV. D ISCUSSION
tion predictions to the true target values.
A. Summary of key findings
In this study we applied machine learning methods LR,DT
and RF to predict the directional bias in assets listed on the
Japanese Stock Exchange [8]. We discussed our primary ob- the effectiveness of machine learning models. The application
jective which was to comparatively evaluate the performance of machine learning to the task of stock market forecasting
of the models in the context of stock market predictions and has the potential to have far-reaching repercussions not only
achieved plausible results. for investors but also for the financial industry as a whole
Our LR model being labelled as our Naive model, managed as the field of financial technology continues to advance. This
to achieve an accuracy of 55% which indicates an impressive research provides a basis for future studies that aim to solve the
naive performance which paves the way to compare to more obstacles and uncertainties involved with financial prediction
technically complex ML algorithms such as our baseline DT using machine learning models. These investigations will be
and RF models. The DT classifier, labelled as our baseline conducted in the future.
model achieved a validation accuracy of 59% which showed
R EFERENCES
a mammoth increase in validation accuracy compared to the
LR model and lastly our state-of-the-art (SOTA) RF model [1] I. Parmar et al., Stock Market Prediction Using Machine Learning,
2018 First International Conference on Secure Cyber Computing and
achieved performance of 63% displaying the added advantage Communication (ICSCCC), Jalandhar, India, 2018, pp. 574-576, doi:
of the ensemble of DT’s. 10.1109/ICSCCC.2018.8703332.
In the context of existing literature, our models prove to [2] P. S and V. P. R, Stock Price Prediction using Machine Learning and
Deep Learning, 2021 IEEE Mysore Sub Section International Confer-
align with existing research in ML algorithms for stock market ence (MysuruCon), Hassan, India, 2021, pp. 660-664, doi: 10.1109/My-
prediction in the discussion of accuracy, in particular Zhong suruCon52639.2021.9641664.
and Enke’s study suggested that for stock market binary [3] L. Mathanprasad and M. Gunasekaran, Analysing the Trend of Stock
Market and Evaluate the performance of Market Prediction using Ma-
classification , ML models may generally achieve around a chine Learning Approach, 2022 International Conference on Advances in
60% accuracy which is in margin of our baseline and SOTA Computing, Communication and Applied Informatics (ACCAI), Chen-
models [9]. Furthermore, DL models such as Artificial Neural nai, India, 2022, pp. 1-9, doi: 10.1109/ACCAI53970.2022.9752616.
[4] A. W. Lo, D. V. Repin, and B. N. Steenbarger, Fear and Greed in
Network and Deep Neural Network experimented in the same Financial Markets: A Clinical Study of Day-Traders, The American
study by Zhong and Enke , achieved results of 58.6% and Economic Review, vol. 95, no. 2, 2005, pp. 352–359, http://www.jstor.
59.9% , in-line with out DT and RF classifiers. org/stable/4132846.
[5] I. Hwang, A Brief History of the Stock Market, June 15, 2023,
Accessed September 28, 2023,
V. C ONCLUSION https://www.sofi.com/learn/content/history-of-the-stock-market/.
In conclusion, the findings of our research show that ma- [6] Y. Han, Y. Liu, G. Zhou, and Y. Zhu, Technical Analysis in the Stock
Market: A Review, SSRN, May 21, 2021, Available at SSRN:
chine learning algorithms are capable of accurately predicting https://ssrn.com/abstract=3850494 or
the direction in which assets traded on the Japanese stock http://dx.doi.org/10.2139/ssrn.3850494.
market will tend to move. According to the findings of [7] S. Thawornwong and D. Enke, The adaptive selection of financial and
economic variables for use with artificial neural networks,
our investigation, a number of models, such as the Logistic Neurocomputing, vol. 56, 2004, pp. 205–232.
Regressor, Decision Tree, Random Forest Classifier, and the [8] A. Sugiyama, C. Hio(Alpaca), E. Kaji, n-onishi, s-meitoma - JPX, S.
Deep Learning Feed Forward Neural Network (FFNN) model, Takato, and T. Kitayama(Alpaca), JPX Tokyo Stock Exchange
Prediction, Kaggle, 2022,
routinely attain accuracy rates that are higher than 50%. These https://kaggle.com/competitions/jpx-tokyo-stock-exchange-prediction.
findings have important repercussions for the study of financial [9] X. Zhong and D. Enke, Predicting the daily return direction of the
data and the development of investment strategies. stock market using hybrid machine learning algorithms, Financial
Innovation, vol. 5, 2019, p. 24, doi: 10.1186/s40854-019-0138-0.
However, it is essential to recognise the limitations of our [10] Jamili Zaini, Bahtiar, Rosnalini Mansor, Norhayati Yusof, and Beh Hui
study, specifically the lack of advanced metrics and trading Sang. 2020.“Classify Stock Market Movement Based on Technical
performance data. This absence raises doubts regarding the Analysis Indicators Using Logistic Regression”. Journal of Advanced
Research in Business and Management Studies 14 (1):35-41.
profitability and return on investment (ROI) of the models [11] Deep learning for financial time series forecast fusion and optimal
in question. Due to the unpredictability of the market, the portfolio rebalancing, S Laher, A Paskaramoorthy, TL Van Zyl, 2021
financial risks connected with adopting predictions generated IEEE 24th International Conference on Information Fusion (FUSION),
1-8
by machine learning for trading portfolios remain substantial. [12] Parden: Surrogate assisted hyper- parameter optimisation for portfolio
In order to improve on these findings, future research should selection, TL van Zyl, M Woolway, A Paskaramoorthy, 2021, 8th
investigate the possibility of including technical analysis indi- international conference on soft computing & machine intelligence
cators and use advanced deep learning models suitable for
time-series forecasting, such as Long Short-Term Memory
(LSTM) and Recurrent Neural Networks (RNN). For instance,
Zaini et als paper found that using technical indicators with a
LR model achieve a classification accuracy of 86% [10]. From
this, it is possible that this will result in an improvement in
the accuracy of directional bias predictions in stock markets.
Our research contributes to the understanding of the com-
plexity of accurately predicting the behaviour of the stock
market and paves the way for additional investigation of meth-
ods, tools, and indicators that have the potential to enhance

2310.16855v1

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

2310.16855v1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2310.16855v1

Uploaded by

Copyright:

Available Formats

Stock Market Directional Bias Prediction Using ML

LR Model Description Model Accuracy F1 Score

You might also like