[go: up one dir, main page]

0% found this document useful (0 votes)
19 views8 pages

Machine Learning

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 8

PERFORMANCE EVALUATION OF MACHINE LEARNING ALGORITHMS FOR

THE PREDICTION OF NIGERIA BREWERIES PLC STOCK MARKET PRICE

ABSTRACT
Predicting the future has always been an innovative task, but it becomes more fascinating when it
involves money and risk like predicting the Stock Market Price (SMP). However, the SMP is
affected by different factors such as the economic standard of a Nation, the supply-demand
relationship of stock, and the operating standards of the company. Therefore, this research aims
to design a predictive model to compare the performance of Support Vector Machine (SVM),
Linear Regression, and Random Forest (RF) algorithms for the prediction of SMP. A primary
dataset was gathered for Nigeria Breweries Plc, via investing.com, which comprises of five (5)
years data from October 2016 to November 2021. The features of the dataset include daily open
price, daily close price, daily highest price, daily lowest price, and daily trading volume. The
dataset was divided into two chronological sets (70% and 30%). Python anaconda and python
data science library were used to build and train the models. The dataset was trained and tested to
evaluate the performance of the algorithms based on their Root Mean Score Error (RMSE), R-
Squared Error (R2). Nevertheless, RF indicates the lowest RMSE of 0.48 and highest R 2 of 0.95.
RF algorithm predicts the stock market price correctly with a substantial result that correlates
with the standard of pure and applied sciences field. Therefore, RF is suitable for the prediction
of stock market price (SMP).
*Keywords*: Stock Market Price, Nigeria Breweries Plc., Support Vector Machine (SVM),
Linear Regression, And Random Forest (RF).
1.0 INTRODUCTION
Stock market price prediction is one of the most challenging tasks to complete in the financial
industry. There are numerous causes for the limitations in the accurate prediction of the stock
market price, among which are market volatility, a variety of other dependents, and independent
variables that affect the value of a certain stock in the market. These variables make it extremely
difficult for any stock market analyst to predict the rise and fall of the market price with great
precision. (Hiransha et al., 2018) affirms that the stock market price is one of the most
captivating inventions, financial analysts have pointed out its impact on the nation’s economy.
Moreover, accurate predictions of future stock market prices can yield considerable profits.
However, the stock market is affected by different physical and rational factors, such as the
economic standard of a Nation, the supply-demand relationship of stock, the operating standards
of the company, and investor sentiment. An additional problem with the prediction of the stock
market price using machine learning is the time issues, this could lead to poor prediction
accuracy with machine learning algorithms because they depend on the historical data.
Therefore, this study sought to design a predictive model to compare the performance of Support
Vector Machine (SVM), Linear Regression, and Random Forest (RF) algorithms for the
prediction of Stock Market Price (SMP).
2.0 RELATED RESEARCH

Conventionally, only historical data were used for forecasting share prices often time. However,
analysts now admit that relying purely on historical data isn't perfect because several factors are
keys to determining the stock price. Singh et al., (2019) applied different approaches to predict
stock prices but, a high rate of accuracy was not achieved even after analyzing the major factors
hindering the stock price. The writers have evaluated the main techniques such as SVM,
Regression, Random Forest, etc. also examined hybrid approach by joining two or more
techniques. The writers believe some models perform better with historical data than with
sentiment data. However, fusion algorithms yielded results with higher accuracy.
Agarwal, R., & Sagar, P. (2019) analyzed the behavior of the stock market and determine the
suitable techniques from several traditional machine learning algorithms which included Random
Forest (RF), Support Vector Machine (SVM), Naive Bayes, K-Nearest Neighbor (KNN), and
Softmax Regression for stock market prediction. The writers conducted a comparative study of
these approaches, several technical indicators were applied to the data that was gathered from
different data sources including Yahoo and NSE-India. The accuracy of each model was
measured and it was observed that RF gave the most productive results for large datasets
meanwhile, for small datasets Naive Bayesian showed the highest accuracy. Another observation
made was, that as the count of technical indicators was reduced the accuracy of the models
decreased. Cao et al., (2019) proposed two hybrid predictive models that incorporate Linear
Regression. A financial timeline is a type of indirect and indirect signal, which can be divided
into several internal mode functions of different time measurements with real EMD and
complete rotation durability with a variable volume (CEEMDAN). To confirm the influence of
historical data on the predictive outcome, LSTM speculative models are developed across the
series of features from EMD and CEEMDAN inclusion. The final result of the prediction was
obtained by reconstructing each series of predictions. The predictive performance of the
proposed models was confirmed by a linear regression analysis of major stock market indicators.
Compared to a single LSTM model, a support vector machine (SVM), a multi-layer perceptron
(MLP), and other hybrid models, the test results show that their proposed models show better
performance in predicting a single step forward financial series. Sezer et al., (2020) categorized
various studies according to their target predictive areas, such as indicators, forex, and asset
predictions, but also included them based on their DL model decisions, such as Convolutional
Neural Networks (CNNs), Deep Belief Networks (DBNs), and Short-Term Memory (LSTM).
Yan and Ouyang, (2018) propose a time series forecast model to capture complex features such
as irregularity and sequence of the financial timeline. LSTM is then used in predicting the daily
closing price of the Shanghai Composite Index and comparing its predictive capabilities with
machine-readable models such as a multi-layered perceptron, a vector support machine, and K-
neighbors. Strength results show that LSTM makes a better predictor effect, and shows excellent
results in static predictions and dynamic trend forecasts for the financial timeline, demonstrating
its effectiveness and efficiency in predicting the financial timeline series. At the same time, both
wavelet decomposition and reconstruction of the financial timeline series can improve the
normal performance of the LSTM speculative model and the accuracy of the long-term trending
variable trend.
Weng et al., (2018) focused on short-term stock price prediction by using ensemble methods of
four well-known machine learning models. The dataset for this study is five different data. Their
dataset was obtained from three open-sourced APIs and an R package named TTR. The machine
learning models they used were Neural Network Regression Ensemble, Random Forest with
unpruned regression trees as base learners, AdaBoost with unpruned regression trees as base
learners, and Support Vector Regression Ensemble. An in-depth study of ensemble methods
specified for short-term stock price prediction. With background knowledge, eight technical
indicators were used in this study and then assessed five datasets. The primary impact of this
paper is that they developed an environment for investors using R, which does not need users to
input their data but calls API to fetch the data from an online platform easily. From the research
view, they only evaluated the prediction of the price for 1 up to 10 days ahead but did not
evaluate longer terms than two trading weeks or a shorter term than 1 day. The primary
constraint of their research was that they only analyzed 20 indigenous US stocks, the model
might not be generalized to other stock markets or need further re-approval to confirm if it
suffered from over-fitting problems. Fischer & Krauss (2018) applied Linear Regression to
financial market prediction. The dataset they used is S&P500 index constituents from Thomson
Reuters. They obtained all month-end constituent lists for the S&P500 from Dec 1989 to Sep
2015, then consolidated the lists into a binary matrix to eliminate survivor bias. The writers also
used RMSprop as an optimizer, known as a mini-batch version of Rprop. The major advantage
of this work is that the writers used the latest deep learning techniques to perform predictions.
They relied on the LSTM technique, and lack of background knowledge in the financial domain.
Moreover, the LSTM outperformed the standard DNN and logistic regression algorithms, while
the author did not mention the effort to train an LSTM with long-time dependencies.
3.0 METHODOLOGY
A Python anaconda navigator which has multiple data science packages embedded in it, and
updated library like Numpy, Pandas, Matplotlib, Seaborn, Sci-kit learn, StandardScaler, Keras,
etc was adopted for this study. The models used in this study were Support Vector Machine
(SVM), Linear Regression (LR), and Random Forest (RF).
Dataset Acquisition
A market dataset was gathered for Nigeria Breweries Plc via (ng.investing.com) which
comprises of five (5) years raw data from October, 2016 to November, 2021. The dataset
features includes daily open price, daily close price, daily highest price, daily lowest price, and
daily trading volume. It contains 1259 entries rows and seven (7) columns.
Data Preprocessing
The data pre-processing techniques for this study involves the conversion of the date column
which was in a string format to date-time data type and a filter method was used to reduce noise
data major improve the accuracy performance. Rouf et al., (2021) believes the output depends on
the pre-processing of the data. The textual data must be transformed into a structured format that
can be used in a machine learning model.
Algorithm
Machine learning approach relies mostly on algorithms, a set of rules that when followed leads to
a desired output. The algorithm of this study is written below.
Step 1: Start.
Step 2: Import libraries to be used and dataset.
Step 3: Data Cleaning.
Step 4: Formation of new/clean dataset.
Step 5: Visualization of data.
Step 6: Train and test data split (70% to 30%).
Step 7: Algorithm using feature subset.
Step 8: Training and testing the model.
Step 9: Model Evaluation.
Calculate model accuracy
Step 10: Report.
Step 11: Stop.
Flowchart
The flowchart shows different shapes and diagrams that are connected by arrow. Each shape
represents a step in the process, and the arrows shows the order in which they occur in the study.
The flowchart for this study is shown in Fig 1 below.
Start

Import Libraries

Data Preprocessing

Data Cleaning New Dataset Data Scaling

Visualization of the new dataset

Train Test split (70% to 30%)

Training and testing the selected models

Model Evaluation

Stop
Figure 1: Depicting the Flowchart
Exploratory Data Analysis (EDA)
The EDA was implemented majorly to have a detailed understanding of the dataset, it enhances
data cleaning and visualization. The dataset employed for this study contains 1,259 rows and 7
columns, no missing data were detected in the dataset. Line graph and Histograms were adopted
to show the major information of the dataset. In Fig 2 below, showing the graphical view of the
stock price plotted against the year using a line graph, it was deduced that there was a rise in the
stock price in 2017 and a dramatic fall in 2020. Also, in Fig 3 below, showing a line graph of
mean volume of stock plotted against the year, it was deduced that there was a huge rise in
volume stock sold in late 2017, and dramatic downfall in the volume of stock sold in early 2020.

Fig 2: Depicting Stock Price Plotted Against Year

Fig 3: Depicting Volume of the Stock Price Plotted Against Year

4.0 RESULTS
RMSE and R2 were the only metrics used for evaluating the performance of the three models
used for this study.
TABLE I: DEPICTING THE MODELS PERFORMANCE EVALUATION RESULTS
MODELS Metrics
RMSE R2
SVM 0.915 -0.39
RF 0.484 0.95
LR 0.985 0.94

5.0 DISCUSSION
The study was implemented using Intel Core i7, 2.7GHz processor, 16GB RAM, 500GB and
Windows 10, a 32-bit operating system, thereafter, the algorithms were trained and evaluated
on python Anaconda Navigator. After evaluating the performance of the algorithms, the results
obtained shows that RF performed better compared to LR and SVM, with RMSE indicating 0.48
and R2 showing 0.95 for Random Forest, RMSE indicating 0.98 and R2 showing 0.94 for Linear
Regression, and Support Vector Machine 0.91 RMSE, and 0.39 R2 respectively. Moreover, there
is no fixed threshold for Root Mean Squared Error but a minimum value depicts a model with
accurate performance, however, R-Squared (R2) value >7 is considered as a substantial result.
Therefore, the study shows that Random Forest Algorithm performs better with an acceptable
result.
6.0 CONCLUSION
This study proposed the performance evaluation of machine learning models for the prediction
stock market price. Three models were evaluated and the result shows that RF performs better
with RMSE indicating 0.48 and R2 showing 0.95 which is acceptable in pure science as a
substantial result. It is recommended that researchers should focus on evaluating the performance
of Deep Hybrid Learning using the selected dataset for this study. Also, predictive models such
as ARIMA and LSTM should be adopted.

REFERENCES
Cao, J., Li, Z., & Li, J. (2019). Financial time series forecasting model based on CEEMDAN and
LSTM. Physica A: Statistical mechanics and its applications, 519, 127-139
Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for
financial market predictions. European journal of operational research, 270(2), 654-669.
Hiransha, M., Gopalakrishnan, E. A., Menon, V. K., & Soman, K. P. (2018). NSE stock market
prediction using deep-learning models. Procedia computer science, 132, 1351-1362.
Illowsky, B., & Dean, S. (2018). Introductory statistics.
Moore, D. S., Notz, W. I., & Fligner, M. A. (2015). The basic practice of statistics. Macmillan
Higher Education.
Rouf, N., Malik, M. B., Arif, T., Sharma, S., Singh, S., Aich, S., & Kim, H. C. (2021). Stock
market prediction using machine learning techniques: a decade survey on methodologies,
recent developments, and future directions. Electronics, 10(21), 2717.
Sezer, O. B., Gudelek, M. U., & Ozbayoglu, A. M. (2020). Financial time series forecasting with
deep learning: A systematic literature review: 2005–2019. Applied soft computing, 90,
106181.
Singh, Sukhman, Tarun Kumar Madan, J. Kumar and A. Singh. “Stock Market Forecasting using
Machine Learning: Today and Tomorrow.” 2019 2nd International Conference on
Intelligent Computing, Instrumentation and Control Technologies (ICICICT) 1 (2019):
738-745
Soni, P., Tewari, Y., & Krishnan, D. (2022). Machine Learning approaches in stock price
prediction: A systematic review. In Journal of Physics: Conference Series (Vol. 2161, No.
1, p. 012065). IOP Publishing.
Weng, B., Lu, L., Wang, X., Megahed, F. M., & Martinez, W. (2018). Predicting short-term
stock prices using ensemble methods and online data sources. Expert Systems with
Applications, 112, 258-273.

You might also like