Stock Price Prediction Using Reinforcement Learnin
Stock Price Prediction Using Reinforcement Learnin
net/publication/354289404
CITATIONS READS
2 2,045
6 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Jyoti Shokeen on 05 January 2022.
Abstract With the availability of new data sources and advancement in marketing and financial instruments, stock
market returns are a major research area. Stocks have a huge influence on today’s economy. A better
predictive model is extremely important in stock prediction. The aim of this paper is to investigate the
positive effect of reinforcement learning on stock price prediction techniques. Q-learning has been shown
to be incredibly effective in various segments, such as cloud scheduling and game automation. This paper
demonstrates how the Q-learning technique is helpful in stock price prediction. The findings are very
positive with excellent predictive accuracy and meteoric speed.
Keywords Reinforcement Learning - Stock Price - Stock Market Prediction - Q-learning
(separated by '-')
Stock Price Prediction Using
Author Proof
Reinforcement Learning
Poonam Rani, Jyoti Shokeen, Anshul Singh, Anmol Singh, Sharlin Kumar,
OF
and Naman Raghuvanshi
RO
1 Abstract With the availability of new data sources and advancement in marketing
2 and financial instruments, stock market returns are a major research area. Stocks
3 have a huge influence on today’s economy. A better predictive model is extremely
4 important in stock prediction. The aim of this paper is to investigate the positive effect
5
7
DP
of reinforcement learning on stock price prediction techniques. Q-learning has been
shown to be incredibly effective in various segments, such as cloud scheduling and
game automation. This paper demonstrates how the Q-learning technique is helpful
8 in stock price prediction. The findings are very positive with excellent predictive
9 accuracy and meteoric speed.
TE
10 Keywords Reinforcement Learning · Stock Price · Stock Market Prediction ·
11 Q-learning
EC
S. Kumar
e-mail: sharlink.co.17@nsit.net.in
N. Raghuvanshi
e-mail: namanr.co.17@nsit.net.in
J. Shokeen
UN
1 Introduction
Author Proof
12
AQ1 13 Today, stock exchanges are one of the most profitable fields of stock trading in the
14 business world. As the industry is becoming prevalent, there is an overwhelming
15 need for better and faster prediction models as stocks directly impact the future of
OF
16 the business as well as the future of investors. The stock exchange can be thought of
17 as a game where you have to sell, buy, and hold stocks at the right time by analyzing
18 the financial market and in some cases with your gut. But still, no one can predict
19 the future which makes this game extremely hard and risky. How accurately you can
20 predict the future of the company will decide your victory, and in this case, victory
RO
21 could mean millions of dollars earned in no time, but at the same time loss would
22 mean millions of dollars lost in no time. This is why there is an escalating need for an
23 efficient stock price prediction model. Consequently, stock prediction is a hot topic
24 in the research and development sector.
25 Many stock prediction models, including Convolutional Neural Network (CNN),
26
27
28
DP
Recurrent Neural Networks (RNN), and Long short-term memory (LSTM) are based
on neural networks [2]. Reinforcement learning is one of the machine learning tech-
niques that deals with how the agent performs actions to maximize cumulative reward
29 in an environment [11]. This technique functions on the reward and punishment pol-
30 icy. It implies that the model penalizes each time it does not work for the solution,
and gives reward if the action turns into victory. Reinforcement learning offers sim-
TE
31
32 ple solutions to several complex problems that other supervised and unsupervised
AQ2 33 machine learning algorithms.
34 In the past decade, stock traders had to rely on different software intelligence
35 systems to reach trading decisions. Lately, with the evolution of artificial intelligence
EC
36 networks, this field has completely changed and has experienced a huge reform. Apart
37 from reinforcement learning, there are numerous machine learning algorithms that
38 can be used efficiently in stock prediction such as CNNs, RNNs, and LSTM with
39 sliding windows. We employ reinforcement learning for stock price prediction in
40 this paper because reinforcement learning allows the use of market signals to create
RR
47 2 Related Works
UN
48 Parmar et al. [4] also worked in this direction of stock market prediction to predict
49 future values of financial stocks. They used linear regression and LSTM to propose
50 the model. Linear regression is used for predicting continuous values by reducing
the error function, i.e., gradient descent. They used LSTM for prediction on a large
Author Proof
51
52 amount of data. However, LSTM needs a huge amount of historical data for training
53 purposes to get good accuracy. Compared to Q-learning, LSTM requires more mem-
54 ory for the training dataset. Also, Q-learning provides a sense of random actions to
55 be taken like humans, which is not possible in LSTMs. LSTMs can just predict stock
OF
56 prices but cannot take actions like buy, sell, or hold according to the predictions.
57 In paper [7], the authors used Support Vector Machine (SVM) as the classifier
58 to know about the action to perform. They believe that SVM is one of the most
59 suitable algorithms for time series prediction. Li et al. [3] employed deep reinforce-
60 ment learning for stock transaction strategy. They claimed that their algorithm is
RO
61 more intelligent than traditional algorithms because of fast adaption and response
62 for changes. However, their approach is not feasible for large datasets.
3 Reinforcement Learning
63
64
DP
Reinforcement learning is one of the machine learning areas which is concerned
65 with how the algorithms decide to take actions in an environment to augment the
66 cumulative reward. Unlike supervised algorithms, reinforcement learning does not
67 need a labeled input/output dataset. Instead, it focuses on finding the balance between
TE
68 exploration and exploitation. Reinforcement learning works similar to the human
69 brain. We take action based on some past experience and our intuition. We assess
70 the result and reward ourselves if it turns out to be profitable and learn that this is
71 a viable action. But in case of loss, we penalize ourselves and try a different way
to solve the problem. This is how reinforcement learning works: it grants rewards if
EC
72
73 the algorithm’s action results in a win and penalizes if it loses. It learns each time it
74 makes a prediction. Figure 1 portrays the functioning of reinforcement learning.
75 Q-learning is an off-policy reinforcement learning approach that aims to get the
76 best action from the given current state. Q-learning is treated as off-policy because
RR
77 it does not need any policy and the Q-learning function learns from actions that are
78 outside the current policy such as taking a random action. “Q” stands for quality AQ3
In the financial world, forecasting stock prices is an essential goal. A fairly accurate
UN
83
84 forecast has the potential to yield high financial benefits and protect against market
85 risks. An efficient stock price prediction system can result in a huge amount of profit in
86 the future. The theory of an efficient market implies that stock prices represent all the
87 information currently available, and any price adjustments that are not based on newly
Fig. 1 Reinforcement
Author Proof
learning
OF
RO
88 released information are therefore potentially unpredictable. Others disagree and
89 those with this perspective have countless techniques and technologies that allegedly
90
91
92
DP
allow them to gain information on future prices. However, due to the uncertainty
and unpredictable nature of the markets and the many undecidable, non-stationary
stochastic variables involved, forecasting stock prices is not a simple task. Nowadays,
93 social network analysis is useful in predicting stock prices [6, 8]. On predicting the
94 stock prices, the users can use the recommender systems to buy or sell the stocks [5,
TE
95 9, 10].
96 The historical trends of financial time series have been analyzed by several schol-
97 ars from different areas, and different methods for forecasting stock prices have
98 been proposed. Most of these methods involve careful selection of input variables
99 to achieve promising results, developing a predictive model with skilled financial
EC
100 expertise, and introducing different statistical methods for arbitrage analysis, mak-
101 ing it impossible for individuals outside the financial sector to use these methods to
102 forecast stock prices.
RR
103 5 Methodology
104 The method that we used in this paper for stock price prediction is Q-learning.
105 We are here first creating an environment and an agent. The term environment in
CO
106 reinforcement learning is referred to as the task, i.e., stock price prediction and the
107 agent refers to the algorithm used to solve that particular task. Hence, the driver
108 program just initiates the needed environment and agents which are given as input
109 to the algorithms which return predictions in values. This part of the algorithm is
110 responsible to calculate the gradient descent or the algorithm which eventually talks
UN
takes in action as the input and performs action accordingly, i.e., it buys the stock,
Author Proof
115
116 moves the pointer, and at the same time updates the reward, next state, and portfolio
117 values.
118 Unlike the typical Q(s, a), we use only the state s and ignore the action a in this
119 stock problem.
Q(s, :) = W T .s + b
OF
120 (1)
121 where Q(s, :) is the vector of Q values at state s, W is the matrix of weights, s is the
122 state, and b is bias.
123 This part of the algorithm contains mainly three functions with various tasks, i.e.,
RO
124 _init_ function, get_action function, and the train function. The _init_ function is
125 just used to initialize the model used for training purposes. The get_action function
126 takes the state as the input and accordingly decides which action to be taken, i.e.,
127 whether to buy stock, sell stocks, or remain sprawl using reinforcement learning
128 techniques such as epsilon or greedy. Finally, the train function takes a tuple of data
129
130
131
DP
including current state, action, reward, next state, and done flag. It calculates the
input and target values that are input to our model, where input is the state and the
target is calculated as follows:
137 In order to make it a real-time project, it is necessary to take recent data. API Alpha
RR
138 Vantage is used to get real-time data to make good predictions about a stock. Alpha
139 vantage is a free API which is used to provide real-time data of the stock. We take
140 the stock prices of three companies: Apple, Microsoft, and International Business
141 Machines (IBM) for the duration of January 2012–November 2020. The dataset is
142 divided into a 60:40 ratio as training set and testing set, respectively.
CO
143 The first step to every reinforcement technique is to decide on an action in the
144 beginning. It is assumed that before making any decision, the algorithm must know
145 the answers to some questions such as
146 • Do I even have enough cash to buy?
147 • Considering the current state of my portfolio and the existing price of the shares
UN
Parameter Value
Episodes 200
Initial investment 20000
ε 1.0
OF
Table 2 Parameters chosen for experiments
Parameter Value
Exploration rate 1.0
RO
Epsilon decay 0.996
Discount factor 0.94
152 between portfolio values of recent time steps and previous time steps. The algorithm
153 computes the portfolio value as follows:
EC
155 where S is the vector of shares owned, P is the vector of share prices, and C is the
156 cash.
157 Epsilon decay is the value of ε to learn and act optimally in the life of an agent. We
RR
158 experimented on different values of epsilon_decay and γ to find their best value.
159 Table 1 depicts the initial configurations set in the proposed model.
160 Table 2 defines the parameters tuned for the experiments and Table 3 shows the
161 performance of the model based on the selected parameters. Figure 2 depicts the
162 results of rewards in respect of epsilon_decay for γ = 0.95, ε = 1.0, and ε_min =
CO
163 0.01. The average reward is 32697.13 and the best value for epsilon_decay in
164 terms of reward is 0.996. Figure 3 depicts the results of rewards in respect of γ with
165 exploration rate ε = 1.0, ε_min = 0.01, and ε_decay = 0.996. The average reward
166 is 81588.52 and the best value for γ in terms of reward is 0.94.
UN
rewards versus
epsilon_decay
OF
RO
Fig. 3 Test results of
rewards versus γ DP
TE
EC
RR
167 7 Conclusion
168 In this paper, we used reinforcement learning for stock price prediction. Also, we
169 found its performance which allows us to use this model as a base of further study
in this domain. Reinforcement learning is better than other learning algorithms as it
CO
170
171 learns from current situations and also makes faster adaptive changes.
UN
References
Author Proof
172
173 1. C. Jin, Z. Allen-Zhu, S. Bubeck, M.I. Jordan, Is Q-learning provably efficient? in Advances in
174 neural information processing systems (2018), pp. 4863–4873
175 2. C.K.S. Leung, R.K. MacKinnon, Y. Wang, A machine learning approach for stock price predic-
tion, in Proceedings of the 18th International Database Engineering & Applications Symposium
OF
176
RO
182 Computing and Communication (ICSCCC) (IEEE, 2018), pp. 574–576
183 5. P. Rani, J. Shokeen, D. Mullick, Recommendations using modified k-means clustering and
184 voting theory. Int. J. Comput. Sci. Mobile Comput. 6(6), 143–148 (2017)
185 6. P. Rani, D.K. Tayal, M. Bhatia, SNA using user experience, in 2019 International Conference
186 on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon) (IEEE, 2019),
187 pp. 125–128
7. V.K.S. Reddy, Stock market prediction using machine learning. Int. Res. J. Eng. Technol.
188
189
190
191
5(10)(2018)
DP
8. J. Shokeen, C. Rana, Social recommender systems: techniques, domains, metrics, datasets and
future scope. J. Intell. Inform. Syst. 54, 633–667 (2019). https://doi.org/10.1007/s10844-019-
192 00578-5
193 9. J. Shokeen, C. Rana, A study on features of social recommender systems. Artif. Intell. Rev.
194 53(2), 965–988 (2020)
TE
195 10. J. Shokeen, C. Rana, P. Rani, A trust-based approach to extract social relationships for recom-
196 mendation, in Data Analytics and Management (Springer, 2020), pp. 51–58
197 11. R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (MIT press, 2018)
EC
RR
CO
UN
Chapter 6
OF
Query Refs. Details Required Author’s response
AQ1 Please check and confirm if the author names and initials are cor-
rect.
AQ2 Please check the sentence “Reinforcement learning offers ...” for
RO
clarity.
AQ3 Please approve the edits made to the sentence “Q-learning is
treated ...”.
DP
TE
EC
RR
CO
UN
MARKED PROOF
Please correct and return this set
Please use the proof correction marks shown below for all alterations and corrections. If you
wish to return your proof by fax you should ensure that all amendments are written clearly
in dark ink and are made well within the page margins.
or and/or
Insert double quotation marks (As above)
or
Insert hyphen (As above)
Start new paragraph
No new paragraph
Transpose
Close up linking characters