[go: up one dir, main page]

0% found this document useful (0 votes)
69 views13 pages

Stock Price Prediction Using Reinforcement Learnin

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views13 pages

Stock Price Prediction Using Reinforcement Learnin

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/354289404

Stock Price Prediction Using Reinforcement Learning

Chapter · January 2022


DOI: 10.1007/978-981-16-2597-8_6

CITATIONS READS

2 2,045

6 authors, including:

Poonam Rani Jyoti Shokeen


Netaji Subhas University of Technology Maharshi Dayanand University
45 PUBLICATIONS   328 CITATIONS    28 PUBLICATIONS   228 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Jyoti Shokeen's Research View project

Conference Papers View project

All content following this page was uploaded by Jyoti Shokeen on 05 January 2022.

The user has requested enhancement of the downloaded file.


Metadata of the chapter that will be visualized in
SpringerLink
Book Title International Conference on Innovative Computing and Communications
Series Title
Chapter Title Stock Price Prediction Using Reinforcement Learning
Copyright Year 2021
Copyright HolderName The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
Corresponding Author Family Name Rani
Particle
Given Name Poonam
Prefix
Suffix
Role
Division Department of Computer Engineering
Organization Netaji Subhas University of Technology
Address Dwarka, New Delhi, India
Email poonamrani2017.nsit@gmail.com
Author Family Name Shokeen
Particle
Given Name Jyoti
Prefix
Suffix
Role
Division Department of Computer Science and Engineering UIET
Organization Maharshi Dayanand University, Rohtak
Address Haryana, India
Email jyotishokeen.rs.uiet@mdurohtak.ac.in
Author Family Name Singh
Particle
Given Name Anshul
Prefix
Suffix
Role
Division Department of Computer Engineering
Organization Netaji Subhas University of Technology
Address Dwarka, New Delhi, India
Email anshuls1.co.17@nsit.net.in
Author Family Name Singh
Particle
Given Name Anmol
Prefix
Suffix
Role
Division Department of Computer Engineering
Organization Netaji Subhas University of Technology
Address Dwarka, New Delhi, India
Email anmols.co.17@nsit.net.in
Author Family Name Kumar
Particle
Given Name Sharlin
Prefix
Suffix
Role
Division Department of Computer Engineering
Organization Netaji Subhas University of Technology
Address Dwarka, New Delhi, India
Email sharlink.co.17@nsit.net.in
Author Family Name Raghuvanshi
Particle
Given Name Naman
Prefix
Suffix
Role
Division Department of Computer Engineering
Organization Netaji Subhas University of Technology
Address Dwarka, New Delhi, India
Email namanr.co.17@nsit.net.in

Abstract With the availability of new data sources and advancement in marketing and financial instruments, stock
market returns are a major research area. Stocks have a huge influence on today’s economy. A better
predictive model is extremely important in stock prediction. The aim of this paper is to investigate the
positive effect of reinforcement learning on stock price prediction techniques. Q-learning has been shown
to be incredibly effective in various segments, such as cloud scheduling and game automation. This paper
demonstrates how the Q-learning technique is helpful in stock price prediction. The findings are very
positive with excellent predictive accuracy and meteoric speed.
Keywords Reinforcement Learning - Stock Price - Stock Market Prediction - Q-learning
(separated by '-')
Stock Price Prediction Using
Author Proof

Reinforcement Learning

Poonam Rani, Jyoti Shokeen, Anshul Singh, Anmol Singh, Sharlin Kumar,

OF
and Naman Raghuvanshi

RO
1 Abstract With the availability of new data sources and advancement in marketing
2 and financial instruments, stock market returns are a major research area. Stocks
3 have a huge influence on today’s economy. A better predictive model is extremely
4 important in stock prediction. The aim of this paper is to investigate the positive effect
5

7
DP
of reinforcement learning on stock price prediction techniques. Q-learning has been
shown to be incredibly effective in various segments, such as cloud scheduling and
game automation. This paper demonstrates how the Q-learning technique is helpful
8 in stock price prediction. The findings are very positive with excellent predictive
9 accuracy and meteoric speed.
TE
10 Keywords Reinforcement Learning · Stock Price · Stock Market Prediction ·
11 Q-learning
EC

P. Rani (B) · A. Singh · A. Singh · S. Kumar · N. Raghuvanshi


RR

Department of Computer Engineering, Netaji Subhas University of Technology, Dwarka New


Delhi, India
e-mail: anshuls1.co.17@nsit.net.in
A. Singh
e-mail: anmols.co.17@nsit.net.in
CO

S. Kumar
e-mail: sharlink.co.17@nsit.net.in
N. Raghuvanshi
e-mail: namanr.co.17@nsit.net.in
J. Shokeen
UN

Department of Computer Science and Engineering UIET, Maharshi Dayanand University,


Rohtak, Haryana, India
e-mail: jyotishokeen.rs.uiet@mdurohtak.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 1
A. Khanna et al. (eds.), International Conference on Innovative Computing
and Communications, Advances in Intelligent Systems and Computing 1388,
https://doi.org/10.1007/978-981-16-2597-8_6

506700_1_En_6_Chapter  TYPESET DISK LE  CP Disp.:12/6/2021 Pages: xxx Layout: T1-Standard


2 P. Rani et al.

1 Introduction
Author Proof

12

AQ1 13 Today, stock exchanges are one of the most profitable fields of stock trading in the
14 business world. As the industry is becoming prevalent, there is an overwhelming
15 need for better and faster prediction models as stocks directly impact the future of

OF
16 the business as well as the future of investors. The stock exchange can be thought of
17 as a game where you have to sell, buy, and hold stocks at the right time by analyzing
18 the financial market and in some cases with your gut. But still, no one can predict
19 the future which makes this game extremely hard and risky. How accurately you can
20 predict the future of the company will decide your victory, and in this case, victory

RO
21 could mean millions of dollars earned in no time, but at the same time loss would
22 mean millions of dollars lost in no time. This is why there is an escalating need for an
23 efficient stock price prediction model. Consequently, stock prediction is a hot topic
24 in the research and development sector.
25 Many stock prediction models, including Convolutional Neural Network (CNN),
26

27

28
DP
Recurrent Neural Networks (RNN), and Long short-term memory (LSTM) are based
on neural networks [2]. Reinforcement learning is one of the machine learning tech-
niques that deals with how the agent performs actions to maximize cumulative reward
29 in an environment [11]. This technique functions on the reward and punishment pol-
30 icy. It implies that the model penalizes each time it does not work for the solution,
and gives reward if the action turns into victory. Reinforcement learning offers sim-
TE
31

32 ple solutions to several complex problems that other supervised and unsupervised
AQ2 33 machine learning algorithms.
34 In the past decade, stock traders had to rely on different software intelligence
35 systems to reach trading decisions. Lately, with the evolution of artificial intelligence
EC

36 networks, this field has completely changed and has experienced a huge reform. Apart
37 from reinforcement learning, there are numerous machine learning algorithms that
38 can be used efficiently in stock prediction such as CNNs, RNNs, and LSTM with
39 sliding windows. We employ reinforcement learning for stock price prediction in
40 this paper because reinforcement learning allows the use of market signals to create
RR

41 profitable trading strategies in a trading context.


42 The subsequent sections are organized as follows: Sect. 2 introduces some recent
43 works related to this area in the literature. Sections 3 and 4 introduce the reinforce-
44 ment learning approach and stock price prediction, respectively. Section 5 defines
45 the methodology used in the paper. Section 6 discusses the experimental work and
CO

46 the results. Lastly, Sect. 7 concludes the paper.

47 2 Related Works
UN

48 Parmar et al. [4] also worked in this direction of stock market prediction to predict
49 future values of financial stocks. They used linear regression and LSTM to propose
50 the model. Linear regression is used for predicting continuous values by reducing

506700_1_En_6_Chapter  TYPESET DISK LE  CP Disp.:12/6/2021 Pages: xxx Layout: T1-Standard


Stock Price Prediction Using Reinforcement Learning 3

the error function, i.e., gradient descent. They used LSTM for prediction on a large
Author Proof

51

52 amount of data. However, LSTM needs a huge amount of historical data for training
53 purposes to get good accuracy. Compared to Q-learning, LSTM requires more mem-
54 ory for the training dataset. Also, Q-learning provides a sense of random actions to
55 be taken like humans, which is not possible in LSTMs. LSTMs can just predict stock

OF
56 prices but cannot take actions like buy, sell, or hold according to the predictions.
57 In paper [7], the authors used Support Vector Machine (SVM) as the classifier
58 to know about the action to perform. They believe that SVM is one of the most
59 suitable algorithms for time series prediction. Li et al. [3] employed deep reinforce-
60 ment learning for stock transaction strategy. They claimed that their algorithm is

RO
61 more intelligent than traditional algorithms because of fast adaption and response
62 for changes. However, their approach is not feasible for large datasets.

3 Reinforcement Learning
63

64
DP
Reinforcement learning is one of the machine learning areas which is concerned
65 with how the algorithms decide to take actions in an environment to augment the
66 cumulative reward. Unlike supervised algorithms, reinforcement learning does not
67 need a labeled input/output dataset. Instead, it focuses on finding the balance between
TE
68 exploration and exploitation. Reinforcement learning works similar to the human
69 brain. We take action based on some past experience and our intuition. We assess
70 the result and reward ourselves if it turns out to be profitable and learn that this is
71 a viable action. But in case of loss, we penalize ourselves and try a different way
to solve the problem. This is how reinforcement learning works: it grants rewards if
EC

72

73 the algorithm’s action results in a win and penalizes if it loses. It learns each time it
74 makes a prediction. Figure 1 portrays the functioning of reinforcement learning.
75 Q-learning is an off-policy reinforcement learning approach that aims to get the
76 best action from the given current state. Q-learning is treated as off-policy because
RR

77 it does not need any policy and the Q-learning function learns from actions that are
78 outside the current policy such as taking a random action. “Q” stands for quality AQ3

79 in Q-learning. Quality indicates how productive a given action is in gaining some


80 future reward. The goal of Q-learning is to maximize the total reward [1]. Li et al.
81 [3] applied deep reinforcement learning in stock forecasting.
CO

82 4 Stock Price Prediction

In the financial world, forecasting stock prices is an essential goal. A fairly accurate
UN

83

84 forecast has the potential to yield high financial benefits and protect against market
85 risks. An efficient stock price prediction system can result in a huge amount of profit in
86 the future. The theory of an efficient market implies that stock prices represent all the
87 information currently available, and any price adjustments that are not based on newly

506700_1_En_6_Chapter  TYPESET DISK LE  CP Disp.:12/6/2021 Pages: xxx Layout: T1-Standard


4 P. Rani et al.

Fig. 1 Reinforcement
Author Proof

learning

OF
RO
88 released information are therefore potentially unpredictable. Others disagree and
89 those with this perspective have countless techniques and technologies that allegedly
90

91

92
DP
allow them to gain information on future prices. However, due to the uncertainty
and unpredictable nature of the markets and the many undecidable, non-stationary
stochastic variables involved, forecasting stock prices is not a simple task. Nowadays,
93 social network analysis is useful in predicting stock prices [6, 8]. On predicting the
94 stock prices, the users can use the recommender systems to buy or sell the stocks [5,
TE
95 9, 10].
96 The historical trends of financial time series have been analyzed by several schol-
97 ars from different areas, and different methods for forecasting stock prices have
98 been proposed. Most of these methods involve careful selection of input variables
99 to achieve promising results, developing a predictive model with skilled financial
EC

100 expertise, and introducing different statistical methods for arbitrage analysis, mak-
101 ing it impossible for individuals outside the financial sector to use these methods to
102 forecast stock prices.
RR

103 5 Methodology

104 The method that we used in this paper for stock price prediction is Q-learning.
105 We are here first creating an environment and an agent. The term environment in
CO

106 reinforcement learning is referred to as the task, i.e., stock price prediction and the
107 agent refers to the algorithm used to solve that particular task. Hence, the driver
108 program just initiates the needed environment and agents which are given as input
109 to the algorithms which return predictions in values. This part of the algorithm is
110 responsible to calculate the gradient descent or the algorithm which eventually talks
UN

111 about the accuracy of the algorithm.


112 We incorporate two additional functions, i.e., the reset function and the step func-
113 tion. The reset function’s task is to bring back the pointer to zero, i.e., start of the time,
114 where the cash in hand is maximum and the investment is zero. The step function

506700_1_En_6_Chapter  TYPESET DISK LE  CP Disp.:12/6/2021 Pages: xxx Layout: T1-Standard


Stock Price Prediction Using Reinforcement Learning 5

takes in action as the input and performs action accordingly, i.e., it buys the stock,
Author Proof

115

116 moves the pointer, and at the same time updates the reward, next state, and portfolio
117 values.
118 Unlike the typical Q(s, a), we use only the state s and ignore the action a in this
119 stock problem.
Q(s, :) = W T .s + b

OF
120 (1)

121 where Q(s, :) is the vector of Q values at state s, W is the matrix of weights, s is the
122 state, and b is bias.
123 This part of the algorithm contains mainly three functions with various tasks, i.e.,

RO
124 _init_ function, get_action function, and the train function. The _init_ function is
125 just used to initialize the model used for training purposes. The get_action function
126 takes the state as the input and accordingly decides which action to be taken, i.e.,
127 whether to buy stock, sell stocks, or remain sprawl using reinforcement learning
128 techniques such as epsilon or greedy. Finally, the train function takes a tuple of data
129

130

131
DP
including current state, action, reward, next state, and done flag. It calculates the
input and target values that are input to our model, where input is the state and the
target is calculated as follows:

132 target = r + γ ∗ max Q(s  , :) (2)


TE
133 where γ is the discount factor which is used to align existing benefits and poten-
134 tial ones. max Q(s  , :) is the maximum Q value among all possible actions given
135 state s  .
EC

136 6 Experimental Work

137 In order to make it a real-time project, it is necessary to take recent data. API Alpha
RR

138 Vantage is used to get real-time data to make good predictions about a stock. Alpha
139 vantage is a free API which is used to provide real-time data of the stock. We take
140 the stock prices of three companies: Apple, Microsoft, and International Business
141 Machines (IBM) for the duration of January 2012–November 2020. The dataset is
142 divided into a 60:40 ratio as training set and testing set, respectively.
CO

143 The first step to every reinforcement technique is to decide on an action in the
144 beginning. It is assumed that before making any decision, the algorithm must know
145 the answers to some questions such as
146 • Do I even have enough cash to buy?
147 • Considering the current state of my portfolio and the existing price of the shares
UN

148 in the market, is it worth selling them?


149 After answering all these questions, the next step is to decide the action to be
150 taken. There are three actions: buy, sell, or hold. Reinforcement learning algorithms
151 evaluate the action and make the next decision accordingly. Reward is the difference

506700_1_En_6_Chapter  TYPESET DISK LE  CP Disp.:12/6/2021 Pages: xxx Layout: T1-Standard


6 P. Rani et al.

Table 1 Initial model configurations


Author Proof

Parameter Value
Episodes 200
Initial investment 20000
ε 1.0

OF
Table 2 Parameters chosen for experiments
Parameter Value
Exploration rate 1.0

RO
Epsilon decay 0.996
Discount factor 0.94

Table 3 Performance results


Performance parameter
Average reward
Minimum reward
DP
Reward
38356.61
23091.67
Maximum reward 54582.06
TE

152 between portfolio values of recent time steps and previous time steps. The algorithm
153 computes the portfolio value as follows:
EC

154 value = S T .P + C (3)

155 where S is the vector of shares owned, P is the vector of share prices, and C is the
156 cash.
157 Epsilon decay is the value of ε to learn and act optimally in the life of an agent. We
RR

158 experimented on different values of epsilon_decay and γ to find their best value.
159 Table 1 depicts the initial configurations set in the proposed model.
160 Table 2 defines the parameters tuned for the experiments and Table 3 shows the
161 performance of the model based on the selected parameters. Figure 2 depicts the
162 results of rewards in respect of epsilon_decay for γ = 0.95, ε = 1.0, and ε_min =
CO

163 0.01. The average reward is 32697.13 and the best value for epsilon_decay in
164 terms of reward is 0.996. Figure 3 depicts the results of rewards in respect of γ with
165 exploration rate ε = 1.0, ε_min = 0.01, and ε_decay = 0.996. The average reward
166 is 81588.52 and the best value for γ in terms of reward is 0.94.
UN

506700_1_En_6_Chapter  TYPESET DISK LE  CP Disp.:12/6/2021 Pages: xxx Layout: T1-Standard


Stock Price Prediction Using Reinforcement Learning 7

Fig. 2 Test results of


Author Proof

rewards versus
epsilon_decay

OF
RO
Fig. 3 Test results of
rewards versus γ DP
TE
EC
RR

167 7 Conclusion

168 In this paper, we used reinforcement learning for stock price prediction. Also, we
169 found its performance which allows us to use this model as a base of further study
in this domain. Reinforcement learning is better than other learning algorithms as it
CO

170

171 learns from current situations and also makes faster adaptive changes.
UN

506700_1_En_6_Chapter  TYPESET DISK LE  CP Disp.:12/6/2021 Pages: xxx Layout: T1-Standard


8 P. Rani et al.

References
Author Proof

172

173 1. C. Jin, Z. Allen-Zhu, S. Bubeck, M.I. Jordan, Is Q-learning provably efficient? in Advances in
174 neural information processing systems (2018), pp. 4863–4873
175 2. C.K.S. Leung, R.K. MacKinnon, Y. Wang, A machine learning approach for stock price predic-
tion, in Proceedings of the 18th International Database Engineering & Applications Symposium

OF
176

177 (2014), pp. 274–277


178 3. Y. Li, P. Ni, V. Chang, Application of deep reinforcement learning in stock trading strategies
179 and stock forecasting. Computing 1–18 (2019)
180 4. I. Parmar, N. Agarwal, S. Saxena, R. Arora, S. Gupta, H. Dhiman, L. Chouhan, Stock market
181 prediction using machine learning, in 2018 First International Conference on Secure Cyber

RO
182 Computing and Communication (ICSCCC) (IEEE, 2018), pp. 574–576
183 5. P. Rani, J. Shokeen, D. Mullick, Recommendations using modified k-means clustering and
184 voting theory. Int. J. Comput. Sci. Mobile Comput. 6(6), 143–148 (2017)
185 6. P. Rani, D.K. Tayal, M. Bhatia, SNA using user experience, in 2019 International Conference
186 on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon) (IEEE, 2019),
187 pp. 125–128
7. V.K.S. Reddy, Stock market prediction using machine learning. Int. Res. J. Eng. Technol.
188

189

190

191
5(10)(2018)
DP
8. J. Shokeen, C. Rana, Social recommender systems: techniques, domains, metrics, datasets and
future scope. J. Intell. Inform. Syst. 54, 633–667 (2019). https://doi.org/10.1007/s10844-019-
192 00578-5
193 9. J. Shokeen, C. Rana, A study on features of social recommender systems. Artif. Intell. Rev.
194 53(2), 965–988 (2020)
TE
195 10. J. Shokeen, C. Rana, P. Rani, A trust-based approach to extract social relationships for recom-
196 mendation, in Data Analytics and Management (Springer, 2020), pp. 51–58
197 11. R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (MIT press, 2018)
EC
RR
CO
UN

506700_1_En_6_Chapter  TYPESET DISK LE  CP Disp.:12/6/2021 Pages: xxx Layout: T1-Standard


Author Queries
Author Proof

Chapter 6

OF
Query Refs. Details Required Author’s response

AQ1 Please check and confirm if the author names and initials are cor-
rect.
AQ2 Please check the sentence “Reinforcement learning offers ...” for

RO
clarity.
AQ3 Please approve the edits made to the sentence “Q-learning is
treated ...”.

DP
TE
EC
RR
CO
UN
MARKED PROOF
Please correct and return this set
Please use the proof correction marks shown below for all alterations and corrections. If you
wish to return your proof by fax you should ensure that all amendments are written clearly
in dark ink and are made well within the page margins.

Instruction to printer Textual mark Marginal mark


Leave unchanged under matter to remain
Insert in text the matter New matter followed by
indicated in the margin or
Delete through single character, rule or underline
or or
through all characters to be deleted
Substitute character or
through letter or new character or
substitute part of one or
more word(s) through characters new characters
Change to italics under matter to be changed
Change to capitals under matter to be changed
Change to small capitals under matter to be changed
Change to bold type under matter to be changed
Change to bold italic under matter to be changed
Change to lower case Encircle matter to be changed
Change italic to upright type (As above)
Change bold to non-bold type (As above)
or
Insert ‘superior’ character through character or
under character
where required
e.g. or
Insert ‘inferior’ character (As above) over character
e.g.
Insert full stop (As above)
Insert comma (As above)
or and/or
Insert single quotation marks (As above)
or

or and/or
Insert double quotation marks (As above)
or
Insert hyphen (As above)
Start new paragraph
No new paragraph
Transpose
Close up linking characters

Insert or substitute space through character or


between characters or words where required

Reduce space between between characters or


characters or words words affected

View publication stats

You might also like