Algorithmic Trading Using Double Deep Q-Networks and Sentiment Analysis
<p>Graph showing train–test split of normalised Tesla closing price, 2014–2020. Source: [<a href="#B30-information-15-00473" class="html-bibr">30</a>].</p> "> Figure 2
<p>Cosine similarity scores for TSLA sentiment, 2014–2020. Source: [<a href="#B30-information-15-00473" class="html-bibr">30</a>].</p> "> Figure 3
<p>Training procedure schema. Source: own.</p> "> Figure 4
<p>Performance of DDQN in testing period 2019 compared to buy-and-hold strategy for varying lengths of look-back windows.</p> "> Figure 4 Cont.
<p>Performance of DDQN in testing period 2019 compared to buy-and-hold strategy for varying lengths of look-back windows.</p> "> Figure 5
<p>Performance of DDQN with sentiment analysis in testing period 2019 compared to buy-and-hold strategy for varying lengths of look-back windows.</p> "> Figure 5 Cont.
<p>Performance of DDQN with sentiment analysis in testing period 2019 compared to buy-and-hold strategy for varying lengths of look-back windows.</p> "> Figure A1
<p>Segment of Tesla stock OHLCV data. Source: [<a href="#B30-information-15-00473" class="html-bibr">30</a>].</p> "> Figure A2
<p>Segment of 2014 10-K raw .txt file. Source: [<a href="#B37-information-15-00473" class="html-bibr">37</a>].</p> "> Figure A3
<p>Segment of 2017 10-Q raw .txt file. Source: [<a href="#B38-information-15-00473" class="html-bibr">38</a>].</p> "> Figure A4
<p>Segment of sentiment space constructed using Loughran–McDonald sentiment word lists. Source: [<a href="#B33-information-15-00473" class="html-bibr">33</a>].</p> ">
Abstract
:1. Introduction
2. Literature Review
3. Notation and Background
3.1. Experience Replay
3.2. Target Networks
3.3. Double Deep Q-Networks
4. Materials and Methods
4.1. Data
4.1.1. Train–Test Split
4.1.2. Financial Statements
4.1.3. Loughran–McDonald Sentiment Word Lists
4.1.4. Cosine Similarity
4.2. Experiment
4.3. Environment
- reset: This method is called when the environment is initialised or reset. It resets the environment to its initial state at the beginning of each episode.
- step: This method executes a time step in the environment. It takes an action as a parameter, updates the agent’s position, calculates the corresponding reward, transitions the environment to its next state and returns four values:
- –
- obs: the new state constructed by the get_observation method.
- –
- reward: a numerical reward for the action taken.
- –
- done: a boolean indicating whether the environment has reached a terminal state.
- –
- info: an info dictionary which can contain any auxiliary diagnostic information.
4.3.1. Constructing State
4.3.2. Update Positions
4.4. Reward Function Design
Percentage PnL
4.5. Agent
4.6. Algorithm
Algorithm 1 Double Deep Q-Learning (Hasselt et al., 2016) |
|
4.7. Hyperparameters
4.8. Training Procedure
5. Results
5.1. Evaluation Metrics
- Cumulative return: The first metric is a measure of the total profit or loss that our model is able to achieve over the duration of the testing period. It provides an indication of the model’s ability to maximise returns and, thus, the effectiveness of its trading decisions.
- Benchmark comparison: The second metric involves benchmarking the model’s learned trading strategy against a buy-and-hold strategy, where one holds onto a stock throughout a given period. Using this as a benchmark, we can assess whether our DRL agent’s trading strategy can lead to alpha (excess return) over simply investing in the market and holding.
- Calmar ratio: The third metric is the Calmar ratio, which is used to determine the financial efficiency of our trading system. It is calculated as the ratio of the annualised return to the maximum drawdown during the testing period [36]. This metric allows us to compare the risk-adjusted returns of our model’s strategy with those of the ”buy and hold” strategy.
5.2. Conventional Environment
5.3. Environment Augmented with Sentiment Analysis
6. Discussion
- The agent’s policy was evaluated on the 2019 Tesla stock data. An essential next step would be to evaluate the policy’s generalisation capability with a diverse set of stocks in the technology sector (e.g., Apple, Microsoft and Amazon), extending the target from trading a single asset to selecting from multiple assets.
- Additionally, exploring the application of this approach in other markets, such as the foreign exchange market, could further validate the robustness and versatility of the model.
- While our agent processed sentiment from Tesla’s 10-K and 10-Q reports, incorporating sentiment derived from news articles or social media platforms like Twitter could provide the agent with a broader perspective.
- While our current agent works with discretised actions, focusing primarily on deciding optimal times to open or close positions, a more granular approach could use a continuous action space. This would allow the agent to factor in considerations like the proportion of capital to invest each time it opens a position.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Data Analysis and Pre-Processing
Appendix A.1. Tesla Stock Data
Appendix A.2. Technical Indicators
Indicator | Description | Formula |
---|---|---|
SMA (simple moving average) | Average of prices over periods | SMA = |
RSI (relative strength index) | Speed and change of price movements | RSI = |
MOM (momentum) | Rate of rise or fall in prices | MOM = |
BOP (balance of power) | Volume to price change | BOP = |
AROONOSC (Aroon oscillator) | Trend-direction change identification | AROONOSC = |
EMA (exponential moving average) | Weighted moving average | EMA = |
Appendix A.3. Financial Statements
Appendix A.3.1. Raw 10-K Segment
Appendix A.3.2. Raw 10-Q Segment
- Isolate the ”document” section from the full text.
- Isolate the document according to its document ”type”, extracting the ”10-K” segment of the text.
- Parse through the 10-K segment, removing all HTML tags.
- Normalise the text, converting all the characters into lowercase format.
- Remove any present uniform resource locators (URLs) from the text.
- Lemmatise all the words within the text.
- Remove all present stop words within the text.
Document Type | Raw File Size (MB) | Cleaned File Size (MB) |
---|---|---|
10-K | 26.7 | 0.361 |
10-Q | 10.3 | 0.212 |
Appendix A.4. Sentiment Word List
Appendix A.5. DDQN Agent Training Hyperparameters: Descriptions
Hyperparameter | Description |
---|---|
The discount factor used in the DDQL update. It determines how much future rewards contribute to the expected cumulative reward. We used a higher gamma value to give more importance to future rewards. | |
The probability of taking a random action. Initially, this was set to a high value to encourage exploration, and then it decayed over time to encourage the exploitation of learned policy. | |
The minimum epsilon value. This ensures that the agent continues to explore throughout its training. | |
The rate at which epsilon is decayed. Faster decay results in faster exploitation of learned policy. | |
replay_buffer.max_size | The maximum number of experiences stored in the replay buffer. Having more experiences to learn from generally results in better performance. |
batch_size | The number of experiences sampled from the memory to train the network. Larger batch sizes generally lead to faster, more stable training, but they also use more memory. We found a batch size of 64 to effectively balance learning efficiency with computational costs. |
n_episodes | The number of training episodes. Each episode is a complete game, from start to finish. We found 50 to be the smallest number of episodes needed for the algorithm to converge to a good policy. |
time_skip | Defines the number of time steps to skip between states. This reduces the computational load and avoids the redundancy of information between states. |
target_network update frequency | The frequency at which the target network weights are updated. |
learning rate | Dictates the step size at each iteration while moving towards a minimum of the loss function. A smaller rate ensures more careful convergence, but at the cost of longer training times. |
References
- Chan, E. Quantitative Trading: How to Build Your Own Algorithmic Trading Business; Wiley Trading; Wiley: Hoboken, NJ, USA, 2009; ISBN 9780470466261. [Google Scholar]
- Chan, E. Algorithmic Trading: Winning Strategies and Their Rationale; Wiley Trading; Wiley: Hoboken, NJ, USA, 2013; ISBN 9781118460146. [Google Scholar]
- Zimmermann, H. Intraday Trading with Neural Networks and Deep Reinforcement Learning; Imperial College London: London, UK, 2021. [Google Scholar]
- Maven. Machine Learning in Algorithmic Trading, Maven Securities. 2023. Available online: https://www.mavensecurities.com/machine-learning-in-algorithmic-trading/ (accessed on 17 May 2024).
- Spooner, T. Algorithmic Trading and Reinforcement Learning: Robust Methodologies for AI in Finance. Ph.D. Thesis, The University of Liverpool Repository, Liverpool, UK, 2021. Available online: https://livrepository.liverpool.ac.uk/3130139/ (accessed on 17 May 2024).
- Bellman, R. A Markovian Decision Process. J. Math. Mech. 1957, 6, 679–684. [Google Scholar] [CrossRef]
- van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double Q-learning. arXiv 2016, arXiv:1509.06461. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with deep reinforcement learning. arXiv 2015, arXiv:1312.5602. [Google Scholar]
- Zejnullahu, F.; Moser, M.; Osterrieder, J. Applications of reinforcement learning in Finance—Trading with a double deep Q-Network. arXiv 2022, arXiv:2206.14267. [Google Scholar]
- Aldridge, I. High-Frequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems; John Wiley & Sons: Hoboken, NJ, USA, 2013; Volume 604. [Google Scholar]
- Savcenko, K. The ‘A’ Factor: The Role of Algorithmic Trading during an Energy Crisis; S&P Global Commodity Insights: London, UK, 2022; Available online: https://www.spglobal.com/commodityinsights/en/market-insights/blogs/electric-power/110322-algorithm-trading-europe-energy-crisis (accessed on 25 July 2024).
- Fischer, T.G. Reinforcement Learning in Financial Markets—A Survey; FAU Discussion Papers in Economics; No. 12/2018; Friedrich-Alexander-Universität Erlangen-Nürnberg, Institute for Economics: Erlangen, Germany, 2018. [Google Scholar]
- Neuneier, R. Optimal asset allocation using adaptive dynamic programming. Advances in Neural Information Processing Systems. 1996, pp. 952–958. Available online: https://proceedings.neurips.cc/paper/1995/hash/3a15c7d0bbe60300a39f76f8a5ba6896-Abstract.html (accessed on 1 August 2024).
- Jin, O.; El-Saawy, H. Portfolio Management Using Reinforcement Learning; Working Paper; Stanford University: Stanford, CA, USA, 2016. [Google Scholar]
- Liang, Z.; Chen, H.; Zhu, J.; Jiang, K.; Li, Y. Adversarial deep reinforcement learning in portfolio management. arXiv 2018, arXiv:1808.09940. [Google Scholar]
- Ding, X.; Zhang, Y.; Liu, T.; Duan, J. Deep learning for event-driven stock prediction. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
- Zhu, K.; Liu, R. The selection of reinforcement learning state and value function applied to portfolio optimization. J. Fuzhou Univ. (Nat. Sci. Ed.) 2020, 48, 146–151. [Google Scholar]
- Dai, S.X.; Zhang, S.L. An application of reinforcement learning based approach to stock trading. Bus. Manag. 2021, 3, 23–27. [Google Scholar] [CrossRef]
- Ning, B.; Lin, F.H.T.; Jaimungal, S. Double deep Q-learning for optimal execution. arXiv 2018, arXiv:1812.06600. [Google Scholar] [CrossRef]
- Machine Learning Trading. Trading with Deep Reinforcement Learning. Dr Thomas Starke (2020) YouTube. Available online: https://www.youtube.com/watch?v=H-c49jQxGbs (accessed on 2 February 2024).
- Nevmyvaka, Y.; Feng, Y.; Kearns, M. Reinforcement learning for optimized trade execution. In Proceedings of the 23rd International Conference on Machine Learning, ACM, Pittsburgh, PA, USA, 25–29 June 2006; pp. 673–680. [Google Scholar]
- Bertoluzzo, F.; Corazza, M. Testing different reinforcement learning configurations for financial trading: Introduction and applications. Procedia Econ. Financ. 2012, 3, 68–77. [Google Scholar] [CrossRef]
- Eilers, D.; Dunis, C.L.; von Mettenheim, H.-J.; Breitner, M.H. Intelligent trading of seasonal effects: A decision support algorithm based on reinforcement learning. Decis. Support Syst. 2014, 64, 100–108. [Google Scholar] [CrossRef]
- Sherstov, A.A.; Stone, P. Three automated stock-trading agents: A comparative study. In Proceeedings of the International Workshop on Agent-Mediated Electronic Commerce; Springer: Berlin/Heidelberg, Germany, 2004; pp. 173–187. [Google Scholar]
- Kaur, S. Algorithmic Trading Using Sentiment Analysis and Reinforcement Learning; Working Paper; Stanford University: Stanford, CA, USA, 2017. [Google Scholar]
- Rong, Z.H. Deep reinforcement learning stock algorithm trading system application. J. Comput. Knowl. Technol. 2020, 16, 75–76. [Google Scholar]
- Li, Y.; Zhou, P.; Li, F.; Yang, X. An improved reinforcement learning model based on sentiment analysis. arXiv 2021, arXiv:2111.15354. [Google Scholar]
- Pardo, R. The Evaluation and Optimization of Trading Strategies; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
- Hu, G. Advancing algorithmic trading: A multi-technique enhancement of deep Q-network models. arXiv 2023, arXiv:2311.05743. [Google Scholar]
- Tesla, Inc. (TSLA). Stock Historical Prices Data—Yahoo Finance—finance.yahoo.com. Available online: https://finance.yahoo.com/quote/TSLA/history?p=TSLA (accessed on 2 April 2024).
- SEC.gov—EDGAR Full Text Search—sec.gov. Available online: https://www.sec.gov/edgar/search/#/q=(Annual%2520report)&dateRange=all&ciks=0001318605&entityName=Tesla%252C%2520Inc.%2520(TSLA)%2520(CIK%25200001318605) (accessed on 2 May 2024).
- Marketing Communications: Web//University of Notre Dame Loughran-McDonald master Dictionary W/Sentiment Word Lists//Software Repository for Accounting and Finance//University of Notre Dame, Software Repository for Accounting and Finance. Available online: https://sraf.nd.edu/loughranmcdonald-master-dictionary/ (accessed on 2 May 2024).
- Loughran-McDonald Master Dictionary w/Sentiment Word Lists//Software Repository for Accounting and Finance//University of Notre Dame—sraf.nd.edu. Available online: https://sraf.nd.edu/loughranmcdonald-master-dictionary/ (accessed on 2 May 2024).
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
- Carapuço, J.M.B. Reinforcement Learning Applied to Forex Trading, Scribd. 2017. Available online: https://www.scribd.com/document/449849827/Corrected-Thesis-JoaoMaria67923 (accessed on 12 February 2024).
- Young, T.W. Calmar ratio: A smoother tool. Futures 1991, 20, 40. [Google Scholar]
- Edgar Filing Documents for 0001564590-17-015705. Available online: https://www.sec.gov/Archives/edgar/data/1318605/000156459017015705/0001564590-17-015705-index.htm (accessed on 2 May 2024).
- Edgar Filing Documents for 0001564590-15-001031. Available online: https://www.sec.gov/Archives/edgar/data/1318605/000156459015001031/0001564590-15-001031-index.htm (accessed on 2 May 2024).
- Murphy, J.J. Technical Analysis of the Financial Markets: A Comprehensive Guide to Trading Methods and Applications; Penguin Publishing Group: New York, NY, USA, 1999; 228p, Available online: https://www.google.com/books/edition/_/5zhXEqdr_IcC?hl=en&gbpv=0 (accessed on 24 July 2024).
- Wilder, J.W., Jr. New Concepts in Technical Trading Systems; Trend Research: Edmonton, AB, Canada, 1978; 6p, Available online: https://archive.org/details/newconceptsintec00wild/page/n151/mode/2up (accessed on 24 July 2024).
- Jahn, M. What Is the Haurlan Index? Investopedia. 2022. Available online: https://www.investopedia.com/terms/h/haurlanindex.asp#:~:text=The%20Haurlan%20Index%20was%20developed,the%20New%20York%20Stock%20Exchange (accessed on 24 July 2024).
- Ushman, D. What Is the SMA Indicator (Simple Moving Average). TrendSpider Learning Center, 2023. Available online: https://trendspider.com/learning-center/what-is-the-sma-indicator-simple-moving-average/ (accessed on 24 July 2024).
- Livshin, I. Balance Of Power. Tech. Anal. Stock. Commod. 2001, 19, 18–32. Available online: https://c.mql5.com/forextsd/forum/90/balance_of_market_power.pdf (accessed on 24 July 2024).
- Mitchell, C. Aroon Oscillator: Definition, Calculation Formula, Trade Signals, Investopedia. 2022. Available online: https://www.investopedia.com/terms/a/aroonoscillator.asp (accessed on 24 July 2024).
Action Signal | Current Open Position | Action Description |
---|---|---|
None | Nothing | |
Short | Hold | |
Long | Hold | |
None | Open Long | |
Short | Close Short | |
Long | Hold | |
None | Open Short | |
Short | Hold | |
Long | Close Long |
Layer (Type) | Units | Activation |
---|---|---|
LSTM (Input) | 64 | - |
LSTM | 32 | - |
Dense | 32 | ReLU |
Dense (Output) | action_space | Linear |
Property | Value |
---|---|
Loss Function | Mean Squared Error |
Optimiser | Adam |
Parameter | Symbol | Value |
---|---|---|
Episodes | M | 50 |
Discount Factor | 0.95 | |
Exploration Probability | 1.0 | |
Minimum Exploration Probability | 0.01 | |
Exploration Decay Rate | 0.995 | |
Learning Rate | 0.001 | |
Target Network Update Frequency | 10 |
Parameter | Value |
---|---|
Replay Capacity | 1000 |
Batch Size | 64 |
Time Skip | 5 |
Lookback Window | Cumulative Reward | % Change | |
---|---|---|---|
Environment 1 | Environment 2 | ||
10 | 0.8 | 0.8 | 0.0 |
15 | 0.47 | 0.67 | 42.55 |
20 | 0.4 | 0.67 | 67.5 |
25 | 1.0 | 1.7 | 70 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tabaro, L.; Kinani, J.M.V.; Rosales-Silva, A.J.; Salgado-Ramírez, J.C.; Mújica-Vargas, D.; Escamilla-Ambrosio, P.J.; Ramos-Díaz, E. Algorithmic Trading Using Double Deep Q-Networks and Sentiment Analysis. Information 2024, 15, 473. https://doi.org/10.3390/info15080473
Tabaro L, Kinani JMV, Rosales-Silva AJ, Salgado-Ramírez JC, Mújica-Vargas D, Escamilla-Ambrosio PJ, Ramos-Díaz E. Algorithmic Trading Using Double Deep Q-Networks and Sentiment Analysis. Information. 2024; 15(8):473. https://doi.org/10.3390/info15080473
Chicago/Turabian StyleTabaro, Leon, Jean Marie Vianney Kinani, Alberto Jorge Rosales-Silva, Julio César Salgado-Ramírez, Dante Mújica-Vargas, Ponciano Jorge Escamilla-Ambrosio, and Eduardo Ramos-Díaz. 2024. "Algorithmic Trading Using Double Deep Q-Networks and Sentiment Analysis" Information 15, no. 8: 473. https://doi.org/10.3390/info15080473