[go: up one dir, main page]

Skip to main content

Advertisement

Log in

A CEP-driven framework for real-time news impact prediction on financial markets

  • Special Issue Paper
  • Published:
Service Oriented Computing and Applications Aims and scope Submit manuscript

Abstract

Real-time news impact prediction on financial markets is a challenging task for finance experts with limited IT expertise. Many practitioners build machine learning models trained with many low-level features extracted from multiple event-based streams (news and financial market data), which often leads to poor outcomes. State-of-the-art solutions either ignore domain-specific contexts or are customised to merely one type of static datasets rather than real-time data streams. In most cases, the domain expert would have to manually conduct data collection, data cleaning and aggregation, and machine learning step by step with the assistance of IT experts, which is time-consuming and complicated. To address these limitations, we propose a technique that uses real-time data pre-processing in accordance with domain-specific event patterns to generate better-quality datasets. This technique is supported by a systematic framework featuring a data model capturing domain-specific event patterns, an SOA-based architecture and processes that integrate the capabilities of sentiment analysis, complex event processing with automated machine learning (AutoML), facilitating event pattern detection and continual learning with sliding time windows. The benefit of adopting an SOA architecture is to ensure the flexibility of the selection and seamless integration of components. This solution allows domain experts to define domain-specific event patterns via a user-friendly interface and prepare better-quality datasets by pre-processing real-time data streams accordingly via the complex event processing component, aiming to generate meaningful prediction results by the downstream AutoML component. The AutoML component allows for minimal machine learning skills by the domain expert to conduct the prediction tasks. A prototype was implemented to evaluate its feasibility and functionality on a real-life price movement prediction scenario involving 3 years of news and financial market data. The results demonstrate that finance experts are able to complete news impact prediction tasks in real time without the intervention of IT experts, which saves a large amount of time compared with traditional machine learning processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Statista. Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2025. https://www.statista.com/statistics/871513/worldwide-data-created/

  2. Milosevic Z, Chen W, Berry A, Rabhi FA (2016) An open architecture for event-based analytics. Int J Data Sci Anal 2(1):13–27. https://doi.org/10.1007/s41060-016-0029-7

    Article  Google Scholar 

  3. Bifet A, Gavaldà R, Holmes G, Pfahringer B (2018) Machine learning for data streams: with practical examples in MOA. The MIT Press, Cambridge

    Book  Google Scholar 

  4. Rabhi FA, Mehandjiev N, Baghdadi A (2020) State-of-the-art in applying machine learning to electronic trading. In: Enterprise applications, markets and services in the finance industry. Springer, Cham, pp 3–20

  5. Abdallah ZS, Du L, Webb GI (2016) Data preparation. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning and data mining. Springer, Boston, pp 1–11

    Google Scholar 

  6. Hussain W, Merigó JM, Raza MR, Gao H (2022) A new QoS prediction model using hybrid IOWA-ANFIS with fuzzy C-means, subtractive clustering and grid partitioning. Inf Sci 584:280–300. https://doi.org/10.1016/j.ins.2021.10.054

    Article  Google Scholar 

  7. Rabhi FA, Yao L, Guabtni A (2012) ADAGE: a framework for supporting user-driven ad-hoc data analysis processes. Computing 94(6):489–519. https://doi.org/10.1007/s00607-012-0193-0

    Article  Google Scholar 

  8. He X, Zhao K, Chu X (2021) AutoML: a survey of the state-of-the-art. Knowl Based Syst 212:106622. https://doi.org/10.1016/j.knosys.2020.106622

    Article  Google Scholar 

  9. Hanussek M, Blohm M, Kintz M (2020) Can AutoML outperform humans? An evaluation on popular OpenML datasets using AutoML benchmark. In: 2020 2nd International conference on artificial intelligence, robotics and control

  10. Chen W, Rabhi FA (2016) Enabling user-driven rule management in event data analysis. Inf Syst Front 18:511–528. https://doi.org/10.1007/s10796-016-9633-2

    Article  Google Scholar 

  11. Omenics. Omenics. https://omenics.com/

  12. Google. Cloud AutoML. https://cloud.google.com/automl

  13. Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Advances in information retrieval. Springer, Berlin, Heidelberg, pp 345–359

  14. Hussain W, Gao H, Raza MR, Rabhi FA, Merigó JM (2022) Assessing cloud QoS predictions using OWA in neural network methods. Neural Comput Appl 34(17):14895–14912. https://doi.org/10.1007/s00521-022-07297-z

    Article  Google Scholar 

  15. Apache. Flink. https://flink.apache.org/

  16. Oinn T et al (2004) Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17):3045–3054. https://doi.org/10.1093/bioinformatics/bth361

    Article  Google Scholar 

  17. Allen DE, McAleer M, Singh AK (2019) Daily market news sentiment and stock prices. Appl Econ 51(30):3212–3235. https://doi.org/10.1080/00036846.2018.1564115

    Article  Google Scholar 

  18. Rosenthal S, Farra N, Nakov P (2017) SemEval-2017 task 4: sentiment analysis in Twitter. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017). Association for Computational Linguistics, Vancouver, Canada, pp 502–518. https://doi.org/10.18653/v1/S17-2088. [Online]. Available: https://aclanthology.org/S17-2088

  19. Taj S, Shaikh BB, Meghji AF (2019) Sentiment analysis of news articles: a lexicon based approach. In: 2019 2nd International conference on computing, mathematics and engineering technologies (iCoMET), 30–31 Jan. 2019, pp 1–5. https://doi.org/10.1109/ICOMET.2019.8673428.

  20. Feuerriegel S, Gordon J (2018) Long-term stock index forecasting based on text mining of regulatory disclosures. Decis Support Syst 112:88–97

    Article  Google Scholar 

  21. Feuerriegel S, Ratku A, Neumann D (2016) Analysis of how underlying topics in financial news affect stock prices using latent dirichlet allocation. In: 2016 49th Hawaii international conference on system sciences (HICSS), 5–8 Jan. 2016, pp 1072–1081. https://doi.org/10.1109/HICSS.2016.137

  22. Chen W, Liu B, Zhang X, Al-Qudah I (2022) An event-based framework for facilitating real-time sentiment analysis in educational contexts. In :2022 11th International conference on educational and information technology (ICEIT), 6–8 Jan. 2022, pp 57–61. https://doi.org/10.1109/ICEIT54416.2022.9690729

  23. Microsoft. Azure streaming analytics. https://azure.microsoft.com/en-us/services/stream-analytics/

  24. Amazon. Amazon streaming. https://aws.amazon.com/streaming-data/

  25. Apache. Kafka. https://kafka.apache.org

  26. Apache. Flume. https://flume.apache.org

  27. Apache. Spark. https://spark.apache.org

  28. Apache. Storm. http://storm.apache.org

  29. Cloudera. Streaming analytics. https://docs.cloudera.com/csa

  30. Luckham D (2012) Event processing for business: organizing the real-time enterprise

  31. Grez A, Riveros C, Ugarte M, Vansummeren S (2021) A formal framework for complex event recognition. ACM Trans Database Syst 46(4):16. https://doi.org/10.1145/3485463

    Article  MathSciNet  MATH  Google Scholar 

  32. EsperTech. Esper. https://www.espertech.com/esper/

  33. SoftwareAG. Apama. https://www.softwareag.com/en_corporate/platform/iot/apama.html

  34. Milosevic Z, Chen W, Berry A, Rabhi FA (2016) Chapter 2—real-time analytics. In: Buyya R, Calheiros RN, Dastjerdi AV (eds) Big Data. Morgan Kaufmann, Burlington, pp 39–61

    Chapter  Google Scholar 

  35. Adi E, Anwar A, Baig Z, Zeadally S (2020) Machine learning and data analytics for the IoT. Neural Comput Appl 32(20):16205–16233. https://doi.org/10.1007/s00521-020-04874-y

    Article  Google Scholar 

  36. Giatrakos N, Alevizos E, Artikis A, Deligiannakis A, Garofalakis M (2020) Complex event recognition in the Big Data era: a survey. VLDB J 29(1):313–352. https://doi.org/10.1007/s00778-019-00557-w

    Article  Google Scholar 

  37. Zhu X (2021) Complex event detection for commodity distribution Internet of Things model incorporating radio frequency identification and wireless sensor network. Future Gener Comput Syst 125:100–111. https://doi.org/10.1016/j.future.2021.06.024

    Article  Google Scholar 

  38. Mahdavinejad MS, Rezvan M, Barekatain M, Adibi P, Barnaghi P, Sheth AP (2018) Machine learning for internet of things data analysis: a survey. Digit Commun Netw 4(3):161–175. https://doi.org/10.1016/j.dcan.2017.10.002

    Article  Google Scholar 

  39. Kaur P, Sharma M, Mittal M (2018) Big data and machine learning based secure healthcare framework. Procedia Comput Sci 132:1049–1059. https://doi.org/10.1016/j.procs.2018.05.020

    Article  Google Scholar 

  40. Tensorflow. Robust machine learning on streaming data using Kafka and Tensorflow-IO. https://www.tensorflow.org/io/tutorials/kafka

  41. Luong NNT, Milosevic Z, Berry A, Rabhi F (2020) An open architecture for complex event processing with machine learning. In: 2020 IEEE 24th international enterprise distributed object computing conference (EDOC), 5–8 Oct. 2020, pp 51–56. https://doi.org/10.1109/EDOC49727.2020.00016

  42. Huang B, Huan Y, Xu LD, Zheng L, Zou Z (2019) Automated trading systems statistical and machine learning methods and hardware implementation: a survey. Enterp Inf Syst 13(1):132–144. https://doi.org/10.1080/17517575.2018.1493145

    Article  Google Scholar 

  43. Huck N (2019) Large data sets and machine learning: applications to statistical arbitrage. Eur J Oper Res 278(1):330–342. https://doi.org/10.1016/j.ejor.2019.04.013

    Article  MathSciNet  MATH  Google Scholar 

  44. Li X et al (2016) Empirical analysis: stock market prediction via extreme learning machine. Neural Comput Appl 27(1):67–78. https://doi.org/10.1007/s00521-014-1550-z

    Article  Google Scholar 

  45. Paiva FD, Cardoso RTN, Hanaoka GP, Duarte WM (2019) Decision-making for financial trading: a fusion approach of machine learning and portfolio selection. Expert Syst Appl 115:635–655. https://doi.org/10.1016/j.eswa.2018.08.003

    Article  Google Scholar 

  46. Bhardwaj A, Yang J, Cudré-Mauroux P (2020) A human-AI loop approach for joint keyword discovery and expectation estimation in micropost event detection. Proc AAAI Conf Artif Intell 34(03):2451–2458. https://doi.org/10.1609/aaai.v34i03.5626

    Article  Google Scholar 

  47. Khan W, Ghazanfar MA, Azam MA, Karami A, Alyoubi KH, Alfakeeh AS (2020) Stock market prediction using machine learning classifiers and social media, news. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-01839-w

    Article  Google Scholar 

  48. Hussain W, Merigó JM, Raza MR (2022) Predictive intelligence using ANFIS-induced OWAWA for complex stock market prediction. Int J Intell Syst 37(8):4586–4611. https://doi.org/10.1002/int.22732

    Article  Google Scholar 

  49. Lu J-Y et al (2022) Structural break-aware pairs trading strategy using deep reinforcement learning. J Supercomput 78(3):3843–3882. https://doi.org/10.1007/s11227-021-04013-x

    Article  MathSciNet  Google Scholar 

  50. Truong Q, Nguyen M, Dang H, Mei B (2020) Housing price prediction via improved machine learning techniques. Procedia Comput Sci 174:433–442. https://doi.org/10.1016/j.procs.2020.06.111

    Article  Google Scholar 

  51. Akyildirim E, Bariviera AF, Nguyen DK, Sensoy A (2022) Forecasting high-frequency stock returns: a comparison of alternative methods. Ann Oper Res. https://doi.org/10.1007/s10479-021-04464-8

    Article  MathSciNet  MATH  Google Scholar 

  52. Roldán J, Boubeta-Puig J, Luis Martínez J, Ortiz G (2020) Integrating complex event processing and machine learning: an intelligent architecture for detecting IoT security attacks. Expert Syst Appl 149:113251. https://doi.org/10.1016/j.eswa.2020.113251

    Article  Google Scholar 

  53. Hutter F, Kotthoff L, Vanschoren J (2019) Automated machine learning: methods, systems, challenges. Springer, Berlin

    Book  Google Scholar 

  54. Agrapetidou A, Charonyktakis P, Gogas P, Papadimitriou T, Tsamardinos I (2021) An AutoML application to forecasting bank failures. Appl Econ Lett 28(1):5–9. https://doi.org/10.1080/13504851.2020.1725230

    Article  Google Scholar 

  55. Shah SY et al (2021) AutoAI-TS: AutoAI for time series forecasting. In: Proceedings of the 2021 international conference on management of data: association for computing machinery, pp 2584–2596

  56. Karmaker SK, Hassan MM, Smith MJ, Xu L, Zhai C, Veeramachaneni K (2021) AutoML to date and beyond: challenges and opportunities. ACM Comput Surv 54(8):175. https://doi.org/10.1145/3470918

    Article  Google Scholar 

  57. Mao Y et al (2019) How data scientistswork together with domain experts in scientific collaborations: To find the right answer or to ask the right question? Proc ACM Hum Comput Interact 3:237. https://doi.org/10.1145/3361118

    Article  Google Scholar 

  58. Sokol K, Flach P (2020) One explanation does not fit all. KI Künstliche Intell 34(2):235–250. https://doi.org/10.1007/s13218-020-00637-y

    Article  Google Scholar 

  59. Drozdal J et al (2020) Trust in AutoML: exploring information needs for establishing trust in automated machine learning systems. In: Presented at the proceedings of the 25th international conference on intelligent user interfaces, Cagliari, Italy, 2020. [Online]. Available: https://doi.org/10.1145/3377325.3377501

  60. Raza MO, Pathan N, Umar A, Bux R (2021) Activity recognition and creation of web service for activity recognition using mobile sensor data using azure machine learning studio. Rev Comput Eng Res 8(1):1–7. https://doi.org/10.18488/journal.76.2021.81.1.7

    Article  Google Scholar 

  61. Das P et al (2020) Amazon sagemaker autopilot: a white box AutoML solution at scale. In: Presented at the proceedings of the fourth international workshop on data management for end-to-end machine learning, Portland, OR, USA, 2020. [Online]. Available: https://doi.org/10.1145/3399579.3399870

  62. Wang D et al (2020) AutoAI: automating the end-to-end AI lifecycle with humans-in-the-loop. In; Presented at the proceedings of the 25th international conference on intelligent user interfaces companion, Cagliari, Italy, 2020. [Online]. Available: https://doi.org/10.1145/3379336.3381474

  63. Yakovlev A et al (2020) Oracle AutoML: a fast and predictive AutoML pipeline. Proc VLDB Endow 13(12):3166–3180. https://doi.org/10.14778/3415478.3415542

    Article  Google Scholar 

  64. Feurer M, Klein A, Eggensperger K, Springenberg JT, Blum M, Hutter F (2019) Auto-sklearn: efficient and robust automated machine learning. In: Hutter F, Kotthoff L, Vanschoren J (eds) Automated machine learning: methods, systems, challenges. Springer, Cham, pp 113–134

    Chapter  Google Scholar 

  65. Kotthoff L, Thornton C, Hoos HH, Hutter F, Leyton-Brown K (2019) Auto-WEKA: automatic model selection and hyperparameter optimization in WEKA. In: Hutter F, Kotthoff L, Vanschoren J (eds) Automated machine learning: methods, systems, challenges. Springer, Cham, pp 81–95

    Chapter  Google Scholar 

  66. Olson RS, Moore JH (2019) TPOT: a tree-based pipeline optimization tool for automating machine learning. In: Hutter F, Kotthoff L, Vanschoren J (eds) Automated machine learning: methods, systems, challenges. Springer, Cham, pp 151–160

    Chapter  Google Scholar 

  67. Wang Q et al (2019) ATMSeer: increasing transparency and controllability in automated machine learning. In: Proceedings of the 2019 CHI conference on human factors in computing systems

  68. Bahri M, Salutari F, Putina A, Sozio M (2022) AutoML: state of the art with a focus on anomaly detection, challenges, and research directions. Int J Data Sci Anal. https://doi.org/10.1007/s41060-022-00309-0

    Article  Google Scholar 

  69. Liu C, Dollár P, He K, Girshick R, Yuille A, Xie S (2020) Are labels necessary for neural architecture search?. In: Presented at the computer vision—ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV, Glasgow, United Kingdom, 2020. [Online]. Available: https://doi.org/10.1007/978-3-030-58548-8_46

  70. Hussain W, Raza MR, Jan MA, Merigó JM, Gao H (2022) Cloud risk management with OWA-LSTM and fuzzy linguistic decision making. IEEE Trans Fuzzy Syst 30(11):4657–4666. https://doi.org/10.1109/TFUZZ.2022.3157951

    Article  Google Scholar 

Download references

Acknowledgements

We would like to acknowledge that the datasets used in this paper are sourced from Yahoo Finance, Binance and Omenics.

Funding

This research was supported by Fujian Provincial Natural Science Foundation of China (Grant No. 2022J05291) and Xiamen Scientific Research Funding for Overseas Chinese Scholars.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weisi Chen.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, W., El Majzoub, A., Al-Qudah, I. et al. A CEP-driven framework for real-time news impact prediction on financial markets. SOCA 17, 129–144 (2023). https://doi.org/10.1007/s11761-023-00358-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11761-023-00358-8

Keywords