A CEP-driven framework for real-time news impact prediction on financial markets

Weisi Chen ORCID: orcid.org/0000-0001-8131-392X¹,
Ahmad El Majzoub²,
Islam Al-Qudah^3,4 &
…
Fethi A. Rabhi²

618 Accesses
Explore all metrics

Abstract

Real-time news impact prediction on financial markets is a challenging task for finance experts with limited IT expertise. Many practitioners build machine learning models trained with many low-level features extracted from multiple event-based streams (news and financial market data), which often leads to poor outcomes. State-of-the-art solutions either ignore domain-specific contexts or are customised to merely one type of static datasets rather than real-time data streams. In most cases, the domain expert would have to manually conduct data collection, data cleaning and aggregation, and machine learning step by step with the assistance of IT experts, which is time-consuming and complicated. To address these limitations, we propose a technique that uses real-time data pre-processing in accordance with domain-specific event patterns to generate better-quality datasets. This technique is supported by a systematic framework featuring a data model capturing domain-specific event patterns, an SOA-based architecture and processes that integrate the capabilities of sentiment analysis, complex event processing with automated machine learning (AutoML), facilitating event pattern detection and continual learning with sliding time windows. The benefit of adopting an SOA architecture is to ensure the flexibility of the selection and seamless integration of components. This solution allows domain experts to define domain-specific event patterns via a user-friendly interface and prepare better-quality datasets by pre-processing real-time data streams accordingly via the complex event processing component, aiming to generate meaningful prediction results by the downstream AutoML component. The AutoML component allows for minimal machine learning skills by the domain expert to conduct the prediction tasks. A prototype was implemented to evaluate its feasibility and functionality on a real-life price movement prediction scenario involving 3 years of news and financial market data. The results demonstrate that finance experts are able to complete news impact prediction tasks in real time without the intervention of IT experts, which saves a large amount of time compared with traditional machine learning processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

A Hybrid Approach for Stock Market Prediction Using Financial News and Stocktwits

DAViS: a unified solution for data collection, analyzation, and visualization in real-time stock market prediction

Article Open access 07 July 2021

Predicting Daily Trends in the Lima Stock Exchange General Index Using Economic Indicators and Financial News Sentiments

References

Statista. Volume of data/information created, captured, copied, and consumed worldwide from 2010 to 2025. https://www.statista.com/statistics/871513/worldwide-data-created/
Milosevic Z, Chen W, Berry A, Rabhi FA (2016) An open architecture for event-based analytics. Int J Data Sci Anal 2(1):13–27. https://doi.org/10.1007/s41060-016-0029-7
Article Google Scholar
Bifet A, Gavaldà R, Holmes G, Pfahringer B (2018) Machine learning for data streams: with practical examples in MOA. The MIT Press, Cambridge
Book Google Scholar
Rabhi FA, Mehandjiev N, Baghdadi A (2020) State-of-the-art in applying machine learning to electronic trading. In: Enterprise applications, markets and services in the finance industry. Springer, Cham, pp 3–20
Abdallah ZS, Du L, Webb GI (2016) Data preparation. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning and data mining. Springer, Boston, pp 1–11
Google Scholar
Hussain W, Merigó JM, Raza MR, Gao H (2022) A new QoS prediction model using hybrid IOWA-ANFIS with fuzzy C-means, subtractive clustering and grid partitioning. Inf Sci 584:280–300. https://doi.org/10.1016/j.ins.2021.10.054
Article Google Scholar
Rabhi FA, Yao L, Guabtni A (2012) ADAGE: a framework for supporting user-driven ad-hoc data analysis processes. Computing 94(6):489–519. https://doi.org/10.1007/s00607-012-0193-0
Article Google Scholar
He X, Zhao K, Chu X (2021) AutoML: a survey of the state-of-the-art. Knowl Based Syst 212:106622. https://doi.org/10.1016/j.knosys.2020.106622
Article Google Scholar
Hanussek M, Blohm M, Kintz M (2020) Can AutoML outperform humans? An evaluation on popular OpenML datasets using AutoML benchmark. In: 2020 2nd International conference on artificial intelligence, robotics and control
Chen W, Rabhi FA (2016) Enabling user-driven rule management in event data analysis. Inf Syst Front 18:511–528. https://doi.org/10.1007/s10796-016-9633-2
Article Google Scholar
Omenics. Omenics. https://omenics.com/
Google. Cloud AutoML. https://cloud.google.com/automl
Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Advances in information retrieval. Springer, Berlin, Heidelberg, pp 345–359
Hussain W, Gao H, Raza MR, Rabhi FA, Merigó JM (2022) Assessing cloud QoS predictions using OWA in neural network methods. Neural Comput Appl 34(17):14895–14912. https://doi.org/10.1007/s00521-022-07297-z
Article Google Scholar
Apache. Flink. https://flink.apache.org/
Oinn T et al (2004) Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17):3045–3054. https://doi.org/10.1093/bioinformatics/bth361
Article Google Scholar
Allen DE, McAleer M, Singh AK (2019) Daily market news sentiment and stock prices. Appl Econ 51(30):3212–3235. https://doi.org/10.1080/00036846.2018.1564115
Article Google Scholar
Rosenthal S, Farra N, Nakov P (2017) SemEval-2017 task 4: sentiment analysis in Twitter. In: Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017). Association for Computational Linguistics, Vancouver, Canada, pp 502–518. https://doi.org/10.18653/v1/S17-2088. [Online]. Available: https://aclanthology.org/S17-2088
Taj S, Shaikh BB, Meghji AF (2019) Sentiment analysis of news articles: a lexicon based approach. In: 2019 2nd International conference on computing, mathematics and engineering technologies (iCoMET), 30–31 Jan. 2019, pp 1–5. https://doi.org/10.1109/ICOMET.2019.8673428.
Feuerriegel S, Gordon J (2018) Long-term stock index forecasting based on text mining of regulatory disclosures. Decis Support Syst 112:88–97
Article Google Scholar
Feuerriegel S, Ratku A, Neumann D (2016) Analysis of how underlying topics in financial news affect stock prices using latent dirichlet allocation. In: 2016 49th Hawaii international conference on system sciences (HICSS), 5–8 Jan. 2016, pp 1072–1081. https://doi.org/10.1109/HICSS.2016.137
Chen W, Liu B, Zhang X, Al-Qudah I (2022) An event-based framework for facilitating real-time sentiment analysis in educational contexts. In :2022 11th International conference on educational and information technology (ICEIT), 6–8 Jan. 2022, pp 57–61. https://doi.org/10.1109/ICEIT54416.2022.9690729
Microsoft. Azure streaming analytics. https://azure.microsoft.com/en-us/services/stream-analytics/
Amazon. Amazon streaming. https://aws.amazon.com/streaming-data/
Apache. Kafka. https://kafka.apache.org
Apache. Flume. https://flume.apache.org
Apache. Spark. https://spark.apache.org
Apache. Storm. http://storm.apache.org
Cloudera. Streaming analytics. https://docs.cloudera.com/csa
Luckham D (2012) Event processing for business: organizing the real-time enterprise
Grez A, Riveros C, Ugarte M, Vansummeren S (2021) A formal framework for complex event recognition. ACM Trans Database Syst 46(4):16. https://doi.org/10.1145/3485463
Article MathSciNet MATH Google Scholar
EsperTech. Esper. https://www.espertech.com/esper/
SoftwareAG. Apama. https://www.softwareag.com/en_corporate/platform/iot/apama.html
Milosevic Z, Chen W, Berry A, Rabhi FA (2016) Chapter 2—real-time analytics. In: Buyya R, Calheiros RN, Dastjerdi AV (eds) Big Data. Morgan Kaufmann, Burlington, pp 39–61
Chapter Google Scholar
Adi E, Anwar A, Baig Z, Zeadally S (2020) Machine learning and data analytics for the IoT. Neural Comput Appl 32(20):16205–16233. https://doi.org/10.1007/s00521-020-04874-y
Article Google Scholar
Giatrakos N, Alevizos E, Artikis A, Deligiannakis A, Garofalakis M (2020) Complex event recognition in the Big Data era: a survey. VLDB J 29(1):313–352. https://doi.org/10.1007/s00778-019-00557-w
Article Google Scholar
Zhu X (2021) Complex event detection for commodity distribution Internet of Things model incorporating radio frequency identification and wireless sensor network. Future Gener Comput Syst 125:100–111. https://doi.org/10.1016/j.future.2021.06.024
Article Google Scholar
Mahdavinejad MS, Rezvan M, Barekatain M, Adibi P, Barnaghi P, Sheth AP (2018) Machine learning for internet of things data analysis: a survey. Digit Commun Netw 4(3):161–175. https://doi.org/10.1016/j.dcan.2017.10.002
Article Google Scholar
Kaur P, Sharma M, Mittal M (2018) Big data and machine learning based secure healthcare framework. Procedia Comput Sci 132:1049–1059. https://doi.org/10.1016/j.procs.2018.05.020
Article Google Scholar
Tensorflow. Robust machine learning on streaming data using Kafka and Tensorflow-IO. https://www.tensorflow.org/io/tutorials/kafka
Luong NNT, Milosevic Z, Berry A, Rabhi F (2020) An open architecture for complex event processing with machine learning. In: 2020 IEEE 24th international enterprise distributed object computing conference (EDOC), 5–8 Oct. 2020, pp 51–56. https://doi.org/10.1109/EDOC49727.2020.00016
Huang B, Huan Y, Xu LD, Zheng L, Zou Z (2019) Automated trading systems statistical and machine learning methods and hardware implementation: a survey. Enterp Inf Syst 13(1):132–144. https://doi.org/10.1080/17517575.2018.1493145
Article Google Scholar
Huck N (2019) Large data sets and machine learning: applications to statistical arbitrage. Eur J Oper Res 278(1):330–342. https://doi.org/10.1016/j.ejor.2019.04.013
Article MathSciNet MATH Google Scholar
Li X et al (2016) Empirical analysis: stock market prediction via extreme learning machine. Neural Comput Appl 27(1):67–78. https://doi.org/10.1007/s00521-014-1550-z
Article Google Scholar
Paiva FD, Cardoso RTN, Hanaoka GP, Duarte WM (2019) Decision-making for financial trading: a fusion approach of machine learning and portfolio selection. Expert Syst Appl 115:635–655. https://doi.org/10.1016/j.eswa.2018.08.003
Article Google Scholar
Bhardwaj A, Yang J, Cudré-Mauroux P (2020) A human-AI loop approach for joint keyword discovery and expectation estimation in micropost event detection. Proc AAAI Conf Artif Intell 34(03):2451–2458. https://doi.org/10.1609/aaai.v34i03.5626
Article Google Scholar
Khan W, Ghazanfar MA, Azam MA, Karami A, Alyoubi KH, Alfakeeh AS (2020) Stock market prediction using machine learning classifiers and social media, news. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-01839-w
Article Google Scholar
Hussain W, Merigó JM, Raza MR (2022) Predictive intelligence using ANFIS-induced OWAWA for complex stock market prediction. Int J Intell Syst 37(8):4586–4611. https://doi.org/10.1002/int.22732
Article Google Scholar
Lu J-Y et al (2022) Structural break-aware pairs trading strategy using deep reinforcement learning. J Supercomput 78(3):3843–3882. https://doi.org/10.1007/s11227-021-04013-x
Article MathSciNet Google Scholar
Truong Q, Nguyen M, Dang H, Mei B (2020) Housing price prediction via improved machine learning techniques. Procedia Comput Sci 174:433–442. https://doi.org/10.1016/j.procs.2020.06.111
Article Google Scholar
Akyildirim E, Bariviera AF, Nguyen DK, Sensoy A (2022) Forecasting high-frequency stock returns: a comparison of alternative methods. Ann Oper Res. https://doi.org/10.1007/s10479-021-04464-8
Article MathSciNet MATH Google Scholar
Roldán J, Boubeta-Puig J, Luis Martínez J, Ortiz G (2020) Integrating complex event processing and machine learning: an intelligent architecture for detecting IoT security attacks. Expert Syst Appl 149:113251. https://doi.org/10.1016/j.eswa.2020.113251
Article Google Scholar
Hutter F, Kotthoff L, Vanschoren J (2019) Automated machine learning: methods, systems, challenges. Springer, Berlin
Book Google Scholar
Agrapetidou A, Charonyktakis P, Gogas P, Papadimitriou T, Tsamardinos I (2021) An AutoML application to forecasting bank failures. Appl Econ Lett 28(1):5–9. https://doi.org/10.1080/13504851.2020.1725230
Article Google Scholar
Shah SY et al (2021) AutoAI-TS: AutoAI for time series forecasting. In: Proceedings of the 2021 international conference on management of data: association for computing machinery, pp 2584–2596
Karmaker SK, Hassan MM, Smith MJ, Xu L, Zhai C, Veeramachaneni K (2021) AutoML to date and beyond: challenges and opportunities. ACM Comput Surv 54(8):175. https://doi.org/10.1145/3470918
Article Google Scholar
Mao Y et al (2019) How data scientistswork together with domain experts in scientific collaborations: To find the right answer or to ask the right question? Proc ACM Hum Comput Interact 3:237. https://doi.org/10.1145/3361118
Article Google Scholar
Sokol K, Flach P (2020) One explanation does not fit all. KI Künstliche Intell 34(2):235–250. https://doi.org/10.1007/s13218-020-00637-y
Article Google Scholar
Drozdal J et al (2020) Trust in AutoML: exploring information needs for establishing trust in automated machine learning systems. In: Presented at the proceedings of the 25th international conference on intelligent user interfaces, Cagliari, Italy, 2020. [Online]. Available: https://doi.org/10.1145/3377325.3377501
Raza MO, Pathan N, Umar A, Bux R (2021) Activity recognition and creation of web service for activity recognition using mobile sensor data using azure machine learning studio. Rev Comput Eng Res 8(1):1–7. https://doi.org/10.18488/journal.76.2021.81.1.7
Article Google Scholar
Das P et al (2020) Amazon sagemaker autopilot: a white box AutoML solution at scale. In: Presented at the proceedings of the fourth international workshop on data management for end-to-end machine learning, Portland, OR, USA, 2020. [Online]. Available: https://doi.org/10.1145/3399579.3399870
Wang D et al (2020) AutoAI: automating the end-to-end AI lifecycle with humans-in-the-loop. In; Presented at the proceedings of the 25th international conference on intelligent user interfaces companion, Cagliari, Italy, 2020. [Online]. Available: https://doi.org/10.1145/3379336.3381474
Yakovlev A et al (2020) Oracle AutoML: a fast and predictive AutoML pipeline. Proc VLDB Endow 13(12):3166–3180. https://doi.org/10.14778/3415478.3415542
Article Google Scholar
Feurer M, Klein A, Eggensperger K, Springenberg JT, Blum M, Hutter F (2019) Auto-sklearn: efficient and robust automated machine learning. In: Hutter F, Kotthoff L, Vanschoren J (eds) Automated machine learning: methods, systems, challenges. Springer, Cham, pp 113–134
Chapter Google Scholar
Kotthoff L, Thornton C, Hoos HH, Hutter F, Leyton-Brown K (2019) Auto-WEKA: automatic model selection and hyperparameter optimization in WEKA. In: Hutter F, Kotthoff L, Vanschoren J (eds) Automated machine learning: methods, systems, challenges. Springer, Cham, pp 81–95
Chapter Google Scholar
Olson RS, Moore JH (2019) TPOT: a tree-based pipeline optimization tool for automating machine learning. In: Hutter F, Kotthoff L, Vanschoren J (eds) Automated machine learning: methods, systems, challenges. Springer, Cham, pp 151–160
Chapter Google Scholar
Wang Q et al (2019) ATMSeer: increasing transparency and controllability in automated machine learning. In: Proceedings of the 2019 CHI conference on human factors in computing systems
Bahri M, Salutari F, Putina A, Sozio M (2022) AutoML: state of the art with a focus on anomaly detection, challenges, and research directions. Int J Data Sci Anal. https://doi.org/10.1007/s41060-022-00309-0
Article Google Scholar
Liu C, Dollár P, He K, Girshick R, Yuille A, Xie S (2020) Are labels necessary for neural architecture search?. In: Presented at the computer vision—ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV, Glasgow, United Kingdom, 2020. [Online]. Available: https://doi.org/10.1007/978-3-030-58548-8_46
Hussain W, Raza MR, Jan MA, Merigó JM, Gao H (2022) Cloud risk management with OWA-LSTM and fuzzy linguistic decision making. IEEE Trans Fuzzy Syst 30(11):4657–4666. https://doi.org/10.1109/TFUZZ.2022.3157951
Article Google Scholar

Download references

Acknowledgements

We would like to acknowledge that the datasets used in this paper are sourced from Yahoo Finance, Binance and Omenics.

Funding

This research was supported by Fujian Provincial Natural Science Foundation of China (Grant No. 2022J05291) and Xiamen Scientific Research Funding for Overseas Chinese Scholars.

Author information

Authors and Affiliations

Xiamen University of Technology, 600 Ligong Rd, Jimei District, Xiamen, 361024, Fujian, China
Weisi Chen
The University of New South Wales, Sydney, NSW, 2052, Australia
Ahmad El Majzoub & Fethi A. Rabhi
University of Sharjah, University City Rd, University City, Sharjah, United Arab Emirates
Islam Al-Qudah
Higher Colleges of Technology, Abu Dhabi, United Arab Emirates
Islam Al-Qudah

Authors

Weisi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad El Majzoub
View author publications
You can also search for this author in PubMed Google Scholar
Islam Al-Qudah
View author publications
You can also search for this author in PubMed Google Scholar
Fethi A. Rabhi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weisi Chen.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, W., El Majzoub, A., Al-Qudah, I. et al. A CEP-driven framework for real-time news impact prediction on financial markets. SOCA 17, 129–144 (2023). https://doi.org/10.1007/s11761-023-00358-8

Download citation

Received: 02 August 2022
Revised: 21 January 2023
Accepted: 13 February 2023
Published: 01 March 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11761-023-00358-8

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Hybrid Approach for Stock Market Prediction Using Financial News and Stocktwits

DAViS: a unified solution for data collection, analyzation, and visualization in real-time stock market prediction

Predicting Daily Trends in the Lima Stock Exchange General Index Using Economic Indicators and Financial News Sentiments

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

A CEP-driven framework for real-time news impact prediction on financial markets

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Hybrid Approach for Stock Market Prediction Using Financial News and Stocktwits

DAViS: a unified solution for data collection, analyzation, and visualization in real-time stock market prediction

Predicting Daily Trends in the Lima Stock Exchange General Index Using Economic Indicators and Financial News Sentiments

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now