[go: up one dir, main page]

Skip to main content
Log in

A Novel Evaluation Framework for Medical LLMs: Combining Fuzzy Logic and MCDM for Medical Relation and Clinical Concept Extraction

  • Original Paper
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

Artificial intelligence (AI) has become a crucial element of modern technology, especially in the healthcare sector, which is apparent given the continuous development of large language models (LLMs), which are utilized in various domains, including medical beings. However, when it comes to using these LLMs for the medical domain, there’s a need for an evaluation platform to determine their suitability and drive future development efforts. Towards that end, this study aims to address this concern by developing a comprehensive Multi-Criteria Decision Making (MCDM) approach that is specifically designed to evaluate medical LLMs. The success of AI, particularly LLMs, in the healthcare domain, depends on their efficacy, safety, and ethical compliance. Therefore, it is essential to have a robust evaluation framework for their integration into medical contexts. This study proposes using the Fuzzy-Weighted Zero-InConsistency (FWZIC) method extended to p, q-quasirung orthopair fuzzy set (p, q-QROFS) for weighing evaluation criteria. This extension enables the handling of uncertainties inherent in medical decision-making processes. The approach accommodates the imprecise and multifaceted nature of real-world medical data and criteria by incorporating fuzzy logic principles. The MultiAtributive Ideal-Real Comparative Analysis (MAIRCA) method is employed for the assessment of medical LLMs utilized in the case study of this research. The results of this research revealed that “Medical Relation Extraction” criteria with its sub-levels had more importance with (0.504) than “Clinical Concept Extraction” with (0.495). For the LLMs evaluated, out of 6 alternatives, (\(A4\)) “GatorTron S 10B” had the 1st rank as compared to (\(A1\)) “GatorTron 90B” had the 6th rank. The implications of this study extend beyond academic discourse, directly impacting healthcare practices and patient outcomes. The proposed framework can help healthcare professionals make more informed decisions regarding the adoption and utilization of LLMs in medical settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

No datasets were generated or analysed during the current study.

References

  1. C. Xingxin, Z. Xin, and W. Gangming, "Research on online fault detection tool of substation equipment based on artificial intelligence," Journal of King Saud University-Science, vol. 34, no. 6, p. 102149, 2022.

    Article  Google Scholar 

  2. M. Pournader, H. Ghaderi, A. Hassanzadegan, and B. Fahimnia, "Artificial intelligence applications in supply chain management," International Journal of Production Economics, vol. 241, p. 108250, 2021.

    Article  Google Scholar 

  3. A. Zirar, S. I. Ali, and N. Islam, "Worker and workplace Artificial Intelligence (AI) coexistence: Emerging themes and research agenda," Technovation, vol. 124, p. 102747, 2023.

    Article  Google Scholar 

  4. A. R. Malik, Y. Pratiwi, K. Andajani, I. W. Numertayasa, S. Suharti, and A. Darwis, "Exploring Artificial Intelligence in Academic Essay: Higher Education Student's Perspective," International Journal of Educational Research Open, vol. 5, p. 100296, 2023.

    Article  Google Scholar 

  5. G. Kaur, P. Tomar, and M. Tanque, Artificial intelligence to solve pervasive internet of things issues. Academic Press, 2020.

  6. S. Tuli et al., "AI augmented Edge and Fog computing: Trends and challenges," Journal of Network and Computer Applications, p. 103648, 2023.

  7. K. Panesar and M. B. P. C. de Alba, "Natural language processing-driven framework for the early detection of language and cognitive decline," Language and Health, 2023.

  8. O. Nov, N. Singh, and D. M. Mann, "Putting ChatGPT's medical advice to the (Turing) test," medRxiv, p. 2023.01. 23.23284735, 2023.

  9. T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, "Large language models are zero-shot reasoners," Advances in neural information processing systems, vol. 35, pp. 22199-22213, 2022.

    Google Scholar 

  10. C. Zhang, J. Chen, J. Li, Y. Peng, and Z. Mao, "Large language models for human-robot interaction: A review," Biomimetic Intelligence and Robotics, p. 100131, 2023.

  11. A. H. Huang, H. Wang, and Y. Yang, "FinBERT: A large language model for extracting information from financial text," Contemporary Accounting Research, vol. 40, no. 2, pp. 806-841, 2023.

    Article  Google Scholar 

  12. R. Taylor et al., "Galactica: A large language model for science," arXiv preprint arXiv:2211.09085, 2022.

  13. X. Yang et al., "A large language model for electronic health records," NPJ Digital Medicine, vol. 5, no. 1, p. 194, 2022.

    Article  PubMed  PubMed Central  Google Scholar 

  14. H. Jung et al., "Enhancing Clinical Efficiency through LLM: Discharge Note Generation for Cardiac Patients," arXiv preprint arXiv:2404.05144, 2024.

  15. J. Barile et al., "Diagnostic accuracy of a large language model in pediatric case studies," JAMA pediatrics, 2024.

  16. B. Kasper and A. Brownfield, "Evaluation of a newly established layered learning model in an ambulatory care practice setting," Currents in Pharmacy Teaching and Learning, vol. 10, no. 7, pp. 925-932, 2018.

    Article  PubMed  Google Scholar 

  17. U. P. Liyanage and N. D. Ranaweera, "Ethical considerations and potential risks in the deployment of large Language Models in diverse societal contexts," Journal of Computational Social Dynamics, vol. 8, no. 11, pp. 15-25, 2023.

    Google Scholar 

  18. J. Yuan, R. Tang, X. Jiang, and X. Hu, "Llm for patient-trial matching: Privacy-aware data augmentation towards better performance and generalizability," in American Medical Informatics Association (AMIA) Annual Symposium, 2023.

  19. A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, L. Gutierrez, T. F. Tan, and D. S. W. Ting, "Large language models in medicine," Nature medicine, vol. 29, no. 8, pp. 1930-1940, 2023.

    Article  CAS  PubMed  Google Scholar 

  20. C. Peng et al., "A Study of Generative Large Language Model for Medical Research and Healthcare," arXiv preprint arXiv:2305.13523, 2023.

  21. L. Gao et al., "The pile: An 800gb dataset of diverse text for language modeling," arXiv preprint arXiv:2101.00027, 2020.

  22. T. L. Saaty, "The analytic hierarchy process: planning, priority setting, resource allocation," ed: McGraw-Hill, New York London, 1980.

  23. R. L. Keeney and H. Raiffa, Decisions with multiple objectives: preferences and value trade-offs. Cambridge university press, 1993.

  24. V. Belton and T. Stewart, Multiple criteria decision analysis: an integrated approach. Springer Science & Business Media, 2002.

  25. T. L. J. E. j. o. o. r. Saaty, "How to make a decision: the analytic hierarchy process," vol. 48, no. 1, pp. 9–26, 1990.

  26. G.-H. Tzeng and J.-J. Huang, Multiple attribute decision making: methods and applications. CRC press, 2011.

  27. E. Triantaphyllou and E. Triantaphyllou, Multi-criteria decision making methods. Springer, 2000.

  28. B. Roy, Multicriteria methodology for decision aiding. Springer Science & Business Media, 2013.

  29. K. T. Atanassov and S. Stoeva, "Intuitionistic fuzzy sets," Fuzzy sets and Systems, vol. 20, no. 1, pp. 87-96, 1986.

    Article  Google Scholar 

  30. M. R. Seikh and U. Mandal, "Multiple attribute group decision making based on quasirung orthopair fuzzy sets: Application to electric vehicle charging station site selection problem," Engineering Applications of Artificial Intelligence, vol. 115, p. 105299, 2022.

    Article  Google Scholar 

  31. R. Mohammed et al., "Determining importance of many-objective optimisation competitive algorithms evaluation criteria based on a novel fuzzy-weighted zero-inconsistency method," International Journal of Information Technology & Decision Making, vol. 21, no. 01, pp. 195-241, 2022.

    Article  Google Scholar 

  32. D. S. Pamucar, S. P. Tarle, and T. Parezanovic, "New hybrid multi-criteria decision-making DEMATEL-MAIRCA model: sustainable selection of a location for the development of multimodal logistics centre," Economic Research-Ekonomska Istraživanja, vol. 31, no. 1, pp. 1641–1665, 2018/01/01 2018, https://doi.org/10.1080/1331677X.2018.1506706.

  33. A. Alamoodi et al., "Based on neutrosophic fuzzy environment: a new development of FWZIC and FDOSM for benchmarking smart e-tourism applications," Complex & Intelligent Systems, vol. 8, no. 4, pp. 3479-3503, 2022.

    Article  Google Scholar 

  34. A. Alamoodi et al., "New extension of fuzzy-weighted zero-inconsistency and fuzzy decision by opinion score method based on cubic pythagorean fuzzy environment: a benchmarking case study of sign language recognition systems," International Journal of Fuzzy Systems, vol. 24, no. 4, pp. 1909-1926, 2022.

    Article  Google Scholar 

  35. E. Krishnan et al., "Interval type 2 trapezoidal‐fuzzy weighted with zero inconsistency combined with VIKOR for evaluating smart e‐tourism applications," International Journal of Intelligent Systems, vol. 36, no. 9, pp. 4723-4774, 2021.

    Article  Google Scholar 

  36. K. Chatterjee, D. Pamucar, and E. K. Zavadskas, "Evaluating the performance of suppliers based on using the R'AMATEL-MAIRCA method for green supply chain implementation in electronics industry," Journal of cleaner production, vol. 184, pp. 101-129, 2018.

    Article  Google Scholar 

  37. K. Huang, J. Altosaar, and R. Ranganath, "Clinicalbert: Modeling clinical notes and predicting hospital readmission," arXiv preprint arXiv:1904.05342, 2019.

  38. L. Floridi and M. Chiriatti, "GPT-3: Its nature, scope, limits, and consequences," Minds and Machines, vol. 30, pp. 681-694, 2020.

    Article  Google Scholar 

  39. J. Lee et al., "BioBERT: a pre-trained biomedical language representation model for biomedical text mining," Bioinformatics, vol. 36, no. 4, pp. 1234-1240, 2020.

    Article  CAS  PubMed  Google Scholar 

  40. X. Yang, J. Bian, R. Fang, R. I. Bjarnadottir, W. R. Hogan, and Y. Wu, "Identifying relations of medications with adverse drug events using recurrent convolutional neural networks and gradient boosting," Journal of the American Medical Informatics Association, vol. 27, no. 1, pp. 65-72, 2020.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by Tenaga Nasional Berhad (TNB) and UNITEN through the BOLD Refresh Postdoctoral Fellowships under the project code of J510050002-IC-6 BOLDREFRESH2025-Centre of Excellence.

Author information

Authors and Affiliations

Authors

Contributions

A.H. Alamoodi, Omar Zughoul, and Dianese David: Writing- Original draft preparation, Salem Garfan, and O.S. Albahri: Conceptualization, Methodology, Dragan Pamucar: Conceptualization, A.S. Albahri: Project Administration, Salman Yussof: Manuscript Revision, Iman Mohamad Sharaf: Writing—Review & Editing.

Corresponding author

Correspondence to A. H. Alamoodi.

Ethics declarations

Ethical Approval

No ethical approval is required for this study.

Ethics approval and consent to participate

All authors in the manuscript consent to participating.

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 28 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alamoodi, A.H., Zughoul, O., David, D. et al. A Novel Evaluation Framework for Medical LLMs: Combining Fuzzy Logic and MCDM for Medical Relation and Clinical Concept Extraction. J Med Syst 48, 81 (2024). https://doi.org/10.1007/s10916-024-02090-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-024-02090-y

Keywords

Navigation