A Novel Evaluation Framework for Medical LLMs: Combining Fuzzy Logic and MCDM for Medical Relation and Clinical Concept Extraction

A. H. Alamoodi^1,2,3,
Omar Zughoul⁴,
Dianese David⁵,
Salem Garfan⁵,
Dragan Pamucar^6,7,8,
O. S. Albahri^9,10,
A. S. Albahri^11,12,
Salman Yussof^1,13 &
…
Iman Mohamad Sharaf¹⁴

14 Accesses
Explore all metrics

Abstract

Artificial intelligence (AI) has become a crucial element of modern technology, especially in the healthcare sector, which is apparent given the continuous development of large language models (LLMs), which are utilized in various domains, including medical beings. However, when it comes to using these LLMs for the medical domain, there’s a need for an evaluation platform to determine their suitability and drive future development efforts. Towards that end, this study aims to address this concern by developing a comprehensive Multi-Criteria Decision Making (MCDM) approach that is specifically designed to evaluate medical LLMs. The success of AI, particularly LLMs, in the healthcare domain, depends on their efficacy, safety, and ethical compliance. Therefore, it is essential to have a robust evaluation framework for their integration into medical contexts. This study proposes using the Fuzzy-Weighted Zero-InConsistency (FWZIC) method extended to p, q-quasirung orthopair fuzzy set (p, q-QROFS) for weighing evaluation criteria. This extension enables the handling of uncertainties inherent in medical decision-making processes. The approach accommodates the imprecise and multifaceted nature of real-world medical data and criteria by incorporating fuzzy logic principles. The MultiAtributive Ideal-Real Comparative Analysis (MAIRCA) method is employed for the assessment of medical LLMs utilized in the case study of this research. The results of this research revealed that “Medical Relation Extraction” criteria with its sub-levels had more importance with (0.504) than “Clinical Concept Extraction” with (0.495). For the LLMs evaluated, out of 6 alternatives, ($A4$) “GatorTron S 10B” had the 1st rank as compared to ($A1$) “GatorTron 90B” had the 6th rank. The implications of this study extend beyond academic discourse, directly impacting healthcare practices and patient outcomes. The proposed framework can help healthcare professionals make more informed decisions regarding the adoption and utilization of LLMs in medical settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi Criteria Decision Making Under Fuzzy, Intuitionistic and Interval-Valued Intuitionistic Fuzzy Environment: A Review

FLINTSTONES 2.0 an Open and Comprehensive Fuzzy Tool for Multi-criteria Decision Analysis

New Methods for Comparing Interval-Valued Fuzzy Cardinal Numbers

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

No datasets were generated or analysed during the current study.

References

C. Xingxin, Z. Xin, and W. Gangming, "Research on online fault detection tool of substation equipment based on artificial intelligence," Journal of King Saud University-Science, vol. 34, no. 6, p. 102149, 2022.
Article Google Scholar
M. Pournader, H. Ghaderi, A. Hassanzadegan, and B. Fahimnia, "Artificial intelligence applications in supply chain management," International Journal of Production Economics, vol. 241, p. 108250, 2021.
Article Google Scholar
A. Zirar, S. I. Ali, and N. Islam, "Worker and workplace Artificial Intelligence (AI) coexistence: Emerging themes and research agenda," Technovation, vol. 124, p. 102747, 2023.
Article Google Scholar
A. R. Malik, Y. Pratiwi, K. Andajani, I. W. Numertayasa, S. Suharti, and A. Darwis, "Exploring Artificial Intelligence in Academic Essay: Higher Education Student's Perspective," International Journal of Educational Research Open, vol. 5, p. 100296, 2023.
Article Google Scholar
G. Kaur, P. Tomar, and M. Tanque, Artificial intelligence to solve pervasive internet of things issues. Academic Press, 2020.
S. Tuli et al., "AI augmented Edge and Fog computing: Trends and challenges," Journal of Network and Computer Applications, p. 103648, 2023.
K. Panesar and M. B. P. C. de Alba, "Natural language processing-driven framework for the early detection of language and cognitive decline," Language and Health, 2023.
O. Nov, N. Singh, and D. M. Mann, "Putting ChatGPT's medical advice to the (Turing) test," medRxiv, p. 2023.01. 23.23284735, 2023.
T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, "Large language models are zero-shot reasoners," Advances in neural information processing systems, vol. 35, pp. 22199-22213, 2022.
Google Scholar
C. Zhang, J. Chen, J. Li, Y. Peng, and Z. Mao, "Large language models for human-robot interaction: A review," Biomimetic Intelligence and Robotics, p. 100131, 2023.
A. H. Huang, H. Wang, and Y. Yang, "FinBERT: A large language model for extracting information from financial text," Contemporary Accounting Research, vol. 40, no. 2, pp. 806-841, 2023.
Article Google Scholar
R. Taylor et al., "Galactica: A large language model for science," arXiv preprint arXiv:2211.09085, 2022.
X. Yang et al., "A large language model for electronic health records," NPJ Digital Medicine, vol. 5, no. 1, p. 194, 2022.
Article PubMed PubMed Central Google Scholar
H. Jung et al., "Enhancing Clinical Efficiency through LLM: Discharge Note Generation for Cardiac Patients," arXiv preprint arXiv:2404.05144, 2024.
J. Barile et al., "Diagnostic accuracy of a large language model in pediatric case studies," JAMA pediatrics, 2024.
B. Kasper and A. Brownfield, "Evaluation of a newly established layered learning model in an ambulatory care practice setting," Currents in Pharmacy Teaching and Learning, vol. 10, no. 7, pp. 925-932, 2018.
Article PubMed Google Scholar
U. P. Liyanage and N. D. Ranaweera, "Ethical considerations and potential risks in the deployment of large Language Models in diverse societal contexts," Journal of Computational Social Dynamics, vol. 8, no. 11, pp. 15-25, 2023.
Google Scholar
J. Yuan, R. Tang, X. Jiang, and X. Hu, "Llm for patient-trial matching: Privacy-aware data augmentation towards better performance and generalizability," in American Medical Informatics Association (AMIA) Annual Symposium, 2023.
A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, L. Gutierrez, T. F. Tan, and D. S. W. Ting, "Large language models in medicine," Nature medicine, vol. 29, no. 8, pp. 1930-1940, 2023.
Article CAS PubMed Google Scholar
C. Peng et al., "A Study of Generative Large Language Model for Medical Research and Healthcare," arXiv preprint arXiv:2305.13523, 2023.
L. Gao et al., "The pile: An 800gb dataset of diverse text for language modeling," arXiv preprint arXiv:2101.00027, 2020.
T. L. Saaty, "The analytic hierarchy process: planning, priority setting, resource allocation," ed: McGraw-Hill, New York London, 1980.
R. L. Keeney and H. Raiffa, Decisions with multiple objectives: preferences and value trade-offs. Cambridge university press, 1993.
V. Belton and T. Stewart, Multiple criteria decision analysis: an integrated approach. Springer Science & Business Media, 2002.
T. L. J. E. j. o. o. r. Saaty, "How to make a decision: the analytic hierarchy process," vol. 48, no. 1, pp. 9–26, 1990.
G.-H. Tzeng and J.-J. Huang, Multiple attribute decision making: methods and applications. CRC press, 2011.
E. Triantaphyllou and E. Triantaphyllou, Multi-criteria decision making methods. Springer, 2000.
B. Roy, Multicriteria methodology for decision aiding. Springer Science & Business Media, 2013.
K. T. Atanassov and S. Stoeva, "Intuitionistic fuzzy sets," Fuzzy sets and Systems, vol. 20, no. 1, pp. 87-96, 1986.
Article Google Scholar
M. R. Seikh and U. Mandal, "Multiple attribute group decision making based on quasirung orthopair fuzzy sets: Application to electric vehicle charging station site selection problem," Engineering Applications of Artificial Intelligence, vol. 115, p. 105299, 2022.
Article Google Scholar
R. Mohammed et al., "Determining importance of many-objective optimisation competitive algorithms evaluation criteria based on a novel fuzzy-weighted zero-inconsistency method," International Journal of Information Technology & Decision Making, vol. 21, no. 01, pp. 195-241, 2022.
Article Google Scholar
D. S. Pamucar, S. P. Tarle, and T. Parezanovic, "New hybrid multi-criteria decision-making DEMATEL-MAIRCA model: sustainable selection of a location for the development of multimodal logistics centre," Economic Research-Ekonomska Istraživanja, vol. 31, no. 1, pp. 1641–1665, 2018/01/01 2018, https://doi.org/10.1080/1331677X.2018.1506706.
A. Alamoodi et al., "Based on neutrosophic fuzzy environment: a new development of FWZIC and FDOSM for benchmarking smart e-tourism applications," Complex & Intelligent Systems, vol. 8, no. 4, pp. 3479-3503, 2022.
Article Google Scholar
A. Alamoodi et al., "New extension of fuzzy-weighted zero-inconsistency and fuzzy decision by opinion score method based on cubic pythagorean fuzzy environment: a benchmarking case study of sign language recognition systems," International Journal of Fuzzy Systems, vol. 24, no. 4, pp. 1909-1926, 2022.
Article Google Scholar
E. Krishnan et al., "Interval type 2 trapezoidal‐fuzzy weighted with zero inconsistency combined with VIKOR for evaluating smart e‐tourism applications," International Journal of Intelligent Systems, vol. 36, no. 9, pp. 4723-4774, 2021.
Article Google Scholar
K. Chatterjee, D. Pamucar, and E. K. Zavadskas, "Evaluating the performance of suppliers based on using the R'AMATEL-MAIRCA method for green supply chain implementation in electronics industry," Journal of cleaner production, vol. 184, pp. 101-129, 2018.
Article Google Scholar
K. Huang, J. Altosaar, and R. Ranganath, "Clinicalbert: Modeling clinical notes and predicting hospital readmission," arXiv preprint arXiv:1904.05342, 2019.
L. Floridi and M. Chiriatti, "GPT-3: Its nature, scope, limits, and consequences," Minds and Machines, vol. 30, pp. 681-694, 2020.
Article Google Scholar
J. Lee et al., "BioBERT: a pre-trained biomedical language representation model for biomedical text mining," Bioinformatics, vol. 36, no. 4, pp. 1234-1240, 2020.
Article CAS PubMed Google Scholar
X. Yang, J. Bian, R. Fang, R. I. Bjarnadottir, W. R. Hogan, and Y. Wu, "Identifying relations of medications with adverse drug events using recurrent convolutional neural networks and gradient boosting," Journal of the American Medical Informatics Association, vol. 27, no. 1, pp. 65-72, 2020.
Article PubMed Google Scholar

Download references

Acknowledgements

This work was supported by Tenaga Nasional Berhad (TNB) and UNITEN through the BOLD Refresh Postdoctoral Fellowships under the project code of J510050002-IC-6 BOLDREFRESH2025-Centre of Excellence.

Author information

Authors and Affiliations

Institute of Informatics and Computing in Energy, Universiti Tenaga Nasional, Kajang, Malaysia
A. H. Alamoodi & Salman Yussof
Applied Science Research Center, Applied Science Private University, Amman, Jordan
A. H. Alamoodi
MEU Research Unit, Middle East University, Amman, Jordan
A. H. Alamoodi
Information Systems and Computer Science Department, Ahmed bin Mohammed Military College, Al-Shahaniya, Qatar
Omar Zughoul
Faculty of Computing and Meta-Technology (FKMT), Universiti Pendidikan Sultan Idris (UPSI), Perak, Malaysia
Dianese David & Salem Garfan
Széchenyi István University, Győr, Hungary
Dragan Pamucar
Department of Industrial Engineering & Management, Yuan Ze University, Taoyuan, 320315, Taiwan
Dragan Pamucar
Department of Mechanics and Mathematics, Western Caspian University, Baku, Azerbaijan
Dragan Pamucar
Australian Technical and Management College, Melbourne, Australia
O. S. Albahri
Computer Techniques Engineering Department, Mazaya University College, Nasiriyah, Iraq
O. S. Albahri
Technical College, Imam Ja’afar Al-Sadiq University, Baghdad, Iraq
A. S. Albahri
Iraqi Commission for Computers and Informatics (ICCI), Baghdad, Iraq
A. S. Albahri
Department of Computing, College of Computing and Informatics, Universiti Tenaga Nasional, Kajang, Malaysia
Salman Yussof
Department of Basic Sciences, Higher Technological Institute, Tenth of Ramadan City, Egypt
Iman Mohamad Sharaf

Authors

A. H. Alamoodi
View author publications
You can also search for this author in PubMed Google Scholar
Omar Zughoul
View author publications
You can also search for this author in PubMed Google Scholar
Dianese David
View author publications
You can also search for this author in PubMed Google Scholar
Salem Garfan
View author publications
You can also search for this author in PubMed Google Scholar
Dragan Pamucar
View author publications
You can also search for this author in PubMed Google Scholar
O. S. Albahri
View author publications
You can also search for this author in PubMed Google Scholar
A. S. Albahri
View author publications
You can also search for this author in PubMed Google Scholar
Salman Yussof
View author publications
You can also search for this author in PubMed Google Scholar
Iman Mohamad Sharaf
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.H. Alamoodi, Omar Zughoul, and Dianese David: Writing- Original draft preparation, Salem Garfan, and O.S. Albahri: Conceptualization, Methodology, Dragan Pamucar: Conceptualization, A.S. Albahri: Project Administration, Salman Yussof: Manuscript Revision, Iman Mohamad Sharaf: Writing—Review & Editing.

Corresponding author

Correspondence to A. H. Alamoodi.

Ethics declarations

Ethical Approval

No ethical approval is required for this study.

Ethics approval and consent to participate

All authors in the manuscript consent to participating.

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 28 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Alamoodi, A.H., Zughoul, O., David, D. et al. A Novel Evaluation Framework for Medical LLMs: Combining Fuzzy Logic and MCDM for Medical Relation and Clinical Concept Extraction. J Med Syst 48, 81 (2024). https://doi.org/10.1007/s10916-024-02090-y

Download citation

Received: 31 May 2024
Accepted: 22 July 2024
Published: 31 August 2024
DOI: https://doi.org/10.1007/s10916-024-02090-y

A Novel Evaluation Framework for Medical LLMs: Combining Fuzzy Logic and MCDM for Medical Relation and Clinical Concept Extraction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi Criteria Decision Making Under Fuzzy, Intuitionistic and Interval-Valued Intuitionistic Fuzzy Environment: A Review

FLINTSTONES 2.0 an Open and Comprehensive Fuzzy Tool for Multi-criteria Decision Analysis

New Methods for Comparing Interval-Valued Fuzzy Cardinal Numbers

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical Approval

Ethics approval and consent to participate

Competing Interests

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 28 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A Novel Evaluation Framework for Medical LLMs: Combining Fuzzy Logic and MCDM for Medical Relation and Clinical Concept Extraction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi Criteria Decision Making Under Fuzzy, Intuitionistic and Interval-Valued Intuitionistic Fuzzy Environment: A Review

FLINTSTONES 2.0 an Open and Comprehensive Fuzzy Tool for Multi-criteria Decision Analysis

New Methods for Comparing Interval-Valued Fuzzy Cardinal Numbers

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical Approval

Ethics approval and consent to participate

Competing Interests

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 28 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation