Abstract
Medical knowledge graph (MKG) provides ideal technical support for integrating multi-source heterogeneous data and enhancing graph-based services. These multi-source data are usually huge, heterogeneous, and difficult to manage. To ensure that the generated MKG has higher quality, the construction of MKG using these data requires a large number of medical experts to participate in the annotation based on their expertise. However, faced with such a large amount of data, manual annotation turns out to be a high labor cost task. In addition, the medical data are generated rapidly, which requires us to manage and annotate efficiently to keep up with the pace of data accumulation. Prior researches lacked efficient data management for massive medical data, and few studies focused on the construction of large-scale and high-quality MKG.
We propose a Medical Knowledge Graph Builder (MKGB) based on Data Lake and active learning, which is used to solve the problems mentioned above. There are four modules in MKGB, data acquiring module, data management framework module based on Data Lake, active learning module for reducing labor cost and MKG construction module. With the efficient management for extensive medical data in data management framework based on Data Lake, MKGB uses active learning based on doctor-in-the-loop idea to reduce the labor cost of annotation process, while ensuring the quality of annotation and enabling the construction of large-scale and high-quality MKG. Based on the efficient data management, we demonstrate that our approach significantly reduces the cost of manual annotation and generates more reliable MKG.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Zhang, Y., et al.: HKGB: an inclusive, extensible, intelligent, semi-auto-constructed knowledge graph framework for healthcare with clinicians’ expertise incorporated. Inf. Process. Manage. 57(6), 102324 (2020)
Huang, Z., Yang, J., van Harmelen, F., Hu, Q.: Constructing knowledge graphs of depression. In: Siuly, S., et al. (eds.) HIS 2017. LNCS, vol. 10594, pp. 149–161. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69182-4_16
Mitchell, J., Naddaf, R., Davenport, S.: A medical microcomputer database management system. Methods Inf. Med. 24(2), 73–78 (1985)
Mohamad, B., Orazio, L., Gruenwald, L.: Towards a hybrid row-column database for a cloud-based medical data management system. In: Cloud-I, pp. 1–4 (2012)
Sebaa, A., et al.: Medical big data warehouse: architecture and system design, a case study: improving healthcare resources distribution. J. Med. Syst. 42, 59 (2018)
Garani, G., Adam, G.K.: A semantic trajectory data warehouse for improving nursing productivity. Health Inf. Sci. Syst. 8(1), 1–13 (2020). https://doi.org/10.1007/s13755-020-00117-5
Hanisch, D., et al.: ProMiner: rule-based protein and gene entity recognition. BMC. Bioinform. 6(1), S14 (2005)
Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: ACL, pp. 104–107 (2004)
Zeng G., Zhang C., Bo X., et al: CRFS-based Chinese named entity recognition with improved tag set. In: CSIE, pp. 519–522 (2009)
Huang Z., Wei X., Kai Y.: Bidirectional LSTM-CRF models for sequence tagging. Comput. Sci. (2015)
Konyushkova, K., Sznitman, R., Fua, P.: Geometry in active learning for binary and multi-class image segmentation. Comput. Vis. Image Underst. 182, 1–16 (2019)
Smailagic, A., et al.: O-MedAL online active deep learning for medical image analysis. Wiley. Interdiscip. Rev. Data. Mining. Knowl. Discov. 10(4), e1353 (2020)
Sheng, M., et al.: AHIAP: an agile medical named entity recognition and relation extraction framework based on active learning. In: Huang, Z., Siuly, S., Wang, H., Zhou, R., Zhang, Y. (eds.) HIS 2020. LNCS, vol. 12435, pp. 68–75. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61951-0_7
Carvallo, A., Parra, D., Lobel, H., Soto, A.: Automatic document screening of medical literature using word and text embeddings in an active learning setting. Scientometrics 125(3), 3047–3084 (2020). https://doi.org/10.1007/s11192-020-03648-6
Li, X., Liu, H., Zhao, X., Zhang, G., Xing, C.: Automatic approach for constructing a knowledge graph of knee osteoarthritis in Chinese. Health Inf. Sci. Syst. 8(1), 1–8 (2020). https://doi.org/10.1007/s13755-020-0102-4
Chen, I., et al.: Robustly extracting medical knowledge from EHRs: a case study of learning a health knowledge graph. In: PSB, pp. 19–30 (2019)
Dixon, J.: Pentaho, Hadoop, and data lakes (2015). https://jamesdixon.woedpress.com/2010/10/14pentaho-hadoop-and-data-lakes/. Accessed 15 June 2021
Mesterhazy, J., Olson, G., Datta, S.: High performance on-demand de-identification of a petabyte-scale medical imaging data lake (2020). arXiv preprint: https://arxiv.org/abs/2008.01827
Bozena, M., Marek, S., Dariusz, M.: Soft and declarative fishing of information in big data lake. IEEE Trans. Fuzzy Syst. 26(5), 2732–2747 (2018)
Alhgaish, A., et al.: Preserve quality medical drug data toward meaningful data lake by cluster. Int. J. Recent Technol. Eng. 8(3), 270–277 (2019)
Kachaoui, J., Larioui, J., Belangour, A.: Towards an ontology proposal model in data lake for real-time COVID-19 cases prevention. Int. J. Online Biomed. Eng. 16(9), 123–136 (2020)
Nath, V., et al.: Diminishing uncertainty within the training pool: active learning for medical image segmentation (2021). arXiv preprint arXiv: https://arxiv.org/abs/2101.02323
Zhang, R., Yu, Y., Zhang, C.: SeqMix: augmenting active sequence labeling via sequence mixup. In: EMNLP, pp. 8566–8579 (2020)
Acknowledgements
This work was supported by National Key R&D Program of China (2020AAA0109603), and Institute of Precision Medicine, Tsinghua University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ren, P., Hou, W., Sheng, M., Li, X., Li, C., Zhang, Y. (2021). MKGB: A Medical Knowledge Graph Construction Framework Based on Data Lake and Active Learning. In: Siuly, S., Wang, H., Chen, L., Guo, Y., Xing, C. (eds) Health Information Science. HIS 2021. Lecture Notes in Computer Science(), vol 13079. Springer, Cham. https://doi.org/10.1007/978-3-030-90885-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-90885-0_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90884-3
Online ISBN: 978-3-030-90885-0
eBook Packages: Computer ScienceComputer Science (R0)