[go: up one dir, main page]

Skip to main content

MKGB: A Medical Knowledge Graph Construction Framework Based on Data Lake and Active Learning

  • Conference paper
  • First Online:
Health Information Science (HIS 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13079))

Included in the following conference series:

  • 1052 Accesses

Abstract

Medical knowledge graph (MKG) provides ideal technical support for integrating multi-source heterogeneous data and enhancing graph-based services. These multi-source data are usually huge, heterogeneous, and difficult to manage. To ensure that the generated MKG has higher quality, the construction of MKG using these data requires a large number of medical experts to participate in the annotation based on their expertise. However, faced with such a large amount of data, manual annotation turns out to be a high labor cost task. In addition, the medical data are generated rapidly, which requires us to manage and annotate efficiently to keep up with the pace of data accumulation. Prior researches lacked efficient data management for massive medical data, and few studies focused on the construction of large-scale and high-quality MKG.

We propose a Medical Knowledge Graph Builder (MKGB) based on Data Lake and active learning, which is used to solve the problems mentioned above. There are four modules in MKGB, data acquiring module, data management framework module based on Data Lake, active learning module for reducing labor cost and MKG construction module. With the efficient management for extensive medical data in data management framework based on Data Lake, MKGB uses active learning based on doctor-in-the-loop idea to reduce the labor cost of annotation process, while ensuring the quality of annotation and enabling the construction of large-scale and high-quality MKG. Based on the efficient data management, we demonstrate that our approach significantly reduces the cost of manual annotation and generates more reliable MKG.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.btch.edu.cn/.

References

  1. Zhang, Y., et al.: HKGB: an inclusive, extensible, intelligent, semi-auto-constructed knowledge graph framework for healthcare with clinicians’ expertise incorporated. Inf. Process. Manage. 57(6), 102324 (2020)

    Article  Google Scholar 

  2. Huang, Z., Yang, J., van Harmelen, F., Hu, Q.: Constructing knowledge graphs of depression. In: Siuly, S., et al. (eds.) HIS 2017. LNCS, vol. 10594, pp. 149–161. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69182-4_16

    Chapter  Google Scholar 

  3. Mitchell, J., Naddaf, R., Davenport, S.: A medical microcomputer database management system. Methods Inf. Med. 24(2), 73–78 (1985)

    Article  Google Scholar 

  4. Mohamad, B., Orazio, L., Gruenwald, L.: Towards a hybrid row-column database for a cloud-based medical data management system. In: Cloud-I, pp. 1–4 (2012)

    Google Scholar 

  5. Sebaa, A., et al.: Medical big data warehouse: architecture and system design, a case study: improving healthcare resources distribution. J. Med. Syst. 42, 59 (2018)

    Article  Google Scholar 

  6. Garani, G., Adam, G.K.: A semantic trajectory data warehouse for improving nursing productivity. Health Inf. Sci. Syst. 8(1), 1–13 (2020). https://doi.org/10.1007/s13755-020-00117-5

    Article  Google Scholar 

  7. Hanisch, D., et al.: ProMiner: rule-based protein and gene entity recognition. BMC. Bioinform. 6(1), S14 (2005)

    Article  Google Scholar 

  8. Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: ACL, pp. 104–107 (2004)

    Google Scholar 

  9. Zeng G., Zhang C., Bo X., et al: CRFS-based Chinese named entity recognition with improved tag set. In: CSIE, pp. 519–522 (2009)

    Google Scholar 

  10. Huang Z., Wei X., Kai Y.: Bidirectional LSTM-CRF models for sequence tagging. Comput. Sci. (2015)

    Google Scholar 

  11. Konyushkova, K., Sznitman, R., Fua, P.: Geometry in active learning for binary and multi-class image segmentation. Comput. Vis. Image Underst. 182, 1–16 (2019)

    Article  Google Scholar 

  12. Smailagic, A., et al.: O-MedAL online active deep learning for medical image analysis. Wiley. Interdiscip. Rev. Data. Mining. Knowl. Discov. 10(4), e1353 (2020)

    Article  Google Scholar 

  13. Sheng, M., et al.: AHIAP: an agile medical named entity recognition and relation extraction framework based on active learning. In: Huang, Z., Siuly, S., Wang, H., Zhou, R., Zhang, Y. (eds.) HIS 2020. LNCS, vol. 12435, pp. 68–75. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61951-0_7

    Chapter  Google Scholar 

  14. Carvallo, A., Parra, D., Lobel, H., Soto, A.: Automatic document screening of medical literature using word and text embeddings in an active learning setting. Scientometrics 125(3), 3047–3084 (2020). https://doi.org/10.1007/s11192-020-03648-6

    Article  Google Scholar 

  15. Li, X., Liu, H., Zhao, X., Zhang, G., Xing, C.: Automatic approach for constructing a knowledge graph of knee osteoarthritis in Chinese. Health Inf. Sci. Syst. 8(1), 1–8 (2020). https://doi.org/10.1007/s13755-020-0102-4

    Article  Google Scholar 

  16. Chen, I., et al.: Robustly extracting medical knowledge from EHRs: a case study of learning a health knowledge graph. In: PSB, pp. 19–30 (2019)

    Google Scholar 

  17. Dixon, J.: Pentaho, Hadoop, and data lakes (2015). https://jamesdixon.woedpress.com/2010/10/14pentaho-hadoop-and-data-lakes/. Accessed 15 June 2021

  18. Mesterhazy, J., Olson, G., Datta, S.: High performance on-demand de-identification of a petabyte-scale medical imaging data lake (2020). arXiv preprint: https://arxiv.org/abs/2008.01827

  19. Bozena, M., Marek, S., Dariusz, M.: Soft and declarative fishing of information in big data lake. IEEE Trans. Fuzzy Syst. 26(5), 2732–2747 (2018)

    Article  Google Scholar 

  20. Alhgaish, A., et al.: Preserve quality medical drug data toward meaningful data lake by cluster. Int. J. Recent Technol. Eng. 8(3), 270–277 (2019)

    Google Scholar 

  21. Kachaoui, J., Larioui, J., Belangour, A.: Towards an ontology proposal model in data lake for real-time COVID-19 cases prevention. Int. J. Online Biomed. Eng. 16(9), 123–136 (2020)

    Article  Google Scholar 

  22. Nath, V., et al.: Diminishing uncertainty within the training pool: active learning for medical image segmentation (2021). arXiv preprint arXiv: https://arxiv.org/abs/2101.02323

  23. Zhang, R., Yu, Y., Zhang, C.: SeqMix: augmenting active sequence labeling via sequence mixup. In: EMNLP, pp. 8566–8579 (2020)

    Google Scholar 

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (2020AAA0109603), and Institute of Precision Medicine, Tsinghua University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Hou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ren, P., Hou, W., Sheng, M., Li, X., Li, C., Zhang, Y. (2021). MKGB: A Medical Knowledge Graph Construction Framework Based on Data Lake and Active Learning. In: Siuly, S., Wang, H., Chen, L., Guo, Y., Xing, C. (eds) Health Information Science. HIS 2021. Lecture Notes in Computer Science(), vol 13079. Springer, Cham. https://doi.org/10.1007/978-3-030-90885-0_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-90885-0_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-90884-3

  • Online ISBN: 978-3-030-90885-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics