Abstract
Chemical-induced hematotoxicity is an important concern in the drug discovery, since it can often be fatal when it happens. It is quite useful for us to give special attention to chemicals which can cause hematotoxicity. In the present study, we focused on in silico prediction of chemical-induced hematotoxicity with machine learning (ML) and deep learning (DL) methods. We collected a large data set contained 632 hematotoxic chemicals and 1525 approved drugs without hematotoxicity. Computational models were built using several different machine learning and deep learning algorithms integrated on the Online Chemical Modeling Environment (OCHEM). Based on the three best individual models, a consensus model was developed. It yielded the prediction accuracy of 0.83 and balanced accuracy of 0.77 on external validation. The consensus model and the best individual model developed with random forest regression and classification algorithm (RFR) and QNPR descriptors were made available at https://ochem.eu/article/135149, respectively. The relevance of 8 commonly used molecular properties and chemical-induced hematotoxicity was also investigated. Several molecular properties have an obvious differentiating effect on chemical-induced hematotoxicity. Besides, 12 structural alerts responsible for chemical hematotoxicity were identified using frequency analysis of substructures from Klekota–Roth fingerprint. These results should provide meaningful knowledge and useful tools for hematotoxicity evaluation in drug discovery and environmental risk assessment.
Graphic abstract
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Rich IN (2003) In vitro hematotoxicity testing in drug development: a review of past, present and future applications. Curr Opin Drug Discov Devel 6(1):100–109
Budinsky RA Jr (2000) Hematotoxicity: chemically induced toxicity of the blood: principles of toxicology. Wiley, New York, pp 87–109
Cox A (2007) Recognition and management of drug-induced blood disorders. Prescriber 18(3):51–56. https://doi.org/10.1002/psb.22
Goto K, Goto M, Ando-Imaoka M et al (2017) Evaluation of drug-induced hematotoxicity using novel in vitro monkey CFU-GM and BFU-E colony assays. J Toxicol Sci 42(4):397–405. https://doi.org/10.2131/jts.42.397
Ng P, Belgur C, Barthakur S et al (2019) Organs-on-chips: a new paradigm for safety assessment of drug-induced thrombosis. Cur Opinion Toxicol 17:1–8. https://doi.org/10.1016/j.cotox.2019.08.004
Jiao Z, Hu P, Xu H et al (2020) Machine learning and deep learning in chemical health and safety: a systematic review of techniques and applications. ACS Chem Health Safety 27(6):316–334. https://doi.org/10.1021/acs.chas.0c00075
Vo AH, Van Vleet TR, Gupta RR et al (2020) An overview of machine learning and big data for drug toxicity evaluation. Chem Res Toxicol 33(1):20–37. https://doi.org/10.1021/acs.chemrestox.9b00227
Wang MWH, Goodman JM, Allen TEH (2021) Machine learning in predictive toxicology: recent applications and future directions for classification models. Chem Res Toxicol 34(2):217–239. https://doi.org/10.1021/acs.chemrestox.0c00316
Yang H, Lou C, Sun L et al (2019) admetSAR 2.0 web-service for prediction and optimization of chemical ADMET properties. Bioinformatics 35(6):1067–1069
Crivori P, Pennella G, Magistrelli M et al (2011) Predicting myelosuppression of drugs from in silico models. J Chem Inf Model 51(2):434–445. https://doi.org/10.1021/ci1003834
Zhang H, Yu P, Zhang T-G et al (2015) In silico prediction of drug-induced myelotoxicity by using Naïve Bayes method. Mol Diversity 19(4):945–953. https://doi.org/10.1007/s11030-015-9613-3
Kuhn M, Letunic I, Jensen LJ et al (2016) The SIDER database of drugs and side effects. Nucleic Acids Res 44(D1):D1075–D1079. https://doi.org/10.1093/nar/gkv1075
Tomasulo P (2002) ChemIDplus-super source for chemical and drug information. Med Ref Serv Q 21(1):53–59. https://doi.org/10.1300/J115v21n01_04
Wishart DS, Knox C, Guo AC et al (2008) DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucl Acids Res. https://doi.org/10.1093/nar/gkm958
Ancuceanu R, Dinu M, Neaga I et al (2019) Development of QSAR machine learning-based models to forecast the effect of substances on malignant melanoma cells. Oncol Lett 17(5):4188–4196. https://doi.org/10.3892/ol.2019.10068
Jolliffe IT, Cadima J (2016) Principal component analysis: a review and recent developments. Philos Trans A Math Phys Eng Sci 374(2065):20150202. https://doi.org/10.1098/rsta.2015.0202
Sushko I, Novotarskyi S, Körner R et al (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des 25(6):533–554. https://doi.org/10.1007/s10822-011-9440-2
Cui X, Liu J, Zhang J et al (2019) In silico prediction of drug-induced rhabdomyolysis with machine-learning models and structural alerts. J Appl Toxicol 39(8):1224–1232. https://doi.org/10.1002/jat.3808
Cui X, Yang R, Li S et al (2020) Modeling and insights into molecular basis of low molecular weight respiratory sensitizers. Mol Diversity. https://doi.org/10.1007/s11030-020-10069-3
Karpov P, Godin G, Tetko IV (2020) Transformer-CNN: Swiss knife for QSAR modeling and interpretation. J Cheminform 12(1):17. https://doi.org/10.1186/s13321-020-00423-w
Kovalishyn V, Abramenko N, Kopernyk I et al (2018) Modelling the toxicity of a large set of metal and metal oxide nanoparticles using the OCHEM platform. Food Chem Toxicol 112:507–517. https://doi.org/10.1016/j.fct.2017.08.008
Li X, Zhang Y, Li H et al (2017) Modeling of the hERG K+ channel blockage using online chemical database and modeling environment (OCHEM). Mol Inf 36(12):1700074. https://doi.org/10.1002/minf.201700074
Tetko IV (2008) Associative neural network. In: Clifton NJ (ed) Methods in molecular biology. Springer, Berlin
P Indyk, R Motwani, (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. Paper presented at the Proceedings of the thirtieth annual ACM symposium on Theory of computing, Dallas, Texas, USA https://doi.org/10.1145/276698.276876
Chang C-C, Lin C-J (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol. https://doi.org/10.1145/1961189.1961199
Chen T, Guestrin C (2016) XGBoost: A Scalable Tree Boosting System. Paper presented at the Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining, San Francisco, California, USA. https://doi.org/10.1145/2939672.2939785
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Wu Z, Ramsundar B, Feinberg Evan N et al (2018) Molecule Net: a benchmark for molecular machine learning. Chem Sci 9(2):513–530. https://doi.org/10.1039/C7SC02664A
Nogueira RF, Lotufo RdA, Machado RC (2016) Fingerprint liveness detection using convolutional neural networks. IEEE Trans Inf Forensics Secur 11(6):1206–1213. https://doi.org/10.1109/TIFS.2016.2520880
Hewitt M, Cronin MTD, Madden JC et al (2007) Consensus QSAR models: do the benefits outweigh the complexity? J Chem Inf Model 47(4):1460–1468. https://doi.org/10.1021/ci700016d
Lei T, Li Y, Song Y et al (2016) ADMET evaluation in drug discovery: 15 Accurate prediction of rat oral acute toxicity using relevance vector machine and consensus modeling. J Cheminformatics 8(1):6
Khan K, Benfenati E, Roy K (2019) Consensus QSAR modeling of toxicity of pharmaceuticals to different aquatic organisms: ranking and prioritization of the DrugBank database compounds. Ecotoxicol Environ Saf 168:287–297. https://doi.org/10.1016/j.ecoenv.2018.10.060
Valsecchi C, Grisoni F, Consonni V et al (2020) Consensus versus individual QSARs in classification: comparison on a large-scale case study. J Chem Inf Model 60(3):1215–1223. https://doi.org/10.1021/acs.jcim.9b01057
Abdelaziz A, Spahn-Langguth H, Schramm K-W et al (2016) Consensus modeling for HTS assays using In silico descriptors calculates the best balanced accuracy in Tox21 challenge. Front Environ Sci. https://doi.org/10.3389/fenvs.2016.00002
Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1):6. https://doi.org/10.1186/s12864-019-6413-7
Yap CW (2011) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32(7):1466–1474. https://doi.org/10.1002/jcc.21707
Li X, Zhang Y, Chen H et al (2017) Insights into the molecular basis of the acute contact toxicity of diverse organic chemicals in the honey bee. J Chem Inf Model 57(12):2948–2957. https://doi.org/10.1021/acs.jcim.7b00476
Li X, Zhang Y, Chen H et al (2017) In silico prediction of chronic toxicity with chemical category approaches. RSC Adv 7(66):41330–41338. https://doi.org/10.1039/C7RA08415C
Yang H, Lou C, Li W et al (2020) Computational approaches to identify structural alerts and their applications in environmental toxicology and drug discovery. Chem Res Toxicol 33(6):1312–1322. https://doi.org/10.1021/acs.chemrestox.0c00006
Klekota J, Roth FP (2008) Chemical substructures that enrich for biological activity. Bioinformatics 24(21):2518–2525. https://doi.org/10.1093/bioinformatics/btn479
Korkmaz S (2020) Deep learning-based imbalanced data classification for drug discovery. J Chem Inf Model 60(9):4180–4190. https://doi.org/10.1021/acs.jcim.9b01162
Jing XY, Zhang X, Zhu X et al (2021) Multiset feature learning for highly imbalanced data classification. IEEE Trans Pattern Anal Mach Intell 43(1):139–156. https://doi.org/10.1109/TPAMI.2019.2929166
Willighagen EL, Mayfield JW, Alvarsson J et al (2017) The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminformatics. https://doi.org/10.1186/s13321-017-0220-4
Ringnér M (2008) What is principal component analysis? Nat Biotechnol 26(3):303–304. https://doi.org/10.1038/nbt0308-303
Thormann M, Vidal D, Almstetter M et al (2007) Nomen est omen: quantitative prediction of molecular properties directly from IUPAC names. The Open Applied Informatics J. https://doi.org/10.2174/1874136300701010028
Xu P, Hu G, Luo C et al (2016) DNA methyltransferase inhibitors: an updated patent review (2012–2015). Expert Opin Ther Pat 26(9):1017–1030. https://doi.org/10.1080/13543776.2016.1209488
Goldstein RS, Rickert DE (1985) Relationship between red blood cell uptake and methemoglobin production by nitrobenzene and dinitrobenzene in vitro. Life Sci 36(2):121–125. https://doi.org/10.1016/0024-3205(85)90090-6
Carey PJ (2003) Drug-induced myelosuppression. Drug Saf 26(10):691–706. https://doi.org/10.2165/00002018-200326100-00003
Acknowledgements
This work was supported by the National Natural Science Foundationof China (grant 81803433). The authors gratefully acknowledge the encouragement and support from Miss Chaoyue Yang.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Hua, Y., Shi, Y., Cui, X. et al. In silico prediction of chemical-induced hematotoxicity with machine learning and deep learning methods. Mol Divers 25, 1585–1596 (2021). https://doi.org/10.1007/s11030-021-10255-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11030-021-10255-x