[go: up one dir, main page]

Skip to main content
Log in

DeepM6ASeq-EL: prediction of human N6-methyladenosine (m6A) sites with LSTM and ensemble learning

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

N6-methyladenosine (m6A) is a prevalent methylation modification and plays a vital role in various biological processes, such as metabolism, mRNA processing, synthesis, and transport. Recent studies have suggested that m6A modification is related to common diseases such as cancer, tumours, and obesity. Therefore, accurate prediction of methylation sites in RNA sequences has emerged as a critical issue in the area of bioinformatics. However, traditional high-throughput sequencing and wet bench experimental techniques have the disadvantages of high costs, significant time requirements and inaccurate identification of sites. But through the use of traditional experimental methods, researchers have produced many large databases of m6A sites. With the support of these basic databases and existing deep learning methods, we developed an m6A site predictor named DeepM6ASeq-EL, which integrates an ensemble of five LSTM and CNN classifiers with the combined strategy of hard voting. Compared to the state-of-the-art prediction method WHISTLE (average AUC 0.948 and 0.880), the DeepM6ASeq-EL had a lower accuracy in m6A site prediction (average AUC: 0.861 for the full transcript models and 0.809 for the mature messenger RNA models) when tested on six independent datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Dunn D, Smith J. Occurrence of a new base in the deoxyribonucleic acid of a strain of Bacterium coli. Nature, 1955, 175(4451): 336–337

    Article  Google Scholar 

  2. Adams J M, Cory S. Modified nucleosides and bizarre 5′-termini in mouse myeloma mRNA. Nature, 1975, 255(5503): 28–33

    Article  Google Scholar 

  3. Lichinchi G, Gao S, Saletore Y, Gonzalez G M, Bansal V, Wang Y, Mason C E, Rana T M. Dynamics of the human and viral m6A RNA methylomes during HIV-1 infection of T cells. Nature Microbiology, 2016, 1(4): 1–9

    Article  Google Scholar 

  4. Yin J, Sun W, Li F, Hong J, Li X, Zhou Y, Lu Y, Liu M, Zhang X, Chen N, Jin X, Xue J, Zeng S, Yu L, Zhu F. VARIDT 1.0: variability of drug transporter database. Nucleic Acids Res, 2020, 48(D1): D1042–D1050

    Article  Google Scholar 

  5. Meyer K D, Jaffrey S R. The dynamic epitranscriptome: N6-methyladenosine and gene expression control. Nature Reviews Molecular Cell Biology, 2014, 15(5): 313–326

    Article  Google Scholar 

  6. Tang J, Fu J, Wang Y, Luo Y, Yang Q, Li B, Tu G, Hong J, Cui X, Chen Y, Yao L, Xue W, Zhu F. Simultaneous improvement in the precision, accuracy, and robustness of label-free proteome quantification by optimizing data manipulation chains. Mol Cell Proteomics, 2019, 18(8): 1683–1699

    Article  Google Scholar 

  7. Cui Q, Shi H, Ye P, Li L, Qu Q, Sun G, Sun G, Lu Z, Huang Y, Yang C-G. m6A RNA methylation regulates the self-renewal and tumorigenesis of glioblastoma stem cells. Cell Reports, 2017, 18(11): 2622–2634

    Article  Google Scholar 

  8. Jia G, Fu Y, Zhao X, Dai Q, Zheng G, Yang Y, Yi C, Lindahl T, Pan T, Yang Y-G. N6-methyladenosine in nuclear RNA is a major substrate of the obesity-associated FTO. Nature Chemical Biology, 2011, 7(12): 885

    Article  Google Scholar 

  9. Fang S, Pan J, Zhou C, Tian H, He J, Shen W, Jin X, Meng X, Jiang N, Gong Z. Circular RNAs serve as novel biomarkers and therapeutic targets in cancers. Curr Gene Ther, 2019, 19(2): 125–133

    Article  Google Scholar 

  10. Feng Y M. Gene therapy on the road. Curr Gene Ther, 2019, 19(1): 6

    Article  Google Scholar 

  11. Cheng L, Yang H, Zhao H, Pei X, Shi H, Sun J, Zhang Y, Wang Z, Zhou M. MetSigDis: a manually curated resource for the metabolic signatures of diseases. Brief Bioinform, 2019, 20(1): 203–209

    Article  Google Scholar 

  12. Yang Q, Wang Y, Zhang Y, Li F, Xia W, Zhou Y, Qiu Y, Li H, Zhu F. NOREVA: enhanced normalization and evaluation of time-course and multi-class metabolomic data. Nucleic Acids Res, 2020

  13. Wang Y, Zhang S, Li F, Zhou Y, Zhang Y, Wang Z, Zhang R, Zhu J, Ren Y, Tan Y, Qin C, Li Y, Li X, Chen Y, Zhu F. Therapeutic target database 2020: enriched resource for facilitating research and early development of targeted therapeutics. Nucleic Acids Res, 2020, 48(D1): D1031–D1041

    Google Scholar 

  14. Li B, Tang J, Yang Q, Li S, Cui X, Li Y, Chen Y, Xue W, Li X, Zhu F. NOREVA: normalization and evaluation of MS-based metabolomics data. Nucleic Acids Res, 2017, 45(W1): W162–W170

    Article  Google Scholar 

  15. Liu B. BioSeq-Analysis: a platform for DNA, RNA, and protein sequence analysis based on machine learning approaches. Briefings in Bioinformatics, 2019, 20(4): 1280–1294

    Article  Google Scholar 

  16. Dominissini D, Moshitch-Moshkovitz S, Schwartz S, Salmon-Divon M, Ungar L, Osenberg S, Cesarkas K, Jacob-Hirsch J, Amariglio N, Kupiec M. Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature, 2012, 485(7397): 201–206

    Article  Google Scholar 

  17. Meyer K D, Saletore Y, Zumbo P, Elemento O, Mason C E, Jaffrey S R. Comprehensive analysis of mRNA methylation reveals enrichment in 3′ UTRs and near stop codons. Cell, 2012, 149(7): 1635–1646

    Article  Google Scholar 

  18. Linder B, Grozhik A V, Olarerin-George A O, Meydan C, Mason C E, Jaffrey S R. Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome. Nature Methods, 2015, 12(8): 767–772

    Article  Google Scholar 

  19. Li Y H, Li X X, Hong J J, Wang Y X, Fu J B, Yang H, Yu C Y, Li F C, Hu J, Xue W W, Jiang Y Y, Chen Y Z, Zhu F. Clinical trials, progression-speed differentiating features and swiftness rule of the innovative targets of first-in-class drugs. Brief Bioinform, 2020, 21(2): 649–662

    Article  Google Scholar 

  20. Xue W, Yang F, Wang P, Zheng G, Chen Y, Yao X, Zhu F. What contributes to serotonin-norepinephrine reuptake inhibitors’ dual-targeting mechanism? the key role of transmembrane domain 6 in human serotonin and norepinephrine transporters revealed by molecular dynamics simulation. ACS Chem Neurosci, 2018, 9(5): 1128–1140

    Article  Google Scholar 

  21. Chen W, Feng P, Ding H, Lin H, Chou K-C. iRNA-methyl: identifying N6-methyladenosine sites using pseudo nucleotide composition. Analytical Biochemistry, 2015, 490: 26–33

    Article  Google Scholar 

  22. Tang J, Fu J, Wang Y, Li B, Li Y, Yang Q, Cui X, Hong J, Li X, Chen Y, Xue W, Zhu F. ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies. Brief Bioinform, 2020, 21(2): 621–636

    Article  Google Scholar 

  23. Liu H, Wang H, Wei Z, Zhang S, Hua G, Zhang S-W, Zhang L, Gao S-J, Meng J, Chen X. MeT-DB V2.0: elucidating context-specific functions of N 6-methyl-adenosine methyltranscriptome. Nucleic Acids Research, 2018, 46(D1): D281–D287

    Article  Google Scholar 

  24. Xuan J-J, Sun W-J, Lin P-H, Zhou K-R, Liu S, Zheng L-L, Qu L-H, Yang J-H. RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Research, 2018, 46(D1): D327–D334

    Article  Google Scholar 

  25. Liu Z, Xiao X, Yu D-J, Jia J, Qiu W-R, Chou K-C. pRNAm-PC: predicting N6-methyladenosine sites in RNA sequences via physical-chemical properties. Analytical Biochemistry, 2016, 497: 60–67

    Article  Google Scholar 

  26. Zhang M, Sun J-W, Liu Z, Ren M-W, Shen H-B, Yu D-J. Improving N6-methyladenosine site prediction with heuristic selection of nucleotide physical-chemical properties. Analytical Biochemistry, 2016, 508: 104–113

    Article  Google Scholar 

  27. Zhou Y, Zeng P, Li Y-H, Zhang Z, Cui Q. SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features. Nucleic Acids Research, 2016, 44(10): e91–e91

    Article  Google Scholar 

  28. Chen W, Tang H, Lin H. MethyRNA: a web server for identification of N6-methyladenosine sites. Journal of Biomolecular Structure and Dynamics, 2017, 35(3): 683–687

    Article  Google Scholar 

  29. Fan C, Liu D, Huang R, Chen Z, Deng L. PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility. BMC Bioinformatics: BioMed Central, 2016, 8

  30. Wang H, Liu C, Deng L. Enhanced prediction of hot spots at protein-protein interfaces using extreme gradient boosting. Scientific Reports, 2018, 8(1): 14285

    Article  Google Scholar 

  31. Deng L, Li W, Zhang J. LDAH2V: exploring meta-paths across multiple networks for lncRNA-disease association prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2019

  32. Qiang X, Chen H, Ye X, Su R, Wei L. M6AMRFS: robust prediction of N6-methyladenosine sites with sequence-based features in multiple species. Frontiers in Genetics, 2018, 9: 495

    Article  Google Scholar 

  33. Wei L, Chen H, Su R. M6APred-EL: a sequence-based predictor for identifying N6-methyladenosine sites using ensemble learning. Molecular Therapy-Nucleic Acids, 2018, 12: 635–644

    Article  Google Scholar 

  34. Zhang Y, Hamada M. DeepM6ASeq: prediction and characterization of m6A-containing sequences using deep learning. BMC bioinformatics, 2018, 19(19): 524

    Article  Google Scholar 

  35. Zou Q, Xing P, Wei L, Liu B. Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA. Rna, 2019, 25(2): 205–218

    Article  Google Scholar 

  36. Chen K, Wei Z, Zhang Q, Wu X, Rong R, Lu Z, Su J, de Magalhaes J P, Rigden D J, Meng J. WHISTLE: a high-accuracy map of the human N6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach. Nucleic Acids Research, 2019, 47(7): e41–e41

    Article  Google Scholar 

  37. Liu K, Chen W. iMRM:a platform for simultaneously identifying multiple kinds of RNA modifications. Bioinformatics, 2020

  38. Vu L P, Pickering B F, Cheng Y, Zaccara S, Nguyen D, Minuesa G, Chou T, Chow A, Saletore Y, MacKay M. The N6-methyladenosine (m6A)-forming enzyme METTL3 controls myeloid differentiation of normal hematopoietic and leukemia cells. Nature Medicine, 2017, 23(11): 1369

    Article  Google Scholar 

  39. Ke S, Alemu E A, Mertens C, Gantman E C, Fak J J, Mele A, Haripal B, Zucker-Scharff I, Moore M J, Park C Y. A majority of m6A residues are in the last exons, allowing the potential for 3′ UTR regulation. Genes & Development, 2015, 29(19): 2037–2053

    Article  Google Scholar 

  40. Ke S, Pandya-Jones A, Saito Y, Fak J J, Vågbø C B, Geula S, Hanna J H, Black D L, Darnell J E, Darnell R B. m6A mRNA modifications are deposited in nascent pre-mRNA and are not required for splicing but do specify cytoplasmic turnover. Genes & Development, 2017, 31(10): 990–1006

    Article  Google Scholar 

  41. Dao F Y, Lv H, Zulfiqar H, Yang H, Su W, Gao H, Ding H, Lin H. A computational platform to identify origins of replication sites in eukaryotes. Brief Bioinform, 2020

  42. Lv H, Zhang Z M, Li S H, Tan J X, Chen W, Lin H. Evaluation of different computational methods on 5-methylcytosine sites identification. Briefings in Bioinformatics, 2019

  43. Li J W, Pu Y Q, Tang J J, Zou Q, Guo F. DeepAVP: a dual-channel deep neural network for identifying variable-length antiviral peptides. IEEE Journal of Biomedical and Health Informatics, 2020: 1–1

  44. Hong J, Luo Y, Zhang Y, Ying J, Xue W, Xie T, Tao L, Zhu F. Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning. Brief Bioinform, 2019

  45. Hong J, Luo Y, Mou M, Fu J, Zhang Y, Xue W, Xie T, Tao L, Lou Y, Zhu F. Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery. Brief Bioinform, 2019

  46. Li F, Zhou Y, Zhang X, Tang J, Yang Q, Zhang Y, Luo Y, Hu J, Xue W, Qiu Y, He Q, Yang B, Zhu F. SSizer: determining the sample sufficiency for comparative biological study. J Mol Biol, 2020

  47. Fang T, Zhang Z, Sun R, Zhu L, He J, Huang B, Xiong Y, Zhu X. RNAm5CPred: Prediction of RNA 5-methylcytosine sites based on three different kinds of nucleotide composition. Mol Ther Nucleic Acids, 2019, 18: 739–747

    Article  Google Scholar 

  48. He J, Fang T, Zhang Z, Huang B, Zhu X, Xiong Y. PseUI: Pseudouridine sites identification based on RNA sequence information. BMC Bioinformatics, 2018, 19(1): 306

    Article  Google Scholar 

  49. Liu B, Li C, Yan K. DeepSVM-fold: protein fold recognition by combining Support Vector Machines and pairwise sequence similarity scores generated by deep learning networks. Briefings in Bioinformatics, 2020, 21(5): 1733–1741

    Article  Google Scholar 

  50. Liu B, Gao X, Zhang H. BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA, and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Research, 2019, 47(20): e127

    Article  Google Scholar 

  51. Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, Ziller M J. Integrative analysis of 111 reference human epigenomes. Nature, 2015, 518(7539): 317–330

    Article  Google Scholar 

  52. Lv H, Dao F Y, Zhang D, Guan Z X, Yang H, Su W, Liu M L, Ding H, Chen W, Lin H. iDNA-MS: an integrated computational tool for detecting dna modification sites in multiple genomes. iScience, 2020, 23(4): 100991

    Article  Google Scholar 

  53. Wei L, Zou Q, Liao M, Lu H, Zhao Y. A novel machine learning method for cytokine-receptor interaction prediction. Combinatorial Chemistry & High Throughput Screening, 2016, 19(2): 144–152

    Article  Google Scholar 

  54. Wu B, Zhang H, Lin L, Wang H, Gao Y, Zhao L, Chen Y-P P, Chen R, Gu L. A similarity searching system for biological phenotype images using deep convolutional encoder-decoder architecture. Current Bioinformatics, 2019, 14(7): 628–639

    Article  Google Scholar 

  55. Lv Z B, Ao C Y, Zou Q. Protein function prediction: from traditional classifier to deep learning. Proteomics, 2019, 19(14): 2

    Google Scholar 

  56. Zhang J, Zhong B N, Wang P F, Wang C, Du J X. Robust feature learning for online discriminative tracking without large-scale pre-training. Frontiers of Computer Science, 2018, 12(6): 1160–1172

    Article  Google Scholar 

  57. Zhang Q J, Zhang L. Convolutional adaptive denoising autoencoders for hierarchical feature extraction. Frontiers of Computer Science, 2018, 12(6): 1140–1148

    Article  Google Scholar 

  58. Zheng N, Wang K, Zhan W, Deng L. Targeting virus-host protein interactions: feature extraction and machine learning approaches. Current Drug Metabolism, 2019, 20(3): 177–184

    Article  Google Scholar 

  59. Liu S, Liu C, Deng L. Machine learning approaches for protein-protein interaction hot spot prediction: progress and comparative assessment. Molecules, 2018, 23(10): 2535

    Article  Google Scholar 

  60. Liu B, Zhu Y. ProtDec-LTR3.0: protein remote homology detection by incorporating profile-based features into Learning to Rank. IEEE Access, 2019, 7: 102499–102507

    Article  Google Scholar 

  61. Zhang M, Li F, Marquez-Lago T T, Leier A, Fan C, Kwoh C K, Chou K-C, Song J, Jia C. MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics, 2019, 35(17): 2957–2965

    Article  Google Scholar 

  62. Yang W, Zhu X J, Huang J, Ding H, Lin H. A brief survey of machine learning methods in protein sub-Golgi localization. Current Bioinformatics, 2019, 14: 234–240

    Article  Google Scholar 

  63. Liu M L, Su W, Guan Z X, Zhang D, Chen W, Liu L, Ding H. An overview on predicting protein subchloroplast localization by using machine learning methods. Curr Protein Pept Sci, 2020

  64. Cheng L, Jiang Y, Ju H, Sun J, Peng J, Zhou M, Hu Y. InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk. BMC Genomics, 2018, 19(Suppl 1): 919

    Article  Google Scholar 

  65. Ding Y, Tang J, Guo F. Identification of drug-side effect association via multiple information integration with centered kernel alignment. Neurocomputing, 2019, 325: 211–224

    Article  Google Scholar 

  66. Zhu X, He J, Zhao S, Tao W, Xiong Y, Bi S. A comprehensive comparison and analysis of computational predictors for RNA N6-methyladenosine sites of Saccharomyces cerevisiae. Brief Funct Genomics, 2019, 18(6): 367–376

    Google Scholar 

  67. Shan X, Wang X, Li C D, Chu Y, Zhang Y, Xiong Y, Wei D Q. Prediction of CYP450 enzyme-substrate selectivity based on the network-based label space division method. J Chem Inf Model, 2019, 59(11): 4577–4586

    Article  Google Scholar 

  68. Chu Y, Kaushik A C, Wang X, Wang W, Zhang Y, Shan X, Salahub D R, Xiong Y, Wei D Q. DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features. Brief Bioinform, 2019

  69. Xu Q, Xiong Y, Dai H, Kumari K M, Xu Q, Ou H Y, Wei D Q. PDC-SGB: prediction of effective drug combinations using a stochastic gradient boosting algorithm. J Theor Biol, 2017, 417: 1–7

    Article  Google Scholar 

  70. Wei H, Liu B. iCircDA-MF: identification of circRNA-disease associations based on matrix factorization. Briefings In Bioinformatics

  71. Jia C, Zuo Y, Zou Q. O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique. Bioinformatics, 2018, 34(12): 2029–2036

    Article  Google Scholar 

  72. Chen W, Feng P, Liu T, Jin D. Recent advances in machine learning methods for predicting heat shock proteins. Curr Drug Metab, 2019, 20(3): 224–228

    Article  Google Scholar 

  73. Wang H, Ding Y, Tang J, Guo F. Identification of membrane protein types via multivariate information fusion with Hilbert-Schmidt independence criterion. Neurocomputing, 2020, 383: 257–269

    Article  Google Scholar 

  74. Wang B, Lu K, Zheng X, Su B, Zhou Y, Chen P, Zhang J. Early stage identification of Alzheimer’s disease using a two-stage ensemble classifier. Current Bioinformatics, 2018, 13(5): 529–535

    Article  Google Scholar 

  75. Li J, Wei L, Guo F, Zou Q. EP3: an ensemble predictor that accurately identifies type III secreted effectors. Briefings in Bioinformatics, 2021, 22(2): 1918–1928

    Article  Google Scholar 

  76. Ru X, Cao P, Li L, Zou Q. Selecting essential micrornas using a novel voting method. Molecular Therapy — Nucleic Acids, 2019, 18: 16–23

    Article  Google Scholar 

  77. Dong X B, Yu Z W, Cao W M, Shi Y F, Ma Q L. A survey on ensemble learning. Frontiers of Computer Science, 2020, 14(2): 241–258

    Article  Google Scholar 

  78. He Y Z, Alem E E, Wang W. Hybritus: a password strength checker by ensemble learning from the query feedbacks of websites. Frontiers of Computer Science, 2020, 14(3): 14

    Article  Google Scholar 

Download references

Acknowledgements

The work was supported by the National Natural Science Foundation of China (Grant Nos. 61922020, 61771331, 91935302).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quan Zou.

Additional information

Juntao Chen is a senior student in University of Electronic Science and Technology of China, China. His research is currently in sequence classification with deep learning.

Quan Zou received his BS, MS and PhD degrees in computer science from Harbin Institute of Technology, China, in 2004, 2007 and 2009, respectively. He is currently a professor in the Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, China. His research is in the areas of bioinformatics, machine learning and parallel computing. Several related works have been published by Science, Briefings in Bioinformatics, Bioinformatics, IEEE/ACM Transactions on Computational Biology and Bioinformatics, etc. He is the editor-in-chief of Current Bioinformatics, associate editor of IEEE Access, Frontiers in Genetics and Frontiers in Plant Science. He was selected as one of the Clarivate Analytics Highly Cited Researchers in 2018 and 2019.

Jing Li is a graduate student in Tianjin University, China. Her research interest is protein classification.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Zou, Q. & Li, J. DeepM6ASeq-EL: prediction of human N6-methyladenosine (m6A) sites with LSTM and ensemble learning. Front. Comput. Sci. 16, 162302 (2022). https://doi.org/10.1007/s11704-020-0180-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-020-0180-0

Keywords

Navigation