DeepM6ASeq-EL: prediction of human N6-methyladenosine (m6A) sites with LSTM and ensemble learning

Juntao Chen¹,
Quan Zou^1,2 &
Jing Li³

305 Accesses
74 Citations
1 Altmetric
Explore all metrics

Abstract

N6-methyladenosine (m⁶A) is a prevalent methylation modification and plays a vital role in various biological processes, such as metabolism, mRNA processing, synthesis, and transport. Recent studies have suggested that m⁶A modification is related to common diseases such as cancer, tumours, and obesity. Therefore, accurate prediction of methylation sites in RNA sequences has emerged as a critical issue in the area of bioinformatics. However, traditional high-throughput sequencing and wet bench experimental techniques have the disadvantages of high costs, significant time requirements and inaccurate identification of sites. But through the use of traditional experimental methods, researchers have produced many large databases of m⁶A sites. With the support of these basic databases and existing deep learning methods, we developed an m⁶A site predictor named DeepM6ASeq-EL, which integrates an ensemble of five LSTM and CNN classifiers with the combined strategy of hard voting. Compared to the state-of-the-art prediction method WHISTLE (average AUC 0.948 and 0.880), the DeepM6ASeq-EL had a lower accuracy in m⁶A site prediction (average AUC: 0.861 for the full transcript models and 0.809 for the mature messenger RNA models) when tested on six independent datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EDLm⁶APred: ensemble deep learning approach for mRNA m⁶A site prediction

Article Open access 29 May 2021

EMDL_m6Am: identifying N6,2′-O-dimethyladenosine sites based on stacking ensemble deep learning

Article Open access 25 October 2023

DeepM6ASeq: prediction and characterization of m6A-containing sequences using deep learning

Article Open access 31 December 2018

References

Dunn D, Smith J. Occurrence of a new base in the deoxyribonucleic acid of a strain of Bacterium coli. Nature, 1955, 175(4451): 336–337
Article Google Scholar
Adams J M, Cory S. Modified nucleosides and bizarre 5′-termini in mouse myeloma mRNA. Nature, 1975, 255(5503): 28–33
Article Google Scholar
Lichinchi G, Gao S, Saletore Y, Gonzalez G M, Bansal V, Wang Y, Mason C E, Rana T M. Dynamics of the human and viral m⁶A RNA methylomes during HIV-1 infection of T cells. Nature Microbiology, 2016, 1(4): 1–9
Article Google Scholar
Yin J, Sun W, Li F, Hong J, Li X, Zhou Y, Lu Y, Liu M, Zhang X, Chen N, Jin X, Xue J, Zeng S, Yu L, Zhu F. VARIDT 1.0: variability of drug transporter database. Nucleic Acids Res, 2020, 48(D1): D1042–D1050
Article Google Scholar
Meyer K D, Jaffrey S R. The dynamic epitranscriptome: N6-methyladenosine and gene expression control. Nature Reviews Molecular Cell Biology, 2014, 15(5): 313–326
Article Google Scholar
Tang J, Fu J, Wang Y, Luo Y, Yang Q, Li B, Tu G, Hong J, Cui X, Chen Y, Yao L, Xue W, Zhu F. Simultaneous improvement in the precision, accuracy, and robustness of label-free proteome quantification by optimizing data manipulation chains. Mol Cell Proteomics, 2019, 18(8): 1683–1699
Article Google Scholar
Cui Q, Shi H, Ye P, Li L, Qu Q, Sun G, Sun G, Lu Z, Huang Y, Yang C-G. m⁶A RNA methylation regulates the self-renewal and tumorigenesis of glioblastoma stem cells. Cell Reports, 2017, 18(11): 2622–2634
Article Google Scholar
Jia G, Fu Y, Zhao X, Dai Q, Zheng G, Yang Y, Yi C, Lindahl T, Pan T, Yang Y-G. N6-methyladenosine in nuclear RNA is a major substrate of the obesity-associated FTO. Nature Chemical Biology, 2011, 7(12): 885
Article Google Scholar
Fang S, Pan J, Zhou C, Tian H, He J, Shen W, Jin X, Meng X, Jiang N, Gong Z. Circular RNAs serve as novel biomarkers and therapeutic targets in cancers. Curr Gene Ther, 2019, 19(2): 125–133
Article Google Scholar
Feng Y M. Gene therapy on the road. Curr Gene Ther, 2019, 19(1): 6
Article Google Scholar
Cheng L, Yang H, Zhao H, Pei X, Shi H, Sun J, Zhang Y, Wang Z, Zhou M. MetSigDis: a manually curated resource for the metabolic signatures of diseases. Brief Bioinform, 2019, 20(1): 203–209
Article Google Scholar
Yang Q, Wang Y, Zhang Y, Li F, Xia W, Zhou Y, Qiu Y, Li H, Zhu F. NOREVA: enhanced normalization and evaluation of time-course and multi-class metabolomic data. Nucleic Acids Res, 2020
Wang Y, Zhang S, Li F, Zhou Y, Zhang Y, Wang Z, Zhang R, Zhu J, Ren Y, Tan Y, Qin C, Li Y, Li X, Chen Y, Zhu F. Therapeutic target database 2020: enriched resource for facilitating research and early development of targeted therapeutics. Nucleic Acids Res, 2020, 48(D1): D1031–D1041
Google Scholar
Li B, Tang J, Yang Q, Li S, Cui X, Li Y, Chen Y, Xue W, Li X, Zhu F. NOREVA: normalization and evaluation of MS-based metabolomics data. Nucleic Acids Res, 2017, 45(W1): W162–W170
Article Google Scholar
Liu B. BioSeq-Analysis: a platform for DNA, RNA, and protein sequence analysis based on machine learning approaches. Briefings in Bioinformatics, 2019, 20(4): 1280–1294
Article Google Scholar
Dominissini D, Moshitch-Moshkovitz S, Schwartz S, Salmon-Divon M, Ungar L, Osenberg S, Cesarkas K, Jacob-Hirsch J, Amariglio N, Kupiec M. Topology of the human and mouse m⁶A RNA methylomes revealed by m⁶A-seq. Nature, 2012, 485(7397): 201–206
Article Google Scholar
Meyer K D, Saletore Y, Zumbo P, Elemento O, Mason C E, Jaffrey S R. Comprehensive analysis of mRNA methylation reveals enrichment in 3′ UTRs and near stop codons. Cell, 2012, 149(7): 1635–1646
Article Google Scholar
Linder B, Grozhik A V, Olarerin-George A O, Meydan C, Mason C E, Jaffrey S R. Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome. Nature Methods, 2015, 12(8): 767–772
Article Google Scholar
Li Y H, Li X X, Hong J J, Wang Y X, Fu J B, Yang H, Yu C Y, Li F C, Hu J, Xue W W, Jiang Y Y, Chen Y Z, Zhu F. Clinical trials, progression-speed differentiating features and swiftness rule of the innovative targets of first-in-class drugs. Brief Bioinform, 2020, 21(2): 649–662
Article Google Scholar
Xue W, Yang F, Wang P, Zheng G, Chen Y, Yao X, Zhu F. What contributes to serotonin-norepinephrine reuptake inhibitors’ dual-targeting mechanism? the key role of transmembrane domain 6 in human serotonin and norepinephrine transporters revealed by molecular dynamics simulation. ACS Chem Neurosci, 2018, 9(5): 1128–1140
Article Google Scholar
Chen W, Feng P, Ding H, Lin H, Chou K-C. iRNA-methyl: identifying N6-methyladenosine sites using pseudo nucleotide composition. Analytical Biochemistry, 2015, 490: 26–33
Article Google Scholar
Tang J, Fu J, Wang Y, Li B, Li Y, Yang Q, Cui X, Hong J, Li X, Chen Y, Xue W, Zhu F. ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies. Brief Bioinform, 2020, 21(2): 621–636
Article Google Scholar
Liu H, Wang H, Wei Z, Zhang S, Hua G, Zhang S-W, Zhang L, Gao S-J, Meng J, Chen X. MeT-DB V2.0: elucidating context-specific functions of N 6-methyl-adenosine methyltranscriptome. Nucleic Acids Research, 2018, 46(D1): D281–D287
Article Google Scholar
Xuan J-J, Sun W-J, Lin P-H, Zhou K-R, Liu S, Zheng L-L, Qu L-H, Yang J-H. RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Research, 2018, 46(D1): D327–D334
Article Google Scholar
Liu Z, Xiao X, Yu D-J, Jia J, Qiu W-R, Chou K-C. pRNAm-PC: predicting N6-methyladenosine sites in RNA sequences via physical-chemical properties. Analytical Biochemistry, 2016, 497: 60–67
Article Google Scholar
Zhang M, Sun J-W, Liu Z, Ren M-W, Shen H-B, Yu D-J. Improving N6-methyladenosine site prediction with heuristic selection of nucleotide physical-chemical properties. Analytical Biochemistry, 2016, 508: 104–113
Article Google Scholar
Zhou Y, Zeng P, Li Y-H, Zhang Z, Cui Q. SRAMP: prediction of mammalian N6-methyladenosine (m⁶A) sites based on sequence-derived features. Nucleic Acids Research, 2016, 44(10): e91–e91
Article Google Scholar
Chen W, Tang H, Lin H. MethyRNA: a web server for identification of N6-methyladenosine sites. Journal of Biomolecular Structure and Dynamics, 2017, 35(3): 683–687
Article Google Scholar
Fan C, Liu D, Huang R, Chen Z, Deng L. PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility. BMC Bioinformatics: BioMed Central, 2016, 8
Wang H, Liu C, Deng L. Enhanced prediction of hot spots at protein-protein interfaces using extreme gradient boosting. Scientific Reports, 2018, 8(1): 14285
Article Google Scholar
Deng L, Li W, Zhang J. LDAH2V: exploring meta-paths across multiple networks for lncRNA-disease association prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2019
Qiang X, Chen H, Ye X, Su R, Wei L. M6AMRFS: robust prediction of N6-methyladenosine sites with sequence-based features in multiple species. Frontiers in Genetics, 2018, 9: 495
Article Google Scholar
Wei L, Chen H, Su R. M6APred-EL: a sequence-based predictor for identifying N6-methyladenosine sites using ensemble learning. Molecular Therapy-Nucleic Acids, 2018, 12: 635–644
Article Google Scholar
Zhang Y, Hamada M. DeepM6ASeq: prediction and characterization of m6A-containing sequences using deep learning. BMC bioinformatics, 2018, 19(19): 524
Article Google Scholar
Zou Q, Xing P, Wei L, Liu B. Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA. Rna, 2019, 25(2): 205–218
Article Google Scholar
Chen K, Wei Z, Zhang Q, Wu X, Rong R, Lu Z, Su J, de Magalhaes J P, Rigden D J, Meng J. WHISTLE: a high-accuracy map of the human N6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach. Nucleic Acids Research, 2019, 47(7): e41–e41
Article Google Scholar
Liu K, Chen W. iMRM:a platform for simultaneously identifying multiple kinds of RNA modifications. Bioinformatics, 2020
Vu L P, Pickering B F, Cheng Y, Zaccara S, Nguyen D, Minuesa G, Chou T, Chow A, Saletore Y, MacKay M. The N6-methyladenosine (m⁶A)-forming enzyme METTL3 controls myeloid differentiation of normal hematopoietic and leukemia cells. Nature Medicine, 2017, 23(11): 1369
Article Google Scholar
Ke S, Alemu E A, Mertens C, Gantman E C, Fak J J, Mele A, Haripal B, Zucker-Scharff I, Moore M J, Park C Y. A majority of m6A residues are in the last exons, allowing the potential for 3′ UTR regulation. Genes & Development, 2015, 29(19): 2037–2053
Article Google Scholar
Ke S, Pandya-Jones A, Saito Y, Fak J J, Vågbø C B, Geula S, Hanna J H, Black D L, Darnell J E, Darnell R B. m6A mRNA modifications are deposited in nascent pre-mRNA and are not required for splicing but do specify cytoplasmic turnover. Genes & Development, 2017, 31(10): 990–1006
Article Google Scholar
Dao F Y, Lv H, Zulfiqar H, Yang H, Su W, Gao H, Ding H, Lin H. A computational platform to identify origins of replication sites in eukaryotes. Brief Bioinform, 2020
Lv H, Zhang Z M, Li S H, Tan J X, Chen W, Lin H. Evaluation of different computational methods on 5-methylcytosine sites identification. Briefings in Bioinformatics, 2019
Li J W, Pu Y Q, Tang J J, Zou Q, Guo F. DeepAVP: a dual-channel deep neural network for identifying variable-length antiviral peptides. IEEE Journal of Biomedical and Health Informatics, 2020: 1–1
Hong J, Luo Y, Zhang Y, Ying J, Xue W, Xie T, Tao L, Zhu F. Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning. Brief Bioinform, 2019
Hong J, Luo Y, Mou M, Fu J, Zhang Y, Xue W, Xie T, Tao L, Lou Y, Zhu F. Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery. Brief Bioinform, 2019
Li F, Zhou Y, Zhang X, Tang J, Yang Q, Zhang Y, Luo Y, Hu J, Xue W, Qiu Y, He Q, Yang B, Zhu F. SSizer: determining the sample sufficiency for comparative biological study. J Mol Biol, 2020
Fang T, Zhang Z, Sun R, Zhu L, He J, Huang B, Xiong Y, Zhu X. RNAm5CPred: Prediction of RNA 5-methylcytosine sites based on three different kinds of nucleotide composition. Mol Ther Nucleic Acids, 2019, 18: 739–747
Article Google Scholar
He J, Fang T, Zhang Z, Huang B, Zhu X, Xiong Y. PseUI: Pseudouridine sites identification based on RNA sequence information. BMC Bioinformatics, 2018, 19(1): 306
Article Google Scholar
Liu B, Li C, Yan K. DeepSVM-fold: protein fold recognition by combining Support Vector Machines and pairwise sequence similarity scores generated by deep learning networks. Briefings in Bioinformatics, 2020, 21(5): 1733–1741
Article Google Scholar
Liu B, Gao X, Zhang H. BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA, and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Research, 2019, 47(20): e127
Article Google Scholar
Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, Ziller M J. Integrative analysis of 111 reference human epigenomes. Nature, 2015, 518(7539): 317–330
Article Google Scholar
Lv H, Dao F Y, Zhang D, Guan Z X, Yang H, Su W, Liu M L, Ding H, Chen W, Lin H. iDNA-MS: an integrated computational tool for detecting dna modification sites in multiple genomes. iScience, 2020, 23(4): 100991
Article Google Scholar
Wei L, Zou Q, Liao M, Lu H, Zhao Y. A novel machine learning method for cytokine-receptor interaction prediction. Combinatorial Chemistry & High Throughput Screening, 2016, 19(2): 144–152
Article Google Scholar
Wu B, Zhang H, Lin L, Wang H, Gao Y, Zhao L, Chen Y-P P, Chen R, Gu L. A similarity searching system for biological phenotype images using deep convolutional encoder-decoder architecture. Current Bioinformatics, 2019, 14(7): 628–639
Article Google Scholar
Lv Z B, Ao C Y, Zou Q. Protein function prediction: from traditional classifier to deep learning. Proteomics, 2019, 19(14): 2
Google Scholar
Zhang J, Zhong B N, Wang P F, Wang C, Du J X. Robust feature learning for online discriminative tracking without large-scale pre-training. Frontiers of Computer Science, 2018, 12(6): 1160–1172
Article Google Scholar
Zhang Q J, Zhang L. Convolutional adaptive denoising autoencoders for hierarchical feature extraction. Frontiers of Computer Science, 2018, 12(6): 1140–1148
Article Google Scholar
Zheng N, Wang K, Zhan W, Deng L. Targeting virus-host protein interactions: feature extraction and machine learning approaches. Current Drug Metabolism, 2019, 20(3): 177–184
Article Google Scholar
Liu S, Liu C, Deng L. Machine learning approaches for protein-protein interaction hot spot prediction: progress and comparative assessment. Molecules, 2018, 23(10): 2535
Article Google Scholar
Liu B, Zhu Y. ProtDec-LTR3.0: protein remote homology detection by incorporating profile-based features into Learning to Rank. IEEE Access, 2019, 7: 102499–102507
Article Google Scholar
Zhang M, Li F, Marquez-Lago T T, Leier A, Fan C, Kwoh C K, Chou K-C, Song J, Jia C. MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics, 2019, 35(17): 2957–2965
Article Google Scholar
Yang W, Zhu X J, Huang J, Ding H, Lin H. A brief survey of machine learning methods in protein sub-Golgi localization. Current Bioinformatics, 2019, 14: 234–240
Article Google Scholar
Liu M L, Su W, Guan Z X, Zhang D, Chen W, Liu L, Ding H. An overview on predicting protein subchloroplast localization by using machine learning methods. Curr Protein Pept Sci, 2020
Cheng L, Jiang Y, Ju H, Sun J, Peng J, Zhou M, Hu Y. InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk. BMC Genomics, 2018, 19(Suppl 1): 919
Article Google Scholar
Ding Y, Tang J, Guo F. Identification of drug-side effect association via multiple information integration with centered kernel alignment. Neurocomputing, 2019, 325: 211–224
Article Google Scholar
Zhu X, He J, Zhao S, Tao W, Xiong Y, Bi S. A comprehensive comparison and analysis of computational predictors for RNA N6-methyladenosine sites of Saccharomyces cerevisiae. Brief Funct Genomics, 2019, 18(6): 367–376
Google Scholar
Shan X, Wang X, Li C D, Chu Y, Zhang Y, Xiong Y, Wei D Q. Prediction of CYP450 enzyme-substrate selectivity based on the network-based label space division method. J Chem Inf Model, 2019, 59(11): 4577–4586
Article Google Scholar
Chu Y, Kaushik A C, Wang X, Wang W, Zhang Y, Shan X, Salahub D R, Xiong Y, Wei D Q. DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features. Brief Bioinform, 2019
Xu Q, Xiong Y, Dai H, Kumari K M, Xu Q, Ou H Y, Wei D Q. PDC-SGB: prediction of effective drug combinations using a stochastic gradient boosting algorithm. J Theor Biol, 2017, 417: 1–7
Article Google Scholar
Wei H, Liu B. iCircDA-MF: identification of circRNA-disease associations based on matrix factorization. Briefings In Bioinformatics
Jia C, Zuo Y, Zou Q. O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique. Bioinformatics, 2018, 34(12): 2029–2036
Article Google Scholar
Chen W, Feng P, Liu T, Jin D. Recent advances in machine learning methods for predicting heat shock proteins. Curr Drug Metab, 2019, 20(3): 224–228
Article Google Scholar
Wang H, Ding Y, Tang J, Guo F. Identification of membrane protein types via multivariate information fusion with Hilbert-Schmidt independence criterion. Neurocomputing, 2020, 383: 257–269
Article Google Scholar
Wang B, Lu K, Zheng X, Su B, Zhou Y, Chen P, Zhang J. Early stage identification of Alzheimer’s disease using a two-stage ensemble classifier. Current Bioinformatics, 2018, 13(5): 529–535
Article Google Scholar
Li J, Wei L, Guo F, Zou Q. EP3: an ensemble predictor that accurately identifies type III secreted effectors. Briefings in Bioinformatics, 2021, 22(2): 1918–1928
Article Google Scholar
Ru X, Cao P, Li L, Zou Q. Selecting essential micrornas using a novel voting method. Molecular Therapy — Nucleic Acids, 2019, 18: 16–23
Article Google Scholar
Dong X B, Yu Z W, Cao W M, Shi Y F, Ma Q L. A survey on ensemble learning. Frontiers of Computer Science, 2020, 14(2): 241–258
Article Google Scholar
He Y Z, Alem E E, Wang W. Hybritus: a password strength checker by ensemble learning from the query feedbacks of websites. Frontiers of Computer Science, 2020, 14(3): 14
Article Google Scholar

Download references

Acknowledgements

The work was supported by the National Natural Science Foundation of China (Grant Nos. 61922020, 61771331, 91935302).

Author information

Authors and Affiliations

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610051, China
Juntao Chen & Quan Zou
Hainan Key Laboratory for Computational Science and Application, Hainan Normal University, Haikou, 571158, China
Quan Zou
College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China
Jing Li

Authors

Juntao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Quan Zou
View author publications
You can also search for this author in PubMed Google Scholar
Jing Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quan Zou.

Additional information

Juntao Chen is a senior student in University of Electronic Science and Technology of China, China. His research is currently in sequence classification with deep learning.

Quan Zou received his BS, MS and PhD degrees in computer science from Harbin Institute of Technology, China, in 2004, 2007 and 2009, respectively. He is currently a professor in the Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, China. His research is in the areas of bioinformatics, machine learning and parallel computing. Several related works have been published by Science, Briefings in Bioinformatics, Bioinformatics, IEEE/ACM Transactions on Computational Biology and Bioinformatics, etc. He is the editor-in-chief of Current Bioinformatics, associate editor of IEEE Access, Frontiers in Genetics and Frontiers in Plant Science. He was selected as one of the Clarivate Analytics Highly Cited Researchers in 2018 and 2019.

Jing Li is a graduate student in Tianjin University, China. Her research interest is protein classification.

Electronic Supplementary Material