Abstract
N6-methyladenosine (m6A) is a prevalent methylation modification and plays a vital role in various biological processes, such as metabolism, mRNA processing, synthesis, and transport. Recent studies have suggested that m6A modification is related to common diseases such as cancer, tumours, and obesity. Therefore, accurate prediction of methylation sites in RNA sequences has emerged as a critical issue in the area of bioinformatics. However, traditional high-throughput sequencing and wet bench experimental techniques have the disadvantages of high costs, significant time requirements and inaccurate identification of sites. But through the use of traditional experimental methods, researchers have produced many large databases of m6A sites. With the support of these basic databases and existing deep learning methods, we developed an m6A site predictor named DeepM6ASeq-EL, which integrates an ensemble of five LSTM and CNN classifiers with the combined strategy of hard voting. Compared to the state-of-the-art prediction method WHISTLE (average AUC 0.948 and 0.880), the DeepM6ASeq-EL had a lower accuracy in m6A site prediction (average AUC: 0.861 for the full transcript models and 0.809 for the mature messenger RNA models) when tested on six independent datasets.
Similar content being viewed by others
References
Dunn D, Smith J. Occurrence of a new base in the deoxyribonucleic acid of a strain of Bacterium coli. Nature, 1955, 175(4451): 336–337
Adams J M, Cory S. Modified nucleosides and bizarre 5′-termini in mouse myeloma mRNA. Nature, 1975, 255(5503): 28–33
Lichinchi G, Gao S, Saletore Y, Gonzalez G M, Bansal V, Wang Y, Mason C E, Rana T M. Dynamics of the human and viral m6A RNA methylomes during HIV-1 infection of T cells. Nature Microbiology, 2016, 1(4): 1–9
Yin J, Sun W, Li F, Hong J, Li X, Zhou Y, Lu Y, Liu M, Zhang X, Chen N, Jin X, Xue J, Zeng S, Yu L, Zhu F. VARIDT 1.0: variability of drug transporter database. Nucleic Acids Res, 2020, 48(D1): D1042–D1050
Meyer K D, Jaffrey S R. The dynamic epitranscriptome: N6-methyladenosine and gene expression control. Nature Reviews Molecular Cell Biology, 2014, 15(5): 313–326
Tang J, Fu J, Wang Y, Luo Y, Yang Q, Li B, Tu G, Hong J, Cui X, Chen Y, Yao L, Xue W, Zhu F. Simultaneous improvement in the precision, accuracy, and robustness of label-free proteome quantification by optimizing data manipulation chains. Mol Cell Proteomics, 2019, 18(8): 1683–1699
Cui Q, Shi H, Ye P, Li L, Qu Q, Sun G, Sun G, Lu Z, Huang Y, Yang C-G. m6A RNA methylation regulates the self-renewal and tumorigenesis of glioblastoma stem cells. Cell Reports, 2017, 18(11): 2622–2634
Jia G, Fu Y, Zhao X, Dai Q, Zheng G, Yang Y, Yi C, Lindahl T, Pan T, Yang Y-G. N6-methyladenosine in nuclear RNA is a major substrate of the obesity-associated FTO. Nature Chemical Biology, 2011, 7(12): 885
Fang S, Pan J, Zhou C, Tian H, He J, Shen W, Jin X, Meng X, Jiang N, Gong Z. Circular RNAs serve as novel biomarkers and therapeutic targets in cancers. Curr Gene Ther, 2019, 19(2): 125–133
Feng Y M. Gene therapy on the road. Curr Gene Ther, 2019, 19(1): 6
Cheng L, Yang H, Zhao H, Pei X, Shi H, Sun J, Zhang Y, Wang Z, Zhou M. MetSigDis: a manually curated resource for the metabolic signatures of diseases. Brief Bioinform, 2019, 20(1): 203–209
Yang Q, Wang Y, Zhang Y, Li F, Xia W, Zhou Y, Qiu Y, Li H, Zhu F. NOREVA: enhanced normalization and evaluation of time-course and multi-class metabolomic data. Nucleic Acids Res, 2020
Wang Y, Zhang S, Li F, Zhou Y, Zhang Y, Wang Z, Zhang R, Zhu J, Ren Y, Tan Y, Qin C, Li Y, Li X, Chen Y, Zhu F. Therapeutic target database 2020: enriched resource for facilitating research and early development of targeted therapeutics. Nucleic Acids Res, 2020, 48(D1): D1031–D1041
Li B, Tang J, Yang Q, Li S, Cui X, Li Y, Chen Y, Xue W, Li X, Zhu F. NOREVA: normalization and evaluation of MS-based metabolomics data. Nucleic Acids Res, 2017, 45(W1): W162–W170
Liu B. BioSeq-Analysis: a platform for DNA, RNA, and protein sequence analysis based on machine learning approaches. Briefings in Bioinformatics, 2019, 20(4): 1280–1294
Dominissini D, Moshitch-Moshkovitz S, Schwartz S, Salmon-Divon M, Ungar L, Osenberg S, Cesarkas K, Jacob-Hirsch J, Amariglio N, Kupiec M. Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq. Nature, 2012, 485(7397): 201–206
Meyer K D, Saletore Y, Zumbo P, Elemento O, Mason C E, Jaffrey S R. Comprehensive analysis of mRNA methylation reveals enrichment in 3′ UTRs and near stop codons. Cell, 2012, 149(7): 1635–1646
Linder B, Grozhik A V, Olarerin-George A O, Meydan C, Mason C E, Jaffrey S R. Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome. Nature Methods, 2015, 12(8): 767–772
Li Y H, Li X X, Hong J J, Wang Y X, Fu J B, Yang H, Yu C Y, Li F C, Hu J, Xue W W, Jiang Y Y, Chen Y Z, Zhu F. Clinical trials, progression-speed differentiating features and swiftness rule of the innovative targets of first-in-class drugs. Brief Bioinform, 2020, 21(2): 649–662
Xue W, Yang F, Wang P, Zheng G, Chen Y, Yao X, Zhu F. What contributes to serotonin-norepinephrine reuptake inhibitors’ dual-targeting mechanism? the key role of transmembrane domain 6 in human serotonin and norepinephrine transporters revealed by molecular dynamics simulation. ACS Chem Neurosci, 2018, 9(5): 1128–1140
Chen W, Feng P, Ding H, Lin H, Chou K-C. iRNA-methyl: identifying N6-methyladenosine sites using pseudo nucleotide composition. Analytical Biochemistry, 2015, 490: 26–33
Tang J, Fu J, Wang Y, Li B, Li Y, Yang Q, Cui X, Hong J, Li X, Chen Y, Xue W, Zhu F. ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies. Brief Bioinform, 2020, 21(2): 621–636
Liu H, Wang H, Wei Z, Zhang S, Hua G, Zhang S-W, Zhang L, Gao S-J, Meng J, Chen X. MeT-DB V2.0: elucidating context-specific functions of N 6-methyl-adenosine methyltranscriptome. Nucleic Acids Research, 2018, 46(D1): D281–D287
Xuan J-J, Sun W-J, Lin P-H, Zhou K-R, Liu S, Zheng L-L, Qu L-H, Yang J-H. RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Research, 2018, 46(D1): D327–D334
Liu Z, Xiao X, Yu D-J, Jia J, Qiu W-R, Chou K-C. pRNAm-PC: predicting N6-methyladenosine sites in RNA sequences via physical-chemical properties. Analytical Biochemistry, 2016, 497: 60–67
Zhang M, Sun J-W, Liu Z, Ren M-W, Shen H-B, Yu D-J. Improving N6-methyladenosine site prediction with heuristic selection of nucleotide physical-chemical properties. Analytical Biochemistry, 2016, 508: 104–113
Zhou Y, Zeng P, Li Y-H, Zhang Z, Cui Q. SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features. Nucleic Acids Research, 2016, 44(10): e91–e91
Chen W, Tang H, Lin H. MethyRNA: a web server for identification of N6-methyladenosine sites. Journal of Biomolecular Structure and Dynamics, 2017, 35(3): 683–687
Fan C, Liu D, Huang R, Chen Z, Deng L. PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility. BMC Bioinformatics: BioMed Central, 2016, 8
Wang H, Liu C, Deng L. Enhanced prediction of hot spots at protein-protein interfaces using extreme gradient boosting. Scientific Reports, 2018, 8(1): 14285
Deng L, Li W, Zhang J. LDAH2V: exploring meta-paths across multiple networks for lncRNA-disease association prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2019
Qiang X, Chen H, Ye X, Su R, Wei L. M6AMRFS: robust prediction of N6-methyladenosine sites with sequence-based features in multiple species. Frontiers in Genetics, 2018, 9: 495
Wei L, Chen H, Su R. M6APred-EL: a sequence-based predictor for identifying N6-methyladenosine sites using ensemble learning. Molecular Therapy-Nucleic Acids, 2018, 12: 635–644
Zhang Y, Hamada M. DeepM6ASeq: prediction and characterization of m6A-containing sequences using deep learning. BMC bioinformatics, 2018, 19(19): 524
Zou Q, Xing P, Wei L, Liu B. Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA. Rna, 2019, 25(2): 205–218
Chen K, Wei Z, Zhang Q, Wu X, Rong R, Lu Z, Su J, de Magalhaes J P, Rigden D J, Meng J. WHISTLE: a high-accuracy map of the human N6-methyladenosine (m6A) epitranscriptome predicted using a machine learning approach. Nucleic Acids Research, 2019, 47(7): e41–e41
Liu K, Chen W. iMRM:a platform for simultaneously identifying multiple kinds of RNA modifications. Bioinformatics, 2020
Vu L P, Pickering B F, Cheng Y, Zaccara S, Nguyen D, Minuesa G, Chou T, Chow A, Saletore Y, MacKay M. The N6-methyladenosine (m6A)-forming enzyme METTL3 controls myeloid differentiation of normal hematopoietic and leukemia cells. Nature Medicine, 2017, 23(11): 1369
Ke S, Alemu E A, Mertens C, Gantman E C, Fak J J, Mele A, Haripal B, Zucker-Scharff I, Moore M J, Park C Y. A majority of m6A residues are in the last exons, allowing the potential for 3′ UTR regulation. Genes & Development, 2015, 29(19): 2037–2053
Ke S, Pandya-Jones A, Saito Y, Fak J J, Vågbø C B, Geula S, Hanna J H, Black D L, Darnell J E, Darnell R B. m6A mRNA modifications are deposited in nascent pre-mRNA and are not required for splicing but do specify cytoplasmic turnover. Genes & Development, 2017, 31(10): 990–1006
Dao F Y, Lv H, Zulfiqar H, Yang H, Su W, Gao H, Ding H, Lin H. A computational platform to identify origins of replication sites in eukaryotes. Brief Bioinform, 2020
Lv H, Zhang Z M, Li S H, Tan J X, Chen W, Lin H. Evaluation of different computational methods on 5-methylcytosine sites identification. Briefings in Bioinformatics, 2019
Li J W, Pu Y Q, Tang J J, Zou Q, Guo F. DeepAVP: a dual-channel deep neural network for identifying variable-length antiviral peptides. IEEE Journal of Biomedical and Health Informatics, 2020: 1–1
Hong J, Luo Y, Zhang Y, Ying J, Xue W, Xie T, Tao L, Zhu F. Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning. Brief Bioinform, 2019
Hong J, Luo Y, Mou M, Fu J, Zhang Y, Xue W, Xie T, Tao L, Lou Y, Zhu F. Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery. Brief Bioinform, 2019
Li F, Zhou Y, Zhang X, Tang J, Yang Q, Zhang Y, Luo Y, Hu J, Xue W, Qiu Y, He Q, Yang B, Zhu F. SSizer: determining the sample sufficiency for comparative biological study. J Mol Biol, 2020
Fang T, Zhang Z, Sun R, Zhu L, He J, Huang B, Xiong Y, Zhu X. RNAm5CPred: Prediction of RNA 5-methylcytosine sites based on three different kinds of nucleotide composition. Mol Ther Nucleic Acids, 2019, 18: 739–747
He J, Fang T, Zhang Z, Huang B, Zhu X, Xiong Y. PseUI: Pseudouridine sites identification based on RNA sequence information. BMC Bioinformatics, 2018, 19(1): 306
Liu B, Li C, Yan K. DeepSVM-fold: protein fold recognition by combining Support Vector Machines and pairwise sequence similarity scores generated by deep learning networks. Briefings in Bioinformatics, 2020, 21(5): 1733–1741
Liu B, Gao X, Zhang H. BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA, and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Research, 2019, 47(20): e127
Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, Ziller M J. Integrative analysis of 111 reference human epigenomes. Nature, 2015, 518(7539): 317–330
Lv H, Dao F Y, Zhang D, Guan Z X, Yang H, Su W, Liu M L, Ding H, Chen W, Lin H. iDNA-MS: an integrated computational tool for detecting dna modification sites in multiple genomes. iScience, 2020, 23(4): 100991
Wei L, Zou Q, Liao M, Lu H, Zhao Y. A novel machine learning method for cytokine-receptor interaction prediction. Combinatorial Chemistry & High Throughput Screening, 2016, 19(2): 144–152
Wu B, Zhang H, Lin L, Wang H, Gao Y, Zhao L, Chen Y-P P, Chen R, Gu L. A similarity searching system for biological phenotype images using deep convolutional encoder-decoder architecture. Current Bioinformatics, 2019, 14(7): 628–639
Lv Z B, Ao C Y, Zou Q. Protein function prediction: from traditional classifier to deep learning. Proteomics, 2019, 19(14): 2
Zhang J, Zhong B N, Wang P F, Wang C, Du J X. Robust feature learning for online discriminative tracking without large-scale pre-training. Frontiers of Computer Science, 2018, 12(6): 1160–1172
Zhang Q J, Zhang L. Convolutional adaptive denoising autoencoders for hierarchical feature extraction. Frontiers of Computer Science, 2018, 12(6): 1140–1148
Zheng N, Wang K, Zhan W, Deng L. Targeting virus-host protein interactions: feature extraction and machine learning approaches. Current Drug Metabolism, 2019, 20(3): 177–184
Liu S, Liu C, Deng L. Machine learning approaches for protein-protein interaction hot spot prediction: progress and comparative assessment. Molecules, 2018, 23(10): 2535
Liu B, Zhu Y. ProtDec-LTR3.0: protein remote homology detection by incorporating profile-based features into Learning to Rank. IEEE Access, 2019, 7: 102499–102507
Zhang M, Li F, Marquez-Lago T T, Leier A, Fan C, Kwoh C K, Chou K-C, Song J, Jia C. MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics, 2019, 35(17): 2957–2965
Yang W, Zhu X J, Huang J, Ding H, Lin H. A brief survey of machine learning methods in protein sub-Golgi localization. Current Bioinformatics, 2019, 14: 234–240
Liu M L, Su W, Guan Z X, Zhang D, Chen W, Liu L, Ding H. An overview on predicting protein subchloroplast localization by using machine learning methods. Curr Protein Pept Sci, 2020
Cheng L, Jiang Y, Ju H, Sun J, Peng J, Zhou M, Hu Y. InfAcrOnt: calculating cross-ontology term similarities using information flow by a random walk. BMC Genomics, 2018, 19(Suppl 1): 919
Ding Y, Tang J, Guo F. Identification of drug-side effect association via multiple information integration with centered kernel alignment. Neurocomputing, 2019, 325: 211–224
Zhu X, He J, Zhao S, Tao W, Xiong Y, Bi S. A comprehensive comparison and analysis of computational predictors for RNA N6-methyladenosine sites of Saccharomyces cerevisiae. Brief Funct Genomics, 2019, 18(6): 367–376
Shan X, Wang X, Li C D, Chu Y, Zhang Y, Xiong Y, Wei D Q. Prediction of CYP450 enzyme-substrate selectivity based on the network-based label space division method. J Chem Inf Model, 2019, 59(11): 4577–4586
Chu Y, Kaushik A C, Wang X, Wang W, Zhang Y, Shan X, Salahub D R, Xiong Y, Wei D Q. DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features. Brief Bioinform, 2019
Xu Q, Xiong Y, Dai H, Kumari K M, Xu Q, Ou H Y, Wei D Q. PDC-SGB: prediction of effective drug combinations using a stochastic gradient boosting algorithm. J Theor Biol, 2017, 417: 1–7
Wei H, Liu B. iCircDA-MF: identification of circRNA-disease associations based on matrix factorization. Briefings In Bioinformatics
Jia C, Zuo Y, Zou Q. O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique. Bioinformatics, 2018, 34(12): 2029–2036
Chen W, Feng P, Liu T, Jin D. Recent advances in machine learning methods for predicting heat shock proteins. Curr Drug Metab, 2019, 20(3): 224–228
Wang H, Ding Y, Tang J, Guo F. Identification of membrane protein types via multivariate information fusion with Hilbert-Schmidt independence criterion. Neurocomputing, 2020, 383: 257–269
Wang B, Lu K, Zheng X, Su B, Zhou Y, Chen P, Zhang J. Early stage identification of Alzheimer’s disease using a two-stage ensemble classifier. Current Bioinformatics, 2018, 13(5): 529–535
Li J, Wei L, Guo F, Zou Q. EP3: an ensemble predictor that accurately identifies type III secreted effectors. Briefings in Bioinformatics, 2021, 22(2): 1918–1928
Ru X, Cao P, Li L, Zou Q. Selecting essential micrornas using a novel voting method. Molecular Therapy — Nucleic Acids, 2019, 18: 16–23
Dong X B, Yu Z W, Cao W M, Shi Y F, Ma Q L. A survey on ensemble learning. Frontiers of Computer Science, 2020, 14(2): 241–258
He Y Z, Alem E E, Wang W. Hybritus: a password strength checker by ensemble learning from the query feedbacks of websites. Frontiers of Computer Science, 2020, 14(3): 14
Acknowledgements
The work was supported by the National Natural Science Foundation of China (Grant Nos. 61922020, 61771331, 91935302).
Author information
Authors and Affiliations
Corresponding author
Additional information
Juntao Chen is a senior student in University of Electronic Science and Technology of China, China. His research is currently in sequence classification with deep learning.
Quan Zou received his BS, MS and PhD degrees in computer science from Harbin Institute of Technology, China, in 2004, 2007 and 2009, respectively. He is currently a professor in the Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, China. His research is in the areas of bioinformatics, machine learning and parallel computing. Several related works have been published by Science, Briefings in Bioinformatics, Bioinformatics, IEEE/ACM Transactions on Computational Biology and Bioinformatics, etc. He is the editor-in-chief of Current Bioinformatics, associate editor of IEEE Access, Frontiers in Genetics and Frontiers in Plant Science. He was selected as one of the Clarivate Analytics Highly Cited Researchers in 2018 and 2019.
Jing Li is a graduate student in Tianjin University, China. Her research interest is protein classification.
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Chen, J., Zou, Q. & Li, J. DeepM6ASeq-EL: prediction of human N6-methyladenosine (m6A) sites with LSTM and ensemble learning. Front. Comput. Sci. 16, 162302 (2022). https://doi.org/10.1007/s11704-020-0180-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-020-0180-0