Abstract
Artificial intelligence (AI) and machine learning (ML) models are being deployed in many domains of society and have recently reached the field of drug discovery. Given the increasing prevalence of antimicrobial resistance, as well as the challenges intrinsic to antibiotic development, there is an urgent need to accelerate the design of new antimicrobial therapies. Antimicrobial peptides (AMPs) are therapeutic agents for treating bacterial infections, but their translation into the clinic has been slow owing to toxicity, poor stability, limited cellular penetration and high cost, among other issues. Recent advances in AI and ML have led to breakthroughs in our abilities to predict biomolecular properties and structures and to generate new molecules. The ML-based modelling of peptides may overcome some of the disadvantages associated with traditional drug discovery and aid the rapid development and translation of AMPs. Here, we provide an introduction to this emerging field and survey ML approaches that can be used to address issues currently hindering AMP development. We also outline important limitations that can be addressed for the broader adoption of AMPs in clinical practice, as well as new opportunities in data-driven peptide design.
Key points
-
Machine learning (ML) can aid antimicrobial peptide (AMP) design and discovery. It can be applied to improve drug efficacy, predict medicinal chemistry and reduce the overall time and cost of drug development.
-
ML can be used for the prediction of therapeutic properties — such as antimicrobial efficacy, and absorption, distribution, metabolism, excretion and toxicity (ADMET) — and macromolecular structures.
-
Deep generative models are promising approaches to designing new AMPs.
-
Important limitations in AMP development include lack of selectivity, undesirable physicochemical and medicinal chemistry properties, unspecific or unknown mechanisms of action, high cost of peptide synthesis, and generation of industrial waste. ML can help to overcome these limitations by applying relevant models trained on high-quality datasets.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Fjell, C. D., Hiss, J. A., Hancock, R. E. W. & Schneider, G. Designing antimicrobial peptides: form follows function. Nat. Rev. Drug. Discov. 11, 37–51 (2012).
Yan, J. et al. Recent progress in the discovery and design of antimicrobial peptides using traditional machine learning and deep learning. Antibiotics 11, 1451 (2022).
Silva, O. N. et al. Repurposing a peptide toxin from wasp venom into antiinfectives with dual antimicrobial and immunomodulatory properties. PNAS 117, 26936–26945 (2020).
Magana, M. et al. The value of antimicrobial peptides in the age of resistance. Lancet Infect. Dis. 20, e216–e230 (2020).
Bahar, A. & Ren, D. Antimicrobial peptides. Pharmaceuticals 6, 1543–1575 (2013).
Chen, C. H. & Lu, T. K. Development and challenges of antimicrobial peptides for therapeutic applications. Antibiotics 9, 24 (2020).
Dijksteel, G. S., Ulrich, M. M. W., Middelkoop, E. & Boekema, B. K. H. L. Review: lessons learned from clinical trials using antimicrobial peptides (AMPs). Front. Microbiol. 12, 616979 (2021).
Centers for Disease Control and Prevention (U.S.); National Center for Emerging Zoonotic and Infectious Diseases (U.S.), Division of Healthcare Quality Promotion, Antibiotic Resistance Coordination and Strategy Unit. Antibiotic Resistance Threats in the United States, 2019 CDC https://doi.org/10.15620/cdc:82532 (2019).
Murray, C. J. et al. Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. Lancet 399, 629–655 (2022).
Santos-Júnior, C. D. et al. Computational exploration of the global microbiome for antibiotic discovery. Preprint at bioRxiv https://doi.org/10.1101/2023.08.31.555663 (2023).
Torres, M. D. T. et al. Human gut metagenomic mining reveals an untapped source of peptide antibiotics. Preprint at bioRxiv https://doi.org/10.1101/2023.08.31.555711 (2023).
Maasch, J. R. M. A., Torres, M. D. T., Melo, M. C. R. & de la Fuente-Nunez, C. Molecular de-extinction of ancient antimicrobial peptides enabled by machine learning. Cell Host Microbe 31, 1230–1274.e6 (2023). This study reports the use of machine learning (ML) to mine the proteomes of the archaic humans Neanderthals and Denisovans, leading to the discovery of the first antibiotics in extinct organisms (including Neanderthalin-1) and launching the field of molecular de-extinction.
Wong, F., de la Fuente-Nunez, C. & Collins, J. J. Leveraging artificial intelligence in the fight against infectious diseases. Science 381, 164–170 (2023). This review summarizes state-of-the-art artificial intelligence (AI)/ML approaches to addressing infectious diseases through the lens of biotechnology and medicine.
Ma, Y. et al. Identification of antimicrobial peptides from the human gut microbiome using deep learning. Nat. Biotechnol. 40, 921–931 (2022). This study reports the use of multiple language processing neural network models to identify 181 antimicrobial peptides (AMPs) with antimicrobial activity from the human gut microbiome, three of which were validated in vivo in a mouse model of bacterial lung infection.
Huang, J. et al. Identification of potent antimicrobial peptides via a machine-learning pipeline that mines the entire space of peptide sequences. Nat. Biomed. Eng. 7, 797–810 (2023). This study applies a cascading pipeline consisting of multiple ML modules to identify 54 AMPs with antimicrobial activity from combinatorial peptide space.
Wan, F., Torres, M. D. T., Peng, J. & de la Fuente-Nunez, C. Molecular de-extinction of antibiotics enabled by deep learning. Preprint at bioRxiv https://doi.org/10.1101/2023.10.01.560353 (2023).
Torres, M. D. T. & de la Fuente-Nunez, C. Toward computer-made artificial antibiotics. Curr. Opin. Microbiol. 51, 30–38 (2019). This review outlines the emerging field of antibiotic discovery enabled by computers.
Chen, C. H., Bepler, T., Pepper, K., Fu, D. & Lu, T. K. Synthetic molecular evolution of antimicrobial peptides. Curr. Opin. Biotechnol. 75, 102718 (2022).
Palmer, N., Maasch, J. R. M. A., Torres, M. D. T. & de la Fuente-Nunez, C. Molecular dynamics for antimicrobial peptide discovery. Infect. Immun. 89, e00703-20 (2021).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers) (eds Burstein, J. et al.) 4171–4186 (Association for Computational Linguistics, 2019).
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017).
Valeri, J. A. et al. Sequence-to-function deep learning frameworks for engineered riboregulators. Nat. Commun. 11, 5058 (2020).
Angenent-Mari, N. M., Garruss, A. S., Soenksen, L. R., Church, G. & Collins, J. J. A deep learning approach to programmable RNA switches. Nat. Commun. 11, 5057 (2020).
Lee, J. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240 (2020).
Gu, Y. et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. 3, 1–23 (2022).
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702.e13 (2020).
Jin, W. et al. Deep learning identifies synergistic drug combinations for treating COVID-19. Proc. Natl Acad. Sci. USA 118, e2015070118 (2021).
Wong, F., Omori, S., Donghia, N. M., Zheng, E. J. & Collins, J. J. Discovering small-molecule senolytics with deep neural networks. Nat. Aging 3, 734–750 (2023).
Soenksen, L. R. et al. Using deep learning for dermatologist-level detection of suspicious pigmented skin lesions from wide-field images. Sci. Transl. Med. 13, eabb3652 (2021).
Liu, G. et al. Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii. Nat. Chem. Biol. 19, 1342–1350 (2023).
Zheng, E. J. et al. Discovery of antibiotics that selectively kill metabolically dormant bacteria. Cell Chem. Biol. https://doi.org/10.1016/j.chembiol.2023.10.026 (2023).
Wong, F. et al. Discovery of a structural class of antibiotics with explainable deep learning. Nature 626, 177–185 (2021).
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
Kim, H. K. et al. Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239–241 (2018).
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).
Wu, R. et al. High-resolution de novo structure prediction from primary sequence. Preprint at bioRxiv https://doi.org/10.1101/2022.07.21.500999 (2022).
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug. Discov. 18, 463–477 (2019).
Li, S. et al. MONN: a multi-objective neural network for predicting compound-protein interactions and affinities. Cell Syst. 10, 308–322.e11 (2020).
Ge, Y. et al. An integrative drug repositioning framework discovered a potential therapeutic agent targeting COVID-19. Signal. Transduct. Target. Ther. 6, 165 (2021).
Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019).
Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. 1, 18 (2018).
Shen, D., Wu, G. & Suk, H.-I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248 (2017).
Melo, M. C. R., Maasch, J. R. M. A. & de la Fuente-Nunez, C. Accelerating antibiotic discovery through artificial intelligence. Commun. Biol. 4, 1050 (2021).
Das, P. et al. Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations. Nat. Biomed. Eng. 5, 613–623 (2021). This study reports the use of a deep generative autoencoder to generate AMPs that were synthesized and tested for antimicrobial activity in vitro and for toxicity in mice.
Torres, M. D. T. et al. Mining for encrypted peptide antibiotics in the human proteome. Nat. Biomed. Eng. 6, 67–75 (2022). This article reports the exploration of the human proteome as a source of antibiotics, leading to the discovery of thousands of previously unrecognized antimicrobial sequences, and providing a new framework for antibiotic discovery by mining entire proteomes.
Porto, W. F. et al. In silico optimization of a guava antimicrobial peptide enables combinatorial exploration for peptide design. Nat. Commun. 9, 1490 (2018). This article describes an antibiotic molecule designed by a computer, called guavanin 2, which displays anti-infective properties in vivo.
Xu, J. et al. Comprehensive assessment of machine learning-based methods for predicting antimicrobial peptides. Brief Bioinform. 22, bbab083 (2021).
Osorio, D., Rondón-Villarreal, P. & Torres, R. Peptides: a package for data mining of antimicrobial peptides. R J. 7, 4–14 (2015).
van Westen, G. J. et al. Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets. J. Cheminform 5, 41 (2013).
Müller, A. T., Gabernet, G., Hiss, J. A. & Schneider, G. modlAMP: Python for antimicrobial peptides. Bioinformatics 33, 2753–2755 (2017).
Romero‐Molina, S., Ruiz‐Blanco, Y. B., Green, J. R. & Sanchez‐Garcia, E. ProtDCal‐Suite: a web server for the numerical codification and functional analysis of proteins. Protein Sci. 28, 1734–1743 (2019).
Barigye, S. J., Gómez‐Ganau, S., Serrano‐Candelas, E. & Gozalbes, R. PeptiDesCalculator: software for computation of peptide descriptors. Definition, implementation and case studies for 9 bioactivity endpoints. Proteins 89, 174–184 (2021).
Chen, Z. et al. iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics 34, 2499–2502 (2018).
Saeys, Y., Inza, I. & Larranaga, P. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007).
Chen, X. et al. Sequence-based peptide identification, generation, and property prediction with deep learning: a review. Mol. Syst. Des. Eng. 6, 406–428 (2021).
Kawashima, S. AAindex: amino acid index database. Nucleic Acids Res. 28, 374–374 (2000).
ElAbd, H. et al. Amino acid encoding for deep learning applications. BMC Bioinform. 21, 235 (2020).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Chung, J., Gülçehre, Ç., Cho, K. & Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. Preprint at arXiv.org/abs/1412.3555 (2014).
Wan, F., Kontogiorgos-Heintz, D. & de la Fuente-Nunez, C. Deep generative models for peptide design. Digit. Discov. 1, 195–208 (2022).
Yan, K., Lv, H., Guo, Y., Peng, W. & Liu, B. sAMPpred-GAT: prediction of antimicrobial peptide by graph attention network and predicted peptide structure. Bioinformatics 39, btac715 (2023).
Ganea, O. et al. GeoMol: torsional geometric generation of molecular 3D conformer ensembles. Adv. Neur. Inf. Process Syst. 34, 13757–13769 (2021).
Jin, W., Wohlwend, J., Barzilay, R. & Jaakkola, T. S. Iterative refinement graph neural network for antibody sequence-structure co-design. In Proc. 10th International Conference on Learning Representations, ICLR 2022 (OpenReview.net, 2022).
Maturana, D. & Scherer, S. VoxNet: a 3D convolutional neural network for real-time object recognition. In Proc. 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 922–928 (IEEE, 2015).
Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
Jiménez, J., Doerr, S., Martínez-Rosell, G., Rose, A. S. & De Fabritiis, G. DeepSite: protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics 33, 3036–3042 (2017).
Jones, D. et al. Improved protein–ligand binding affinity prediction with structure-based deep fusion inference. J. Chem. Inf. Model. 61, 1583–1592 (2021).
Bengio, Y., Courville, A. & Vincent, P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013).
Grill, J.-B. et al. Bootstrap your own latent — a new approach to self-supervised learning. Adv. Neur. Inf. Process Syst. 33, 21271–21284 (2020).
Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. In Proc. 1st International Conference on Learning Representations, ICLR 2013, Workshop Track (eds Bengio, Y. & LeCun, Y.) (OpenReview.net, 2013).
Brown, T. B. et al. Language models are few-shot learners. Preprint at https://doi.org/10.48550/arXiv.2005.14165 (2020).
Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023).
Alley, E. C., Khimulya, G., Biswas, S., AlQuraishi, M. & Church, G. M. Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods 16, 1315–1322 (2019).
Rong, Y. et al. Self-supervised graph transformer on large-scale molecular data. In NIPS'20: Proc. 34th International Conference on Neural Information Processing Systems (eds Larochelle, H. et al.) 12559–12571 (Curran Assoc., 2020).
Zang, X., Zhao, X. & Tang, B. Hierarchical molecular graph self-supervised learning for property prediction. Commun. Chem. 6, 34 (2023).
Geourjon, C. & Deléage, G. SOPM: a self-optimized method for protein secondary structure prediction. Protein Eng. 7, 157–164 (1994).
Cao, X. et al. PSSP-MVIRT: peptide secondary structure prediction based on a multi-view deep learning architecture. Brief Bioinform. 22, bbab203 (2021).
Peri, S., Steen, H. & Pandey, A. GPMAW – a software tool for analyzing proteins and peptides. Trends Biochem. Sci. 26, 687–689 (2001).
Pereira, J. et al. High‐accuracy protein structure prediction in CASP14. Protein 89, 1687–1699 (2021).
Robin, X. et al. Continuous Automated Model EvaluatiOn (CAMEO) — perspectives on the future of fully automated evaluation of structure prediction methods. Proteins 89, 1977–1986 (2021).
Vaswani, A. et al. Attention is all you need. In NIPS'17: Proc. 31st International Conference on Neural Information Processing Systems (eds Guyon, I. et al.) 6000–6010 (Curran Assoc., 2017).
Berman, H. M. The protein data bank. Nucleic Acids Res. 28, 235–242 (2000).
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
McDonald, E. F., Jones, T., Plate, L., Meiler, J. & Gulsevin, A. Benchmarking AlphaFold2 on peptide structure prediction. Structure 31, 111–119.e2 (2023).
Lamiable, A. et al. PEP-FOLD3: faster de novo structure prediction for linear peptides in solution and in complex. Nucleic Acids Res. 44, W449–W454 (2016).
Timmons, P. B. & Hewage, C. M. APPTEST is a novel protocol for the automatic prediction of peptide tertiary structures. Brief Bioinform. 22, bbab308 (2021).
Boaro, A. et al. Structure-function-guided design of synthetic peptides with anti-infective activity derived from wasp venom. Cell Rep. Phys. Sci. 4, 101459 (2023).
Torres, M. D. T. et al. Structure-function-guided exploration of the antimicrobial peptide polybia-CP identifies activity determinants and generates synthetic therapeutic candidates. Commun. Biol. 1, 221 (2018).
Wong, F. et al. Benchmarking‐enabled molecular docking predictions for antibiotic discovery. Mol. Syst. Biol. 18, e11081 (2022).
Luo, S., Shi, C., Xu, M. & Tang, J. Predicting molecular conformation via dynamic graph score matching. Adv. Neur. Inf. Process Syst. 34, 19784–19795 (2021).
Hoogeboom, E. et al. Equivariant diffusion for molecule generation in 3D. Proc. Mach. Learn. Res. 162, 8867–8887 (PMLR, 2022).
Xu, M. et al. GeoDiff: a geometric diffusion model for molecular conformation generation. In Proc. 10th International Conference on Learning Representations, ICLR 2022 (OpenReview.net, 2022).
Mansimov, E., Mahmood, O., Kang, S. & Cho, K. Molecular geometry prediction using a deep generative graph neural network. Sci. Rep. 9, 20381 (2019).
Gogineni, T. et al. TorsionNet: a reinforcement learning approach to sequential conformer search. In NIPS'20: Proc. 34th International Conference on Neural Information Processing Systems (eds Larochelle, H. et al.) 20142–20153 (ACM, 2020).
Janson, G., Valdes-Garcia, G., Heo, L. & Feig, M. Direct generation of protein conformational ensembles via machine learning. Nat. Commun. 14, 774 (2023).
Pirtskhalava, M. et al. DBAASP v3: database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic Acids Res. 49, D288–D297 (2021).
García-Jacas, C. R., Pinacho-Castellanos, S. A., García-González, L. A. & Brizuela, C. A. Do deep learning models make a difference in the identification of antimicrobial peptides? Brief Bioinform. 23, bbac094 (2022).
Sidorczuk, K. et al. Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data. Brief Bioinform. 23, bbac343 (2022).
Waghu, F. H., Barai, R. S., Gurung, P. & Idicula-Thomas, S. CAMPR3: a database on sequences, structures and signatures of antimicrobial peptides: Table 1. Nucleic Acids Res. 44, D1094–D1097 (2016).
Zhao, X., Wu, H., Lu, H., Li, G. & Huang, Q. LAMP: a database linking antimicrobial peptides. PLoS ONE 8, e66557 (2013).
Witten, J. & Witten, Z. Deep learning regression model for antimicrobial peptide design. Preprint at bioRxiv https://doi.org/10.1101/692681 (2019).
Wang, G., Li, X. & Wang, Z. APD3: the antimicrobial peptide database as a tool for research and education. Nucleic Acids Res. 44, D1087–D1093 (2016).
Meher, P. K., Sahu, T. K., Saini, V. & Rao, A. R. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC. Sci. Rep. 7, 42362 (2017).
Xiao, X., Wang, P., Lin, W.-Z., Jia, J.-H. & Chou, K.-C. iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Anal. Biochem. 436, 168–177 (2013).
Fingerhut, L. C. H. W., Miller, D. J., Strugnell, J. M., Daly, N. L. & Cooke, I. R. ampir: an R package for fast genome-wide prediction of antimicrobial peptides. Bioinformatics 36, 5262–5263 (2021).
Santos-Júnior, C. D., Pan, S., Zhao, X.-M. & Coelho, L. P. Macrel: antimicrobial peptide screening in genomes and metagenomes. PeerJ 8, e10555 (2020).
Burdukiewicz, M. et al. Proteomic screening for prediction and design of antimicrobial peptides with AmpGram. Int. J. Mol. Sci. 21, 4310 (2020).
Lawrence, T. J. et al. amPEPpy 1.0: a portable and accurate antimicrobial peptide prediction tool. Bioinformatics 37, 2058–2060 (2021).
Bhadra, P., Yan, J., Li, J., Fong, S. & Siu, S. W. I. AmPEP: sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest. Sci. Rep. 8, 1697 (2018).
Pane, K. et al. Antimicrobial potency of cationic antimicrobial peptides can be predicted from their amino acid composition: application to the detection of ‘cryptic’ antimicrobial peptides. J. Theor. Biol. 419, 254–265 (2017).
Yan, J. et al. Deep-AmPEP30: improve short antimicrobial peptides prediction with deep learning. Mol. Ther. Nucleic Acids 20, 882–894 (2020).
Veltri, D., Kamath, U. & Shehu, A. Deep learning improves antimicrobial peptide recognition. Bioinformatics 34, 2740–2747 (2018).
Xiao, X., Shao, Y.-T., Cheng, X. & Stamatovic, B. iAMP-CA2L: a new CNN-BiLSTM-SVM classifier based on cellular automata image for identifying antimicrobial peptides and their functional types. Brief. Bioinform. 22, bbab209 (2021).
Robles-Loaiza, A. A. et al. Traditional and computational screening of non-toxic peptides and approaches to improving selectivity. Pharmaceuticals 15, 323 (2022).
Plisson, F., Ramírez-Sánchez, O. & Martínez-Hernández, C. Machine learning-guided discovery and design of non-hemolytic peptides. Sci. Rep. 10, 16581 (2020).
Chaudhary, K. et al. A web server and mobile app for computing hemolytic potency of peptides. Sci. Rep. 6, 22843 (2016).
Win, T. S. et al. HemoPred: a web server for predicting the hemolytic activity of peptides. Future Med. Chem. 9, 275–291 (2017).
Hasan, M. M. et al. HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics 36, 3350–3356 (2020).
Gautam, A. et al. Hemolytik: a database of experimentally determined hemolytic and non-hemolytic peptides. Nucleic Acids Res. 42, D444–D449 (2014).
Zakharova, E., Orsi, M., Capecchi, A. & Reymond, J. Machine learning guided discovery of non‐hemolytic membrane disruptive anticancer peptides. ChemMedChem 17, e202200291 (2022).
Timmons, P. B. & Hewage, C. M. HAPPENN is a novel tool for hemolytic activity prediction for therapeutic peptides which employs neural networks. Sci. Rep. 10, 10869 (2020).
Capecchi, A. et al. Machine learning designs non-hemolytic antimicrobial peptides. Chem. Sci. 12, 9221–9232 (2021).
Salem, M., Keshavarzi Arshadi, A. & Yuan, J. S. AMPDeep: hemolytic activity prediction of antimicrobial peptides using transfer learning. BMC Bioinform. 23, 389 (2022).
Gupta, S. et al. In silico approach for predicting toxicity of peptides and proteins. PLoS ONE 8, e73957 (2013).
Sharma, N., Naorem, L. D., Jain, S. & Raghava, G. P. S. ToxinPred2: an improved method for predicting toxicity of proteins. Brief. Bioinform. 23, bbac174 (2022).
Naamati, G., Askenazi, M. & Linial, M. ClanTox: a classifier of short animal toxins. Nucleic Acids Res. 37, W363–W368 (2009).
Wei, L., Ye, X., Sakurai, T., Mu, Z. & Wei, L. ToxIBTL: prediction of peptide toxicity based on information bottleneck and transfer learning. Bioinformatics 38, 1514–1524 (2022).
Wei, L., Ye, X., Xue, Y., Sakurai, T. & Wei, L. ATSE: a peptide toxicity predictor by exploiting structural and evolutionary information based on graph neural network and attention mechanism. Brief. Bioinform. 22, bbab041 (2021).
Zhang, J., Zhang, Z., Pu, L., Tang, J. & Guo, F. AIEpred: an ensemble predictive model of classifier chain to identify anti-inflammatory peptides. IEEE/ACM Trans. Comput. Biol. Bioinform. 18, 1831–1840 (2021).
Khatun, M. S., Hasan, M. M. & Kurata, H. PreAIP: computational prediction of anti-inflammatory peptides by integrating multiple complementary features. Front. Genet. 10, 219 (2019).
Manavalan, B., Shin, T. H., Kim, M. O. & Lee, G. AIPpred: sequence-based prediction of anti-inflammatory peptides using random forest. Front. Pharmacol. 9, 276 (2018).
Gupta, S., Sharma, A. K., Shastri, V., Madhu, M. K. & Sharma, V. K. Prediction of anti-inflammatory proteins/peptides: an insilico approach. J. Transl. Med. 15, 7 (2017).
Gupta, S., Madhu, M. K., Sharma, A. K. & Sharma, V. K. ProInflam: a webserver for the prediction of proinflammatory antigenicity of peptides and proteins. J. Transl. Med. 14, 178 (2016).
Manavalan, B., Shin, T. H., Kim, M. O. & Lee, G. PIP-EL: a new ensemble learning method for improved proinflammatory peptide predictions. Front. Immunol. 9, 1783 (2018).
Khatun, M. S., Hasan, M. M., Shoombuatong, W. & Kurata, H. ProIn-Fuse: improved and robust prediction of proinflammatory peptides by fusing of multiple feature representations. J. Comput. Aided Mol. Des. 34, 1229–1236 (2020).
Boeckmann, B. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370 (2003).
Sharma, A., Singla, D., Rashid, M. & Raghava, G. P. S. Designing of peptides with desired half-life in intestine-like environment. BMC Bioinform. 15, 282 (2014).
Yin, S., Ding, F. & Dokholyan, N. V. Eris: an automated estimator of protein stability. Nat. Methods 4, 466–467 (2007).
Persikov, A. V., Ramshaw, J. A. M. & Brodsky, B. Prediction of collagen stability from amino acid sequence. J. Biol. Chem. 280, 19343–19349 (2005).
Wang, F. et al. Advancing oral delivery of biologics: machine learning predicts peptide stability in the gastrointestinal tract. Int. J. Pharm. 634, 122643 (2023).
Mathur, D., Singh, S., Mehta, A., Agrawal, P. & Raghava, G. P. S. In silico approaches for predicting the half-life of natural and modified peptides in blood. PLoS ONE 13, e0196829 (2018).
Cardoso, M. H. et al. Non-lytic antibacterial peptides that translocate through bacterial membranes to act on intracellular targets. Int. J. Mol. Sci. 20, 4877 (2019).
Ho, Y.-H., Shah, P., Chen, Y.-W. & Chen, C.-S. Systematic analysis of intracellular-targeting antimicrobial peptides, bactenecin 7, hybrid of pleurocidin and dermaseptin, proline–arginine-rich peptide, and lactoferricin b, by using Escherichia coli proteome microarrays. Mol. Cell. Proteom. 15, 1837–1847 (2016).
Schissel, C. K. et al. Deep learning to design nuclear-targeting abiotic miniproteins. Nat. Chem. 13, 992–1000 (2021).
Fu, X., Cai, L., Zeng, X. & Zou, Q. StackCPPred: a stacking and pairwise energy content-based prediction of cell-penetrating peptides and their uptake efficiency. Bioinformatics 36, 3028–3034 (2020).
Nasiri, F., Atanaki, F. F., Behrouzi, S., Kavousi, K. & Bagheri, M. CpACpP: in silico cell-penetrating anticancer peptide prediction using a novel bioinformatics framework. ACS Omega 6, 19846–19859 (2021).
Wolfe, J. M. et al. Machine learning to predict cell-penetrating peptides for antisense delivery. ACS Cent. Sci. 4, 512–520 (2018).
Kumar, V. et al. Prediction of cell-penetrating potential of modified peptides containing natural and chemically modified residues. Front. Microbiol. 9, 725 (2018).
Manavalan, B., Subramaniyam, S., Shin, T. H., Kim, M. O. & Lee, G. Machine-learning-based prediction of cell-penetrating peptides and their uptake efficiency with improved accuracy. J. Proteome Res. 17, 2715–2726 (2018).
Sanders, W. S., Johnston, C. I., Bridges, S. M., Burgess, S. C. & Willeford, K. O. Prediction of cell penetrating peptides by support vector machines. PLoS Comput. Biol. 7, e1002101 (2011).
Qiang, X. et al. CPPred-FL: a sequence-based predictor for large-scale identification of cell-penetrating peptides by feature representation learning. Brief. Bioinform 21, 11–23 (2018).
Lei, Y. et al. A deep-learning framework for multi-level peptide–protein interaction prediction. Nat. Commun. 12, 5465 (2021).
Cunningham, J. M., Koytiger, G., Sorger, P. K. & AlQuraishi, M. Biophysical prediction of protein–peptide interactions and signaling networks using machine learning. Nat. Methods 17, 175–183 (2020).
Li, Z., Miao, Q., Yan, F., Meng, Y. & Zhou, P. Machine learning in quantitative protein–peptide affinity prediction: implications for therapeutic peptide design. Curr. Drug. Metab. 20, 170–176 (2019).
Trisciuzzi, D., Siragusa, L., Baroni, M., Cruciani, G. & Nicolotti, O. An integrated machine learning model to spot peptide binding pockets in 3D protein screening. J. Chem. Inf. Model. 62, 6812–6824 (2022).
Wang, R., Jin, J., Zou, Q., Nakai, K. & Wei, L. Predicting protein–peptide binding residues via interpretable deep learning. Bioinformatics 38, 3351–3360 (2022).
Müller, R., Kornblith, S. & Hinton, G. When does label smoothing help? In Proc. 33rd International Conference on Neural Information Processing Systems (eds Wallach, H. M. et al.) 4694–4703 (ACM, 2019).
Imani, E. & White, M. Improving regression performance with distributional losses. In Proc. 35th International Conference on Machine Learning, Vol. 80 (eds Dy, J. G. & Krause, A.) 2162–2171 (PMLR, 2018).
Bekker, J. & Davis, J. Learning from positive and unlabeled data: a survey. Mach. Learn. 109, 719–760 (2020).
Yoshida, M. et al. Using evolutionary algorithms and machine learning to explore sequence space for the discovery of antimicrobial peptides. Chem 4, 533–543 (2018).
Boone, K., Wisdom, C., Camarda, K., Spencer, P. & Tamerler, C. Combining genetic algorithm with machine learning strategies for designing potent antimicrobial peptides. BMC Bioinforma. 22, 239 (2021).
Rezende, D. J., Mohamed, S. & Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. In Proc. 31st International Conference on Machine Learning, Vol. 32 (eds Xing, E. P. & Jebara, T.) 1278–1286 (PMLR, 2014).
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In Proc. 2nd International Conference on Learning Representations, ICLR 2014, Conference Track (eds Bengio, Y. & LeCun, Y.) (OpenReview.net, 2014).
Rezende, D. & Mohamed, S. Variational inference with normalizing flows. In Proc. 32nd International Conference on Machine Learning, Vol. 37 (eds Bach, F. & Blei, D.) 1530–1538 (PMLR, 2015).
Goodfellow, I. et al. Generative adversarial networks. Commun. ACM 63, 139–144 (2020).
Song, Y. et al. Score-based generative modeling through stochastic differential equations. In 9th International Conference on Learning Representations, ICLR 2021 (OpenReview.net, 2021).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Proc. 34th Conference on Neural Information Processing Systems, Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) (NeurIPS, 2020).
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proc. 32nd International Conference on Machine Learning. Vol. 37 (eds Bach, F. & Blei, D.) 2256–2265 (PMLR, 2015).
Müller, A. T., Hiss, J. A. & Schneider, G. Recurrent neural network model for constructive peptide design. J. Chem. Inf. Model. 58, 472–479 (2018).
Nagarajan, D. et al. Computational antimicrobial peptide design and evaluation against multidrug-resistant clinical isolates of bacteria. J. Biol. Chem. 293, 3492–3509 (2018).
Wang, C., Garlick, S. & Zloh, M. Deep learning for novel antimicrobial peptide design. Biomolecules 11, 471 (2021).
Dean, S. N. & Walper, S. A. Variational autoencoder for generation of antimicrobial peptides. ACS Omega 5, 20746–20754 (2020).
Dean, S. N., Alvarez, J. A. E., Zabetakis, D., Walper, S. A. & Malanoski, A. P. PepVAE: variational autoencoder framework for antimicrobial peptide generation and activity prediction. Front. Microbiol. https://doi.org/10.3389/fmicb.2021.725727 (2021).
UniProt Consrtioum. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47, D506–D515 (2019).
Arjovsky, M. & Bottou, L. Towards principled methods for training generative adversarial networks. In Proc. 5th International Conference on Learning Representations, ICLR 2017, Conference Track (OpenReview.net, 2017).
Tucs, A. et al. Generating ampicillin-level antimicrobial peptides with activity-aware generative adversarial networks. ACS Omega 5, 22847–22851 (2020).
Van Oort, C. M., Ferrell, J. B., Remington, J. M., Wshah, S. & Li, J. AMPGAN v2: machine learning-guided design of antimicrobial peptides. J. Chem. Inf. Model. 61, 2198–2207 (2021).
Cao, Q. et al. Designing antimicrobial peptides using deep learning and molecular dynamic simulations. Brief. Bioinform 24, bbad058 (2023).
Ferrell, J. B. et al. A generative approach toward precision antimicrobial peptide design. Preprint at bioRxiv https://doi.org/10.1101/2020.10.02.324087 (2021).
Shi, C. et al. GraphAF: a flow-based autoregressive model for molecular graph generation. In Proc. 8th International Conference on Learning Representations, ICLR 2020 (OpenReview.net, 2020).
Anand, N. & Achim, T. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. Preprint at https://doi.org/10.48550/arXiv.2205.15019 (2022).
Coin, I., Beyermann, M. & Bienert, M. Solid-phase peptide synthesis: from standard procedures to the synthesis of difficult sequences. Nat. Protoc. 2, 3247–3256 (2007).
Mueller, L. K., Baumruck, A. C., Zhdanova, H. & Tietze, A. A. Challenges and perspectives in chemical synthesis of highly hydrophobic peptides. Front Bioeng. Biotechnol. 8, 162 (2020).
Isidro-Llobet, A. et al. Sustainability challenges in peptide synthesis and purification: from r&d to production. J. Org. Chem. 84, 4615–4628 (2019).
Conchillo-Solé, O. et al. AGGRESCAN: a server for the prediction and evaluation of ‘hot spots’ of aggregation in polypeptides. BMC Bioinform. 8, 65 (2007).
Fernandez-Escamilla, A.-M., Rousseau, F., Schymkowitz, J. & Serrano, L. Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat. Biotechnol. 22, 1302–1306 (2004).
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B. & Hochreiter, S. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Proc. 31st International Conference on Neural Information Processing Systems, 6629–6640 (Curran Assoc., 2017).
Preuer, K., Renz, P., Unterthiner, T., Hochreiter, S. & Klambauer, G. Fréchet chemnet distance: a metric for generative models for molecules in drug discovery. J. Chem. Inf. Model. 58, 1736–1741 (2018).
Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why should I trust you?’ explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144 (ACM, 2016).
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In Proc. 34th International Conference on Machine Learning, Vol. 70, 3145–3153 (JMLR.org, 2017).
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Proc. 34th International Conference on Machine Learning, Vol. 70, 3319–3328 (JMLR.org, 2017).
Jiménez-Luna, J., Grisoni, F. & Schneider, G. Drug discovery with explainable artificial intelligence. Nat. Mach. Intell. 2, 573–584 (2020).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proc. 31st International Conference on Neural Information Processing Systems, 4768–4777 (Curran Assoc., 2017).
Yuan, H., Yu, H., Gui, S. & Ji, S. Explainability in graph neural networks: a taxonomic survey. IEEE Trans. Pattern Anal. Mach. Intell. 45, 5782–5799 (2023).
Farahani, A., Voghoei, S., Rasheed, K. & Arabnia, H. R. In Advances in Data Science and Information Engineering. Transactions on Computational Science and Computational Intelligence (eds Stahlbock, R. et al.) https://doi.org/10.1007/978-3-030-71704-9_65 (Springer, 2021).
Reffuveille, F., de la Fuente-Núñez, C., Mansour, S. & Hancock, R. E. W. A broad-spectrum antibiofilm peptide enhances antibiotic action against bacterial biofilms. Antimicrob. Agents Chemother. 58, 5363–5371 (2014).
Karniadakis, G. E. et al. Physics-informed machine learning. Nat. Rev. Phys. 3, 422–440 (2021).
Doerr, S. et al. TorchMD: a deep learning framework for molecular simulations. J. Chem. Theory Comput. 17, 2355–2363 (2021).
Husic, B. E. et al. Coarse graining molecular dynamics with graph neural networks. J. Chem. Phys. 153, 194101 (2020).
Omar, S. I., Keasar, C., Ben-Sasson, A. J. & Haber, E. Protein design using physics informed neural networks. Biomolecules 13, 457 (2023).
Ren, P. et al. A comprehensive survey of neural architecture search. ACM Comput. Surv. 54, 1–34 (2022).
He, X., Zhao, K. & Chu, X. AutoML: a survey of the state-of-the-art. Knowl. Based Syst. 212, 106622 (2021).
Valeri, J. A. et al. BioAutoMATED: an end-to-end automated machine learning tool for explanation and design of biological sequences. Cell Syst. 14, 525–542 (2023).
Ferrazzano, L. et al. Sustainability in peptide chemistry: current synthesis and purification technologies and future challenges. Green. Chem. 24, 975–1020 (2022).
Acknowledgements
C.d.l.F.-N. holds a Presidential Professorship at the University of Pennsylvania, is a recipient of the Langer Prize by the AIChE Foundation, and acknowledges funding from the IADR Innovation in Oral Care Award, the Procter & Gamble Company, United Therapeutics, a BBRF Young Investigator Grant, the Nemirovsky Prize, Penn Health-Tech Accelerator Award, the Dean’s Innovation Fund from the Perelman School of Medicine at the University of Pennsylvania, the National Institute of General Medical Sciences of the US National Institutes of Health (NIH) under award number R35GM138201, and the Defense Threat Reduction Agency (DTRA; HDTRA11810041, HDTRA1-21-1-0014, and HDTRA1-23-1-0001). We thank K. Pepper for editing the manuscript and de la Fuente Lab members for discussions. F. Wong was supported by the National Institute of Allergy and Infectious Diseases of the NIH under award no. K25AI168451. J.J.C. was supported by the Defense Threat Reduction Agency (grant no. HDTRA12210032), the NIH (grant no. R01-AI146194), and the Broad Institute of MIT and Harvard. This work is part of the Antibiotics-AI Project, which is directed by J.J.C. and supported by the Audacious Project, Flu Lab, LLC, the Sea Grape Foundation, Rosamund Zander and Hansjorg Wyss for the Wyss Foundation, and an anonymous donor.
Author information
Authors and Affiliations
Contributions
F. Wan and F. Wong researched and wrote the first manuscript draft. J.J.C. and C.d.l.F.-N. supervised the work. All authors contributed to writing and editing the manuscript.
Corresponding authors
Ethics declarations
Competing interests
J.J.C. is scientific co-founder and scientific advisory board chair of EnBiotix, an antibiotic drug discovery company, and Phare Bio, a non-profit venture focused on antibiotic drug development. C.d.l.F.-N. provides consulting services to Invaio Sciences and is a member of the Scientific Advisory Boards of Nowture S.L. and Phare Bio. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Reviews Bioengineering thanks Carlos Brizuela, Jun Wang and Monique van Hoek for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wan, F., Wong, F., Collins, J.J. et al. Machine learning for antimicrobial peptide identification and design. Nat Rev Bioeng 2, 392–407 (2024). https://doi.org/10.1038/s44222-024-00152-x
Published:
Issue Date:
DOI: https://doi.org/10.1038/s44222-024-00152-x
This article is cited by
-
Deep-learning-enabled antibiotic discovery through molecular de-extinction
Nature Biomedical Engineering (2024)
-
Designing nanotheranostics with machine learning
Nature Nanotechnology (2024)
-
Host Defense Peptides: Exploiting an Innate Immune Component Against Infectious Diseases and Cancer
International Journal of Peptide Research and Therapeutics (2024)
-
From Data to Decisions: Leveraging Artificial Intelligence and Machine Learning in Combating Antimicrobial Resistance – a Comprehensive Review
Journal of Medical Systems (2024)