Molecular graph convolutions: moving beyond fingerprints

Steven Kearnes ORCID: orcid.org/0000-0003-4579-4388¹,
Kevin McCloskey²,
Marc Berndl²,
Vijay Pande¹ &
…
Patrick Riley²

27k Accesses
49 Altmetric
4 Mentions
Explore all metrics

Abstract

Molecular “fingerprints” encoding structural information are the workhorse of cheminformatics and machine learning in drug discovery applications. However, fingerprint representations necessarily emphasize particular aspects of the molecular structure while ignoring others, rather than allowing the model to make data-driven decisions. We describe molecular graph convolutions, a machine learning architecture for learning from undirected graphs, specifically small molecules. Graph convolutions use a simple encoding of the molecular graph—atoms, bonds, distances, etc.—which allows the model to take greater advantage of information in the graph structure. Although graph convolutions do not outperform all fingerprint-based methods, they (along with other graph-based methods) represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Physicochemical graph neural network for learning protein–ligand interaction fingerprints from sequence data

Article 17 June 2024

GMPP-NN: a deep learning architecture for graph molecular property prediction

Article Open access 26 June 2024

Molecular contrastive learning of representations via graph neural networks

Article 03 March 2022

References

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M et al (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Software. http://tensorflow.org
Ballester PJ, Richards WG (2007) Ultrafast shape recognition to search compound databases for similar molecular shapes. J Comput Chem 28(10):1711–1723
Article CAS Google Scholar
Bruna J, Zaremba W, Szlam A, LeCun Y (2013) Spectral networks and locally connected networks on graphs. arXiv:1312.6203
Carhart RE, Smith DH, Venkataraghavan R (1985) Atom pairs as molecular features in structure-activity studies: definition and applications. J Chem Inf Comput Sci 25(2):64–73
Article CAS Google Scholar
Dahl G (2012) Deep learning how I did it: Merck 1st place interview.http://blog.kaggle.com/2012/11/01/deep-learning-how-i-did-it-merck-1st-place-interview
Dahl GE, Jaitly N, Salakhutdinov R (2014) Multi-task neural networks for QSAR predictions. arXiv:1406.1231
Dieleman S (2015) Classifying plankton with deep neural networks. 17 Mar 2015. http://benanne.github.io/2015/03/17/plankton.html
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159
Google Scholar
Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A, Adams RP (2015) Convolutional networks on graphs for learning molecular fingerprints. In: Advances in neural information processing systems, pp 2224–2232
Gasteiger J, Marsili M (1980) Iterative partial equalization of orbital electronegativity–a rapid access to atomic charges. Tetrahedron 36(22):3219–3228
Article CAS Google Scholar
Hawkins PCD, Skillman AG, Nicholls A (2007) Comparison of shape-matching and docking as virtual screening tools. J Med Chem 50(1):74–82
Article CAS Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167
Jain AN, Nicholls A (2008) Recommendations for evaluation of computational methods. J Comput Aided Mol Des 22(3–4):133–139
Article CAS Google Scholar
Landrum G (2014) RDKit: open-source cheminformatics. http://www.rdkit.org
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article CAS Google Scholar
Lusci A, Pollastri G, Baldi P (2013) Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J Chem Inf Model 53(7):1563–1575
Article CAS Google Scholar
Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V (2015) Deep neural nets as a method for quantitative structure–activity relationships. J Chem Inf Model 55(2):263–274
Article CAS Google Scholar
Masci J, Boscaini D, Bronstein M, Vandergheynst P (2015) Geodesic convolutional neural networks on riemannian manifolds. In: Proceedings of the IEEE international conference on computer vision workshops, pp 37–45
Mayr A, Klambauer G, Unterthiner T, Hochreiter S (2015) Deeptox: toxicity prediction using deep learning. Front Environ Sci 3:80
Google Scholar
McGill R, Tukey JW, Larsen WA (1978) Variations of box plots. Am Stat 32(1):12–16
Google Scholar
Merkwirth C, Lengauer T (2005) Automatic generation of complementary descriptors with molecular graph networks. J Chem Inf Model 45(5):1159–1168
Article CAS Google Scholar
Micheli A (2009) Neural network for graphs: a contextual constructive approach. IEEE Trans Neural Netw 20(3):498–511
Article Google Scholar
Muchmore SW, Souers AJ, Akritopoulou-Zanze I (2006) The use of three-dimensional shape and electrostatic similarity searching in the identification of a melanin-concentrating hormone receptor 1 antagonist. Chem Biol Drug Des 67(2):174–176
Article CAS Google Scholar
Mysinger MM, Carchia M, Irwin JJ, Shoichet BK (2012) Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J Med Chem 55(14):6582–6594
Article CAS Google Scholar
Nicholls A, McGaughey GB, Sheridan RP, Good AC, Warren G, Mathieu M, Muchmore SW, Brown SP, Grant JA, Haigh JA et al (2010) Molecular shape and medicinal chemistry: a perspective. J Med Chem 53(10):3862–3886
Article CAS Google Scholar
OpenEye GraphSim Toolkit. OpenEye Scientific Software, Santa Fe, NM. http://www.eyesopen.com
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Google Scholar
Petrone PM, Simms B, Nigsch F, Lounkine E, Kutchukian P, Cornett A, Deng Z, Davies JW, Jenkins JL, Glick M (2012) Rethinking molecular similarity: comparing compounds on the basis of biological activity. ACS Chem Biol 7(8):1399–1409
Article CAS Google Scholar
Ramsundar B, Kearnes S, Riley P, Webster D, Konerding D, Pande V (2015) Massively multitask networks for drug discovery. arXiv:1502.02072
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754
Article CAS Google Scholar
Rohrer SG, Baumann K (2009) Maximum unbiased validation (MUV) data sets for virtual screening based on pubchem bioactivity data. J Chem Inf Model 49(2):169–184
Article CAS Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536
Article Google Scholar
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2009) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80
Article Google Scholar
Seabold S, Perktold J (2010) Statsmodels: econometric and statistical modeling with python. In: Proceedings of the 9th Python in science conference, pp 57–61
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Google Scholar
Swamidass JS, Azencott C-A, Lin T-W, Gramajo H, Tsai S-C, Baldi P (2009) Influence relevance voting: an accurate and interpretable virtual high throughput screening method. J Chem Inf Model 49(4):756–766
Article CAS Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: CVPR 2015. arxiv.org/abs/1409.4842
Todeschini R, Consonni V (2009) Molecular descriptors for chemoinformatics, volume 41 (2 volume set), vol 41. Wiley, New York
Book Google Scholar
Truchon J-F, Bayly CI (2007) Evaluating virtual screening methods: good and bad metrics metrics for the âĂIJearly recognitionâĂİ problem. J Chem Inf Model 47(2):488–508
Article CAS Google Scholar
Wallach I, Dzamba M, Heifets A (2015) AtomNet: a deep convolutional neural network for bioactivity prediction in structure-based drug discovery. arXiv:1510.02855
Yanli W, Xiao J, Suzek TO, Zhang J, Wang J, Zhou Z, Han L, Karapetyan K, Dracheva S, Shoemaker BA et al (2012) PubChem’s BioAssay database. Nucl Acids Res 40(D1):D400–D412
Article Google Scholar
Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353
Article Google Scholar

Download references

Acknowledgments

We thank Bharath Ramsundar, Brian Goldman, and Robert McGibbon for helpful discussion. We also acknowledge Manjunath Kudlur, Derek Murray, and Rajat Monga for assistance with TensorFlow. S.K. was supported by internships at Google Inc. and Vertex Pharmaceuticals Inc. Additionally, we acknowledge use of the Stanford BioX3 cluster supported by NIH S10 Shared Instrumentation Grant 1S10RR02664701. S.K. and V.P. also acknowledge support from from NIH 5U19AI109662-02.

Author information

Authors and Affiliations

Stanford University, 318 Campus Dr. S296, Stanford, CA, 94305, USA
Steven Kearnes & Vijay Pande
Google Inc., 1600 Amphitheatre Pkwy, Mountain View, CA, 94043, USA
Kevin McCloskey, Marc Berndl & Patrick Riley

Authors

Steven Kearnes
View author publications
You can also search for this author in PubMed Google Scholar
Kevin McCloskey
View author publications
You can also search for this author in PubMed Google Scholar
Marc Berndl
View author publications
You can also search for this author in PubMed Google Scholar
Vijay Pande
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Riley
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steven Kearnes.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1207 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kearnes, S., McCloskey, K., Berndl, M. et al. Molecular graph convolutions: moving beyond fingerprints. J Comput Aided Mol Des 30, 595–608 (2016). https://doi.org/10.1007/s10822-016-9938-8

Download citation

Received: 04 March 2016
Accepted: 11 August 2016
Published: 24 August 2016
Issue Date: August 2016
DOI: https://doi.org/10.1007/s10822-016-9938-8

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Physicochemical graph neural network for learning protein–ligand interaction fingerprints from sequence data

GMPP-NN: a deep learning architecture for graph molecular property prediction

Molecular contrastive learning of representations via graph neural networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 1207 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Molecular graph convolutions: moving beyond fingerprints

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Physicochemical graph neural network for learning protein–ligand interaction fingerprints from sequence data

GMPP-NN: a deep learning architecture for graph molecular property prediction

Molecular contrastive learning of representations via graph neural networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 1207 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation