Symmetry-Corrected RMSD Calculations in Python
Symmetry-Corrected RMSD Calculations in Python
Abstract
Root mean square displacement (RMSD) calculations play a fundamental role in the comparison of different con-
formers of the same ligand. This is particularly important in the evaluation of protein-ligand docking, where different
ligand poses are generated by docking software and their quality is usually assessed by RMSD calculations. Unfor-
tunately, many RMSD calculation tools do not take into account the symmetry of the molecule, remain difficult to
integrate flawlessly in cheminformatics and machine learning pipelines—which are often written in Python—or
are shipped within large code bases. Here we present a new open-source RMSD calculation tool written in Python,
designed to be extremely lightweight and easy to integrate into existing software.
Keywords: RMSD, Symmetry, Software, Python
© The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material
in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material
is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativeco
mmons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/
zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Meli and Biggin J Cheminform (2020) 12:49 Page 2 of 7
Standard RMSD
Let us call A and B the N × 3 matrices of atomic coordi-
nates of two conformers A and B of the same molecule.
The standard RMSD is simply defined as
1
N 2
RMSDstandard = (Aij − Bij )2 .
N
i=1 j=0
Speed
By design, spyrmsd is written fully in Python and lever-
ages fast libraries that are easy to install (using the pip
or conda package managers). This means that there is
some overhead compared to the most efficient imple-
mentations in other compiled libraries.
Figure 6 shows a speed comparison between spyrmsd
and obrms for 100 randomly selected systems. Error
bars are obtained by repeating the measurements 25
times. spyrmsd is usually comparable or an order of
magnitude slower than obrms. This is expected since
Python comes with some overhead compared to com-
piled code. The difference between the graph-tool
and networkx backends is more difficult to elucidate:
graph-tool seems to be generally slightly faster, but
networkx has clearly more variation from system to
Fig. 4 Comparison between obrms and spyrmsd for minimised system (see Fig. 7).
symmetry-corrected RMSD calculations. The mean squared error is Benchmarking was performed on an Apple MacBook
3.28 × 10−12 while the Pearson’s correlation coefficient is 1.00. The Pro (macOS 10.15) with a 2.6 GHz 6-Core Intel Core i7
maximum absolute error is 5.00 × 10−6 amongst all 40439 system
processor and 32 GB of 2400 MHz DDR4 memory (Addi-
tested
tional file 2).
Discussion
Despite being somewhat slower than other state-of-the-
art tools for RMSD calculation, we believe that spyrmsd
could be extremely useful to the community: it is a light-
weight tool with focussed functionality, it is easy to use
and integrate in existing Python codebases and pipelines,
and it is easy to install via popular package managers.
Easy installation
spyrmsd is available on the Python Package Index
(PyPI) [24] and via the conda package manager [25] on
the conda-forge channel [26]. This provides easy cross-
platform installation of spyrmsd and all its dependen-
cies to work as a library (with networkx). On macOS
and Linux, users can get some speed improvement by
installing graph-tool, which is also available via the
conda package manager.
In order to use spyrmsd as a standalone tool, users
will have to install either OpenBabel or RDKit with their
preferred installation method.
Fig. 6 RMSD calculation time (including input) for 100 randomly selected systems. Error bars indicate the standard deviation over 25 repeats.
spyrmsd is comparable or an order of magnitude slower than obrms
Fig. 7 RMSD calculation time (without input) for 100 randomly selected systems. Error bars indicate the standard deviation over 25 repeats.
networkx shows a large variability between systems, while graph-tool is more consistent
numbers are usually readily available in most Python easy-to-read reports. A code coverage of 100% is tar-
libraries dealing with molecular file formats, while the geted, so that all lines of code are executed at least once
adjacency matrix of a molecule is easy to build from bond during tests.
connectivity. The code is compatible with Python 3.6 or above. Static
We believe that the simple API will favour the integration analysis tools are constantly applied to the code in order to
of spyrmsd in many existing libraries, bringing symmetry- catch errors that would be otherwise missed or discovered
corrected RMSD calculations to widely used packages. only during execution. We use mypy to perform static
checks [32] and flake8 to detect style and formatting
issues. Such tools help maintaining correctness and stabil-
Software best practices ity for future developments as well as a clean codebase.
The development of spyrmsd is based on modern soft- Finally, the code is documented using Python doc-
ware engineering best practices. The code is version-con- strings and the documentation is built automatically
trolled using git [27] and it is freely available on GitHub using sphinx [33]. This will likely make it easier to fully
(https://github.com/RMeli/spyrmsd) [28], released under understand the codebase thus facilitating the adoption of
the open-source and permissive MIT license. spyrmsd by other libraries.
The code is extensively tested using pytest [29]. Tests
are run automatically every time a new version of the Conclusion
code is pushed to GitHub thanks to Travis-CI bindings spyrmsd provides robust symmetry-corrected RMSD
for continuous integration [30]. The code coverage of the calculations with a clean and simple API that is easy to
test suite is reported on Codecov [31], which provides integrate in existing Python libraries and pipelines. We
Meli and Biggin J Cheminform (2020) 12:49 Page 7 of 7
believe that such a tool could be useful to the wider com- 3. O’Boyle NM, Vandermeersch T, Flynn CJ, Maguire AR, Hutchison GR (2011)
Confab—systematic generation of diverse low-energy conformers. J
munity of molecular modellers and cheminformaticians. Cheminf 3(1):8. https://doi.org/10.1186/1758-2946-3-8
Future development of the software will focus on 4. van der Walt S, Colbert SC, Varoquaux G (2011) The numpy array: A struc-
improved automatic bond perception (to automatically ture for efficient numerical computation. Comput Sci Eng 13:98
5. Theobald DL (2005) Rapid calculation of rmsds using a quaternion-based
build molecular adjacency matrices) and speed. characteristic polynomial. Acta Cryst A 61:478–480
6. Roman S (2007) Advanced linear algebra. Springer, Berlin
Availability and requirements 7. Quarteroni A, Saleri F (2007) Numerical mathematics. Springer, Berlin
8. Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau
D, Burovski E, Peterson P, Weckesser W, Bright J, van der Walt SJ, Brett M,
• Project name: spyrmsd Wilson J, Jarrod Millman K, Mayorov N, Nelson ARJ, Jones E, Kern R, Larson
• Code: https://github.com/RMeli/pyrmsd E, Carey CJ, Polat I, Feng Y, Moore EW (2019) Scipy 1.0–fundamental
algorithms for scientific computing in python. Nat Methods 17:261–272
• Docs: https://spyrmsd.readthedocs.io/ 9. Kuhn HW (1955) The hungarian method for the assignment problem. Nav
• Operating systems: Linux, macOS, Windows Res Logist Q 2:83–97
• Programming language: Python 10. Munkres J (1957) Algorithms for the assignment and transportation
problems. J Soc Indus Appl Math 5:32–38
• Other requirements: Python 3.6 or higher 11. Ignazio J, Cavalier TM (1994) Linear programming. Prentice-Hall, New York
• License: MIT 12. Allen WJ, Rizzo RC (2014) Implementation of the hungarian algorithm to
account for ligand symmetry and similarity in structure-based design. J
Chem Inf Model 54:518–529
13. Bell EW, Zhang Y (2019) Dockrmsd: an open-source tool for atom
Supplementary information mapping and rmsd calculation of symmetric molecules through graph
Supplementary information accompanies this paper at https://doi. isomorphism. J Cheminf 11:9
org/10.1186/s13321-020-00455-2. 14. Hagberg AA, Schult DA, Swart PJ (2008) Exploring network structure,
dynamics, and function using networkx. In: Proceedings of the 7th
Python in Science Conference. p. 11–5
Additional file 1. Comparison of correctness between spyrmsd and 15. graph-tool: Efficient network analysis. https://graph-tool.skewed.de/
obrms. 16. Cordella LP, Foggia P, Sansone C, Vento M (2004) A (sub)graph isomor-
phism algorithm for matching large graphs. IEEE Trans Pattern Anal Mach
Additional file 2. Speed comparison between spyrmsd (with net-
Intell 26:1367–1372
workx or graph-tool) and obrms.
17. O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison
GR (2011) Open babel: An open chemical toolbox. J Cheminf 33:121
Acknowledgements 18. O’Boyle NM, Morley C, Hutchison GR (2008) Pybel: a python wrapper for
This work was supported by funding from the Biotechnology and Biological the openbabel cheminformatics toolkit. Chem Cent J 2, 5
Sciences Research Council (BBSRC) [BB/MO11224/1] National Productivity 19. Rdkit: Open-source cheminformatics software. http://www.rdkit.org/
Investment Fund (NPIF) [BB/S50760X/1] and Evotec (UK) via the Interdiscipli- 20. Wang R, Fang X, Lu Y, Wang S (2004) The pdbbind database: collection of
nary Biosciences DTP at the University of Oxford. The authors acknowledge binding affinities for protein-ligand complexes with known three-dimen-
fruitful interactions with Dr. Irfan Alibay. sional structures. J Med Chem 47:2977–2980
21. Liu Z, Li Y, Han L, Li J, Liu J, Zhao Z, Nie W, Liu Y, Wang R (2014) Pdb-wide
Authors’ contributions collection of binding data: current status of the pdbbind database. Bioin-
RM wrote spyrmsd. RM and PCB prepared the manuscript. Both authors read formatics 31:405–412
and approved the final manuscript. 22. Koes DR, Baumgartner MP, Camacho CJ (2013) Lessons learned in empiri-
cal scoring with smina from the csar 2011 benchmarking exercise. J
Data availibility statement Chem Inf Model 58:1893–1904
spyrmsd is available for download on PyPI and conda-forge, while the 23. RCSB protein data bank. https://www.rcsb.org/
source code is available on GitHub under the MIT license: https://github.com/ 24. PyPI: Find, install and publish python packages with the python package
RMeli/pyrmsd. The code used to produce the figures is included in the sup- index. https://pypi.org/
plementary information. The results of docking are available on Zenodo (https 25. conda: Package, dependency and environment management for any
://doi.org/10.5281/zenodo.3747315). language. https://conda.io/en/latest/
26. conda-forge: A community-led collection of recipes, build infrastructure
Competing interests and distributions for the conda package manager. https://conda-forge
The authors declare that they have no competing interests. .org/
27. Chacon S, Straub B (2014) Pro git. Apress
Received: 15 April 2020 Accepted: 21 August 2020 28. GitHub. https://github.com/
29. Krekel H, Oliveira B, Pfannschmidt R, Bruynooghe F, Laugher B, Bruhin F
(2014) pytest. https://github.com/pytest-dev/pytest
30. Travis CI. https://travis-ci.org/
31. Codecov. https://codecov.io/
32. mypy: Optional static typing for python. http://mypy-lang.org/
References
33. Sphinx: Python documentation generator. https://www.sphinx-doc.org/
1. Mukherjee S, Balius TE, Rizzo RC (2010) Docking validation resources:
protein family and ligand flexibility experiments. J Chem Inf Model
50:1986–2000 Publisher’s Note
2. Leung S, Bodkin M, von Delft F, Brennan P, Morris G (2019) Sucos is bet- Springer Nature remains neutral with regard to jurisdictional claims in pub-
ter than rmsd for evaluating fragment elaboration and docking poses. lished maps and institutional affiliations.
ChemRxiv. https://doi.org/10.26434/chemrxiv.8100203.v1