The Open Applied Informatics Journal

2007, 1 : 28-32
Published online 2007 November 8. DOI: 10.2174/1874136300701010028
Publisher ID: TOAINFOJ-1-28

Nomen Est Omen: Quantitative Prediction of Molecular Properties Directly from IUPAC Names

Michael Thormann , David Vidal , Michael Almstetter and Miquel Pons

Origenis GmbH, Am Klopferspitz 19a, 82152 Martinsried, Germany.

ABSTRACT

The International Union of Pure and Applied Chemistry (IUPAC) was formed in 1919 by chemists from industry and academia [1]. Over nearly nine decades the Union has succeeded in fostering worldwide communications in the chemical sciences and in uniting chemistry - academic, industrial and government - in a common language. As one of the results of the Union, IUPAC names nowadays serve as a commonly agreed text representation of chemical structures in patents, publications and databases. In public databases of chemical compounds, like PubChem with more than 12 million entries, chemical structures are identified by default using their IUPAC names [2]. We report a very fast linguistic method to extract the implicit information contained in IUPAC names to statistically predict pharmacologically relevant properties. This provides an efficient annotation tool that can be used to assess the likelihood of a given compound as a drug candidate and renders the entire chemical literature a searchable database for virtual screening experiments and data mining.