An Evaluation of Text Retrieval Methods for Similarity Search of Multi-dimensional NMR-Spectra

Alexander Hinneburg¹,
Andrea Porzel² &
Karina Wolfram²

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4414))

Included in the following conference series:

International Conference on Bioinformatics Research and Development

1175 Accesses
1 Citations

Abstract

Searching and mining nuclear magnetic resonance (NMR)-spectra of naturally occurring substances is an important task to investigate new potentially useful chemical compounds. Multi-dimensional NMR-spectra are relational objects like documents, but consists of continuous multi-dimensional points called peaks instead of words. We develop several mappings from continuous NMR-spectra to discrete text-like data. With the help of those mappings any text retrieval method can be applied. We evaluate the performance of two retrieval methods, namely the standard vector space model and probabilistic latent semantic indexing (PLSI). PLSI learns hidden topics in the data, which is in case of 2D-NMR data interesting in its owns rights. Additionally, we develop and evaluate a simple direct similarity function, which can detect duplicates of NMR-spectra. Our experiments show that the vector space model as well as PLSI, which are both designed for text data created by humans, can effectively handle the mapped NMR-data originating from natural products. Additionally, PLSI is able to find meaningful ”topics” in the NMR-data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Approximate matching-based unsupervised document indexing approach: application to biomedical domain

Article 07 May 2020

DL-VSM based document indexing approach for information retrieval

Article 13 January 2020

Q3-D3-LSA: D3.js and Generalized Vector Space Models for Statistical Computing

References

Tsipouras, A., Ondeyka, J., Dufresne, C., et al.: Using similarity searches over databases of estimated c-13 nmr spectra for structure identification of natural products. Analytica Chimica Acta 316, 161–171 (1995)
Article Google Scholar
Barros, A.S., Rutledge, D.N.: Segmented principal component transform-principal component analysis. Chemometrics & Intelligent Laboratory Systems 78, 125–137 (2005)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Article MATH Google Scholar
Cai, L., Hofmann, T.: Text categorization by boosting automatically extracted concepts. In: SIGIR ’03 (2003)
Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR ’99 (1999)
Google Scholar
Krishnan, P., Kruger, N.J., Ratcliffe, R.G.: Metabolite fingerprinting and profiling in plants using nmr. Journal of Experimental Botany 56, 255–265 (2005)
Article Google Scholar
Farkas, M., Bendl, J., Welti, D.H., et al.: Similarity search for a h-1 nmr spectroscopic data base. Analytica Chimica Acta 206, 173–187 (1988)
Article Google Scholar
Mei, Q., Zhai, C.: Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: KDD ’05 (2005)
Google Scholar
Popescul, A., Ungar, L.H., Pennock, D.M., Lawrence, S.: Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. In: UAI’2001 (2001)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Steinbeck, C., Krause, S., Kuhn, S.: Nmrshiftdb-constructing a free chemical information system with open-source components. J. chem. inf. & comp. sci. 43, 1733–1739 (2003)
Article Google Scholar
Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: KDD ’04 (2004)
Google Scholar
Wolfram, K., Porzel, A., Hinneburg, A.: Similarity search for multi-dimensional nmr-spectra of natural products. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, Springer, Heidelberg (2006)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Martin-Luther-University of Halle-Wittenberg, Germany
Alexander Hinneburg
Leibniz Institute of Plant Biochemistry (IPB), Germany
Andrea Porzel & Karina Wolfram

Authors

Alexander Hinneburg
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Porzel
View author publications
You can also search for this author in PubMed Google Scholar
Karina Wolfram
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Sepp Hochreiter Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hinneburg, A., Porzel, A., Wolfram, K. (2007). An Evaluation of Text Retrieval Methods for Similarity Search of Multi-dimensional NMR-Spectra. In: Hochreiter, S., Wagner, R. (eds) Bioinformatics Research and Development. BIRD 2007. Lecture Notes in Computer Science(), vol 4414. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71233-6_33

Download citation

DOI: https://doi.org/10.1007/978-3-540-71233-6_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71232-9
Online ISBN: 978-3-540-71233-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Evaluation of Text Retrieval Methods for Similarity Search of Multi-dimensional NMR-Spectra

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Approximate matching-based unsupervised document indexing approach: application to biomedical domain

DL-VSM based document indexing approach for information retrieval

Q3-D3-LSA: D3.js and Generalized Vector Space Models for Statistical Computing

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Evaluation of Text Retrieval Methods for Similarity Search of Multi-dimensional NMR-Spectra

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Approximate matching-based unsupervised document indexing approach: application to biomedical domain

DL-VSM based document indexing approach for information retrieval

Q3-D3-LSA: D3.js and Generalized Vector Space Models for Statistical Computing

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation