Volltext-Downloads (blau) und Frontdoor-Views (grau)

An Evaluation of Text Retrieval Methods for Similarity Search of multi-dimensional NMR-Spectra

  • Searching and mining nuclear magnetic resonance (NMR)-spectra of naturally occurring substances is an important task to investigate new potentially useful chemical compounds. Multidimensional NMR-spectra are relational objects like documents, but consists of continuous multi-dimensional points called peaks instead of words. We develop several mappings from continuous NMR-spectra to discrete text-like data. With the help of those mappings any text retrieval method can be applied. We evaluate the performance of two retrieval methods, namely the standard vector space model and probabilistic latent semantic indexing (PLSI). PLSI learns hidden topics in the data, which is in case of 2D-NMR data interesting in its owns rights. Additionally, we develop and evaluate a simple direct similarity function, which can detect duplicates of NMR-spectra. Our experiments show that the vector space model as well as PLSI, which are both designed for text data created by humans, can effectively handle the mapped NMRdata originating from natural products. Additionally, PLSI is able to find meaningful "topics" in the NMR-data.

Download full text files

Export metadata

Additional Services

Share in Twitter    Search Google Scholar    frontdoor_oas
Metadaten
Author:Alexander Hinneburg, Andrea Porzel, Karina Wolfram
URN:https://nbn-resolving.org/urn:nbn:de:gbv:hil2-opus-367
Document Type:Conference Proceeding
Language:English
Date of Publication (online):2011/04/21
Contributing Corporation:Institute of Computer Science, Martin-Luther-University of Halle-Wittenberg
Release Date:2011/04/21
Source:LWA 2006: Lernen - Wissensentdeckung - Adaptivität, Hildesheim, 9. - 11. Oktober 2006
PPN:Link zum Katalog
Contributor:Althoff, Klaus-Dieter
Institutes:Fachbereich IV / Informatik
DDC classes:000 Allgemeines, Informatik, Informationswissenschaft / 000 Allgemeines, Wissenschaft / 004 Informatik
Licence (German):License LogoDeutsches Urheberrecht