Volltext-Downloads (blau) und Frontdoor-Views (grau)

Automatic Genre Classification in Web Pages Applied to Web Comments

  • Automatic Web comment detection could significantly facilitate information retrieval systems, e.g., a focused Web crawler. In this paper, we propose a text genre classifier for Web text segments as intermediate step for Web comment detection in Web pages. Different feature types and classifiers are analyzed for this purpose. We compare the two-level approach to state-of-the-art techniques operating on the whole Web page text and show that accuracy can be improved significantly. Finally, we illustrate the applicability for information retrieval systems by evaluating our approach on Web pages achieved by a Web crawler.

Download full text files

  • Main Conference Proceedings of the 12th Konvens 2014

Export metadata

Additional Services

Share in Twitter    Search Google Scholar    frontdoor_oas
Metadaten
Author:Melanie Neunerdt, Michael Reyer, Rudolf Mathar
URN:https://nbn-resolving.org/urn:nbn:de:gbv:hil2-opus-2758
Parent Title (English):Proceedings of the 12th edition of the KONVENS conference
Document Type:Conference Proceeding
Language:English
Date of Publication (online):2014/10/23
Release Date:2014/10/23
Tag:Informationsextraktion
Information Extraction; Information Retrieval
GND Keyword:Information Retrieval
First Page:145
Last Page:151
PPN:Link zum Katalog
Contributor:Faaß, Gertrud
Institutes:Fachbereich III / Informationswissenschaft und Sprachtechnologie
DDC classes:400 Sprache / 400 Sprache, Linguistik
Collections:KONVENS 2014 / Proceedings of the 12th KONVENS 2014
Licence (German):License LogoCreative Commons - Namensnennung 3.0