Improving the Performance of Standard Part-of-Speech Taggers for Computer-Mediated Communication

  • We assess the performance of off-the-shelve POS taggers when applied to two types of Internet texts in German, and investigate easy-to-implement methods to improve tagger performance. Our main findings are that extending a standard training set with small amounts of manually annotated data for Internet texts leads to a substantial improvement of tagger performance, which can be further improved by using a previously proposed method to automatically acquire training data. As a prerequisite for the evaluation, we create a manually annotated corpus of Internet forum and chat texts.

Download full text files

Export metadata

  • Export Bibtex
  • Export RIS

Additional Services

Share in Twitter Search Google Scholar
Author:Andrea Horbach, Diana Steffen, Stefan Thater, Manfred Pinkal
Document Type:Conference Proceeding
Date of Publication (online):2014/10/23
Release Date:2014/10/23
Tag:Annotation von Wortarten
morphology; phonetics; phonology; segmenation; tagging
GND Keyword:Morphologie; Phonetik; Phonologie; Segmentierung
Source:Proceedings of the 12th edition of the KONVENS conference Vol. 1. - Hildesheim
PPN:Link zum Katalog
Institutes:Fachbereich III / Informationswissenschaft und Sprachtechnologie
Collections:KONVENS 2014 / Proceedings of the 12th KONVENS 2014
Access Rights:Frei zugänglich
Licence (German):License LogoCreative Commons - Namensnennung

$Rev: 13581 $