Improving the Performance of Standard Part-of-Speech Taggers for Computer-Mediated Communication
- We assess the performance of off-the-shelve POS taggers when applied to two types of Internet texts in German, and investigate easy-to-implement methods to improve tagger performance. Our main findings are that extending a standard training set with small amounts of manually annotated data for Internet texts leads to a substantial improvement of tagger performance, which can be further improved by using a previously proposed method to automatically acquire training data. As a prerequisite for the evaluation, we create a manually annotated corpus of Internet forum and chat texts.
Author: | Andrea Horbach, Diana Steffen, Stefan Thater, Manfred Pinkal |
---|---|
URN: | https://nbn-resolving.org/urn:nbn:de:gbv:hil2-opus-2792 |
Parent Title (English): | Proceedings of the 12th edition of the KONVENS conference |
Document Type: | Conference Proceeding |
Language: | English |
Date of Publication (online): | 2014/10/23 |
Release Date: | 2014/10/23 |
Tag: | Annotation von Wortarten morphology; phonetics; phonology; segmenation; tagging |
GND Keyword: | Morphologie; Phonetik; Phonologie; Segmentierung |
First Page: | 171 |
Last Page: | 177 |
PPN: | Link zum Katalog |
Institutes: | Fachbereich III / Informationswissenschaft und Sprachtechnologie |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Collections: | KONVENS 2014 / Proceedings of the 12th KONVENS 2014 |
Licence (German): | ![]() |