Volltext-Downloads (blau) und Frontdoor-Views (grau)

Unsupervised Text Segmentation for Automated Error Reduction

  • Challenging the assumption that traditional whitespace/punctuation-based tokenisation is the best solution for any NLP application, I propose an alternative approach to segmenting text into processable units. The proposed approach is nearly knowledge-free, in that it does not rely on language-dependent, man-made resources. The text segmentation approach is applied to the task of automated error reduction in texts with high noise. The results are compared to conventional tokenisation.

Download full text files

  • Main Conference Proceedings of the 12th Konvens 2014

Export metadata

Additional Services

Share in Twitter    Search Google Scholar    frontdoor_oas
Metadaten
Author:Lenz Furrer
URN:https://nbn-resolving.org/urn:nbn:de:gbv:hil2-opus-2804
Parent Title (German):Proceedings of the 12th edition of the KONVENS conference
Document Type:Conference Proceeding
Language:English
Date of Publication (online):2014/10/23
Release Date:2014/10/23
Tag:Machine Learning; Statistical Methods
GND Keyword:Maschinelles Lernen
First Page:178
Last Page:185
PPN:Link zum Katalog
Contributor:Faaß, Gertrud
Institutes:Fachbereich III / Informationswissenschaft und Sprachtechnologie
DDC classes:400 Sprache / 400 Sprache, Linguistik
Collections:KONVENS 2014 / Proceedings of the 12th KONVENS 2014
Licence (German):License LogoCreative Commons - Namensnennung 3.0