Unsupervised Text Segmentation for Automated Error Reduction
- Challenging the assumption that traditional whitespace/punctuation-based tokenisation is the best solution for any NLP application, I propose an alternative approach to segmenting text into processable units. The proposed approach is nearly knowledge-free, in that it does not rely on language-dependent, man-made resources. The text segmentation approach is applied to the task of automated error reduction in texts with high noise. The results are compared to conventional tokenisation.
Author: | Lenz Furrer |
---|---|
URN: | https://nbn-resolving.org/urn:nbn:de:gbv:hil2-opus-2804 |
Parent Title (German): | Proceedings of the 12th edition of the KONVENS conference |
Document Type: | Conference Proceeding |
Language: | English |
Date of Publication (online): | 2014/10/23 |
Release Date: | 2014/10/23 |
Tag: | Machine Learning; Statistical Methods |
GND Keyword: | Maschinelles Lernen |
First Page: | 178 |
Last Page: | 185 |
PPN: | Link zum Katalog |
Contributor: | Faaß, Gertrud |
Institutes: | Fachbereich III / Informationswissenschaft und Sprachtechnologie |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Collections: | KONVENS 2014 / Proceedings of the 12th KONVENS 2014 |
Licence (German): | ![]() |