Corpus-Based Linguistic Typology: A Comprehensive Approach
- This paper will have a holistic view at the field of corpus-based linguistic typology and present an overview of current advances at Leipzig University. Our goal is to use automatically created text data for a large variety of languages for quantitative typological investigations. In our approaches we utilize text corpora created for several hundred languages for cross-language quantitative studies using mathematically well-founded methods (Cysouw, 2005). These analyses include the measurement of textual characteristics. Basic requirements for the use of these parameters are also discussed. The measured values are then utilized for typological studies. Using quantitative methods, correlations of measured properties of corpora among themselves or with classical typological parameters are detected. Our work can be considered as an automatic and language-independent process chain, thus allowing extensive investigations of the various languages of the world.
Author: | Dirk Goldhahn, Uwe Quasthoff, Gerhard Heyer |
---|---|
URN: | https://nbn-resolving.org/urn:nbn:de:gbv:hil2-opus-2846 |
Parent Title (English): | Proceedings of the 12th edition of the KONVENS conference |
Document Type: | Conference Proceeding |
Language: | English |
Date of Publication (online): | 2014/10/23 |
Release Date: | 2014/10/23 |
Tag: | Multilinguale Systeme machine translation; multilingual systems |
GND Keyword: | Maschinelle Übersetzung |
First Page: | 215 |
Last Page: | 221 |
PPN: | Link zum Katalog |
Contributor: | Faaß, Gertrud |
Institutes: | Fachbereich III / Informationswissenschaft und Sprachtechnologie |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Collections: | KONVENS 2014 / Proceedings of the 12th KONVENS 2014 |
Licence (German): | ![]() |