- search hit 1 of 1
Mapping German Tweets to Geographic Regions
- We present a first attempt at classifying German tweets by region using only the text of the tweets. German Twitter users are largely unwilling to share geolocation data. Here, we introduce a two-step process. First, we identify regionally salient tweets by comparing them to an "average" German tweet based on lexical features. Then, regionally salient tweets are assigned to one of 7 dialectal regions. We achieve an accuracy (on regional tweets) of up to 50% on a balanced corpus, much improved from the baseline. Finally, we show several directions in which this work can be extended and improved.
Author: | Tatjana Scheffler, Johannes Gontrum, Matthhias Wegel, Steve Wendler |
---|---|
URN: | https://nbn-resolving.org/urn:nbn:de:gbv:hil2-opus-3236 |
Parent Title (English): | Workshop proceedings of the 12th edition of the KONVENS conference |
Document Type: | Conference Proceeding |
Language: | English |
Date of Publication (online): | 2014/11/27 |
Release Date: | 2014/11/27 |
Tag: | Soziale Medien corpus linguistics; dialect; social media |
GND Keyword: | Dialekt; Korpus <Linguistik> |
First Page: | 26 |
Last Page: | 33 |
PPN: | Link zum Katalog |
Institutes: | Fachbereich III / Informationswissenschaft und Sprachtechnologie |
DDC classes: | 400 Sprache / 400 Sprache, Linguistik |
Collections: | KONVENS 2014 / Workshop Proceedings of the 12th KONVENS 2014 |
Licence (German): | ![]() |