Volltext-Downloads (blau) und Frontdoor-Views (grau)
  • search hit 1 of 1
Back to Result List

Mapping German Tweets to Geographic Regions

  • We present a first attempt at classifying German tweets by region using only the text of the tweets. German Twitter users are largely unwilling to share geolocation data. Here, we introduce a two-step process. First, we identify regionally salient tweets by comparing them to an "average" German tweet based on lexical features. Then, regionally salient tweets are assigned to one of 7 dialectal regions. We achieve an accuracy (on regional tweets) of up to 50% on a balanced corpus, much improved from the baseline. Finally, we show several directions in which this work can be extended and improved.

Download full text files

Export metadata

Additional Services

Share in Twitter    Search Google Scholar    frontdoor_oas
Author:Tatjana Scheffler, Johannes Gontrum, Matthhias Wegel, Steve Wendler
Parent Title (English):Workshop proceedings of the 12th edition of the KONVENS conference
Document Type:Conference Proceeding
Date of Publication (online):2014/11/27
Release Date:2014/11/27
Tag:Soziale Medien
corpus linguistics; dialect; social media
GND Keyword:Dialekt; Korpus <Linguistik>
First Page:26
Last Page:33
PPN:Link zum Katalog
Institutes:Fachbereich III / Informationswissenschaft und Sprachtechnologie
DDC classes:400 Sprache / 400 Sprache, Linguistik
Collections:KONVENS 2014 / Workshop Proceedings of the 12th KONVENS 2014
Licence (German):License LogoCreative Commons - Namensnennung 3.0