Volltext-Downloads (blau) und Frontdoor-Views (grau)

Mining corpora of computer-mediated communication: Analysis of linguistic features in Wikipedia talk pages using machine learning methods

  • Machine learning methods offer a great potential to automatically investigate large amounts of data in the humanities. Our contribution to the workshop reports about ongoing work in the BMBF project KobRA (http://www.kobra.tu-dortmund.de) where we apply machine learning methods to the analysis of big corpora in language-focused research of computer-mediated communication (CMC). At the workshop, we will discuss first results from training a Support Vector Machine (SVM) for the classification of selected linguistic features in talk pages of the German Wikipedia corpus in DeReKo provided by the IDS Mannheim. We will investigate different representations of the data to integrate complex syntactic and semantic information for the SVM. The results shall foster both corpus-based research of CMC and the annotation of linguistic features in CMC corpora.

Download full text files

Export metadata

Additional Services

Share in Twitter    Search Google Scholar    frontdoor_oas
Author:Michael Beißwenger, Harald Lüngen, Eliza Margaretha, Christian Pölitz
Parent Title (English):Workshop proceedings of the 12th edition of the KONVENS conference
Document Type:Conference Proceeding
Date of Publication (online):2014/11/25
Release Date:2014/11/25
Tag:IBK; Internet-basierte Kommunikation; Soziale Medien
cmc; corpus linguistics; social media
GND Keyword:Computerunterstützte Kommunikation; Korpus <Linguistik>; Onlinecommunity
First Page:42
Last Page:47
PPN:Link zum Katalog
Institutes:Fachbereich III / Informationswissenschaft und Sprachtechnologie
DDC classes:400 Sprache / 400 Sprache, Linguistik
Collections:KONVENS 2014 / Workshop Proceedings of the 12th KONVENS 2014
Licence (German):License LogoCreative Commons - Namensnennung 3.0