Refine
Document Type
- Conference Proceeding (32)
- Part of a Book (9)
- Master's Thesis (1)
Has Fulltext
- yes (42)
Is part of the Bibliography
- no (42)
Keywords
- Korpus <Linguistik> (42) (remove)
We present a first attempt at classifying German tweets by region using only the text of the tweets. German Twitter users are largely unwilling to share geolocation data. Here, we introduce a two-step process. First, we identify regionally salient tweets by comparing them to an "average" German tweet based on lexical features. Then, regionally salient tweets are assigned to one of 7 dialectal regions. We achieve an accuracy (on regional tweets) of up to 50% on a balanced corpus, much improved from the baseline. Finally, we show several directions in which this work can be extended and improved.
In this paper, we describe our system developed for the GErman SenTiment AnaLysis shared Task (GESTALT) for participation in the Maintask 2: Subjective Phrase and Aspect Extraction from Product Reviews. We present a tool, which identifies subjective and aspect phrases in German product reviews. For the recognition of subjective phrases, we pursue a lexicon-based approach. For the extraction of aspect phrases from the reviews, we consider two possible ways: Besides the subjectivity and aspect look-up, we also implemented a method to establish which subjective phrase belongs to which aspect. The system achieves better results for the recognition of aspect phrases than for the subjective identification.
We report on the two systems we built for Task 1 of the German Sentiment Analysis Shared Task, the task on Source, Subjective Expression and Target Extraction from Political Speeches (STEPS). The first system is a rule-based system relying on a predicate lexicon specifying extraction rules for verbs, nouns and adjectives, while the second is a translation-based system that has been obtained with the help of the (English) MPQA corpus.
We present the German Sentiment Analysis Shared Task (GESTALT) which consists of two main tasks: Source, Subjective Expression and Target Extraction from Political Speeches (STEPS) and Subjective Phrase and Aspect Extraction from Product Reviews (StAR). Both tasks focused on fine-grained sentiment analysis, extracting aspects and targets with their associated subjective expressions in the German language. STEPS focused on political discussions from a corpus of speeches in the Swiss parliament. StAR fostered the analysis of product reviews as they are available from the website Amazon.de. Each shared task led to one participating submission, providing baselines for future editions of this task and highlighting specific challenges. The shared task homepage can be found at https://sites.google.com/site/iggsasharedtask/.
This paper describes the process followed in creating a tool aimed at helping learners produce collocations in Spanish. First we present the Diccionario de colocaciones del español (DiCE), an online collocation dictionary, which represents the first stage of this process. The following section focuses on the potential user of a collocation learning tool: we examine the usability problems DiCE presents in this respect, and explore the actual learner needs through a learner corpus study of collocation errors. Next, we review how collocation production problems of English language learners can be solved using a variety of electronic tools devised for that language. Finally, taking all the above into account, we present a new tool aimed at assisting learners of Spanish in writing texts, with particular attention being paid to the use of collocations in this language.
Virtual textual communication involves numeric supports as transporter and mediator. SMS language is part of this type of communication and represents some specific particularities. An SMS text is characterized by an unpredictable use of white-spaces, special characters and a lack of any writing standards, when at the same time stays close to the orality. This paper aims to expose the database of alpes4science project from the collation to the processing of the SMS corpus. Then we present some of the most common SMS tokenization problems and works related to SMS normalization.
This software demonstration paper presents a project on the interactive visualization of social media data. The data presentation fuses German Twitter data and a social relation network extracted from German online news. Such fusion allows for comparative analysis of the two types of media. Our system will additionally enable users to explore relationships between named entities, and to investigate events as they develop over time. Cooperative tagging of relationships is enabled through the active involvement of users. The system is available online for a broad user audience.
Machine learning methods offer a great potential to automatically investigate large amounts of data in the humanities. Our contribution to the workshop reports about ongoing work in the BMBF project KobRA (http://www.kobra.tu-dortmund.de) where we apply machine learning methods to the analysis of big corpora in language-focused research of computer-mediated communication (CMC). At the workshop, we will discuss first results from training a Support Vector Machine (SVM) for the classification of selected linguistic features in talk pages of the German Wikipedia corpus in DeReKo provided by the IDS Mannheim. We will investigate different representations of the data to integrate complex syntactic and semantic information for the SVM. The results shall foster both corpus-based research of CMC and the annotation of linguistic features in CMC corpora.
Political debates bearing ideological references exist for long in our society; the last few years though the explosion of the use of the internet and the social media as communication means have boosted the production of ideological texts to unprecedented levels. This creates the need for automated processing of the text if we are interested in understanding the ideological references it contains. In this work, we propose a set of linguistic rules based on certain criteria that identify a text as bearing ideology. We codify and implement these rules as part of a Natural Language Processing System that we also present. We evaluate the system by using it to identify if ideology exists in tweets published by French politicians and discuss its performance.
In this paper, we propose an integrated web strategy for mixed sociolinguistic research methodologies in the context of social media corpora. After stating the particular challenges for building corpora of private, non-public computer-mediated communication, we will present our solution to these problems: a Facebook web application for the acquisition of such data and the corresponding meta data. Finally, we will discuss positive and negative implications for this method.
For a fistful of blogs: Discovery and comparative benchmarking of republishable German content
(2014)
We introduce two corpora gathered on the web and related to computer-mediated communication: blog posts and blog comments. In order to build such corpora, we addressed following issues: website discovery and crawling, content extraction constraints, and text quality assessment. The blogs were manually classified as to their license and content type. Our results show that it is possible to find blogs in German under Creative Commons license, and that it is possible to perform text extraction and linguistic annotation efficiently enough to allow for a comparison with more traditional text types such as newspaper corpora and subtitles. The comparison gives insights on distributional properties of the processed web texts on token and type level. For example, quantitative analysis reveals that blog posts are close to written language, while comments are slightly closer to spoken language.
The workshops hosted at this iteration of KONVENS also reflect the interaction of, and common themes shared between, Computational Linguistics and Information Science: a focus on on evaluation, represented by shared tasks on Named Entity Recognition (GermEval) and on Sentiment Analysis (GESTALT); a growing interest in the processing of non-canonical text such as that found in social media (NLP4CMC) or patent documents (IPaMin); multi-disciplinary research which combines Information Science, Computer Aided Language Learning, Natural Language Processing, and E-Lexicography with the objective of creating language learning and training systems that provide intelligent feedback based on rich knowledge (ISCALPEL).
This paper presents Atomic, an open-source platform-independent desktop application for multi-level corpus annotation. Atomic aims at providing the linguistic community with a user-friendly annotation tool and sustainable platform through its focus on extensibility, a generic data model, and compatibility with existing linguistic formats. It is implemented on top of the Eclipse Rich Client Platform, a pluggable Java-based framework for creating client applications. Atomic - as a set of plug-ins for this framework - integrates with the platform and allows other researchers to develop and integrate further extensions to the software as needed. The generic graph-based meta model Salt serves as Atomic’s domain model and allows for unlimited annotation levels and types. Salt is also used as an intermediate model in the Pepper framework for conversion of linguistic data, which is fully integrated into Atomic, making the latter compatible with a wide range of linguistic formats. Atomic provides tools for both less experienced and expert annotators: graphical, mouse-driven editors and a command-line data manipulation language for rapid annotation.
We discovered several recurring errors in the current version of the Europarl Corpus originating both from the web site of the European Parliament and the corpus compilation based thereon. The most frequent error was incompletely extracted metadata leaving non-textual fragments within the textual parts of the corpus files. This is, on average, the case for every second speaker change. We not only cleaned the Europarl Corpus by correcting several kinds of errors, but also aligned the speakers’ contributions of all available languages and compiled every- thing into a new XML-structured corpus. This facilitates a more sophisticated selection of data, e.g. querying the corpus for speeches by speakers of a particular political group or in particular language combinations.
The paper by Beatrix Kreß provides a contrastive study of face work in German and Russian online communication. She analyses users' comments in online newspaper and comes to the conclusion that Russian debates tend to have a more direct style, whereas German users more frequently apply humour to mitigate FTAs.
The difference between experts and laypeople is also the subject of the paper by Gesa Linnemann, Benjamin Brummernhenrich and Regina Jucks. In an experiment in pedagogical psychology, they examine efficient knowledge acquisition in e-learning contexts. In the experiment, tutors applied various strategies to criticise the learners' results, with different intensity levels of face threat. If mitigating strategies were used, the learners considered the tutors to be more credible.
Martina Schrader-Kniffki analyses how such status attributions are developed in the French forum 'Francais notre belle langue'. Users of this community discuss language-related topics, usually on the level of laypeople in linguistics. However, the self-presentation of the participants plays an important role in the discussion, which is often the result of intentional subjectified speech acts. In this way, the users develop evidentially and constructed knowledge.
The construction of a shared enemy is also the subject of the paper by Bettina Kluge. She examines the phenomenon of the troll, a user who joins a constructive debate with the intention of systematically destroying it by making hurtful or meaningless contributions or ones that detract from the subject. The paper looks at how a community cooperatively construes a user as a troll and how it deals wirh this disruptive behaviour.
The author dedicates her paper to collective attacks against absent third parties. The users, who do not know each other, construct a shared concept of the enemy which they then male fun of, attacking it collectively in the form of so-called 'flaming'. Even if the person being attacked is unaware of it, this FTA has the effect of enhancing the shared face of the group of attackers.
The author examines Comments on recipes in a French and Italian cooking community. She discovers how users avoid open criticism and use various diluting strategies to ensure that the potential FTA is mitigated. These include, for example, messages formulated in the first person to deflect the criticism from the addressed person, and and praise aimed at balancing and putting the criticism into perspective.
The author shows, on the basis of Watts' model, that online communication on interaction platforms tends to be marked. Due to the media conditions, utterances can be misunderstood or ambigious. This often leads to a discussion about how a post is to be interpreted and about a third party who may also be reading - a potential FTA. To counteract this, verbal, paraverbal an non-verbal strategies aim at marking the posts through multiple codes by means of their user profile, with their avatar, signature, etc. - options many platforms provide and thus support such behaviour.
The author examines interactions in a forum community. Her paper focuses primarily on the negotiation of status, which is measured for example by the length of membership and the activity of the users in the communities. Using the example of the community 'The Student Room', she shows that newcomers first have to earn the right to perform certain verbal actions.
The authors discuss how mutual criticism is expressed in the CouchSurfing community. As this community is based on mutual trust and the willingness to provide overnight accommodation in their own homes, user ratings that contain criticism and negative judgement have to be formulated in a way to avoid further conflicts and to maintain a good host image. This is why many negative evaluations contain mitigating strategies that anticipate future interactions in the community and that can be judged as face work.
In this paper, the author reflects in the terms self, identity and face. She will give (psychological) definitions of the terms self and identity and differentiate the two terms before she details the concept of face. The author will exemplify the use of face in a qualitative analysis in the Spanish online forum "Crepúsculo" (Twighlight).
On platforms such as Facebook and Twitter, on message boards, in blogs and commentaries, in short: in the Social Media, users interact as if they knew each other personally. Malicious verbal behaviour is found next to clapping and kissing emoticons, both indicative of users' relational work strategies. This book presents seventeen papers on face work in Social Media – theoretical reflections as well as corpus-based studies – thus opening the way to rethink linguistic pragmatics in computer-mediated communication.
Face Work and Social Media
(2014)
On platforms such as Facebook and Twtter, on message boards, in blogs and commentaries, in short: in the Social Media, users interact as if they knew each other personally. Malicious verbal behaviour is found next to clapping and kissing emoticons, both indicative of users' relational work strategies. This book presents seventeen papers on face work in Social Media - theoretical reflections as well as corpus-based studies - thus opening the way to rethink linguistic pragmatics in computer-mediated communication.
Weblogs zählen zu den populärsten Formen der Kommunikation im Web 2.0. die wenigen formalen Rahmenbedingungen sowie die einfache Publikation von Inhalten führen zu einer außerordentlichen Heterogenität innerhalb dieser Kommunikationsform. Dieser Beitrag untersucht, ob die unterschiedlichen Ausprägungen von Weblogs als Textsorten bezeichnet werden können. Dazu werden persönliche Blogs einer französischen Weblogplattform nach ihrer jeweiligen Textfunktion kategorisiert. Die Ergebnisse werden anhand einzelner Textbeispiele veranschaulicht und diskutiert. Es wird untersucht, ob sprachliche Konventionen vorhanden sind, die auf die Entstehung einer neuen Textsorte hindeuten.
Dieser Beitrag untersucht beziehungsorientierte Online-Kommunikation am Beispiel öffentlicher Chatrooms. Er geht der Frage nach, wie Beziehungsgestaltung in einem Medium funktioniert, das sich auf einen einzigen, schriftbasierten Kanal beschränkt. In einem face-theoretischen Modell werden beziehungsorientierte Handlungen näher bestimmt und an konkreten Interaktionsmustern festgemacht. Anhand einer empirischen Analyse französisch- und deutschsprachiger Chat-Gespräche wird gezeigt, in welcher Form diese Interaktionsmuster im Chat in Erscheinung treten und trotz der medialen Restriktionen erfolgreiche Beziehungsarbeit in Chatrooms ermöglichen.
Der Beitrag geht der kommunikativen Konstruktion rechter bzw. extrem rechter politischer Identität im Netz nach. Basis der Untersuchung bilden LeserInnen-Postings, die im Anschluss an den freiheitlichen "Exil-Jude"-Sager bei dr Voralberger Landtagswahl 2009 auf krone.de veröffentlicht wurden, In einem ersten Schritt wird dabei auf lokaler Ebene nach spezifischen Kategorisierungsprozessen gesucht, die dem Selbst, dem Gegenüber und anderen eine bestimmte Identität zuweisen. In einem zweiten Schritt werden typische Argumentationsschemata herausgearbeitet, in denen rechte bzw. extrem recht Denkmuster zum Ausdruck kommen.
Erstkontakt-Texte aus Online-Kontaktbörsen sind ein geeigneter Gegenstand für die lunguistische Persuasionsforschung. Sie sollen nicht nur dazu dienen, argumentativ zu überzeugen, sondern auch Emotionen zu evozieren. Der Beitrag setllt anhand exemplarischer Kurzanalysen persuasive Emotionalisierungsstrategien wie Identitätskonstruktion, Illusionskreierung und Herstellen von Verbindlichkeit vor.
In seiner einflussreichen Schrift "La nature des pronoms" von 1956 bezeichnet Émile Benveniste die dritte Person als "non-personne", da sie Gesprächsgegenstände benenne. Diese Sicht ist bis heute verbreitet. Dieser Beitrag zeigt dagegen, dass die dritte Person vielmehr eine sehr wichtige Rolle für die Gesprächskonstellation einnimmt - ebenso wie die zweite Person hat sie eine potentielle Sprecherrolle inne. Das gilt in besonderem Maße für die Kommunikation in Internetforen, wo über anwesende Dritte gesprochen wird. Der Beitrag weist nach, dass dieses höchst gesichtsbedrohende Sprechen über anwesende Dritte ein wichtiger Grund für die häufig konstatierte Aggressivität von Forenkommunikation ist.
Remediatisierungen auf Videoplattformen am Beispiel der Thilo-Sarrazin-Kontroverse auf YouTube
(2012)
Das Austauschen, Mitteilen, Kopieren, Verlinken und Bewerten von Inhalten sind neue soziokulturelle Handlungsoptionen für die Selbstdarstellung im vielfältigen Netzwerk von Videoplattformen - so auch für die Online-Beteiligung an gesellschaftlichen Debatten, welche in Remediationen widergespiegelt und umgedeutet werden. Am Beispiel der Thilo-Sarrazin-Kontroverse auf YouTube werden hier in Videoclips Muster der Selbstinszenierung als remediatisierte Diskursstrategien aufgezeigt.
Der für unkonventionelle Schreibweisen im Internet gebrauchte Begriff der Neografie suggeriert eine völlige Neuartigkeit dieser Schreibweisen. Dass bei einer Vielzahl der neografischen Strategien und Verfahren, wie dem Gebrauch von Logogrammen, Syllabogrammen, der Konsonantenschreibweise und der Anwedung von ökonomisierenden Schreibungen, Parallelen zu teils schon lange bestehenden Traditionen feststellen lassen, soll dieser Beitrag zeigen.
Der vorliegende Beitrag stellt dar, wie eine Jungengruppe ihren Ausflug in eine Großstadt im sozialen Netzwerk SchülerVZ medial inszeniert. Sie nutzen hierbei verschiedene Funktionen des Netzwerkes, wie bspw. "Funksprüche" oder das Einstellen von Bildern. Als Schwerpunkt wird die sprachliche Umsetzung der Inszenierung näher betrachet.
„On the internet, nobody knows you're a dog", heißt es in der berühmten Karikatur, die das Magazin New Yorker 1993 veröffentlichte. Die zahlreichen Kommuniktionsplattformen des Web 2.0 ermöglichen heutzutage jedem Internetnutzer, Online-Inhalte auf seine Weise zu konsumieren und selbst zu erstellen. Seither ist das World Wide Web nicht mehr so anonym, wie das Zitat vom unerkannten Hund glauben macht: Die Kommunikation der Nutzer über die Inhalte ist gerade in interpersoneller Hinsicht überaus differenziert. Identität, Selbstdarstellung, Fremdzuschreibungen und kommunikative Konventionen spielen eine große Rolle im virtuellen Netzwerk der Personen. Für die Linguistik stellt sich die Frage, welche Veränderungen im menschlichen Sprachverhalten zu beobachten sind. Das Zusammenspiel von Offline- und Online-Kommunikation lässt vermuten, dass Traditionen und Innovationen aus beiden Bereichen ineinandergreifen.
Seit das Behindertengleichstellungsgesetz im Jahre 2002 in Kraft getreten ist, wird das Thema Inklusion in unserer Gesellschaft immer wichtiger. Die gesetzlichen Neuerungen manifestieren sich auch im schulischen Bereich: Kinder mit Behinderung haben Anspruch auf Regelbeschulung, barrierefreie Gestaltung von Schulen und Beseitigung von Kommunikationshindernissen. In dieser Ausarbeitung spielt insbesondere die Sprache eine wichtige Rolle. Sie dient der Vermittlung und dem Austausch von Informationen und ermöglicht Teilhabe. Auf der anderen Seite kann Sprache jedoch auch eine Hürde darstellen, nämlich wenn die vermittelten Inhalte nicht verstanden werden. Im schulischen Kontext kann Leichte Sprache Anwendung finden, wenn es darum geht, einen Nachteilsausgleich für hörgeschädigte Schülerinnen und Schüler herbeizuführen. So können mögliche Defizite in der Sprach- und Lesekompetenz beispielsweise durch die Verwendung einer einfachen und klaren Sprache ausgeglichen werden. Hörgeschädigte sind eine der Zielgruppen von Leichter Sprache; sie stehen in dieser Ausarbeitung im Fokus. Die Forschungsfrage lautet, ob und inwiefern durch die Verwendung von Leichter Sprache in Prüfungsaufgaben ein Nachteilsausgleich erreicht werden kann. Untersucht wird ein ausgewähltes Textkorpus im Schulfach Mathematik 9. Klasse Hauptschule aus dem Jahre 2006 der Landesbildungszentren für Hörgeschädigte (LBZH) in Niedersachsen. Die Untersuchung soll Aufschluss darüber geben, an welchen Regeln sich die Lehrkräfte der Landesbildungszentren im Raum Niedersachsen gegenwärtig orientieren, um Prüfungsaufgaben zu optimieren und infolgedessen einen Nachteilsausgleich herzustellen. Die Arbeit formuliert auch konkrete Hinweise für eine weitere Optimierung.