Workshop Proceedings of the 12th KONVENS 2014
Refine
Year of publication
- 2014 (26)
Document Type
Has Fulltext
- yes (26)
Is part of the Bibliography
- no (26)
Keywords
- Computerlinguistik (13)
- NER (13)
- Named entity recognition (13)
- Korpus <Linguistik> (11)
- corpus linguistics (10)
- Computerunterstützte Kommunikation (9)
- Onlinecommunity (6)
- Soziale Medien (6)
- social media (6)
- cmc (5)
Institute
We present a first attempt at classifying German tweets by region using only the text of the tweets. German Twitter users are largely unwilling to share geolocation data. Here, we introduce a two-step process. First, we identify regionally salient tweets by comparing them to an "average" German tweet based on lexical features. Then, regionally salient tweets are assigned to one of 7 dialectal regions. We achieve an accuracy (on regional tweets) of up to 50% on a balanced corpus, much improved from the baseline. Finally, we show several directions in which this work can be extended and improved.
In this paper, we describe our system developed for the GErman SenTiment AnaLysis shared Task (GESTALT) for participation in the Maintask 2: Subjective Phrase and Aspect Extraction from Product Reviews. We present a tool, which identifies subjective and aspect phrases in German product reviews. For the recognition of subjective phrases, we pursue a lexicon-based approach. For the extraction of aspect phrases from the reviews, we consider two possible ways: Besides the subjectivity and aspect look-up, we also implemented a method to establish which subjective phrase belongs to which aspect. The system achieves better results for the recognition of aspect phrases than for the subjective identification.
We report on the two systems we built for Task 1 of the German Sentiment Analysis Shared Task, the task on Source, Subjective Expression and Target Extraction from Political Speeches (STEPS). The first system is a rule-based system relying on a predicate lexicon specifying extraction rules for verbs, nouns and adjectives, while the second is a translation-based system that has been obtained with the help of the (English) MPQA corpus.
We present the German Sentiment Analysis Shared Task (GESTALT) which consists of two main tasks: Source, Subjective Expression and Target Extraction from Political Speeches (STEPS) and Subjective Phrase and Aspect Extraction from Product Reviews (StAR). Both tasks focused on fine-grained sentiment analysis, extracting aspects and targets with their associated subjective expressions in the German language. STEPS focused on political discussions from a corpus of speeches in the Swiss parliament. StAR fostered the analysis of product reviews as they are available from the website Amazon.de. Each shared task led to one participating submission, providing baselines for future editions of this task and highlighting specific challenges. The shared task homepage can be found at https://sites.google.com/site/iggsasharedtask/.
In this paper, we present our Named Entity Recognition (NER) system for German – NERU (Named Entity Rules), which heavily relies on handcrafted rules as well as information gained from a cascade of existing external NER tools. The system combines large gazetteer lists, information obtained by comparison of different automatic translations and POS taggers. With NERU, we were able to achieve a score of 73.26% on the development set provided by the GermEval 2014 Named Entity Recognition Shared Task for German.
This paper presents a Named Entity Recognition system for German based on Conditional Random Fields. The model also includes language-independant features and features computed form large coverage lexical resources. Along side the results themselves, we show that by adding linguistic resources to a probabilistic model, the results improve significantly.
In the latest decades, machine learning approaches have been intensively experimented for natural language processing. Most of the time, systems rely on using statistics within the system, by analyzing texts at the token level and, for labelling tasks, categorizing each among possible classes. One may notice that previous symbolic approaches (e.g. transducers) where designed to delimit pieces of text. Our research team developped mXS, a system that aims at combining both approaches. It locates boundaries of entities by using sequential pattern mining and machine learning. This system, intially developped for French, has been adapted to German.
In this paper, we investigate a semi- supervised learning approach based on neu- ral networks for nested named entity recog- nition on the GermEval 2014 dataset. The dataset consists of triples of a word, a named entity associated with that word in the first-level and one in the second-level. Additionally, the tag distribution is highly skewed, that is, the number of occurrences of certain types of tags is too small. Hence, we present a unified neural network archi- tecture to deal with named entities in both levels simultaneously and to improve gen- eralization performance on the classes that have a small number of labelled examples.
In this paper we present Nessy (Named Entity Searching System) and its application to German in the context of the GermEval 2014 Named Entity Recognition Shared Task (Benikova et al., 2014a). We tackle the challenge by using a combination of machine learning (Naive Bayes classification) and rule-based methods. Altogether, Nessy achieves an F-score of 58.78% on the final test set.
This paper presents the BECREATIVE Named Entity Recognition system and its participation at the GermEval 2014 Named Entity Recognition Shared Task (Benikova et al., 2014a). BECREATIVE uses a hybrid approach of two commonly used procedural methods, namely list-based lookups and machine learning (Naive Bayes Classification), which centers around the classifier. BECREATIVE currently reaches an F-score of 37.34 on the strict evaluation setting applied on the development set provided by GermEval.