Volltext-Downloads (blau) und Frontdoor-Views (grau)

On the effect of word frequency on distributional similarity

  • The dependency of word similarity in vector space models on the frequency of words has been noted in a few studies, but has received very little attention. We study the influence of word frequency in a set of 10 000 randomly selected word pairs for a number of different combinations of feature weighting schemes and similarity measures. We find that the similarity of word pairs for all methods, except for the one using singular value decomposition to reduce the dimensionality of the feature space, is determined to a large extent by the frequency of the words. In a binary classification task of pairs of synonyms and unrelated words we find that for all similarity measures the results can be improved when we correct for the frequency bias.

Download full text files

  • Main Conference Proceedings of the 12th Konvens 2014

Export metadata

Additional Services

Share in Twitter    Search Google Scholar    frontdoor_oas
Metadaten
Author:Christian Wartena
URN:https://nbn-resolving.org/urn:nbn:de:gbv:hil2-opus-2634
Parent Title (English):Proceedings of the 12th edition of the KONVENS conference
Document Type:Conference Proceeding
Language:English
Date of Publication (online):2014/10/22
Release Date:2014/10/22
Tag:Statistische Methoden
Machine Learning; Statistical Methods
GND Keyword:Maschinelles Lernen
First Page:1
Last Page:10
PPN:Link zum Katalog
Contributor:Faaß, Gertrud
Institutes:Fachbereich III / Informationswissenschaft und Sprachtechnologie
DDC classes:400 Sprache / 400 Sprache, Linguistik
Collections:KONVENS 2014 / Proceedings of the 12th KONVENS 2014
Licence (German):License LogoCreative Commons - Namensnennung 3.0