Refine
Document Type
Language
- English (2) (remove)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2) (remove)
Keywords
Institute
Digital libraries allow us to organize a vast amount of publications in a structured way and to extract information of user’s interest. In order to support customized use of digital libraries, we develop novel methods and techniques in the Knowledge Discovery in Scientific Literature (KDSL) research program of our graduate school. It comprises several sub-projects to handle specific problems in their own fields. The sub-projects are tightly connected by sharing expertise to arrive at an integrated system. To make consistent progress towards enriching digital libraries to aid users by automatic search and analysis engines, all methods developed in the program are applied to the same set of freely available scientific articles.
Evaluation metrics for rule learning typically, in one way or another, trade off consistency and coverage. In this work, we investigate this tradeoff for three different families of rule learning heuristics, all of them featuring a parameter that implements this trade-off in different guises. These heuristics are the m-estimate, the F-measure, and the Klösgen measures. The main goals of this work are to extend our understanding of these heuristics by visualizing their behavior via isometrics in coverage space, and to determine optimal parameter settings for them. Interestingly, even though the heuristics use quite different ways for implementing this trade-off, their optimal settings realize quite similar evaluation functions. Our empirical results on a large number of datasets demonstrate that, even though we do not use any form of pruning, the quality of the rules learned with these settings outperforms standard rule learning heuristics and approaches the performance of Ripper, a state-of-the-art rule learning system that uses extensive pruning and optimization phases.