004 Informatik
Refine
Year of publication
Document Type
- Conference Proceeding (71)
- Doctoral Thesis (17)
- Master's Thesis (16)
- Report (3)
- Book (2)
Has Fulltext
- yes (109)
Is part of the Bibliography
- no (109)
Keywords
- Informationssysteme (20)
- Machine Learning (4)
- Computerlinguistik (3)
- Information Retrieval (3)
- Künstliche Intelligenz (3)
- Maschinelles Lernen (3)
- Opinion Mining (3)
- Sentiment Analyse (3)
- Transformation (3)
- ZfDW (3)
Institute
Automated machine learning represents the next generation of machine learning that involves efficiently identifying model hyperparameters and configurations that ensure decent generalization behavior beyond the training data. With a proper setup in place, considerable resources can be saved by practitioners and academics. Beyond naive approaches, e.g. random sampling or grid search, sequential model-based optimization has been at the forefront of solutions that attempt to optimize the black-box function representing the generalization surface, for example, the validation loss.
With the abundance of data and algorithm evaluations being available, transfer learning techniques and meta-knowledge can be utilized to further expedite hyperparameter optimization. In this thesis, we cover 4 ways in which meta-knowledge can be leveraged to improve hyperparameter optimization.
In the first part, we present two large-scale meta-datasets, i.e. a collection of hyperparameters and their respective response for a machine learning algorithm trained on several datasets. We describe in detail the implementation details and descriptive analytics that highlight the heterogeneity of the resulting response surface. The two meta-datasets are used as benchmark datasets upon which the subsequent methods developed in this thesis have been empirically evaluated.
In the second part, we introduce the first work that automates the process of learning meta-features, i.e. dataset characteristics, directly from the dataset distribution. Previously, meta-features required expert-domain knowledge and a lot of engineering to properly represent datasets as entities for a meta-learning task. Following this work, we integrate the meta-feature extractor as a module in the machine learning algorithm, and optimize it jointly for the meta-learning task, further promoting the benefits of differentiable meta-features. Finally, we carry over the concept of meta-feature learning in the absence of the underlying dataset. Specifically, we design a deep Gaussian kernel that allows for a richer representation of the attributes via non-linear transformation. The resulting surrogate is conditioned on landmark meta-features extracted from the history of task-specific evaluations.
In the third part, we formulate the problem of hyperparameter optimization as a Markov Decision Process. As such, we introduce the first paper on hyperparameter optimization in a reinforcement learning framework and define a novel transferable policy that acts as an acquisition function for hyperparameter optimization. Furthermore, we study the impact of planning in hyperparameter optimization through a novel non-myopic acquisition function.
Finally, we present hyperparameter optimization in a zero-shot setting. In contrast to sequential model-based optimization, the fastest way for HPO is by learning a zero-shot approach, that identifies the best configuration with a single trial. Our Zap-HPO approach outperforms the state-of-the-art in algorithm selection for deep learning pipelines that comprise a machine learning algorithm and its associated hyperparameters, given simple meta-features.
Road accidents are one of the leading causes of death worldwide, particularly among young people. The police and local authorities therefore strive to reduce the risk of accidents through appropriate road safety measures. In order to plan these measures, the relevant types of accidents, i. e., accidents with certain features, must first be recognized. However, the variety of accident features and the amount of resulting feature combinations make it impossible to monitor all accident types manually.
In this thesis, methods are proposed to automatically identify interesting accident types. Here, it is investigated whether combinations of features occur together and how the temporal pattern of the combined occurrence behaves. The change mining approach can then be used to determine whether structural changes in frequency occur during the period under consideration. For example, a feature combination that suddenly appears more frequently or exhibits a change in seasonality should be prioritized for further investigation so that appropriate road safety measures may be initiated for that combination.
The implemented strategic, multi-stage data mining framework based on frequent itemset mining, time series clustering, forecasting methods, and a scoring process is able to detect interesting feature combinations. These are then processed on a map in a web interface suitable for the respective audience in order to support the strategic planning of road safety measures. The framework is applied to several accident data sets from different countries to determine suitable default parameter values for the respective data analysis methods and to carefully align the methods. It is shown that there exist only minor dependencies of the parameter selection on the database to be analyzed.
For operational planning, it is necessary to consider small geographic areas and identify the features that have the greatest impact on accident occurrence there. Therefore, the developed operational framework analyzes and predicts the course of accident time series, taking into account the associated feature-specific time series. On the one hand, this makes it possible to increase the forecast performance, and, on the other hand, to determine which accident features have a significant influence on the course of the accident numbers over time. The insights gained can be used as a basis for short-term measures.
Recent decades have seen exponential growth in data acquisition attributed to advancements in edge device technology. Factory controllers, smart home appliances, mobile devices, medical equipment, and automotive sensors are a few examples of edge devices capable of collecting data. Traditionally, these devices are limited to data collection and transfer functionalities, whereas decision-making capabilities were missing. However, with the advancement in microcontroller and processor technologies, edge devices can perform complex tasks. As a result, it provides avenues for pushing training machine learning models to the edge devices, also known as learning-at-the-edge. Furthermore, these devices operate in a distributed environment that is constrained by high latency, slow connectivity, privacy, and sometimes time-critical applications. The traditional distributed machine learning methods are designed to operate in a centralized manner, assuming data is stored on cloud storage. The operating environment of edge devices is impractical for transferring data to cloud storage, rendering centralized approaches impractical for training machine learning models on edge devices.
Decentralized Machine Learning techniques are designed to enable learning-at-the-edge without requiring data to leave the edge device. The main principle in decentralized learning is to build consensus on a global model among distributed devices while keeping the communication requirements as low as possible. The consensus-building process requires averaging local models to reach a global model agreed upon by all workers. The exact averaging schemes are efficient in quickly reaching global consensus but are communication inefficient. Decentralized approaches employ in-exact averaging schemes that generally reduce communication by communicating in the immediate neighborhood. However, in-exact averaging introduces variance in each worker's local values, requiring extra iterations to reach a global solution.
This thesis addresses the problem of learning-at-the-edge devices, which is generally referred to as decentralized machine learning or Edge Machine Learning. More specifically, we will focus on the Decentralized Parallel Stochastic Gradient Descent (DPSGD) learning algorithm, which can be formulated as a consensus-building process among distributed workers or fast linear iteration for decentralized model averaging. The consensus-building process in decentralized learning depends on the efficacy of in-exact averaging schemes, which have two main factors, i.e., convergence time and communication. Therefore, a good solution should keep communication as low as possible without sacrificing convergence time. An in-exact averaging solution consists of a connectivity structure (topology) between workers and weightage for each link. We formulate an optimization problem with the objective of finding an in-exact averaging solution that can achieve fast consensus (convergence time) among distributed workers keeping the communication cost low. Since direct optimization of the objective function is infeasible, a local search algorithm guided by the objective function is proposed. Extensive empirical evaluations on image classification tasks show that the in-exact averaging solutions constructed through the proposed method outperform state-of-the-art solutions.
Next, we investigate the problem of learning in a decentralized network of edge devices, where a subset of devices are close to each other in that subset but further apart from other devices not in the subset. Closeness specifically refers to geographical proximity or fast communication links.
We proposed a hierarchical two-layer sparse communication topology that localizes dense communication among a subgroup of workers and builds consensus through a sparse inter-subgroup communication scheme. We also provide empirical evidence of the proposed solution scaling better on Machine Learning tasks than competing methods.
Finally, we address scalability issues of a pairwise ranking algorithm that forms an important class of problem in online recommender systems. The existing solutions based on a parallel stochastic gradient descent algorithm define a static model parameter partitioning scheme, creating an imbalance of work distribution among distributed workers. We propose a dynamic block partitioning and exchange strategy for the model parameters resulting in work balance among distributed workers. Empirical evidence on publicly available benchmark datasets indicates that the proposed method scales better than the static block-based methods and outperforms competing state-of-the-art methods.
Finding an available parking spot in city centers can be a cumbersome task for individual drivers and also negatively affects general traffic flow and CO2 emissions.
In the context of smart cities and the internet of things this problem can be mitigated by using available data to monitor and predict parking occupancy in order to guide users to an available parking location near their destination.
With this goal in mind there arise multiple challenges of which we introduce selected ones to propose novel solutions based on machine learning.
The focus of this work is to enable the usage of readily available and inexpensive data sources like parking meter transactions, opposed to expensive technology like in-ground sensors or cameras where the costs prevent a widespread coverage. Our proposed data sources do not directly monitor the actual parking availability but still provide enough signal for our algorithms to infer the real parking situation with high accuracy.
As part of this work we developed a parking availability prediction system based on parking meter transactions that was deployed to 33 german cities.
A main contribution of our work is the proposal of a novel way to generate labels based on the parking transactions and to use semi-supervised-, more specifically positive-unlabeled learning, to leverage the sparse signal in order to require as little data as possible.
Additionally, we utilize and design novel methodologies in the area of transfer learning to learn simultaneously from different cities which leads to the previously seldom explored setting of combining transfer learning with positive-unlabeled learning. We therefore introduce a novel algorithm to tackle this problem type.
We hope that our work enables the deployment of smart parking systems at lower costs and therefore leads towards the goal of smart parking guidance in smart cities.
Low-Code-Symposium
(2022)
Zum bundesweiten Digitaltag am 18. Juni 2021 richtete das Zentrum für Digitalen Wandel in Kooperation mit dem Hi-X-DigiHub, der Digital City GmbH und der COMPRA GmbH ein digitales Symposium zum Thema der Low-Code-Softwareentwicklung aus. Die Bedeutung der Low-Code-Technologie wächst, weil sich mittels dieser Verfahren Softwareentwicklung sowohl effizient und innovativ als auch barrierefrei für Nicht-Informatiker_innen gestalten lässt. Im Fokus des Symposiums standen drei Vorträge von Akteuren der Low-Code-Softwarebranche sowie eine abschließende Podiumsdiskussion, die die verschiedenen Ansätze und Potenziale der Low-Code- und No-Code-Softwareentwicklung beleuchtete. Dieser Bericht dokumentiert zusammenfassend die Inhalte der Vorträge sowie der Podiumsdiskussion.
Das Zentrum für Digitalen Wandel lud fachübergreifend am 20. November 2020 zu einem dreistündigen Online-Workshop ein, der Frage nachzugehen, welche Veränderungen der Bedeutung von Kompetenzen im Digitalen Wandel zu erwarten sind. Dabei wurden vier Kompetenzbereiche differenziert: 1. Sprachliche Kompetenz und Lernkompetenz, 2. Soziale Kompetenz und unternehmerische Eigeninitiative, 3. MINT-Kompetenz, 4. Kulturbewusstsein und kulturelle Ausdrucksfähigkeit. Darüber hinaus wurden die Konsequenzen für die eigene Organisation reflektiert. Dieser Bericht dokumentiert die Arbeitsergebnisse der Teilnehmenden aus den vier Fachbereichen und der Wissenschaftsadministration der Stiftung Universität Hildesheim.
Kernanliegen des Artikels sind die Darstellung und die Diskussion des Forschungsstands zur Digitalisierung in Heimen und Internaten. Ausgangspunkt ist das BMBF-Projekt „DigiPäd 24/7“, in dem die Potenziale und Anforderungen der Digitalisierung für die pädagogische Praxis in Heimen und Internaten analysiert werden. Auf der Grundlage der Untersuchungsergebnisse werden Empfehlungen herausgearbeitet, wie eine nachhaltige Integration von digitalen Medien und Medienbildung in Heimen und Internaten möglich ist. Die Ergebnisse der Literaturrecherche wurden im Rahmen eines ganztägigen Workshops mit Vertreter*innen aus der Wissenschaft diskutiert und werden im vorliegenden Artikel vorgestellt.
In distributional semantics, the unsupervised learning approach has been widely used for a large number of tasks. On the other hand, supervised learning has less coverage.
In this dissertation, we investigate the supervised learning approach for semantic relatedness tasks in distributional semantics. The investigation considers mainly semantic similarity and semantic classification tasks. Existing and newly-constructed datasets are used as an input for the experiments. The new datasets are constructed from thesauruses like Eurovoc. The Eurovoc thesaurus is a multilingual thesaurus maintained by the Publications Office of the European Union. The meaning of the words in the dataset is represented by using a distributional semantic approach.
The distributional semantic approach collects co-occurrence information from large texts and represents the words in high-dimensional vectors. The English words are represented by using UkWaK corpus while German words are represented by using DeWaC corpus. After representing each word by the high dimensional vector, different supervised machine learning methods are used on the selected tasks. The outputs from the supervised machine learning methods are evaluated by comparing the tasks performance and accuracy with the state of the art unsupervised machine learning methods’ results. In addition, multi-relational matrix factorization is introduced as one supervised learning method in distributional semantics. This dissertation shows the multi-relational matrix factorization method as a good alternative method to integrate different sources of information of words in distributional semantics.
In the dissertation, some new applications are also introduced. One of the applications is an application which analyzes a German company’s website text, and provides information about the company with a concept cloud visualization. The other applications are automatic recognition/disambiguation of the library of congress subject headings and automatic identification of synonym relations in the Dutch Parliament thesaurus applications.
Dieses Buch stellt gleichsam die Ernte der einjährigen universitätsweiten Auseinandersetzung mit dem Buch «Erfindet euch neu! Eine Liebeserklärung an die vernetzte Generation» von Michel Serres dar. Auslöser war der Gewinn des Wettbewerbs «Eine Uni – ein Buch», der vom Stifterverband und der Klaus Tschira Stiftung in Kooperation mit DIE ZEIT ausgeschrieben worden ist.
Nach einer kurzen Einführung in den Wettbewerbsbeitrag der Stiftung Universität Hildesheim erfolgen Überlegungen über Sprache und Literatur im digitalen Zeitalter sowie zwei ausführliche Auseinandersetzungen aus soziologischer und politikwissenschaftlicher Perspektive mit dem Buch des französischen Philosophen über die kleinen Däumlinge. Großen Raum nehmen sodann die mannigfaltigen Antworten von Studierenden, Lehrenden und Mitarbeiter_innen im Rahmen von Einzelinterviews zu den drei Leitfragen ein: 1. Was verstehen wir unter Wissen? 2. Wie ist unsere digitale Wahrnehmung? 3. Wie tickt unsere Zeit? Ergänzt werden diese Stimmen durch den Abdruck verschiedener Screenshots aus den digitalen Lesegruppen und Leseforen. Ein Essay über soziales Lesen und Schreiben unter den Bedingungen der digitalen Transformation rundet diesen Band ab.