UM Data Science Research Seminar
The UM Data Science Research Seminar Series are monthly sessions organised by the Institute of Data Science, on behalf of the UM Data Science Community, in collaboration with different departments across UM with the aim to bring together data scientists from Maastricht University to discuss breakthroughs and research topics related to Data Science.
This session is organised in collaboration with the Department of Data Analytics and Digitalisation (DAD).
Time: 12:00 - 12:30
Title: Towards building a Mechanical Pollster: what Convenient Digital Samples and Machine Learning can contribute to Public Opinion Research.
Speakers: Roberto Cerina and Raymond Duch
Abstract: Online digital samples, such as those derived from the Twitter streaming API, have the potential to disrupt public opinion research as we know it, by enabling real-time, granular measurement of global attitudes from unobtrusively measured opinions and behaviours. We baptise the machine which takes digital samples as inputs to generate real-time, representative estimates of public opinion and behaviour as the `Mechanical Pollster’. We identify three broad challenges which have thus far hindered the birth of this disruptive AI: i) the difficulty in converting unstructured online data to structured survey-like objects; ii) the adverse selection effects into digital samples, which tend to be correlated with most behavioural outcomes of interest; iii) the practical computational challenges in repeatedly fitting complex models to these data to update predictions as new data points are observed. In this paper, we first propose a novel way to structure digital samples from the Twitter streaming API with the help of Amazon Mechanical Turks; we then test the degree to which digital samples can rival traditional Random Digit Dial polls in estimating the support for US Presidential Election Candidates in the US, under relatively small sample sizes; finally, we compare these estimates obtained via fitting computationally costly parametric models with craftier non-parametric models from the machine-learning literature, to explore whether the goal of real-time updating of global opinion estimates is realistic.
Time: 12:30 - 13:00
Title: Anomaly Detection by Robust Feature Reconstruction
Speaker: Ron Triepels
Abstract: Auto-encoders have become increasingly popular in anomaly detection problems over the years. Nevertheless, it remains a challenge to train autoencoders for anomaly detection purposes properly. A key contributing factor to this problem is that there is usually no clean dataset available from which the normal case can be learned. Instead, autoencoders must be trained based on a contaminated dataset containing an unknown amount of anomalies that potentially harm the training process. In this paper, we address this problem by studying the impact of the loss function on the robustness of an autoencoder. It is common practice to train an autoencoder by minimizing a loss function (e.g. squared error loss) under the assumption that all features are equally important to be reconstructed well. We relax this assumption and introduce a new loss function that adapts its robustness to anomalies based on the characteristics of data and on a per feature basis. Experimental results show that an autoencoder can be trained by this loss function robustly even when there are many anomalies in the training set.