UM Data Science Research Seminar
The UM Data Science Research Seminar Series are monthly sessions organised by the Institute of Data Science, on behalf of the UM Data Science Community, in collaboration with different departments across UM with the aim to bring together data scientists from Maastricht University to discuss breakthroughs and research topics related to Data Science.
This session is organised in collaboration with the Department of Methodology & Statistics on 17th October, 2019 from 12:00 - 1:00pm.
Event is free and open to everyone. Lunch will be provided. Please register by 14th October, 2019.
12.00 - 12.30
Talk by Alberto Cassese
Title: Bayesian statistical methods for integrating high-dimensional data
Abstract: Nowadays, researchers collect multiple (high-dimensional) set of measurements on the same subjects (e.g. DNA and mRNA). Joint integrative analysis of such multi-source datasets is of great interest, since it allows answering new scientific questions. In this context, the number of variables (“P”) is much larger than the sample size (“n”), i.e. the so-called “small-n-large-P” setting, leading to mathematical and computational issues. Bayesian variable selection (BVS) is a general shrinkage method capable of handling these analyses in a multivariate fashion. My presentation will first introduce the general framework of Bayesian variable selection. Secondly, I will focus on an example. Specifically, a Bayesian hierarchical mixture regression model for studying the association between a multivariate response, measured as counts on a set of features, and a set of covariates. We have available RNA‐Seq and DNA methylation data measured on breast cancer patients at different stages of the disease. We account for the heterogeneity and over‐dispersion of count data (here, RNA‐Seq data) by considering a mixture of negative binomial distributions and incorporate the covariates (here, methylation data) into the model via a linear modeling construction on the mean components. I will conclude showing the results of the analysis on the breast cancer dataset.
12.30 - 1:00
Talk by Sophie Vanbelle
Title: Statistical models for accelerometers : from reliability to daily physical activity patterns
Abstract: Accelerometers permit to record physical activity type (e.g. stepping, standing, sedentary, sleeping) and intensity continuously over long time periods at high temporal frequency. This generates a huge amount of observations per subject. For example, recording information every second for 24 hours produces 86400 observations per subject. This poses a statistical
challenge that can be handled differently depending on the study purpose. When the aim is to model 24h overall activity patterns and compare these patterns between several groups (e.g. males and females), the amount of data can be reduced by summarizing information over shorter time periods (e.g. one minute or even one hour) without loosing information over the raw data struture.
To model the daily pattern of this new bounded outcome, a random effects zero-inflated beta-binomial model offers several advantages. For example, considering a beta-binomial model accounts for the bounded nature of the outcome and for correlation between observations within each time interval. The correlation between time intervals is then taken into account with random effects.
These models can be extended to study the reliability and validity levels of accelerometers in real time. The daily activity patterns of males and females in Maastricht study will be compared with the proposed method that can be
easily implemented in standard Bayesian software (e.g. Jags). Validity of the MOX® will be studied in patients in cardiopulmonary revalidation.