Institute of Data Science lands two major research projects on FAIR data
The Institute of Data Science (IDS) at Maastricht University is the only European research group invited to take part in an international research project led by the National Center for Advancing Translational Sciences (NCATS) in the USA. “This will allow us to collaborate with the world leaders in translational research”, says Michel Dumontier, distinguished university professor of Data Science and director of the IDS. Another research project headed by the IDS is set to receive funding through the Dutch National Research Agenda (NWA). “We want to make it possible to access medical data for research purposes in a responsible manner, in line with the FAIR principles: Findable, Accessible, Interoperable, Reusable.”
In September 2016 the Dutch Ministry of Education, Culture and Science unveiled its plans to invest €30 million in the NWA. Some €20 million of this is earmarked for the ‘startimpuls’ programme, where it funds research focusing on certain themes. Seven of the 54 proposals submitted under the theme Responsible Value Creation with Big Data have been selected for funding, including the IDS’s ‘Analyzing partitioned FAIR health data’. The IDS will collaborate on the project with the Maastricht University Medical Centre (MUMC+), the research group Ethics, Legal & Social Impact (ELSI) of Maastricht University, the Maastro Clinic and Statistics Netherlands (CBS).
The aim of the project is to synthesise the medical data stored separately by different institutions, such as schools, hospitals and insurance companies. As such data are privacy sensitive, the key is to do so in a responsible manner. “In addition to leading the study, the IDS offers expertise in machine learning and research data management, with a view to applying the FAIR principles to these partitioned datasets. The ELSI research group is contributing expertise on the social, legal and ethical aspects of big data in health. The Maastro Clinic brings the Personal Health Train concept, which gives access to heterogeneous data sources while ensuring maximum privacy protection. CBS is providing socioeconomic data that will help us to identify the determinants of health. And finally, the MUMC+ is contributing the health data from the Maastricht Study, particularly surrounding type 2 diabetes”, explains Dumontier. “Our main motivation is to understand the relationship between diabetes, lifestyle, socioeconomic factors and healthcare utilisation. We can then use this knowledge to inform guidelines with a major impact on public health.” The NWA grant covers a period of three years.
From basic research to medical application
Unlocking data in order to advance medical research is also one of the drivers behind the NCATS Biomedical Data Translator project. The NCATS is a national medical institute in the US that funds projects seeking to translate basic research into effective treatments for disease. In the Translator programme, experts from 16 universities and research institutes are working on a computational tool to improve the connections between different types of biomedical data and provide insight into disease biology and treatment. Alongside partners such as Harvard, Johns Hopkins University, MIT’s Broad Institute and the University of Montreal, Maastricht University is the only European participant. The IDS is collaborating with a team from Columbia University on the development of a computational infrastructure to address the problems associated with taking multiple prescription drugs.
“Our work focuses on establishing key infrastructure to enable scientists to answer sophisticated research questions using public biomedical data and web services”, says Dumontier. “The problem is that data and software rarely follow the FAIR principles, and they require substantial amounts of preparation before you can even begin to answer any question. We’re working with our partners to identify the minimal interoperability requirements so that we can maximise reuse. For instance, we’ve established standards and infrastructure for describing and accessing datasets and web services, which can then be incorporated into programs to generate new insights into the side effects of drugs or the basis for genetic disease.”