24 May
13:00 - 17:30

FAIR Personal, Open and Distributed Data on the Web

The purpose of this symposium is to bring together a diverse set of stakeholders to explore the progress, opportunities, and challenges that lie in the discovery and reuse of personal, open, and distributed data on the Web in a manner that is scalable, responsible, and secure.

This event is brought to you by the Institute of Data Science at Maastricht University (IDS@UM) the Internet Technology and Data Science Lab at the University of Gent and industry partner OntoForce.


Abstracts and Bio's

Accelerating biomedical discovery with an Internet of FAIR data and services

Michel Dumontier

With its focus on improving the health and well being of people, biomedicine has always been a fertile, if not challenging domain for computational discovery science. Indeed, the existence of millions of scientific articles, thousands of databases, and hundreds of ontologies, offer exciting opportunities to reuse our collective knowledge, were we not stymied by incompatible formats, overlapping and incomplete vocabularies, unclear licensing, and heterogeneous access points. In this talk, I will discuss our work to create computational standards, platforms, and methods to wrangle knowledge into simple, but effective representations based on semantic web technologies that are maximally FAIR - Findable, Accessible, Interoperable, and Reuseable - and to further use these for biomedical knowledge discovery. But only with additional crucial developments will this emerging Internet of FAIR data and services enable automated scientific discovery on a global scale.

Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research focuses on the development of computational methods for scalable and responsible discovery science. Previously at Stanford University, Dr. Dumontier now leads the interfaculty Institute of Data Science at Maastricht University to develop sociotechnological systems for accelerating scientific discovery, improving human health and well-being, and empowering communities with ethical data-driven decision making. He is a principal investigator in the Dutch National Research Agenda, the European Open Science Cloud, the NCATS Biomedical Data Translator, and the NIH Data Commons Pilots. He is the editor-in-chief for the journal Data Science and an associate editor for the journal Semantic Web. He is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.

Research challenges for decentralized data

Ruben Verborgh

People have lost control over their personal data, which has become centralized in the hands of a few companies. The Solid project aims to give people back that control, by separating data storage from data usage. While this leads to desirable ecosystem properties for both data producers and data consumers, it also comes with considerable challenges regarding scalability and interoperability. In this talk, I will introduce the Solid ecosystem, outline current work, and the research challenges ahead.

Ruben Verborgh is a professor of Semantic Web technology at IDLab, Ghent University – imec, and a research affiliate at the Decentralized Information Group of CSAIL at MIT. Additionally, he acts as a technology advocate for Inrupt and the Solid ecosystem of apps that let you keep your own data. He aims to build a more intelligent generation of clients for a decentralized Web at the intersection of Linked Data and hypermedia-driven Web APIs. Through the creation of Linked Data Fragments, he introduced a new paradigm for query execution at Web-scale. He has co-authored two books on Linked Data, and contributed to more than 250 publications for international conferences and journals on Web-related topics.

Data2Services: enabling automated conversion of data to services

Alex Malic, Vincent Emonet

Actual database silos don't enable researcher to easily integrate and link data from multiple data sources, which is required to answer even simple biomedical questions. RDF is a good solution but lack of tools to build and maintain large and complexe knowledge graph. We propose the Data2Services framework composed of small individual modules, focusing on scalability and reusability, that can be run providing a few parameters. Those modules are deployed using Docker container for easy deployment. Using RDF as common data model, we propose the Data2Services framework to integrate heterogeneous data sources under a same data representation, and deploy various services to access this data depending on its model. This framework is composed of small individual modules, focusing on scalability and reusability, that can be deployed using Docker container for easy use.

Vincent Emonet is a developer working on Linked Data for Biomedical research at Maastricht University. He works on the Data2Services project, a framework to generate services from various data sources using a shared data model. Prior to that he took part to the Bio2RDF project in Quebec, aiming to convert biological databases to RDF. And he worked on BioPortal and AgroPortal, two repositories for Biomedical and agronomical ontologies, in Montpellier, France.

OptimAL: Optimal combination of humans and machine using active learning towards drug data quality assessment

Remzi Celebi, Amrapali Zaveri

Therapeutic intent, the reason behind the choice of a therapy and the context in which a given approach should be used, is an important aspect of medical practice. There is a need to capture and structure therapeutic intent for computational reuse, thus enabling more sophisticated decision-support tools and a possible mechanism for computer-aided drug repurposing.
Automated methods such as NLP or text mining fail to capture relationships among diseases, symptoms, and other contextual information relevant to therapeutic intent. This is where human curation is required to manually label these concepts in the text so as to train these methods in order to improve their accuracy. However, acquiring labeled data can get expensive at a large scale. This is why limiting the number of required human labelled data could be done using machine learning (ML) methods (e.g. active learning algorithms) in order to accurately label data.
Existing active learning algorithms help choose the data to train the ML model so that it can perform better than traditional methods with substantially less data needed for training. Humans provide the labels that train the model (labeling faster), the model decides what labels it needs to improve (labeling smarter), and humans again provide those labels. It’s a virtuous circle that improves model accuracy faster than brute force supervised learning approaches, saving both time and money. The question, however, that is not yet answered is what is the optimal balance between the ML algorithms accuracy and the amount of human-labeled data required in order to achieve high accuracy. To address these issues, we propose OptimAL that aims to optimally combine machine learning with human curation in an iterative and optimal manner using active learning methods in order to i) create high quality datasets and ii) build and test increasingly accurate and comprehensive predictive models.

Remzi Celebi is a postdoctoral researcher at Maastricht University in the Institute of Data Science (IDS) since August 2018. He completed his PhD in Computer Engineering from Ege University, Turkey. His research interests include machine learning, knowledge graphs and drug repurposing. He is currently working on graph embedding methods and deep learning methods to extract hidden knowledge from big knowledge graphs and unstructured data.

Analyzing partitioned FAIR health data responsibly

Chang Sun

One of the unique characteristics of Big Data in health is that it is extremely partitioned across different entities. Citizens, hospitals, insurers, municipalities, schools, etc. all have a partition of the data and nobody has the complete set. Sharing across these entities is not easy due to administrative, political, legal-ethical and technical challenges. To tackle this challenge, we developed a scalable technical and governance infrastructure which can combine access-restricted data from multiple entities in a privacy-preserving manner. In addition, I will present the FAIRHealth project which is a collaboration with Statistics Netherlands and the Maastricht Study as a use case.

Chang Sun is a PhD candidate, who joined IDS in October of 2017. Meanwhile, she is a co-founder and board member of the Youth Data Science Community@UM. After graduating from the Artificial Intelligence Master Program, she started her PhD at IDS to continue her research on the data science domain. Her focus is on privacy-preserving data mining and distributed machine learning to solve the problem of analyzing sensitive data across multiple independent data parties. Currently, she is working on the FAIR-Health project: analyzing fair health data responsibly collaborating with CBS and the Maastricht Study.

Automating data reuse for Mobility as a Service

Pieter Colpaert

Mathematicians introduced various algorithms in order to calculate the best route home or calculate a shortest path. In each of these algorithms, the documented method is to (1) manually collect the data on one machine first. Then (2) the data can be fitted to the right internal model. Only then (3) the query evaluation happens. In order to innovate route planners today, we need to automate the data integration itself. Open Data describes exactly this challenge: trying to publish data for maximum reuse. In this presentation, the Linked Connections, Routable Tiles and Tree Ontology building blocks will be introduced. This way, a reuser can exploit the HTTP protocol to discover data, to keep them in cache, and to evaluate questions over multiple data sources on client-side. Thanks to a high cache hit-rate on the server, a good cost-efficiency for data publishers is achieved.

Pieter Colpaert am a postdoctoral researcher at IDLab – Ghent University – imec, born in the year the Web was invented. I hold a master degree in engineering from Ghent University since 2012 and a PhD from the same university since 2017. Before starting my research at IDLab, I was running a digital signage start-up called FlatTurtle visualizing datasets in and around office building. Back in 2012, I cofounded a not for profit organisation to promote Open Knowledge in Belgium. I have mainly been active in the transport domain, with e.g., the iRail project. However, this also implied doing projects in the e-gov domain, smart cities, administrative simplification and concerning Open Data in general. My team’s research focuses on Linked Open data Interface, in order to publish data for maximum reuse. As of January 2017, I’m the first “chief technology” of Smart Flanders, in which my task is to align the open data strategies among various stakeholders in Flanders.

Knowledge Graph construction and validation

Anastasia Dimou

The real power of Semantic Web will be realized once a significant number of software agents that require information derived from diverse sources, become available. However, software agents still have limited ability to interact with heterogeneous data. Intelligent software agents which function with Semantic Web technologies do not have enough data to work with, and human agents do not want to put in effort to provide Knowledge Graphs until there are software agents that use it. This talk will discuss a set of complementary techniques that contribute in generating high quality Knowledge Graphs with least effort needed.

Anastasia Dimou is a senior/post-doctoral researcher at imec and IDLab, Ghent University. Anastasia joined the IDLab research group in February 2013. Her research interests include Knowledge Graph Generation and Publication, Data Quality and Integration, Knowledge Representation and Management. As part of her research, she investigated a uniform language to describe rules for generating high quality Linked Data from multiple heterogeneous data sources. Anastasia currently conducts research on automated Knowledge Graph generation and publication workflows, data validation and quality assessment, query-answering and knowledge integration from Big stream data. Her research activities are applied in different domains, such as IoT, manufacturing, media and advertisement and led to the development of the RML.io tool chain. She is involved in different national, bilaterals, and EU projects, authored several peer-reviewed publications presented at prominent conferences and journals such as ESWC, ISWC, JWS and SWJ, participated in several PCs and co-organized tutorials and workshops.

How to measure the value of making data FAIR?

Filip Pattyn

Generating data which is more FAIR or converting data into more FAIR data comes with an effort or cost. The goal of the FAIR movement is to ease (re-)use of data and thus reducing the 'effort' to make that data actionable. It's key to find a balance between the cost of making data more FAIR and the return you can generate by reducing the cost for data consumers. In this talk, a metric will be explained how you can calculate the effort or cost of making data more FAIR. This will be demonstrated with examples.


Also read

More events