Automatic Semantic Enhancement of Data2Services pipeline

Semantically enhanced data plays a crucial role in information management. The richness provided by such data, in comparison to other poplar available formats can benefit users with all levels of experience in their work. In my internship project, I will enhance the semantic mappings between the source data and the target data model from the Data2Services framework.


Nowadays, data can have multiple different formats and structures (e.g. CSV, XML, RDB). The size and amount of available data, together with its diversity contributes to the expansion of different tools used for integrating it. Therefore, data processing is becoming increasingly difficult. Semantically enhanced data can provide more fruitful information to the user compared to other available formats. Therefore, a tool which provides integration of semantically enhanced data  should be accessible to both experts and non-experts alike. The Data2Services framework will automatically produce “semantic” mappings between the source data and the target data model. However, the framework cannot tap into existing terminologies and ontologies to annotate data with specific concepts and relations. To address this limitation, we will use (semi)-automated methods that use a combination of i) NLP-based concept recognition, ii) profiling of the value space, range and lexical datatypes, and iii) machine learning method to concept assignment.