The aim of this project is to develop an automated pipeline for translating the information content of case law records in European and Dutch National law databases into the semantically rich data publishing format, RDF (Resource Description Framework), thereby enabling: 1) advanced querying and analysis of the information through a host of RDF tools and services, and 2) linking of semantically related information from other application domains.
The has been significant effort in recent years to publish the metadata and information content of court decisions in online, public databases such as EUR-LEX (http://eur-lex.europa.eu). This has enabled data scientists and empirical legal researchers to investigate how court decisions evolve over time, what factors influence these decisions and if they law is being consistently applied.
However, the problem is that current case law databases use disparate terminology, data formats and data access methodologies. Furthermore, national databases remain isolated from each other as well as from European and international level databases. This makes it difficult to answer case law research questions on a global scale. Furthermore, because some databases still use legacy data formats, the data analytics tools being used by researchers are less powerful and outdated.
We propose to develop an automated pipeline to convert case law data into the semantically rich RDF (Resource Description Framework) format. RDF is the W3C (World Wide Web consortium) recommendation for representing linked data on the Web. Representing the information in RDF enables computers to understand what the data means because the terminology is defined in standardized vocabularies called “ontologies” that are also published on the Web. RDF also natively solves the problem of unlinked data because disparate terminology across databases are mapped or unified using ontologies. This enables the automatic integration of case law data across databases as well as the use of open source, actively supported RDF software to conduct advanced querying, visualisation and analytics on the data, regardless of its original source.