Limburgish on the digital map

For years, Limburgish has been struggling with a major shortage of digital resources and technical systems to support, study and make the language and all its dialects accessible. This lack hinders not only scientific research, but also the development of digital applications such as speech recognition, machine translation and other AI-based technologies. 

A new project, implemented by Andreas Simons under the direction of Leonie Cornips (Chair of Language Culture in Limburg) and subsidised by the Hoes veur ‘t Limburgs, is now committed to changing this.

Why a Limburgish Corpus?

Modern technologies and scientific research in languages depend on so-called ‘corpora’: large databases of source material such as books, poetry, internet blogs and conversations. However, Limburgish is among the most poorly documented Germanic languages. This means that the language is hardly accessible to researchers, developers, education and governments. The lack of digital availability leads to a vicious circle in which the visibility, use and prestige of Limburgish continue to decline.

Despite these challenges, there is growing interest in Limburgish. Young people increasingly use the language on social media, and various resources such as local literature, dialect dictionaries and theatre scripts exist. What is missing is a central and publicly accessible repository for this material.

Digital infrastructure in the Hoes veur ‘t Limburgs

A digital infrastructure (digital resources and technical systems to store, manage and make the language accessible) will be set up in one year to collect, manage and complete a Limburgish Corpus. At the end of the project, a basic version of the corpus will be available for further scientific research and industry applications. An edited version of the corpus will be made publicly available so that researchers, students and developers can work with the data. This will create a snowball effect for further research and position Limburgish as a ‘studyable’ language.

The infrastructure will also be easily expandable so that future projects can add to and further develop the corpus. This paves the way for training language models and other applications, similar to initiatives for other minority languages.

 

Picture by: Laura Knipsael

Also read