Limburgish on the digital map
For years, Limburgish has been struggling with a major shortage of digital resources and technical systems to support, study and make the language and all its dialects accessible. This lack hinders not only scientific research, but also the development of digital applications such as speech recognition, machine translation and other AI-based technologies.
A new project, implemented by Andreas Simons under the direction of Leonie Cornips (Chair of Language Culture in Limburg) and subsidised by the Hoes veur ‘t Limburgs, is now committed to changing this.
Why a Limburgish Corpus?
Modern technologies and scientific research in languages depend on so-called ‘corpora’: large databases of source material such as books, poetry, internet blogs and conversations. However, Limburgish is among the most poorly documented Germanic languages. This means that the language is hardly accessible to researchers, developers, education and governments. The lack of digital availability leads to a vicious circle in which the visibility, use and prestige of Limburgish continue to decline.
Despite these challenges, there is growing interest in Limburgish. Young people increasingly use the language on social media, and various resources such as local literature, dialect dictionaries and theatre scripts exist. What is missing is a central and publicly accessible repository for this material.
Digital infrastructure in the Hoes veur ‘t Limburgs
A digital infrastructure (digital resources and technical systems to store, manage and make the language accessible) will be set up in one year to collect, manage and complete a Limburgish Corpus. At the end of the project, a basic version of the corpus will be available for further scientific research and industry applications. An edited version of the corpus will be made publicly available so that researchers, students and developers can work with the data. This will create a snowball effect for further research and position Limburgish as a ‘studyable’ language.
The infrastructure will also be easily expandable so that future projects can add to and further develop the corpus. This paves the way for training language models and other applications, similar to initiatives for other minority languages.
Picture by: Laura Knipsael
Also read
-
The Societal Impact Project
The Societal Impact Project stimulates students’ autonomous motivation to work on societal relevant problems. One of the topics this year is vaping.
-
How to increase vaccination rates: “It’s not a matter of convincing people”
Last year, at least eight people—the highest number since the 1960s—died of whooping cough in the Netherlands. Most of them were babies. Behind this tragic statistic lies a years-long trend: fewer and fewer parents are vaccinating their children against serious infectious diseases, which jeopardises...
-
Vaccine promotion policies for COVID-19
Two researchers from Maastricht University play a key role in translating research into vaccine policy recommendations for COVID-19: Timo Clemens, Associate Professor health policy and governance, and Inge van der Putten, Assistant Professor at the department of Health Services Research.