Data science: changing the world one bit at a time
The amount of data produced by scientists increases by one third every year, according to the European Commission. How can they find their way around this mountain of data? This is the key question intriguing the new distinguished university professor of Data Science, Michel Dumontier. The 41-year-old Canadian researcher is relocating to Maastricht from the prestigious Stanford University, where he focused on discovering new drugs and precision medicine.
When Dumontier was invited to establish an interdisciplinary institute for data science in Maastricht, he didn’t have to think twice. “If you want to change the world you have to collaborate. What Maastricht has in mind is unique. Nowhere else in the world will you find an interfaculty institute for data science like the one planned here.”
Rewind a decade or so and a data scientist was typically a computer programmer with a highly technical background. With all the software that has since been developed,
Dumontier says, it’s now someone who knows how to ask the right questions. “If you know what problem you want to solve, you can figure out what data you need.” Dumontier himself never studied computer science; he is a self-taught programmer who originally trained as a biochemist.
Given his first computer at the age of five, he was immediately hooked. His mother had to coax him out from behind the screen to play outside from time to time. But his father was just as enthusiastic as he was; as he saw it, computers were the future. Dumontier was constantly playing video games with his younger brother. Still, when it was time to choose a study programme he opted for biochemistry, having always been interested in the natural sciences. It was only during his PhD that he decided definitively to focus on programming.
“Initially it was a very frustrating experience, but at some point I got the hang of it. I learned how to interact with the computer; how to get it to do the things I wanted it to do. Once I learned how to translate my thoughts into code, I realised that almost anything I could think of, I could do on the computer. That was an enormous ‘aha’ moment. It’s a kind of method-driven creative engagement. Once I discovered that, I never looked back at being a biochemist. Biological research is actually really difficult. You do an experiment one day and get certain results, the next day you do the same experiment and the results are totally different. And you’re not sure why: is it something I did, or was the result wrong in the first place? If a computer program doesn’t work, it can only be your own fault. And you can fix it.”
He calls this the ‘empowering aspect’ of data science, and it is something he aims to impart to his students. “It’s my hope that when we give undergraduates and even high school students some data and methodological training and then have them analyse a problem from their environment, they can come up with creative solutions and recommendations for policymakers and managers. Once you’ve learned the basic tools of data science, you can start thinking about all sorts of problems, from the impact of immigration to the consequences of ageing here in the region.”
In Stanford Dumontier developed a way of making connections between biomedical data, enabling predictive analyses to be made from the mountain of information. “We’ve built a network, which we call a knowledge graph, made up of data from individual molecules, chemicals, genes and proteins all the way up to geographic information about populations from all over the world. We use a set of methods that not only allows us to answer a question, but also to find connections that would otherwise remain hidden. For example, we use this drug for that disease, but maybe we could use it for another disease; it’s just that nobody’s ever explored it.”
What makes their method so special? “In principle nothing. There are many different ways to connect data, but it’s about adopting a standard way of storing and processing data.” Having a single standard allows data to be used and reused by others – and the more accessible the data are, the better scientists can validate their research and the greater the chance of finding solutions to important problems. “It’s similar to the reason the internet is so successful. All you needed were two technical tools: TCP/IP, a way of transporting information around a network, and HTML, a means of putting information on a page so that it can be rendered by a browser. Now that this is standard, you only need one browser to view the web of data. But imagine nobody had that standard; you’d need a million browsers to be able to read all those different pages. So these tools give us a uniform way of sharing information and making it accessible. My group focuses on what the minimum standards are to make scientific data accessible in a uniform way. That’s the technology we’ve developed and want to bring to the attention of others.”
This is what will happen in the Institute for Data Science, which currently goes by the working name IDS@UM. “It’s an opportunity to see if our methods can be applied to these data, the problems here, and not only to the biomedical data I used at Stanford. The institute will be a unifying space. We’ll be working closely with researchers from the DKE, but also with data scientists from all disciplines, from economics and humanities to health and life sciences, so that we can share different methods and applications.
We want to give a home to all those data scientists who are now working alone at UM, and create a good academic environment: what is the latest and greatest development? How well does it work? Could econometric methods also be used for human health, and vice versa? Can a discovery in one domain also work in another?” Thinking small is not part of Dumontier’s repertoire. He hopes they will ultimately be joined by universities worldwide. “If you want to change the world, you can’t do it alone. Wouldn’t it be great if a doctor could give you the right treatment from day one, tailored to your individual characteristics – don’t we all want that?”
Privacy is naturally a big issue, especially in medical science. “People worry that the information they provide could be used against them. But the solution is not to not share information. I think the solution is to have rules and laws that punish those who abuse the privilege of having access to your information. One of the reasons medical research is so slow is because medical data are not easily accessible.
Let’s not put up barriers to accessing data. There are far more people who would do good things with data than people who would do bad things with them. So let’s not punish the people who try to do good.”
Dumontier and his partner, an internist, are already house-hunting in Maastricht. He met her at Stanford where she was doing an internship in biomedical informatics. “We have this shared interest in informatics.” Moving from the States to Europe was no problem. His father was a salesman, and for his work the family moved around a lot. “I was born in Winnipeg, but we lived in Woodstock Ontario, Montreal and Thunder Bay, where I went to high school. I’m at home wherever I am, and that’s now Maastricht – one of the nicest cities I’ve seen, by the way.”
Barely in his 40s, with a high-flying career – does he spend all his time working? “Not at all, I stick to the standard working day. People tend to think the more hours you work the more productive you are. For me it’s the opposite. If you take more time to think and reflect, you’re more effective in the things you do.”
What does he do to switch off? “Two things: first, I have a rabbit. She came with me from Canada to America, and she’ll move to Maastricht too. She’s probably the most widely travelled rabbit in the world”, he laughs. “She’s seven years old. I had another rabbit too, but he passed away recently. So we’re looking forward to getting some new rabbits here. Watching rabbits just doing their thing is very Zen. You get to know them. They’re intelligent individuals and they have huge personalities. It helps me clear my head and just relax for a bit.
“The other thing I do is play video games. I can spend hours doing that. People have different hobbies. Some people run; video games are my thing. It’s an opportunity to think completely differently. It’s not addictive, it’s more therapeutic. I’m very good at it and that’s also rewarding – you need something to do that you’re good at.”
Michel Dumontier (1975) was associate professor of Medicine (Biomedical Informatics) at Stanford University from 2013 to 2017. He obtained his PhD in biochemistry at the University of Toronto in 2004. From 2006-2013 he was assistant and associate professor at Carleton University in Ottawa. He holds various extracurricular positions, including chair of the AMIA Knowledge Representation and Semantics Working Group since 2016 and scientific director of Bio2RDF (an open-source project that creates and connects data for the life sciences) since 2010. He was appointed university professor of Data Science at Maastricht University on 1 January 2017.