An open book that doesn’t make sense
DKE’s Jerry Spanakis on responsible machine learning models, what algorithms can and can’t ‘understand’ and why that matters to all of us.
This article is meaningful. Not in the sense that it’s particularly relevant: that’s for you to decide... But there is an intended meaning which has been translated into text, which has been translated into a sequence of 0s and 1s, which has been translated back into text, and which you are now translating into meaning.
So far, so good, but interpreting the infinity of content on the internet isn’t that straightforward. Jerry Spanakis thinks about how to structure the unstructured: he is using algorithms to find ways to make sense of all the data at our disposal, to extract useful information and to add semantics. Ultimately, his goal is to make those insights useful to people.
Algorithms don’t quite get it
“You need to keep in mind that, for example, Twitter is not very representative of the actual population. Information and data online is a distortion of the real world – we need to critically question that data when determining the potential scope and limitations of our research.”
Spanakis adds another disclaimer: “To say that an algorithm ‘understands’ language is an overstatement. The progress speech recognition has made is incredible; just compare Google Translate now to what it was ten years ago!” Yet, he warns Greekly: “We should be wary of hubris. Algorithms are still quite bad with issues like e.g. context and common sense.”
The main challenge he has faced in social media analysis is its constant evolution, moving from text to images to videos and, more recently, to videos vanishing after a set time. “It becomes a lot more tricky with things like for example TikTok, which I am now trying to familiarise myself with. Thank goodness, I have my students to update me on the latest trends and developments.”
Helping those sold to
Sparked by his experiences around a delayed flight, Spanakis tries to take care of the less tech-savvy. “The airline did not respond to my compensation claim at all until I pressured them on social media. When they denied I was eligible, I took my case to the regulatory authorities. Only then, after six months, did I get my money back. I’m looking for ways to make these complicated processes accessible to all consumers.”
A consumer protection measure that is very hard to police is that social media influencers now have to declare when they are being paid to advertise products. So, Spanakis is collaborating with the Law faculty. “You can train algorithms to detect logos or even voice modulations but for now my focus is on building algorithms that analyse the captions in the pictures, videos, or especially the comments underneath.”
A computer scientist and engineer by training, Spanakis is acutely aware that technology doesn’t exist in a vacuum. An example with dire real-world consequences is hate speech. “There’s actually very little moderation. It’s challenging to detect – an algorithm assigns an objective value to statements made in vastly different cultural contexts. There are many false positives because algorithms struggle with irony, for example. Also, these companies are profit-driven and it might not be in their interest to moderate incendiary content that drives engagement.”
Objectively spotting nonsense
A prominent recent example of social media moderation was Twitter’s flagging of Donald Trump’s tweets disputing the legitimacy of the US election results. “I imagine there was a great amount of manual work involved. Algorithms can filter things, but you still need humans to check them – and given the sheer volume of information, that’s not practical.”
“For content to be successful, to go viral, it needs to appeal to people. But in order to reach them in the first place, it needs to be promoted by the algorithm of the big social media platforms – and we don’t know how these algorithms work.” The algorithms promote content likely to command attention, the basic currency of social media, but that is an economic, not a civic imperative.
Spanakis worries about the societal impact of how those news feeds are curated – and tries to do his part: “Within ten hours, our algorithm can determine with 80% accuracy whether content is fake, just by the pattern of its spreads. On Twitter, for example, reputable news stories tend to spread over a longer period and are shared by people with more followers who also follow more people. Fake news tends to spread in bursts.”
A crucial blow in the fight against fake news? Not quite. “You have to keep in mind the sheer numbers involved and how many people can be reached and, more importantly, influenced within those ten hours.” Spanakis now studies corona-related news over the last year and to what extent a text’s emotional slant influences people’s reaction.
While people have always been susceptible to manipulation, digital technology provides hitherto unimaginable tools for bad faith actors. Research and development of viable safeguards will be crucial to the functioning of civic society – and common sense at large.
Jerry Spanakis is assistant professor at the Department of Data Science and Knowledge Engineering (Faculty of Science and Engineering) and the Maastricht Law+Tech Lab (Faculty of LAW)