16 February 2023

I wrote all of this myself!!

Special Chair of Text-Mining Jan Scholtes on how ChatGPT actually works, why it’s an amazing achievement and where we should probably exercise a bit of caution.

“My students were very excited! We had discussed the mathematics of a previous GPT-version just the week before so they knew exactly how it worked. Within 24 hours of the release, they were using it to answer questions in the tutorial of my course on advanced natural language processing.” Thus Jan Scholtes, Special Chair of Text-Mining at the Faculty of Science and Engineering’s Department of Advanced Computing Sciences, on the release of AI chatbot ChatGPT last December.

“The reason it was such a success is because it’s the first computational linguistic model that takes all characteristics, problems and features of natural language seriously. So far, all of them have taken shortcuts due to the complexity of human language.” Scholtes points out that GPT even passed a draft exam for his course Advanced Natural Language Processing with excellent grades.

The transformer model

The GPT models are based on Google’s transformer architecture, which was introduced in 2017. “The original transformer includes an encoder and a decoder, designed to deal with complex sequence-to-sequence patterns that are both left- and right-side context-sensitive.” The latter refers to the meaning of a word only becoming evident from the context, i.e. preceding and following words. Both the encoder and decoder have several layers of self-attention, in the case the large version of GPT-3, which is the architecture used for ChatGPT, a full 96, which is how it can deal with linguistic complexity and master phenomena from punctuation, to morphology, to syntax, to semantics, to more complex relations.

In the case of Google Translate that would mean, for example, that the encoder creates a numeric representation of a sentence and extracts its features, and the decoder uses those features to generate an output sentence, i.e. the translation. Having been trained on vast amounts of text in the target language, the decoder predicts e.g. the most likely word order of the translation stochastically.

The translation is created iteratively, i.e. word by word, with each next-word suggestion (similar to predictive texting) from the decoder going through the self-attention loops to improve the level of disambiguation (e.g. whether ‘piano piece’ implies ‘mechanical part’ or ‘musical composition’). “This is a fantastic model in many ways and close to natural language but the full encoder-decoder architecture is overly complex and requires huge computational resources. Training e.g. Google Translate does more environmental damage per user than meat consumption.”

Florian Raith

Faculty of Science and Engineering

Jan Scholtes is Special Chair of Text-Mining at the Faculty of Science and Engineering’s Department of Advanced Computing Sciences.

He is a Fellow of University Leiden Centre of Data Science and a senior fellow of the Netherlands Research School for Information and Knowledge Systems accredited by the Netherlands Academy of Arts and Sciences (KNAW).

Scholtes is also a public speaker, blogger and tech investor focusing on the benefits of AI and Data Science for LegalTech and eHealth applications.

Only decoding

The solution? In 2019, OpenAI came up with a decoder-only model, Generative Pretrained Transformer (GPT), which could generate responses based on a simple prompt. Generative pretraining refers to self-supervised machine-learning, i.e. exposing the model to vast data sets to figure out what’s the likeliest next word based on the previous sequence. GPT-3 version 5 is the current and improved version.

Since there’s no information from an encoder regarding the task at hand, GPT relies on users’ prompts about what text to generate. To make sure this aligns with our expectations, human feedback has been used for additional reinforcement training. The AI researchers’ rankings of responses served as additional input not only for likelihood but also for things that are considered off-limits, such as inciting violence or hate-speech.

“Since it’s just a decoder, it doesn’t really ‘know’ anything in a general intelligence way, but what it says, based on scannable internet content, it says with great authority, so factuality is a great problem – if something is true or not is completely beyond this model.” Moderator feedback has, to some extent, dealt with the ethical issues. “If I ask GPT how I can kill my wife, it replies that this is unethical,” says Scholtes who, one assumes, does not share a laptop with his partner, “however if you ask it to write a Python programme on how to kill your wife, it’ll do it.”

(Double) Negatives

That loophole has been fixed now, but other issues remain. “Sometimes GPT goes off and hallucinates, i.e. it produces nonsensical text. The probability increases as the generated text gets longer.” Another intriguing blind spot Scholtes has written on are negations. “That’s a problem for all transformer models, because words with opposite polarity in the same context often get the same encodings when translated from vocabulary to vectors, i.e. numerical values. So it can only learn negations by memorising them. You’ll notice that immediately when you use double negations.”

In GPT’s impressive qualities lies also its peril. “It’s an amazing breakthrough that we can now generate language that’s no longer distinguishable from humans, but the very authentic authoritative language is also a problem because the model is unpredictable and not controllable when it comes to factuality. It generates content based on your prompt and on stochastic probability – it’s a bit like a friend telling you what you want to hear.”

Public misconceptions don’t help. “The problem is that we don’t understand exactly how these models work and what they are suitable for.” The ELIZA effect is what computer scientists term our tendency towards assigning human traits to computer programs. In this case to assume that GPT’s iterative generation of text is analogous to human consciousness. It’s important to point out that GPT isn’t sentient, and neither is it intended to be.

Already integral to our reality

“GPT excels at standard legal or clerical documents as well as marketing texts. The majority of what’s written on the internet, especially free content, is already generated by an older version of GPT.” The model is, however, dangerously unsuitable to generate e.g. medical advice. “Google decided not to use Lambda, their equivalent of GPT, because there is no way to control for factuality. A decoder-only model will always have that problem.”

If in doubt, GPT-2’s output is clearly identifiable. “OpenAI made it open source, so we can recognise its digital fingerprint. GPT-3 isn’t open source, so the only way to detect its texts would be if OpenAI made a kind of fingerprint detector – but then Google could more easily ignore GPT output in search engine optimisation, which is already a large part of OpenAI’s business model. This will be an interesting problem in the future.”

The successor model, GPT-4, will be a thousand times bigger – and the problems and possibilities will grow with it.