Large Language Models and Education

Large Language Models (LLMs) are Deep Learning algorithms that deal with textual data. Depending on the application and model, they can generate, modify, summarise, translate text, and more. 

The information on this page has been compiled from internal resources such as the faculty fact sheets and external literature regarding LLMs and education. 

 

LLMs are probabilistic. They create a probability map of words and sequences of words based on large bodies of training data harvested from web pages, e-books, e-journals, etc. In the case of generative models such as ChatGPT, the word choice is based on conditional probability. The model generates the word which is most likely to follow previous words, sentences and paragraphs.

Some popular applications of LLMs include:

  • Chatbots and AI assistants
  • Search engines’ summarised answers at the top of search results
  • Support in writing any text, software, and code
  • Translation engines

 

ChatGPT and Opportunities for PBL Education

ChatGPT (Generative Pre-trained Transformer) is the most prominent example of an LLM that is impacting education globally. ChatGPT 3.5 is a free platform (ChatGPT 4 requires a paid subscription) that generates answers to questions based on a vast knowledge database with relatively high accuracy and readability, hardly distinguishable from human texts. It can produce summaries, answers to essay questions, papers, programming code etc.

ChatGPT cannot reason or produce original knowledge. However, it can combine information from different sources in such a confident and convincing way that it gives a semblance of understanding and reasoning.

LLMs such as ChatGPT create immediate challenges regarding the quality of assessment and trigger debates about the function of education and the way we learn. However, we also see opportunities to meaningfully integrate LLMs into learning activities, allowing students (and staff) to gain inspiration, receive feedback and encourage reflection, implicitly activating critical thinking and digital literacy skills. Read more about risks and precautions in relation to LLMs like ChatGPT.

Options for Educational Design and Teaching & Learning Activities

Educational design in PBL can meaningfully integrate AI tools to support digital/AI literacy and the development of critical thinking competences. Maastricht University's problem-based learning model is based on four key learning principles: constructive, contextual, collaborative and self-directed learning. PBL hence offers a flexible basis for diverse and creative learning formats.

An effect on learning can be triggered when students are challenged to evaluate LLM/chatbot responses and improve them. Next to that, students can gain insight and experience in AI functionalities whilst critically approaching AI output. Chatbot models can furthermore be used in tutor group meetings for brainstorming and drafting initial essays.

Tips for integrating ChatGPT into your educational design

  • Provide students with ChatGPT-produced written content and ask students to improve it both in terms of content and academic standards.
  • Have students critically relate different sources instead of merely summarising them (which can be done by ChatGPT).
  • Have students provide key quotes from their readings and explain why they capture the gist of what they read.
  • Educate students on the use of AI tools like ChatGPT: what is it? What can it produce? What are pitfalls and disadvantages?
  • Have chatGPT answer the learning goals, then have students discuss these answers based on their readings.

Options for Assessment

LLMs such as ChatGPT challenge the way universities can measure in how far students have successfully mastered intended learning outcomes. Especially for written assessments taking place in an uncontrolled environment, it becomes impossible to ascertain the author/owner of the final product. To prevent plagiarism/fraud, assignments should be designed to be too complicated for these models to complete.

Also, at least one of the assessment moments within a module should occur in a controlled environment. An assessment design that connects the student's identity to their product increases the possibility of successfully judging whether the student masters the course learning outcomes.

Let UM’s vision on assessment be a source of inspiration to design course assessment, promoting ‘assessment for learning’ through, e.g. structural feedback, formative assessment, programmatic assessment, and multiple low-stake assessment moments.

Tips for designing PBL-proof assessments that diminish the impact of external sources

  • Add non-written components (e.g. presentation, poster, debate) to written assignments produced at home, e.g. ask students to visualise their line of argumentation through a mindmap, diagram, vlog or another way.
  • Have students integrate content from the module into their written assignments (with concrete references).
  • Have students provide reasons for why they disagree/agree with a particular case, primary source, image, etc., using content from the module.
  • Let students peer-review written assignments based on module content and mark the review's quality.
  • Have students use ChatGPT to formulate an answer to a question, and then have them critically assess this answer with arguments based on the module's content (with concrete references).
  • Require an analysis that draws on the in-class discussion.
  • Require the analysis of specifics from images, audio, or videos.
  • Require more extended text output (too large for the limited windows for prompting automated systems).
  • Require the analysis of recent events and contexts that are not in the system’s data (yet).
  • Design assignments that articulate nuanced relationships between ideas.
  • Organise more onsite exams within a controlled environment.
  • Ask students to affirm that their submissions are their work.

Tips to use AI tools like ChatGPT within your assessment practice

  • Have students use ChatGPT to formulate an answer to a question, and then have them critically assess this answer with arguments based on the learning materials used in the course (with specific references).
  • Ask students to mark text generated by ChatGPT or another AI tool in their assignments or exams.
  • Ensure that the expected intended learning outcomes are still assessed by checking the work produced by the students and not the AI tool. For example, ask students to critically reflect on the answers generated by the AI tool or by comparing the AI-generated answer with their own answer (or specific sources).

Tips when assessing assignments

  • Look for unusual language or formatting: texts (partly) produced by ChatGPT may be atypical of student work, with unusual or repeated phrases or differences in style, syntax, spelling and punctuation between different sections of the same text; pay extra attention to the content and reasoning of such sections/texts.
  • Currently, there are several AI text detection tools. These presently available software tools are still beta versions, which provide probabilistic results. It is not yet known how reliable their checks are. Therefore, we advise you to use them cautiously and only to confirm several other findings (e.g., writing style as mentioned above, lacking claims, no actual argumentation, meagre referencing, fake sources).
  • Always check for plagiarism in written assignments: texts produced by ChatGPT may contain excerpts from other sources without proper attribution, which plagiarism detection software such as Ouriginal can detect.
  • Always forward your suspicion to the Board of Examiners, who will investigate it and -if confirmed- sanction the student.
  • The Board of Examiners will sanction AI-generated exam work by considering it plagiarism (commissioned work) and/or fraud (an action that makes it impossible to evaluate the student’s knowledge).

Flaws of ChatGPT

ChatGPT has flaws that should be considered when using it in the classroom. For more elaboration on these and other risks, see Precautions & Risks.

  • Biased output based on poor “training data”.
  • The generated text sounds plausible, but the user must verify the factual accuracy.
  • LLMs can “hallucinate” by making up information
  • Output can be mediocre for specific and complicated topics
  • LLMs can only cite sources with a specific command and can make up sources
  • The model struggles to distinguish classic articles from other articles.

From ChatGPT 3.5 to 4 and Beyond

Given the rapid technological developments in AI, new versions of ChatGPT and the availability of competing LLMs (e.g., BARD, ERNIE) can be expected in quick succession. New versions may offer new functionalities or eradicate flaws of previous versions.

ChatGPT 4, launched on 14 March 2023,  includes a.o. the following changes/upgrades:

  • ChatGPT 4 can process more words and produce eight times more text (up to 25.000 words at once)
  • Multimodal input and output (not yet available at the moment of writing): ChatGPT-4 accepts and understands visual information and can connect that to output, e.g., a photo of a sketch of a website can, with the right prompt, be turned into an actual functioning website.
  • Improved accuracy and reliability
  • More effective filters for toxic, hateful content

Prevention


Updated Rules

Maastricht University has checked and, where necessary, will update fraud and plagiarism sections in the Education and Exam regulations (Onderwijs- en Examenregeling, OER) and Exam rules and guidelines (Regels en Richtlijnen, RR) to cover the impact of new AI functionalities on assessment.

On a course level, to prevent any misunderstandings, it could make sense to explicitly mention that students are not allowed to use ChatGPT or other similar text or image-producing AI tools in assignments, presentations or exams (unless it is part of your educational design).

The following sentence could be used: “Unless otherwise mentioned, AI-generated text used to answer exam questions/assignments can be seen as commissioned work that represents plagiarism and fraud and will be sanctioned under the Rules and Regulations.”

Safeguarding UM's Culture of Integrity

Maastricht University highly values its culture of academic integrity, with attention in each educational programme to ethical behaviour, social etiquette and professional attitude. Given the speedy development of AI tools and their impact on education, it is worth (re-) emphasising expectations from students about academic integrity and rules and regulation concerning fraud in the academic context.

Tips to safeguard a culture of integer behaviour

  • Trust your students and trust your culture of academic integrity
  • Accept an inevitable loss of control over how students learn
  • Make students aware of (academic) integrity policies
  • Encourage intrinsic motivation
  • Provide information on rules and regulations in course books and Canvas
  • In tutorials, talk with students about instructions, rules and expectations
  • Ask students to affirm that their submissions are their own work

Deterrence: Tools for Detecting AI-generated Text

At the moment, there is no reliable AI text detector. Detecting a rewritten AI-generated text (hybrid text) is also very difficult. Still, a couple of promising tools can give a semi-reliable indication of whether a text is AI-generated. A suspected text can be submitted to stand-alone tools such as:

AI-generated text detection technology will, at some point, be integrated into Maastricht University’s plagiarism service. The current service, Ouriginal, is unsuited for such integration and will be replaced by a more potent tool. Furthermore, automated watermarking of LLM-generated output can be detected faster by stand-alone tools and plagiarism applications. This feature is not bulletproof yet and is currently under development by OpenAI.

Precautions & Risks

Several risks are involved in using Large Language Models such as ChatGPT in the educational setting. Students and educators should know these risks before relying on this technology for their work.

Risks include but are not limited to:

  • Lack of transparency
    Large Language Models are incredibly complex. They are black box models, meaning that their internal workings are not transparent nor easily understood by humans. This complexity allows for accurate outputs but comes at the price of intelligibility, which makes it harder to identify errors, biases, and discriminatory elements in the model.

     
  • Hallucination
    The lack of transparency of LLMs can produce hallucination or the generation of erroneous output. Hallucination happens because LLMs do not understand the text’s context or meaning. They simply come up with a text output based on the likelihood of word sequences. LLMs are language models, not knowledge models. This means they can generate incorrect information, especially when dealing with more specialised and uncommon topics not well represented in the training data. Hallucination poses a significant risk in an educational setting where factual accuracy is crucial. When using LLMs, it is important to be aware that their output, as confident and seemingly correct as it might sound, could contain factual errors.

     
  • Biased or discriminatory output
    The output of LLMs can be not only erroneous, as in the case of hallucination, but also contain biased and discriminatory elements. Despite some efforts by developers to mitigate such biases, the complexity and lack of transparency of these models make it difficult to trace and filter out all discriminatory elements. It is critical that students and educators are made aware of these potential biases and always handle an LLM’s output with a critical mindset.

     
  • Privacy concerns
    LLMs often require access to users’ personal data and can capture private information. For example, OpenAI utilises users’ conversation history to train and improve the model further. It is not uncommon for privacy leaks to happen with these systems. For example, in March 2023, OpenAI had to temporarily shut down ChatGPT to fix a bug that exposed users’ private chat titles to other users. These raised privacy concerns about OpenAI’s access to chat content and management of personal user data. Incidents like this are not uncommon; generally, it is good practice not to share or discuss private information with these models.

     
  • Environmental concerns
    Finally, we want to draw attention to LLMs and other AI models and their environmental impact. These models require extensive computing power, consume significant energy, and contribute to carbon emissions. They are hence not sustainable, given the global climate crisis.

Future Developments in AI

Given the rapid development and fierce competition in the field of AI, it is to be expected that the following AI functionalities will impact education at large:

  • Speech to speech – clone your voice and make videos without having to record your voice
  • Text to speech – e.g., Realistic AI voice with synthesia.io
  • Text to video – e.g., Meta's Make-A-Video based on a prompt
  • Text to audio – e.g., AudioGen, turns text prompts into audio
  • Photo to video – e.g., MyHeritage, brings pictures to life
  • Text to music – e.g., MusicLM, turns description into music
  • Text to image – e.g. Midjourney or DALL-E, turns a prompt into an image
  • Brain signal directly translated into output (text, video, audio) without a mediating platform

Resources

If you know of other internal articles or blog posts to be added to this list, please contact us!

Disclaimer

Open AI text generators, such as ChatGPT are constantly evolving, which means that a fix today may not be a fix tomorrow. Therefore, our list of suggestions will be periodically updated.