Evaluating Automatically Generated Contextualised Programming Exercises
Abstract
Introductory programming courses often require students to solve many small programming exercises as part of their learning. Researchers have previously suggested that the context used in the problem description for these exercises is likely to impact student engagement and motivation. Furthermore, supplying programming exercises that use a broad range of contexts or even allowing students to select contexts to personalize their own exercises, may support the interests of a diverse student population. Unfortunately, it is time-consuming for instructors to create large numbers of programming exercises that provide a wide range of contextualized problems. However, recent work has shown that large language models may be able to automate the mass production of programming exercises, reducing the burden on instructors. In this research, we explore the potential of OpenAI’s GPT-4 to create high-quality and novel programming exercises that implement various contexts. Finally, through prompt engineering, we compare different prompting strategies used to generate many programming exercises with various contextualized problem descriptions and then evaluate the quality of the exercises generated.