Evaluating Copilot on CS1 Code Writing Problems with Suppressed Specifications
Abstract
AbstractView references
Code writing problems in introductory programming (CS1) courses typically ask students to write simple functions or programs based on detailed natural-language specifications. These details can be leveraged by large language models (LLMs), accessible to students via tools such as GitHub Copilot, to generate solutions that are often correct. CS1 instructors who are unwilling or unable to prohibit such usage must consider variants of traditional code writing problems that align with their learning objectives but are more difficult for LLMs to solve. Since LLMs are sensitive to the level of details in their prompts, it is natural to consider variants where details are progressively trimmed from the specifications of traditional code writing problems, and consequent ambiguities are clarified via examples. We consider an extreme variant, where all natural language is suppressed except for meaningful names of functions and their arguments. We evaluate the performance of Copilot on suppressed specification versions of 153 such problems drawn from the CodeCheck repository. If Copilot initially fails to generate a correct solution, we augment each suppressed specification with as few clarifying examples as possible to obtain a correct solution. Copilot solves 134 problems (87%) with just 0.7 examples on average, requiring no examples in 78 instances. Thus, modifying traditional code-writing problems by merely trimming specification details is unlikely to thwart sophisticated LLMs such as GitHub Copilot. © 2023 ACM.