Validating AI-Generated Code with Live Programming

Authors: Kasra Ferdowsi*, Ruanqianqian (Lisa) Huang*, Michael B. James, Nadia Polikarpova, Sorin Lerner

Published in CHI, May 2024

AI-powered programming assistants are increasingly gaining popularity, with GitHub Copilot alone used by over a million developers worldwide. These tools are far from perfect, however, producing code suggestions that may be incorrect in subtle ways. As a result, developers face a new challenge: validating AI’s suggestions. This paper explores whether Live Programming (LP), a continuous display of a program’s runtime values, can help address this challenge. To answer this question, we built a Python editor that combines an AI-powered programming assistant with an existing LP environment. Using this environment in a between-subjects study with 17 participants, we found that by lowering the cost of validation by execution, LP can mitigate over- and under-reliance on AI-generated programs and reduce the cognitive load of validation for certain types of tasks.

Download preprint here

* = Equal Contribution

Michael 🅱️ James, PhD