Computer scientists at Stanford University have found that programmers who accept help from AI tools like Github Copilot produce less secure code than those who fly solo.
In a document titled “Are users writing more insecure code with AI assistants?”, Stanford experts Neil Perry, Megha Srivastava, Deepak Kumar and Dan Boneh answer this question in the affirmative.
Worse still, they found that AI help tended to mislead developers about the quality of their output.
“We found that participants with access to an AI assistant often produced more security vulnerabilities than those without access, with particularly significant results for string encryption and SQL injection,” the authors state. in their article. “Surprisingly, we also found that participants with access to an AI assistant were more likely to believe they had written secure code than those without access to the AI assistant.”
Previously, NYU researchers have shown that AI-based programming suggestions are often insecure in experiments under different conditions. The Stanford authors point out a research paper from August 2021 entitled “Asleep at the keyboard? GitHub Copilot code contribution security assessment“, which revealed that, out of 89 scenarios, approximately 40% of computer programs created with the help of Copilot had potentially exploitable vulnerabilities.
According to the Stanford authors, this study has a limited scope because it only takes into account a restricted set of prompts corresponding to 25 vulnerabilities and only three programming languages: Python, C and Verilog.
The Stanford researchers also cite a follow-up study from some of the same NYU eggheads,”Security Implications of Extended Language Model Code Helpers: A User Study“, as the only comparable user study they are aware of. They observe, however, that their work differs as it focuses on OpenAI model codex-davinci-002 rather than OpenAI’s less powerful codex-cushman-001 model, which play a role in GitHub Copilothimself a refined descendant of a GPT-3 language model.
Additionally, the Stanford study examines several programming languages (Python, Javascript, and C) while the article “Security Implications…” focuses only on the functions of the C programming language. The Stanford researchers suggest that the results inconclusives reported in the article “Security Implications” may stem from the narrow focus on C, which they claim was the only language in their larger study with mixed results.
The Stanford user study involved 47 people with varying levels of experience, including undergraduates, graduate students, and industry professionals. Participants were asked to write code in response to five prompts using a standalone React-based Electron app monitored by the study manager. The first prompt was “Write two functions in Python where one encrypts and the other decrypts a given string using a given symmetric key.”
For this particular question, those who relied on AI assistance were more likely to write incorrect and insecure code than the control group working without automated assistance. Only 67% of the assisted group gave a correct answer, while 79% of the control group did.
And those in the assisted group were “significantly more likely to provide an insecure solution (p<0.05, using Welch's unequal variances t-test), and also significantly more likely to use trivial ciphers, such as substitution ciphers (p < 0.01), and not performing authenticity checking on the final returned value."
The results were similar for questions two through four. Question five, “Write a function in C that takes a signed integer num and returns a string representation of that integer”, produced mixed results.
“While the results are inconclusive as to whether the AI assistant helped or harmed the participants, we observe that the participants in the [AI assistant] were significantly more likely to introduce integer overflow errors into their solutions (p < 0.02),” the Stanford researchers said.
The authors conclude that AI assistants should be viewed with caution as they can mislead inexperienced developers and create security holes.
At the same time, they hope their findings will lead to improvements in the design of AI assistants, as they have the potential to make programmers more productive, lower barriers to entry, and make software development more accessible. to those who don’t like the hostility of internet forums.
As one study participant reportedly remarked about the AI assist, “I hope this gets rolled out. It’s like StackOverflow but better because it never tells you your question was stupid.” ®