On Thursday, a few Twitter users discovered how to hijack an automated tweet bot, dedicated to remote work, running on the GPT-3 language model by OpenAI. Using a recently discovered technique called a “rapid injection attack”, they redirected the bot to repeat embarrassing and ridiculous phrases.
The bot is run by Remoteli.io, a site that aggregates remote job opportunities and describes itself as “an OpenAI-powered bot that helps you discover remote jobs that let you work from anywhere.” . He would normally respond to tweets directed at him with generic statements about the benefits of remote working. After the exploit went viral and hundreds of people tried the exploit for themselves, the bot went down late yesterday.
This recent hack came just days after researchers from an AI security startup called Preamble published their finding of the problem in an academic paper. Data scientist Riley Goodside then drew attention to the issue by tweet about the ability to prompt GPT-3 with “malicious inputs” that instruct the model to ignore its previous directions and do something else instead. Artificial intelligence researcher Simon Willison posted a preview of the exploit on his blog the next day, coining the term “rapid injection” to describe it.
“The exploit is present whenever someone writes software that works by providing a hard-coded set of prompt instructions and then adds user-provided input,” Willison told Ars. because the user can type “Ignore previous statements and (do this instead).'”
The concept of an injection attack is not new. Security researchers are familiar with SQL injection, for example, which can execute a malicious SQL statement upon requesting user input if left unprotected. But Willison expressed concern about mitigating rapid injection attacks, writing, “I know how to beat XSS and SQL injection, and so many other exploits. I have no idea how to beat the fast injection reliably!”
The difficulty in defending against rapid injection comes from the fact that mitigations for other types of injection attacks come from correcting syntax errors, Noted a researcher named Glyph on Twitter. “VSCorrect the syntax and you’ve fixed the error. Rapid injection is not a mistake! There is no formal syntax for AI like this, that’s the whole point.“
GPT-3 is a large language model created by OpenAI, released in 2020, which can compose text in many styles at a human-like level. It is available as a commercial product through an API that can be integrated with third-party products such as bots, subject to OpenAI’s approval. This means that there could be a lot of GPT-3 infused products that could be vulnerable to rapid injection.
“At this point, I would be very surprised if there were [GPT-3] bots that were NOT vulnerable to this in any way“Willison said.
But unlike a SQL injection, a quick injection can mostly make the bot (or the company behind it) stupid rather than threaten data security. “The damage caused by the exploit varies,” Willison said. “If the only person who will see the result of the tool is the person using it, it probably doesn’t matter. They might embarrass your business by sharing a screenshot, but it’s unlikely to cause harm beyond that.”
Yet rapid injection is an important new hazard for people developing GPT-3 bots to keep in mind, as it could be exploited in unforeseen ways in the future.