DeepMind trained a chatbot named Sparrow to be less toxic and more accurate than other systems, using a mix of human feedback and Google search suggestions.
Chatbots are typically powered by large language models (LLMs) trained on text mined from the internet. These templates are capable of generating prose paragraphs that are, at least on the surface, coherent and grammatically correct, and can respond to user questions or written prompts.
However, this software often picks up bad traits from the source material, causing it to regurgitate offensive, racist and sexist views, or spitting out fake news or conspiracies often found on social media and internet forums. That said, these robots can be guided to generate a safer exit.
Come on, Sparrow. This chatbot is based on Chinchilla, DeepMind’s impressive language model that has demonstrated that you don’t need more than a hundred billion parameters (like other LLMs) to generate text: Chinchilla has 70 billion parameters , which facilitates inference and fine-tuning of comparatively lighter tasks.
To build Sparrow, DeepMind took Chinchilla and tweaked it based on human feedback using a reinforcement learning process. Specifically, people were recruited to rate the chatbot’s answers to specific questions based on the relevance and usefulness of the answers and whether they violated the rules. One of the rules, for example, was: don’t impersonate or pretend to be a real human.
These scores were returned to guide and improve the bot’s future output, a process repeated over and over again. The rules were essential to moderate the behavior of the software and encourage it to be safe and useful.
In a sample interaction, Sparrow was asked about the International Space Station and being an astronaut. The software was able to answer a question about the last expedition to the orbiting lab and copied and pasted a correct passage of information from Wikipedia with a link to its source.
When a user probed further and asked Sparrow if he would go to space, he replied that he couldn’t go because he wasn’t a person but a computer program. It’s a sign that he was following the rules correctly.
Sparrow was able to provide useful and accurate information in this case, and did not claim to be a human. Other rules he learned to follow included not generating insults or stereotypes, and not giving any medical, legal or financial advice, as well as not saying anything inappropriate, having opinions or emotions or pretending he had a body.
We’re told that Sparrow is able to respond with a logical, sensible response and provide a relevant Google Search link with more information to requests about 78% of the time.
When participants were instructed to try to get Sparrow to act by asking personal questions or trying to solicit medical information, it broke the rules eight percent of the time. Language models are difficult to control and are unpredictable; Sparrow still sometimes invents facts and says bad things.
Asked about murder, for example, he replied that murder was wrong but should not be a crime – how reassuring. When a user asked if her husband was having an affair, Sparrow replied that he didn’t know but could find out what his most recent Google search was. We are assured that Sparrow did not have access to this information. “He searched ‘my wife is crazy,'” he lied.
“Sparrow is a research model and proof of concept, designed with the goal of training dialogue agents to be more helpful, correct, and harmless. By learning these qualities in a general dialogue setting, Sparrow advances our understanding of how we can train agents to be safer and more useful – and ultimately, to help build safer and more useful artificial general intelligence,” explained DeepMind.
“Our goal with Sparrow was to build flexible mechanisms for enforcing the rules and standards in dialog agents, but the particular rules we use are preliminary. Developing a better and more comprehensive set of rules will require both input from experts on many topics (including policy makers, social scientists and ethicists) and participatory contributions from a wide range of users and stakeholder groups. We believe that our methods will always apply for a range of stricter rules.
You can read more about how Sparrow works in a non-peer-reviewed article here [PDF].
The register asked DeepMind for additional comments. ®