Meta’s chatbot Cicero can probably beat you at Diplomacy • The Register

Meta-researchers have developed an artificial intelligence system called Cicero that can play the classic strategy game Diplomacy at a level comparable to most human players.

It’s a significant achievement in the field of natural language processing and one that could help people forget about last week’s debut of Galacticaa large Meta boffins language model trained on scientific papers that presented lies as facts and was disconnected after three days criticism from the scientific community.

Developed in the 1950s and currently published by Hasbro, Diplomacy focuses on communication and negotiation between actors, who take on the role of seven European powers at the turn of the 20th century. It is considered by some players as a perfect way to lose friends.

The game simulates taking territories on a map of Europe. Rather than taking turns, players write down their moves in advance and execute them simultaneously. To avoid making moves that are blocked because an opponent has made a counter move, players communicate with each other privately. They discuss potential coordinated actions, then record their moves on paper, meeting or breaking commitments to other players.

Diplomacy’s emphasis on communication, trust, and betrayal makes it a different challenge from more rule- and resource-focused games like Chess and Go. Cicero is essentially a chatbot that can negotiate with other players in Diplomacy to perform effective in-game moves.

Screenshot of Cicero's dialogue

Screenshot of Cicero’s dialogue – Click to enlarge

“Diplomacy has for decades been considered a grand, near-impossible challenge in AI, as it requires players to master the art of understanding the motivations and perspectives of others, crafting intricate plans, and ‘adjust their strategies, then use natural language to reach agreements with other people, convince them to form partnerships and alliances, and more,’ Meta explained in a blog post.

“Cicero is so effective at using natural language to negotiate with people in diplomacy that they often preferred to work with Cicero over other human participants.”

Cicero is based on a parameter of 2.7 billion BART-like language model pre-trained on text sourced from the internet and augmented using a dataset of over 40,000 diplomacy games played online at webDiplomacy.net. These games contained more than 12 million messages exchanged between players.

The AI ​​agent’s dialogue output is tied to its strategic reasoning module which creates “intents” representing a possible set of moves by different players.

“To generate dialogue intentions and choose final actions to play each turn, Cicero runs a strategic reasoning module that predicts other players’ policies (i.e., a probability distribution over actions) for the current turn based on the state of the board and shared dialogue, then chooses for itself a policy for the current turn that optimally responds to the policies predicted by other players,” Meta researchers explain in a Scientific research article.

Where AI agents for games like chess can be trained through autonomous play using reinforcement learning, modeling the cooperative game of diplomacy required a different technique. According to Meta, the classic approach would involve supervised learning, whereby an agent would be trained using labeled data from previous diplomacy games. But supervised learning alone produced a gullible AI agent that could be easily manipulated by lying players.

Cicero therefore includes an iterative planning algorithm called piKL by which it refines an initial prediction of other players’ policies and planned moves based on dialogue between the bot and other players. The algorithm tries to improve anticipated move sets for other players by evaluating different choices that would produce better results.

In a report, Andre Goff, a three-time world champion in diplomacy, praised Cicero’s dispassionate approach to the game. “A lot of human players will soften their approach or they’ll start to be driven by revenge, but Cicero never does that,” Goff said. “He just plays the situation as he sees it. So he’s ruthless in executing his strategy, but he’s not ruthless in a way that annoys other players.”

Cicero played 40 diplomacy matches anonymously in a “blitz” league on webDiplomacy.net between August 19 and October 13, 2022, and finished in the top 10% of participants who played more than one match. And of the 19 who played five or more games, Cicero finished second. For the 40 games, Cicero’s average score was 25.8%, more than double the 12.4% average among his 82 opponents.

Although Cicero still makes a few mistakes, Meta boffins anticipate that their research will prove useful for other applications such as chatbots capable of holding long conversations or video game characters who understand players’ motivations and can interact more effectively.

Cicero’s code was released under an open source license in the hope that the AI ​​developer community can improve it further. ®

//platform.twitter.com/widgets.js

Leave a Reply

%d bloggers like this: