Capture the Flag is a fairly simple competitive mode implemented in many popular shooters. Each team has a marker located in their base, and the goal is to capture the opponent's marker and successfully bring it back to them. However, what is easily understood by people is not so easily given to machines. To capture the flag, non-player characters (bots) are traditionally programmed with heuristics and simple algorithms that provide limited freedom of choice and are significantly inferior to humans. But artificial intelligence and machine learning promise to completely reverse this situation.
В , published this week in the journal Science about a year after , and so in , researchers from DeepMind, Alphabet's London-based subsidiary, describe a system that can not only learn to play capture the flag on id Software's Quake III Arena maps, but also develop completely new team strategies in a way that is in no way inferior to a human.

“No one told the AI how to play this game, they only had the result of whether the AI defeated their opponent or not. The beauty of using this approach is that you never know what behavior will occur when training agents, ”says Max Jaderberg, a researcher at DeepMind, who previously worked on the AlphaStar machine learning system (recently she human team of professionals in StarCraft II). He went on to explain that the key method of their new work is, first, reinforcement learning, which uses a kind of reward system to push software agents to achieve their goals, with the reward system working regardless of whether the AI team won or not, and in Secondly, the training of agents was carried out in groups, which forced the AI to master team interaction from the very beginning.
“From a research standpoint, this is a novelty for an algorithmic approach that is really impressive,” added Max. “The way we trained our AI is a good example of how to scale up and implement some of the classic evolutionary ideas.”

The defiantly named For The Win (FTW) DeepMind agents learn directly from screen pixels using a convolutional neural network, a set of mathematical functions (neurons) arranged in layers modeled after the human visual cortex. The received data is transmitted to two networks with multiple short-term memory (English long short-term memory - LSTM), capable of recognizing long-term dependencies. One of them manages operational data with a fast response time, while the other works slowly for analysis and strategizing. Both are associated with variation memory, which they share to predict changes in the game world and perform actions through an emulated game controller.

In total, DeepMind trained 30 agents, the scientists gave them a range of teammates and opponents to play with, and the game cards were chosen randomly so that the AI didn't remember them. Each agent had its own reward signal, allowing it to create its own internal goals, such as capturing the flag. Each AI individually played about 450 capture the flag games, which is equivalent to about four years of gaming experience.
Fully trained FTW agents have learned to apply strategies common to any map, team roster, and team size. They learned human behaviors such as following teammates, camping in an enemy base, and defending their base from attackers, and they gradually lost less advantageous models, such as watching an ally too closely.

So what were the results? In a 40-man tournament in which humans and agents randomly played both together and against each other, FTW agents vastly outperformed human players' win rate. The AI's Elo rating, which corresponds to the probability of winning, was 1600, compared to 1300 for the "strong" human players and 1050 for the "average" human player.

This is not surprising, since the reaction rate of the AI is significantly faster than that of a human, which gave the former a significant advantage in the initial experiments. But even when the accuracy of the agents was reduced and the reaction time increased thanks to a built-in delay of 257 milliseconds, the AI still outperformed the humans. Advanced and regular players won only 21% and 12% of the games respectively.

Moreover, after the publication of the study, scientists decided to test agents on full-fledged Quake III Arena maps with complex level architecture and additional objects, such as Future Crossings and Ironwood, where AI began to successfully challenge the superiority of people in test matches. When the researchers studied the activation patterns of neural networks in agents, that is, the functions of the neurons responsible for determining the output based on the input information, they found clusters representing rooms, the state of flags, the visibility of teammates and opponents, the presence or absence of agents in the enemy base. or team-based, and other significant aspects of the gameplay. Trained agents even contained neurons that directly encoded specific situations, such as when the flag is taken by an agent or when an ally holds it.
“I think one of the things to look at is that these multi-agent teams are exceptionally powerful, and our research shows that,” Jaderberg says. “It's something we've been learning to do better and better over the last few years — how to solve the reinforcement learning problem. And the reinforced training has really shown itself brilliantly.”
Thore Graepel, a professor of computer science at University College London and a scientist at DeepMind, is confident that their work highlights the potential of multi-agent learning to advance AI in the future. It can also serve as a basis for research into human-machine interactions and systems that complement each other or work together.
“Our results show that multi-agent reinforcement learning can successfully master a complex game to the extent that human players even come to believe that computer players are better teammates. The study also provides a very interesting in-depth analysis of how trained agents behave and work together,” says Grapel. “What makes these results so exciting is that these agents perceive their environment in first person, [i.e.] the same way as a human player. To learn how to play tactically and cooperate with their teammates, these agents had to rely on performance feedback, without any teacher or coach showing them what to do.”
Source: 3dnews.ru
