OpenAI teaches teamwork to AI in a game of hide-and-seek

The good old game of hide and seek can be a great test for artificial intelligence (AI) bots to demonstrate how they make decisions and interact, both with each other and with various objects around them.

In his new article, published by researchers from OpenAI, a non-profit artificial intelligence research organization known for victory over world champions in the computer game Dota 2, scientists describe how AI-controlled agents were trained to be more sophisticated in seeking out and hiding from each other in a virtual environment. The results of the study showed that a team of two bots learn more efficiently and faster than any single agent without allies.

OpenAI teaches teamwork to AI in a game of hide-and-seek

Scientists have used a long-established method reinforcement learning, in which artificial intelligence is placed in an environment unknown to it, while having certain ways of interacting with it, as well as a system of rewards and penalties for one or another result of its actions. This method is quite effective due to the ability of AI to perform various actions in a virtual environment at a tremendous speed, millions of times faster than a person can imagine. This allows trial and error to find the most effective strategies for solving the problem. But this approach also has some limitations, for example, creating an environment and conducting numerous training cycles requires huge computing resources, and the process itself needs an accurate system for comparing the results of AI actions with its goal. In addition, the skills acquired by the agent in this way are limited to the task described, and once the AI ​​learns to cope with it, there will be no further improvements.

To teach AI to play hide-and-seek, the scientists used an approach called Undirected exploration, in which agents have free rein to develop their understanding of the game world and develop winning strategies. This is similar to the multi-agent learning approach taken by the DeepMind researchers when multiple AI systems have been trained to play Capture the Flag in Quake III Arena. As in this case, the AI ​​agents were not pre-trained in the rules of the game, but over time they learned the basic strategies and were even able to surprise the researchers with non-trivial solutions.

When playing hide-and-seek, multiple agents whose job it was to hide needed to avoid direct line-of-sight of the opponents after a small time head start while the team of searching agents was immobilized. At the same time, the β€œline of sight” in this context is a cone of 135 degrees in front of an individual bot. Agents could not move too far outside the playground and were forced to move around randomly generated rooms with the ability to use some external objects (boxes, movable walls, special ramps) scattered around, which could be used both to create shelters and to infiltrate in them.

OpenAI teaches teamwork to AI in a game of hide-and-seek

As a result of a long learning process, AI agents learned six unique strategies, each of which helped them move to the next stage of the game. In the beginning, the seeker and hide teams simply ran and chased each other, but after about 25 million matches, the hide team learned how to block the passages with boxes and build shelters out of them. After another 75 million matches, the search team discovered ways to use the ramps to get into these hideouts. After another 10 million rounds, hiders learned to drag the ramps to the edge of the playground and lock them in place to prevent opponents from using them.

OpenAI teaches teamwork to AI in a game of hide-and-seek

After 388 million matches, seekers have learned to use blocked ramps to climb onto crates brought to them, and then, moving directly on them, penetrate opponents' shelters created from portable walls. And finally, after 458 million matches, the hiding team concluded that they needed to block all objects in general, and then build a shelter, which, apparently, led to their final victory.

What is particularly impressive is that after 22 million matches, the agents learned to coordinate their actions and in the future the effectiveness of their teamwork only increased, for example, each brought his own box or wall to create cover and chose his share of objects to block to complicate game to opponents.

OpenAI teaches teamwork to AI in a game of hide-and-seek

Scientists also noted an important point related to the influence of the number of training objects (the amount of data passed through the neural network - "Batch Size") on the learning rate. The default model required 132,3 million matches over 34 hours of training to reach the phase where the hiding team learned to block ramps, while more data resulted in a noticeable reduction in training time. For example, increasing the number of parameters (part of the data obtained during the entire training process) from 0,5 million to 5,8 million increased the sampling efficiency by 2,2 times, and increasing the size of the input data from 64 KB to 128 KB reduced the training time. almost one and a half times.

OpenAI teaches teamwork to AI in a game of hide-and-seek

At the end of their work, the researchers decided to test how learning in the game can help agents cope with similar tasks outside the game. There were five tests in total: awareness of the number of objects (understanding that an object continues to exist even if it is out of sight and not used); "block and return" - the ability to remember your original position and return to it after completing some additional task; "sequential blocking" - 4 boxes were randomly located in three rooms without doors, but with ramps to get inside, the agents had to find and block them all; placement of boxes on predetermined sites; creating a cover around an object in the form of a cylinder.

As a result, in three out of five tasks, the bots that had been pre-trained in the game learned faster and showed better results than the AI ​​that learned to solve problems from scratch. They did slightly better at completing the task and returning to the starting position, sequentially blocking boxes in closed rooms, and placing boxes on given areas, but showed a slightly weaker result in realizing the number of objects and creating cover around another object.

The researchers explain the mixed results in how AI acquires and remembers certain skills. β€œWe think that the tasks where the in-game pre-training showed the best result are related to the reuse of previously learned skills in the usual way, while to perform the remaining tasks better than the AI ​​trained from scratch, they will need to use them in a different way, which much more difficult,” write the co-authors of the work. "This result highlights the need to develop methods for effectively reusing learning skills when transferring them from one environment to another."

The work done is really impressive, since the prospect of using this learning method lies far beyond the limits of any games. The researchers say their work is a significant step towards creating AI with "physically based" and "human" behavior that can diagnose diseases, predict the structures of complex protein molecules, and analyze computed tomography.

In the video below, you can clearly see how the whole learning process went, how the AI ​​learned teamwork, and its strategies became more and more cunning and complex.



Source: 3dnews.ru

Add a comment