🥇How to create a game AI: a guide for beginners

I came across an interesting material about artificial intelligence in games. With an explanation of the basic things about AI using simple examples, and also inside there are many useful tools and methods for its convenient development and design. How, where and when to use them is also there.

Most of the examples are written in pseudocode, so deep programming knowledge is not required. There are 35 sheets of text with pictures and gifs under the cut, so get ready.

UPD. I'm sorry, but I already did my own translation of this article on Habré PatientZero. You can read his version. here, but for some reason the article passed me by (I used the search, but something went wrong). And since I’m writing for a blog dedicated to gamedev, I decided to leave my own version of the translation for subscribers (some points are designed differently for me, some are deliberately omitted on the advice of the developers).

What is AI?

Game AI is focused on what actions an object should perform, based on the conditions in which it is located. This is usually referred to as managing "intelligent agents", where an agent is a playable character, a vehicle, a bot, and sometimes something more abstract: a whole group of entities or even a civilization. In each case, it is a thing that must see its environment, make decisions based on it and act in accordance with them. This is called the Sense/Think/Act cycle (Feel/Think/Act):

Sense: The agent finds or receives information about things in its environment that may affect its behavior (threats nearby, items to collect, points of interest to explore).
Think: The agent decides how to react (considers if it's safe enough to pick up items or if it should fight/hide first).
Act: the agent performs actions to implement the previous decision (starts moving towards the enemy or object).
…now the situation has changed due to the actions of the characters, so the cycle is repeated with new data.

AI tends to focus on the Sense part of the loop. For example, autonomous cars take pictures of the road, combine them with radar and lidar data, and interpret them. This is usually done by machine learning, which processes the incoming data and makes sense of it, extracting semantic information like “there is another car 20 yards ahead of you”. These are the so-called classification problems.

Games don't need a complex system for extracting information, since most of the data is already an integral part of it. There is no need to run image recognition algorithms to determine if there is an enemy ahead - the game already knows and transmits information right in the decision-making process. Therefore, the Sense part of the cycle is often much simpler than Think and Act.

Game AI Limitations

AI has a number of restrictions that must be observed:

AI does not need to be pre-trained as if it were a machine learning algorithm. It makes no sense to write a neural network during development in order to observe tens of thousands of players and learn the best way to play against them. Why? Because the game is not released, and there are no players.
The game should be entertaining and challenging, so agents shouldn't find the best approach against humans.
Agents need to look realistic so that players feel like they are playing against real people. The AlphaGo program surpassed the human, but the steps taken were far from the traditional understanding of the game. If the game imitates a human opponent, this feeling should not be there. The algorithm needs to be changed so that it makes plausible decisions, not ideal ones.
AI must work in real time. This means that the algorithm cannot monopolize the CPU usage for a long time to make decisions. Even 10 milliseconds is too long, because most games need between 16 and 33 milliseconds to do all the processing and move on to the next graphics frame.
Ideally, at least part of the system is data-driven so that "non-coders" can make changes, and so that adjustments happen faster.

Consider AI approaches that span the entire Sense/Think/Act cycle.

Making Basic Decisions

Let's start with the simplest game - Pong. Goal: move the platform (paddle) so that the ball bounces off it, and does not fly past. It's like tennis where you lose if you don't hit the ball. Here, the AI has a relatively easy task to decide in which direction to move the platform.

Conditional Operators

For AI in Pong, the most obvious solution is to always try to position the platform under the ball.

A simple algorithm for this, written in pseudocode:

every frame/update while the game is running:
if the ball is to the left of the paddle:
move paddle left
else if the ball is to the right of the paddle:
move paddle right

If the platform moves at the speed of the ball, then this is an ideal algorithm for AI in Pong. There is no need to complicate anything if there is not so much data and possible actions for the agent.

This approach is so simple that the entire Sense/Think/Act loop is barely visible. But he is:

The Sense part is in two if statements. The game knows where the ball is and where the platform is, so the AI turns to it for this information.
The Think part is also included in the two if statements. They embody two solutions, which in this case are mutually exclusive. As a result, one of three actions is selected - move the platform to the left, move to the right, or do nothing if it is already correctly located.
The Act part is in the Move Paddle Left and Move Paddle Right statements. Depending on the design of the game, they can move the platform instantly or at a certain speed.

Such approaches are called reactive - there is a simple set of rules (in this case, if statements in the code) that react to the current state of the world and act.

Decision Tree

The Pong game example is effectively equivalent to a formal AI concept called a decision tree. The algorithm goes through it to reach the "leaf" - the decision on what action to take.

Let's make a block diagram of the decision tree for the algorithm of our platform:

Each part of the tree is called a node (node) - AI uses graph theory to describe such structures. There are two types of nodes:

Decision Nodes: A choice between two alternatives based on a test of some condition, where each alternative is represented as a separate node.
End Nodes: An action to be taken representing the final decision.

The algorithm starts from the first node (the "root" of the tree). It either decides which child node to go to, or performs an action stored in the node and exits.

What is the advantage if the decision tree does the same job as the if-statements in the previous section? There is a general system here where each decision has only one condition and two possible outcomes. This allows the developer to create AI from the data representing the decisions in the tree without having to hardcode it. Let's put it in the form of a table:

On the code side, you'll get a system for reading strings. Create a node for each of them, connect decision logic based on the second column and child nodes based on the third and fourth columns. You still need to program the conditions and actions, but now the structure of the game will be more complicated. In it, you add additional decisions and actions, and then customize the entire AI by simply editing a text file with a tree definition. Next, you pass the file to the game designer, who can change the behavior without recompiling the game and changing the code.

Decision trees are very useful when they are built automatically from a large set of examples (for example, using the ID3 algorithm). This makes them an efficient and high performance tool for classifying situations based on data. However, we are going beyond a simple system for choosing actions by agents.

Scenarios

We disassembled the decision tree system, which used pre-created conditions and actions. The person designing the AI can arrange the tree however they want, but they still have to rely on the coder who programmed it all. What if we could give the designer the tools to create their own conditions or actions?

In order for the programmer not to write code for the conditions Is Ball Left Of Paddle and Is Ball Right Of Paddle, he can make a system in which the designer will write conditions to check these values. Then the decision tree data will look like this:

This is essentially the same as in the first table, but the solutions have their own code inside them, a bit like the conditional part of the if-statement. On the code side, this would read in the second column for the decision nodes, but instead of looking for a specific condition to execute (Is Ball Left Of Paddle), it evaluates the conditional expression and returns true or false respectively. This is done using the scripting language Lua or Angelscript. With them, the developer can accept objects in his game (ball and paddle) and create variables that will be available in the script (ball.position). Also, the scripting language is simpler than C++. It does not require a full compilation stage, so it is ideal for quickly adjusting game logic and allows "non-coders" to create the necessary functions themselves.

In the example shown, the scripting language is only used to evaluate the conditional expression, but it can also be used for actions. For example, the Move Paddle Right data could become a script statement (ball.position.x += 10). So that the action is also defined in the script, without having to program Move Paddle Right.

You can go even further and write the entire decision tree in a scripting language. This will be code in the form of hardcoded conditional statements, but they will be located in external script files, that is, they can be changed without recompiling the entire program. It is often possible to modify the script file while playing to quickly test different AI responses.

Event response

The examples above are perfect for Pong. They continuously run a Sense/Think/Act cycle and act based on the last state of the world. But in more complex games, you need to react to individual events, and not evaluate everything at once. Pong in this case is already a bad example. Let's choose another.

Imagine a shooter where the enemies are motionless until they find the player, after which they act depending on their “specialization”: someone will run to “rush”, someone will attack from afar. It's still a basic react system - "if a player is seen, then do something" - but it can be logically divided into a Player Seen event (the player is seen) and a reaction (select a response and do it).

This brings us back to the Sense/Think/Act cycle. We can code the Sense part, which will check every frame whether the AI sees the player. If not, nothing happens, but if it does, then the Player Seen event is fired. The code will have a separate section that says "when the Player Seen event occurs, do", where is the response you need to call the Think and Act parts. Thus, you will set up reactions to the Player Seen event: for the “rushing” character - ChargeAndAttack, and for the sniper - HideAndSnipe. These links can be created in the data file for quick editing without the need to recompile. And here, too, you can use the scripting language.

Making difficult decisions

Although simple reaction systems are very powerful, there are many situations where they are not enough. Sometimes you need to make different decisions based on what the agent is doing at the moment, but it's hard to represent this as a condition. Sometimes there are too many conditions to effectively represent them in a decision tree or script. Sometimes you need to evaluate in advance how the situation will change before deciding on the next step. More sophisticated approaches are needed to solve these problems.

Finite state machine

Finite state machine or FSM (finite state machine) is a way of saying that our agent is currently in one of several possible states, and that it can move from one state to another. There are a certain number of such states - hence the name. The best example from life is a traffic light. There are different sequences of lights in different places, but the principle is the same - each state represents something (stop, go, etc.). A traffic light is only in one state at any given time, and moves from one to the other based on simple rules.

It's a similar story with NPCs in games. For example, let's take a guard with the following states:

Patrolling.
Attacker (Attacking).
Runaway (Fleeing).

And such conditions for changing its state:

If the guard sees the enemy, he attacks.
If the guard attacks, but no longer sees the enemy, he returns to patrol.
If the guardian attacks but is badly wounded, he flees.

You can also write if-statements with the guardian's state variable and various checks: is there an enemy nearby, what is the health level of the NPC, etc. Let's add a few more states:

Inaction (Idling) - between patrols.
Search (Searching) - when the noticed enemy disappeared.
Ask for help (Finding Help) - when the enemy is seen, but too strong to fight him alone.

The choice for each of them is limited - for example, the guard will not go looking for a hidden enemy if he has low health.

In the end, a huge list of "if , That ' might become too unwieldy, so we should formalize a method that allows us to keep states and state transitions in mind. To do this, we take into account all states, and under each state we write in the list all transitions to other states, along with the conditions necessary for them.

This is a state transition table - a complex way to represent the FSM. Let's draw a diagram and get a complete overview of how NPC behavior is changing.

The diagram reflects the essence of decision making for this agent based on the current situation. Moreover, each arrow shows the transition between states if the condition next to it is true.

Each update, we check the current state of the agent, look at the list of transitions, and if the conditions for the transition are met, it takes a new state. For example, each frame it checks to see if the 10 second timer has expired, and if so, then the guard transitions from the Idling state to Patrolling. In the same way, the Attacking state checks the health of the agent - if it is low, then it goes into the Fleeing state.

This is the handling of state transitions, but what about the behavior associated with the states themselves? As far as implementing the actual behavior for a particular state, there are usually two types of "hook" where we assign actions to the FSM:

Actions that we periodically perform for the current state.
The actions we take when we move from one state to another.

Examples for the first type. The Patrolling state will move the agent along the patrol route every frame. The Attacking state will attempt each frame to start an attack or transition to a state whenever possible.

For the second type, consider the transition “if the enemy is visible and the enemy is too strong, then go to the Finding Help state. The agent must choose where to go for help and store this information so that the Finding Help state knows where to go. Once help is found, the agent transitions back to the Attacking state. At this point, he will want to tell the ally about the threat, so a NotifyFriendOfThreat action may occur.

Again, we can look at this system through the lens of the Sense/Think/Act cycle. Sense is embodied in the data used by the transition logic. Think - transitions available in each state. And Act is carried out by actions performed periodically within a state or on transitions between states.

Sometimes it can be costly to continuously poll for transition conditions. For example, if each agent performs complex calculations every frame to determine if it sees enemies and to understand if it is possible to transition from Patrolling to Attacking state, this will take a lot of CPU time.

Important changes in the state of the world can be thought of as events that will be processed as they occur. Instead of having the FSM check the transition condition "can my agent see the player?" every frame, you can configure a separate system to check less often (for example, 5 times per second). And the result is to issue Player Seen when the check passes.

This is passed to the FSM, which should now go to the Player Seen event received condition and react accordingly. The resulting behavior is the same except for an almost imperceptible delay before the response. But performance has improved as a result of separating the Sense part into a separate part of the program.

Hierarchical finite state machine

However, working with large FSMs is not always convenient. If we want to expand the attack state by replacing it with separate MeleeAttacking (melee) and RangedAttacking (ranged), we will have to change the transitions from all other states that lead to the Attacking state (current and future).

Surely you noticed that in our example there are a lot of duplicated transitions. Most transitions in the Idling state are identical to transitions in the Patrolling state. It would be nice not to repeat ourselves, especially if we add more similar states. It makes sense to group Idling and Patrolling under the general label "non-combat", where there is only one common set of transitions to combat states. If we represent this label as a state, then Idling and Patrolling become substates. An example of using a separate transition table for a new non-combat substate:

Main states:

Out of combat status:

And in chart form:

It's the same system, but with a new non-combat state that includes Idling and Patrolling. With each state containing an FSM with substates (and these substates in turn contain their own FSM - and so on for as long as you need), we get a Hierarchical Finite State Machine or HFSM (Hierarchical Finite State Machine). By grouping the non-combat state, we cut out a bunch of redundant transitions. We can do the same for any new states with shared transitions. For example, if in the future we expand the Attacking state into the MeleeAttacking and MissileAttacking states, they will be substates that transition between each other based on enemy distance and ammo availability. As a result, complex behaviors and sub-behaviors can be represented with a minimum of duplicate transitions.

Behavior Tree

With HFSM, complex combinations of behaviors are created in a simple way. However, there is a slight difficulty that decision making in the form of transition rules is closely related to the current state. And in many games, this is just what you need. And careful use of the state hierarchy can reduce the number of transition repetitions. But sometimes you need rules that work no matter what state you're in, or that apply in almost any state. For example, if an agent's health drops to 25%, you'll want them to run away regardless of whether they're in combat, idling, or talking - you'll have to add this condition to every state. And if your designer later wants to change the low health threshold from 25% to 10%, then this will have to be done again.

Ideally, this situation needs a system in which the decisions of "what state to be in" are outside the states themselves, in order to make changes in only one place and not touch the transition conditions. This is where behavior trees come in.

There are several ways to implement them, but the essence for all is approximately the same and is similar to a decision tree: the algorithm starts from the “root” node, and in the tree there are nodes representing either decisions or actions. There are a few key differences though:

Nodes now return one of three values: Succeeded (if the job is done), Failed (if it can't run), or Running (if it's still running and there's no end result).
There are no more decision nodes to choose between two alternatives. Instead, Decorator nodes, which have one child node. If they are Succeed, then they execute their only child node.
Nodes that perform actions return a Running value to represent the actions being performed.

This small set of nodes can be combined to create a large number of complex behaviors. Let's represent the guardian's HFSM from the previous example as a tree of behavior:

With this structure, there should be no explicit transition from the Idling/Patrolling states to the Attacking state or any of the others. If the enemy is visible and the character's health is low, the execution will stop at the Fleeing node, regardless of which node he previously performed - Patrolling, Idling, Attacking, or any other.

Behavior trees are complex—there are many ways to compose them, and finding the right combination of decorators and composite nodes can be tricky. There are also questions about how often to check the tree - do we want to go through it every part, or only when one of the conditions has changed? How to store state related to nodes - how do we know when we were in the Idling state for 10 seconds, or how to know which nodes were executed last time in order to properly process the sequence?

That is why there are many implementations. For example, some systems have replaced decorator nodes with built-in decorators. They reevaluate the tree when decorator conditions change, help join nodes, and provide periodic updates.

utility-based system

Some games have many different mechanics. It is desirable that they get all the benefits of simple and general transition rules, but not necessarily in the form of a complete tree of behavior. Instead of having a clear set of choices or a tree of possible actions, it's easier to study all the actions and choose the most suitable one at a given moment.

The utility-based system will do just that. This is a system where the agent has many actions, and he chooses which one to perform, based on the relative usefulness of each. Where utility is an arbitrary measure of how important or desirable it is for the agent to perform that action.

The calculated utility of an action based on the current state and environment, the agent can check and choose the most appropriate different state at any time. This is similar to FSM, except where the transitions are determined by an evaluation for each potential state, including the current one. Note that we choose the most useful action to go to (or stay if we've already taken one). For more variety, this can be a weighted but random selection from a small list.

The system assigns an arbitrary range of utility values - for example, from 0 (completely undesirable) to 100 (completely desirable). Each action has a number of parameters that affect how this value is calculated. Returning to our guardian example:

Transitions between actions are ambiguous - any state can follow any other. Action priorities are found in the returned utility values. If an enemy is visible and that enemy is strong and the character's health is low, then both Fleeing and FindingHelp will return high non-zero values. In this case, FindingHelp will always be higher. Likewise, non-combat actions never return more than 50, so they will always be below combat. You need to take this into account when creating actions and calculating their usefulness.

In our example, the actions return either a fixed constant value or one of two fixed values. A more realistic system would return an estimate from a continuous range of values. For example, the Fleeing action returns higher utility values if the agent's health is low, and the Attacking action returns lower utility values if the enemy is too strong. Because of this, the Fleeing action takes precedence over Attacking in any situation where the agent feels that he does not have enough health to defeat the enemy. This allows you to change the priorities of actions based on any number of criteria, which makes this approach more flexible and variable than a behavior tree or FSM.

Each action has many conditions for the calculation of the program. They can be written in a scripting language or as a series of mathematical formulas. In The Sims, which models the character's daily routine, they add an additional level of calculation - the agent receives a number of "motivations" that affect utility ratings. If the character is hungry, they will become even more hungry over time, and the utility of the EatFood action will increase until the character performs it, reducing the hunger level and returning the EatFood value to zero.

The idea of choosing actions based on a rating system is quite simple, so the utility-based system can be used as part of AI decision processes, and not as a complete replacement for them. The decision tree may ask for the utility score of the two child nodes and choose the higher one. Similarly, a behavior tree can have a Utility node to evaluate the usefulness of actions to decide which child to execute.

Movement and navigation

In the previous examples, we had a platform that we moved left or right, and a guard that patrolled or attacked. But how exactly do we handle moving an agent over a period of time? How do we set speed, how do we avoid obstacles, and how do we plan a route if getting to a destination is more difficult than just driving in a straight line? Let's take a look at this.

Management

At the initial stage, we will assume that each agent has a speed value, which includes how fast it is moving and in which direction. It can be measured in meters per second, kilometers per hour, pixels per second, and so on. Remembering the Sense/Think/Act cycle, we can imagine that the Think part selects a speed, and the Act part applies that speed to the agent. Games usually have a physics system that does this task for you by learning the value of each object's speed and adjusting it. Therefore, you can leave AI with one task - to decide what speed the agent should have. If you know where the agent should be, then you need to move it in the right direction with a set speed. A very trivial equation:

desired_travel = destination_position - agent_position

Imagine a 2D world. The agent is at (-2,-2), the destination is somewhere in the northeast at (30, 20), and the required path for the agent to get there is (32, 22). Let's say these positions are measured in meters - if we take the agent's speed as 5 meters per second, then we will scale our displacement vector and get the speed approximately (4.12, 2.83). With these parameters, the agent would arrive at the destination in almost 8 seconds.

You can recalculate values at any time. If the agent was halfway to the target, the movement would be half the length, but since the agent's maximum speed is 5 m/s (we decided this above), the speed will be the same. This also works for moving targets, allowing the agent to make small changes as they move.

But we want more variability - for example, to slowly build up speed to simulate a character moving from a standing state and transitioning to a run. The same can be done at the end before stopping. These features are known as steering behaviors, each of which has specific names: Seek (search), Flee (flight), Arrival (arrival), etc. The idea is that acceleration forces can be applied to the speed of an agent, based on comparing the agent's position and current speed with the destination in order to use different ways to move to the destination.

Each behavior has a slightly different purpose. Seek and Arrival are ways to move the agent to the destination. Obstacle Avoidance (obstacle avoidance) and Separation (separation) adjust the movement of the agent to bypass obstacles on the way to the goal. Alignment (coordination) and Cohesion (communication) keep agents moving together. Any number of different steering behaviors can be summed to produce a single path vector, taking into account all factors. An agent using the Arrival, Separation, and Obstacle Avoidance behaviors to stay away from walls and other agents. This approach works well in open locations without unnecessary details.

In more difficult conditions, the addition of different behaviors works worse - for example, an agent can get stuck in a wall due to a conflict between Arrival and Obstacle Avoidance. Therefore, you need to consider options that are more complicated than just adding up all the values. The way is this: instead of adding up the results of each behavior, you can consider the movement in different directions and choose the best option.

However, in a complex environment with dead ends and a choice of which way to go, we need something even more advanced.

Search for a way

Steering behaviors are great for simple movement in open areas (football field or arena), where getting from A to B is a straight path with small deviations past obstacles. For complex routes, we need pathfinding, which is a way of exploring the world and deciding on a route through it.

The simplest is to apply a grid to each square next to the agent and evaluate which of them are allowed to move. If any of them is a destination, then follow the route from it from each square to the previous one until you reach the beginning. This is the route. Otherwise, repeat the process with nearby other squares until you find your destination or run out of squares (meaning there is no possible route). This is what is formally known as Breadth-First Search or BFS (breadth-first search algorithm). At each step, he looks in all directions (hence breadth, "width"). The search space is like a wavefront that moves until it reaches the place you are looking for - the search area expands at each step until it hits the end point, after which you can trace the path to the beginning.

As a result, you will receive a list of squares, according to which the desired route is compiled. This is the path (hence, pathfinding) - a list of places that the agent will visit, following to the destination.

Given that we know the position of every square in the world, we can use steering behaviors to move along the path - from node 1 to node 2, then from node 2 to node 3, and so on. The easiest option is to head towards the center of the next square, but even better is to stop in the middle of the edge between the current square and the next one. Because of this, the agent will be able to cut corners on sharp turns.

The BFS algorithm also has disadvantages - it explores as many squares in the "wrong" direction as in the "correct" one. This is where a more complex algorithm called A* (A star) comes in. It works the same way, but instead of blindly examining neighbor squares (then neighbors neighbors, then neighbors neighbors neighbors, and so on), it gathers the nodes into a list and sorts them so that the next node it examines is always the one that leads to the shortest path. The nodes are sorted based on a heuristic that takes into account two things - the "cost" of a hypothetical route to the desired square (including any travel costs) and an estimate of how far that square is from the destination (biasing the search in the right direction).

This example shows that the agent explores one square at a time, each time selecting the neighboring one that is the most promising. The resulting path is the same as with BFS, but fewer squares were considered in the process - and this makes a big difference to game performance.

Movement without a grid

But most games aren't laid out on a grid, and it's often impossible to do so without sacrificing realism. We need compromises. What size should the squares be? Too big and they won't be able to correctly represent small corridors or turns, too small there will be too many squares to search for, which will end up taking a lot of time.

The first thing to understand is that the grid gives us a graph of connected nodes. The A* and BFS algorithms actually work on the charts and don't care about our grid at all. We could put nodes anywhere in the game world: if there is a connection between any two connected nodes, as well as between the start and end points and at least one of the nodes, the algorithm will work just as well as before. This is often referred to as a waypoint system, since each node represents a significant position in the world that can be part of any number of hypothetical paths.

Example 1: a knot in each square. The search starts from the node where the agent is located and ends at the node of the desired square.

Example 2: smaller set of nodes (waypoints). The search starts in the square with the agent, passes through the required number of nodes, and then continues to the destination.

It is quite flexible and powerful system. But some care is needed in deciding where and how to place a waypoint, otherwise agents may simply not see the nearest waypoint and be unable to start the path. It would be easier if we could automatically place waypoints based on the geometry of the world.

This is where the navigation mesh or navmesh appears. This is usually a 2D mesh of triangles that is overlaid on the geometry of the world - wherever the agent is allowed to walk. Each of the triangles in the grid becomes a node in the graph and has up to three adjacent triangles that become neighboring nodes in the graph.

This picture is an example from the Unity engine - it analyzed the geometry in the world and created a navmesh (in the screenshot in light blue). Each polygon in the navmesh is an area where an agent can stand or move from one polygon to another polygon. In this example, the polygons are smaller than the floors on which they are located - done in order to take into account the size of the agent, which will go beyond its nominal position.

We can search for a route through this grid, again using the A* algorithm. This will give us an almost perfect route in the world that takes into account all the geometry and does not require extra nodes and creating waypoints.

Pathfinding is too broad a topic, about which one section of the article is not enough. If you want to study it in more detail, then this will help Amit Patel website.

Planning

We've seen with pathfinding that sometimes it's not enough to just choose a direction and move - we have to choose a route and take a few turns to get to our destination. We can generalize this idea: reaching a goal is not just the next step, but a whole sequence where sometimes you need to look ahead a few steps to find out what the first one should be. This is called planning. Pathfinding can be seen as one of several planning additions. In terms of our Sense/Think/Act cycle, this is where the Think part plans several Act parts for the future.

Let's take the board game Magic: The Gathering as an example. We go first with the following set of cards in hand:

Swamp - Gives 1 black mana (earth card).
Forest - Grants 1 Green Mana (Earth Card).
Fugitive Wizard - Requires 1 blue mana to summon.
Elvish Mystic - Requires 1 green mana to summon.

The remaining three cards are ignored to make it easier. According to the rules, a player is allowed to play 1 land card per turn, he can "tap" this card to extract mana from it, and then use spells (including summon a creature) according to the amount of mana. In this situation, the human player knows to play Forest, tap 1 green mana, and then summon Elvish Mystic. But how can the game AI figure this out?

Simple planning

A trivial approach is to try each action in turn until there are no suitable ones left. By looking at the cards, the AI sees what Swamp can play. And plays it. Are there other actions left this turn? It cannot summon either Elvish Mystic or Fugitive Wizard, as they require green and blue mana respectively, while Swamp only grants black mana. And he won't be able to play Forest anymore because he's already played Swamp. Thus, the game AI went by the rules, but did it badly. Can be improved.

Planning can find a list of actions that bring the game into the desired state. Just as every square in a path has neighbors (in pathfinding), every action in the plan also has neighbors or successors. We can look for these actions and subsequent actions until we reach the desired state.

In our example, the desired result is "summon creature if possible". At the beginning of the turn, we see only two possible actions allowed by the rules of the game:

1. Play Swamp (result: Swamp in game)
2. Play Forest (result: Forest in play)

Each action taken can lead to further actions and close others, again depending on the rules of the game. Imagine playing Swamp - this will remove Swamp as a next step (we already played it), and it will also remove Forest (because by the rules you can play one land card per turn). After that, the AI adds as the next step - getting 1 black mana, because there are no other options. If he goes ahead and chooses Tap the Swamp, he will get 1 black mana and can't do anything with it.

1. Play Swamp (result: Swamp in game)
1.1 Tap Swamp (result: Swamp is tapped, +1 black mana)
No action available - END
2. Play Forest (result: Forest in play)

The list of actions came out short, we reached a dead end. We repeat the process for the next step. We play Forest, unlock the "gain 1 green mana" action, which in turn unlocks the third action - summon Elvish Mystic.

1. Play Swamp (result: Swamp in game)
1.1 Tap Swamp (result: Swamp is tapped, +1 black mana)
No action available - END
2. Play Forest (result: Forest in play)
2.1 Tap Forest (results: Forest tapped, +1 green mana)
2.1.1 Summon Elvish Mystic (result: Elvish Mystic in play, -1 green mana)
No action available - END

Finally, we studied all possible actions and found a plan that calls the creature.

This is a very simplified example. It is advisable to choose the best possible plan, and not any one that meets some criteria. As a rule, one can evaluate potential plans based on the end result or the cumulative benefit from their implementation. You can score yourself 1 point for playing a land card and 3 points for summoning a creature. Playing Swamp would be a 1 point plan. And playing Forest → Tap the Forest → summon Elvish Mystic will immediately give 4 points.

This is how planning works in Magic: The Gathering, but the same logic applies to other situations. For example, move a pawn to make room for a bishop to move in chess. Or take cover behind a wall to safely shoot XCOM like that. In general, you get the point.

Improved Planning

Sometimes there are too many potential actions to consider every possible option. Returning to the example from Magic: The Gathering: let's say that in the game and in your hand you have several cards of land and creatures - the number of possible combinations of moves can be in the tens. There are several solutions to the problem.

The first way is backwards chaining (reverse chain formation). Instead of going through all the combinations, it's better to start with the final result and try to find a direct route. Instead of going from the root of the tree to a specific leaf, we move in the opposite direction - from the leaf to the root. This method is easier and faster.

If the enemy has 1 health, you can find a plan to "deal 1 or more damage". To achieve this, a number of conditions must be met:

1. Damage can be dealt by a spell - it must be in hand.
2. Casting a spell requires mana.
3. To gain mana, you need to play a land card.
4. To play a land card, you must have it in your hand.

Another way is best-first search (the best first search). Instead of iterating through all the paths, we choose the most suitable one. Most often, this method gives the optimal plan without the extra cost of searching. A* is a form of best first search - by exploring the most promising routes from the start, it can already find the best path without having to check other options.

An interesting and increasingly popular best-first search option is the Monte Carlo Tree Search. Instead of guessing which plans are better than others as each subsequent action is chosen, the algorithm chooses random successors at each step until it reaches the end (when the plan resulted in victory or defeat). The final result is then used to raise or lower the "weight" of the previous options. By repeating this process several times in a row, the algorithm gives a good estimate of which next move is better, even if the situation changes (if the opponent takes action to thwart the player).

A story about planning in games will not do without Goal-Oriented Action Planning or GOAP (Goal-Oriented Action Planning). This is a widely used and discussed method, but apart from a few distinctive details, it is essentially the backwards chaining method we talked about earlier. If the objective was to "destroy the player" and the player is behind cover, the plan might be: destroy with a grenade → get it → drop it.

There are usually multiple targets, each with a different priority. If the highest priority target cannot be completed (no combination of actions creates a "kill the player" plan because the player is not visible), the AI will fall back to lower priority targets.

Training and adaptation

We have already said that game AI usually does not use machine learning, because it is not suitable for real-time control of agents. But this does not mean that something cannot be borrowed from this area. We want an enemy in a shooter that we can learn from. For example, learn about the best positions on the map. Or an opponent in a fighting game that would block the player's frequently used combos, motivating them to use others. So machine learning in such situations can be quite useful.

Statistics and probabilities

Before we get into complex examples, let's see how far we can go by taking a few simple measurements and using them to make decisions. For example, real-time strategy - how do we determine if the player will be able to launch an attack in the first few minutes of the game and what kind of defense to prepare against it? We can look at the past experience of the player to understand what the future reaction might be. To begin with, we do not have such initial data, but we can collect them - every time the AI plays against a person, it can record the time of the first attack. After several sessions, we will get the average value of the time after which the player will attack in the future.

Average values also have a problem: if a player rushed 20 times and played slowly 20 times, then the desired values will be somewhere in the middle, and this will not give us anything useful. One solution is to limit the input data - you can take into account the last 20 pieces.

A similar approach is used when estimating the likelihood of certain actions, assuming that a player's past preferences will be the same in the future. If a player attacks us five times with fireballs, twice with lightning and once with melee, it is obvious that he prefers fireballs. We extrapolate and see the probability of using various weapons: fireball = 62,5%, lightning = 25% and melee = 12,5%. Our game AI needs to prepare for fire defense.

Another interesting method is to use the Naive Bayes Classifier to examine large amounts of input data and classify the situation so that the AI reacts in the right way. Bayesian classifiers are best known for their use in email spam filters. There, they examine the words, compare them to where those words have appeared before (in spam or not), and draw conclusions about incoming emails. We can do the same even with less input. Based on all the useful information the AI sees (like what enemy units are spawned, or what spells they use, or what techs they've researched) and the final result (war or peace, rush or defend, etc.) - we will choose the desired AI behavior.

All these training methods are sufficient, but it is advisable to use them based on data from testing. The AI will learn to adapt to the different strategies your playtesters have used. AI that adapts to the player after release can become too predictable or too hard to beat.

Value-Based Adaptation

Given the content of our game world and rules, we can change the set of values that influence decision making, and not just use the input data. We do this:

Let the AI collect data about the state of the world and key events during the game (as above).
Let's change some important values (value) based on this data.
We implement our decisions based on the processing or evaluation of these values.

For example, an agent has several rooms to choose from on a first-person shooter map. Each room has its own value, which determines how desirable it is to visit. AI randomly chooses which room to go to based on value. The agent then remembers which room he was killed in and decreases its value (the probability that he will return there). Similarly for the reverse situation - if the agent destroys many opponents, then the value of the room increases.

Markov model

What if we use the collected data to make predictions? If we remember every room we see the player in for a certain period of time, we will predict which room the player might go to. By tracking and recording the player's movements through the rooms (values), we can predict them.

Let's take three rooms: red, green and blue. As well as observations that we recorded while watching the game session:

The number of observations for each room is almost equal - we still do not know where to make a good place for an ambush. Collecting statistics is also complicated by the respawning of players who spawn evenly across the map. But data about the next room they enter after appearing on the map is already useful.

It can be seen that the green room suits the players - most people move from the red room to it, 50% of which stay there and further. The blue room, on the contrary, is not popular, they almost do not go to it, and if they do, they do not linger.

But the data tells us something more important - when a player is in a blue room, the next room we're most likely to see them in will be red, not green. Even though the green room is more popular than the red one, the situation changes if the player is in the blue room. The next state (i.e. the room the player goes to) depends on the previous state (i.e. the room the player is currently in). Due to the study of dependencies, we will make predictions more accurately than if we simply counted the observations independently of each other.

Predicting a future state based on past state data is called a Markov model, and such examples (with rooms) are called Markov chains. Since the patterns represent the probability of changes between successive states, they are visually displayed as FSM with a probability near each transition. Previously, we used FSM to represent the behavioral state an agent was in, but the concept extends to any state, whether it is related to the agent or not. In this case, the states represent the room the agent occupies:

This is a simple way to represent the relative likelihood of state changes, giving the AI some ability to predict the next state. You can foresee a few steps ahead.

If the player is in the green room, then there is a 50% chance that they will stay there the next time they are observed. But what is the probability that he will still be there even after? There is not only a chance that the player stayed in the green room after two sightings, but also a chance that he left and returned. Here is the new table with the new data:

It shows that the chance of seeing the player in the green room after two observations will be 51% - 21% that he will come from the red room, 5% of them that the player will visit the blue room between them, and 25% that the player does not leave the green room.

The table is just a visual tool - the procedure only requires multiplying the probabilities at each step. This means you can look far into the future with one caveat: we assume that the chance to enter a room depends entirely on the current room. This is called the Markov Property - the future state depends only on the present. But this is not XNUMX% accurate. Players can change their minds based on other factors such as health or ammo. Since we do not record these values, our forecasts will be less accurate.

N Grams

What about the example of a fighting game and predicting the player's combo moves? The same! But instead of a single state or event, we will explore the entire sequences that make up a combo strike.

One way to do this is to store each input (such as Kick, Punch or Block) in a buffer and write the entire buffer as an event. So the player presses Kick, Kick, Punch repeatedly to use the SuperDeathFist attack, the AI system stores all inputs in a buffer and remembers the last three used in each step.

(The lines in bold are when the player launches the SuperDeathFist attack.)

The AI will see all the options when the player chose Kick, followed by another Kick, and then notice that the next input is always Punch. This will allow the agent to predict the SuperDeathFist combo and block it if possible.

These sequences of events are called N-grams, where N is the number of elements stored. In the previous example, it was a 3-gram (trigram), which means that the first two entries are used to predict the third. Accordingly, in the 5-gram, the first four entries predict the fifth, and so on.

The developer needs to carefully choose the size of N-grams. A smaller number N requires less memory, but also keeps a smaller history. For example, a 2-gram (bigram) will record Kick, Kick or Kick, Punch, but will not be able to store Kick, Kick, Punch, so the AI will not respond to SuperDeathFist combos.

On the other hand, larger numbers require more memory, and it will be harder for the AI to learn, as there will be many more possible options. If you had three possible inputs of Kick, Punch, or Block, and we used a 10-gram, there would be about 60 different options.

The bigram model is a simple Markov chain - each past state/current state pair is a bigram, and you can predict the second state based on the first. 3-grams and larger N-grams can also be thought of as Markov chains, where all elements (except the last one in the N-gram) together form the first state, and the last element the second. The fighting game example shows the chance of transitioning from the Kick and Kick state to the Kick and Punch state. By treating multiple entries of the input history as one unit, we are essentially transforming the input sequence into part of a whole state. This gives us a Markov property to use Markov chains to predict the next input and guess which combo move will be next.

Conclusion

We talked about the most common tools and approaches in the development of artificial intelligence. And also analyzed the situations in which they need to be applied and where they are especially useful.

This should be enough to understand basic things in game AI. But, of course, these are not all methods. Less popular, but no less effective, include:

optimization algorithms including hill climbing, gradient descent and genetic algorithms
adversarial search/scheduling algorithms (minimax and alpha-beta pruning)
classification methods (perceptrons, neural networks and support vector machines)
systems for processing perception and memory of agents
architectural approaches to AI (hybrid systems, subset of architectures and other ways of overlaying AI systems)
animation tools (motion planning and coordination)
performance factors (level of detail, anytime algorithms, and timeslicing)

Related Resources:

1. GameDev.net has section with articles and tutorials on AIand forum.
2. AiGameDev.com contains many presentations and articles on a wide range of topics related to the development of game AI.
3. The GDC Vault includes topics from the GDC AI Summit, many of which are available for free.
4. Useful materials can also be found on the website AI Game Programmers Guild.
5. Tommy Thompson, AI researcher and game developer, makes YouTube videos AI and Games with an explanation and study of AI in commercial games.

Related books:

1. The Game AI Pro book series are collections of short articles explaining how to implement specific features or how to solve specific problems.

Game AI Pro: Collected Wisdom of Game AI Professionals
Game AI Pro 2: Collected Wisdom of Game AI Professionals
Game AI Pro 3: Collected Wisdom of Game AI Professionals

2. The AI Game Programming Wisdom series is the predecessor of the Game AI Pro series. It contains older methods, but almost all are relevant even today.

AI Game Programming Wisdom 1
AI Game Programming Wisdom 2
AI Game Programming Wisdom 3
AI Game Programming Wisdom 4

3. Artificial Intelligence: A Modern Approach is one of the basic texts for anyone who wants to understand the general field of artificial intelligence. This book is not about game development - it teaches the basics of AI.

Source: habr.com

How to create a game AI: a guide for beginners