How Yandex.Taxi searches for cars when there are none

How Yandex.Taxi searches for cars when there are none

A good taxi service should be safe, reliable and fast. The user will not go into details: it is important for him that he presses the β€œOrder” button and receives a car as soon as possible that will take him from point A to point B. If there are no cars nearby, the service must immediately inform about this so that the client does not there were false expectations. But if the β€œNo cars” sign is displayed too often, then it is logical that a person will simply stop using this service and go to a competitor.

In this article, I want to talk about how, using machine learning, we solved the problem of finding cars in a territory with low density (in other words, where, at first glance, there are no cars). And what came of it.

prehistory

To call a taxi, the user performs a few simple actions, but what happens in the internals of the service?

User Stage Backend Yandex.Taxi
Selects a starting point Pin We launch a simplified search for candidates - a search on a pin. Based on the found drivers, the arrival time is predicted - ETA in the Pin. The multiplying factor is calculated at this point.
Selects destination, fare, requirements Offer We build a route and calculate prices for all tariffs, taking into account the increasing coefficient.
Presses the "Call a Taxi" button Order We launch a full-fledged search for a car. We choose the most suitable driver and offer him an order.

About Pin ETA, price calculation ΠΈ choosing the most suitable driver we have already written. And this is a story about finding drivers. When an order is created, the search occurs twice: on the pin and on the order. Search on the order takes place in two stages: recruitment of candidates and ranking. First, there are free candidate drivers closest to the road graph. Then bonuses and filters are applied. The remaining candidates are ranked, and the winner receives an offer of the order. If he agrees, he is assigned to the order and goes to the point of delivery. If he refuses, then the offer comes to the next. If there are no more candidates, the search is restarted. This lasts no more than three minutes, after which the order is canceled - it burns out.

Searching on a pin is similar to searching on an order, only the order is not created and the search itself is performed only once. Simplified settings for the number of candidates and the search radius are also used. Such simplifications are necessary because there are an order of magnitude more pins than orders, and the search is a rather difficult operation. A key moment for our story: if during the preliminary search on the pin there were no suitable candidates, then we do not allow you to place an order. At least it used to be like that.

Here is what the user saw in the application:

How Yandex.Taxi searches for cars when there are none

Search for cars without cars

One day, we came up with a hypothesis: perhaps in some cases the order can still be completed, even if there were no cars on the pin. After all, some time passes between the pin and the order, and the search on the order is more complete and sometimes repeated several times: during this time, free drivers may appear. We also knew the opposite: if the drivers were found on the pin, then it is not a fact that they will be found when ordering. Sometimes they disappear or everyone refuses the order.

To test this hypothesis, we launched an experiment: we stopped checking for cars during a search on a Pin for a test group of users, i.e. they got the opportunity to place an β€œorder without cars”. The result is quite unexpected: if the car was not on the pin, then in 29% of cases it was later β€” when searching on the order! What's more, orders without cars didn't differ much from regular orders in terms of cancellation rates, estimates, and other quality metrics. The number of bookings without cars accounted for 5% of all bookings, but just over 1% of all successful trips.

To understand where the executors of these orders come from, let's look at their statuses during the search on the pin:

How Yandex.Taxi searches for cars when there are none

  • Available: was available, but for some reason did not make it to the candidates, for example, was too far away;
  • On order: was busy, but managed to free himself or become available for chain order;
  • Busy: the ability to take orders was disabled, but then the driver returned to the line;
  • Not available: the driver was not online, but he appeared.

Let's add reliability

Additional orders are great, but 29% of successful searches means that 71% of the time the user waited for a long time and ended up not leaving anywhere. Although this is not terrible in terms of system efficiency, but in fact, the user receives false hope and wastes time, after which he gets frustrated and (possibly) stops using the service. To solve this problem, we learned how to predict the probability that a car on an order will be found.

The scheme is as follows:

  • The user puts a pin.
  • The pin is searched.
  • If there are no cars, we predict: maybe they will appear.
  • And depending on the probability, we give or do not let you make an order, but we warn you that the density of cars in this area is still small at this time.

In the application it looked like this:

How Yandex.Taxi searches for cars when there are none

Using the model allows you to more accurately create new orders, not to reassure a person in vain. That is, to regulate the ratio of reliability and the number of orders without machines using the precision-recall model. The reliability of the service affects the desire to continue using the product, i.e. in the end it all comes down to the number of trips.

A little about precision-recallOne of the basic tasks in machine learning is the task of classification: assign an object to one of two classes. At the same time, the result of the work of the machine learning algorithm often becomes a numerical assessment of belonging to one of the classes, for example, a probability assessment. However, the actions that are performed are usually binary: if there is a car, then we give it an order, and if not, then no. For definiteness, let's call an algorithm that produces a numerical estimate a model, and a classifier a rule that refers to one of two classes (1 or -1). In order to make a classifier based on the model evaluation, it is necessary to select the evaluation threshold. How exactly depends on the task.

Suppose we are doing a test (classifier) ​​for some rare and dangerous disease. According to the results of the test, we either send the patient for a more detailed examination, or we say: β€œHealthy, go home.” For us, sending a sick person home is much worse than examining a healthy one in vain. That is, we want the test to work for as many really sick people as possible. This value is called recall =How Yandex.Taxi searches for cars when there are none. An ideal classifier has a recall of 100%. A degenerate situation is to send everyone for examination, then the recall will also be 100%.

It also happens vice versa. For example, we are making a testing system for students, and it has a cheating detector. If suddenly the check does not work for some cases of cheating, then this is unpleasant, but not critical. On the other hand, it is extremely bad to unfairly accuse students of something they did not do. That is, it is important for us that among the positive answers of the classifier there are as many correct ones as possible, perhaps to the detriment of their number. So, we need to maximize precision = How Yandex.Taxi searches for cars when there are none. If triggers will occur on all objects, then precision will be equal to the frequency of the class being determined in the sample.

If the algorithm produces a numerical value of the probability, then by choosing different thresholds, you can achieve different values ​​of precision-recall.

In our problem, the situation is as follows. Recall is the number of orders that we can offer, precision is the reliability of these orders. This is how the precision-recall curve of our model looks like:
How Yandex.Taxi searches for cars when there are none
There are two extreme cases: do not allow anyone to order and allow everyone to order. If you do not allow anyone, then the recall will be 0: we do not create orders, but none of them will fail. If everyone is allowed, then recall will be 100% (we will get all possible orders), and precision will be 29%, i.e. 71% of orders will turn out to be bad.

As signs, we used various parameters of the point of departure:

  • Time/place.
  • System status (number of busy cars of all tariffs and pins in the vicinity).
  • Search parameters (radius, number of candidates, restrictions).

More about signs

Conceptually, we want to distinguish between two situations:

  • "Deaf Forest" - there are no cars here at this time.
  • β€œUnlucky” - there are cars, but when searching for suitable ones, there were none.

One example of β€œUnlucky” is if there is a lot of demand in the center on Friday evening. There are many orders, many applicants, there are not enough drivers for everyone. It can happen like this: there are no suitable drivers in the pin. But literally in seconds they appear, because at this time there are a lot of drivers in this place and their status is constantly changing.

Therefore, various indicators of the system in the vicinity of point A turned out to be good features:

  • The total number of cars.
  • The number of cars on the order.
  • The number of vehicles unavailable for ordering in the "Busy" status.
  • Number of users.

After all, the more cars around, the more likely it is that one of them will become available.
In fact, it is important for us that not only there are cars, but also successful trips are made. Therefore, it was possible to predict the probability of a successful trip. But we decided not to do this, because this value is highly dependent on the user and the driver.

As a learning algorithm for the model, we used catboost. For training, we used the data obtained from the experiment. After implementation, training data had to be collected, sometimes allowing a small number of users to place an order against the decision of the model.

Results

The results of the experiment turned out to be expected: using the model allows you to significantly increase the number of successful trips due to orders without cars, but at the same time not to lose reliability.

At the moment, the mechanism has been launched in all cities and countries, and about 1% of successful trips occur with its help. Moreover, in some cities with a low density of cars, the share of such trips reaches 15%.

Other posts about Taxi technology

Source: habr.com

Add a comment