How to solve NP-hard problems with parameterized algorithms

Research work is perhaps the most interesting part of our training. The idea is to try yourself in the chosen direction while still at the university. For example, students from the Software Engineering and Machine Learning areas often go to do research in a company (mainly JetBrains or Yandex, but not only).

In this post I will talk about my project in the field of Computer Science. As part of the work, I studied and put into practice approaches to solving one of the most famous NP-hard problems: vertex cover problem.

Now an interesting approach to NP-hard problems is developing very quickly - parameterized algorithms. I will try to bring you up to date, tell you some simple parameterized algorithms and describe one powerful method that helped me a lot. I presented my results at the PACE Challenge competition: according to the results of open tests, my solution takes third place, and the final results will be known on July 1.

How to solve NP-hard problems with parameterized algorithms

About Me

My name is Vasily Alferov, I am currently finishing my third year at the Higher School of Economics - St. Petersburg. I have been fond of algorithms since my school days, when I studied at Moscow school 179 and successfully participated in computer science olympiads.

A finite number of parametrized algorithms walk into a bar...

Example taken from the book "Parameterized algorithms"

Imagine that you are a bar guard in a small town. Every Friday, half of the city comes to your bar to relax, which gives you a lot of trouble: you need to throw violent customers out of the bar to prevent fights. Eventually, you get fed up with it and decide to take preventive measures.

Since your city is small, you know exactly which pairs of patrons are most likely to fight if they get into a bar together. Do you have a list of n people who will come to the bar tonight. You decide not to let any townspeople into the bar so that no one gets into a fight. At the same time, your bosses do not want to lose profit and will be unhappy if you do not let more than k human.

Unfortunately, the problem you are facing is a classic NP-hard problem. You might know her Vertex Cover, or as a vertex cover problem. For such problems, in the general case, algorithms that work in an acceptable time are not known. To be precise, the unproven and rather strong ETH (Exponential Time Hypothesis) hypothesis says that this problem cannot be solved in time How to solve NP-hard problems with parameterized algorithms, that is, nothing can be thought of that is noticeably better than a complete enumeration. For example, let him come to your bar n = 1000 Human. Then the exhaustive search will be How to solve NP-hard problems with parameterized algorithms options, which are about How to solve NP-hard problems with parameterized algorithms - insanely many. Luckily, your management has placed a restriction on you. k = 10, so the number of combinations you need to iterate over is much smaller: the number of subsets of ten elements is How to solve NP-hard problems with parameterized algorithms. This is already better, but it still won’t count for a day even on a powerful cluster.
How to solve NP-hard problems with parameterized algorithms
In order to eliminate the possibility of a fight with such a configuration of strained relations between bar patrons, Bob, Daniel, and Fedor must be kept out. There is no solution where only two are left behind.

Does this mean it's time to give up and let everyone in? Let's consider other options. Well, for example, you can not let in only those who are likely to fight with a very large number of people. If someone can even fight with k+1 another person, then you definitely shouldn’t let him in - otherwise you’ll have to keep everyone out k+1 the townspeople, with whom he can fight, which will definitely upset the leadership.

Let you throw out everyone you could, according to this principle. Then everyone else can fight no more than k people. Throwing out of them k man, you can prevent nothing more than How to solve NP-hard problems with parameterized algorithms conflicts. So if there are more than How to solve NP-hard problems with parameterized algorithms a person is involved in at least one conflict, then you definitely cannot prevent them all. Since, of course, you will definitely let in completely non-conflicting people, then you need to go through all subsets of ten out of two hundred people in size. There are about How to solve NP-hard problems with parameterized algorithms, and such a number of operations can already be sorted out on the cluster.

If it is safe to take individuals who are not conflicted at all, then what about those who are involved in only one conflict? In fact, they too can be let in by closing the doors on their opponent. Indeed, if Alice is in conflict only with Bob, then if we let Alice out of the two of them, we will not lose: Bob may have other conflicts, but Alice definitely does not. Moreover, it makes no sense for us not to let both. After such operations, there is no more How to solve NP-hard problems with parameterized algorithms guests with an unresolved fate: in total we have How to solve NP-hard problems with parameterized algorithms conflicts, in each there are two participants and each participates in at least two. Hence, it remains only to enumerate How to solve NP-hard problems with parameterized algorithms options, which may well be calculated for half a day on a laptop.

In fact, even more attractive conditions can be achieved by simple reasoning. Note that we definitely need to resolve all disputes, that is, from each conflicting pair, choose at least one person whom we will not let in. Consider the following algorithm: we take any conflict, from which we remove one participant and recursively start from the rest, then we remove the other and also start recursively. Since we throw someone out at each step, the recursion tree of such an algorithm is a binary depth tree k, so in total the algorithm works for How to solve NP-hard problems with parameterized algorithmsWhere n is the number of vertices, and m is the number of ribs. In our example, this is about ten million, which can be calculated in a fraction of a second not only on a laptop, but even on a mobile phone.

The above example is an example parameterized algorithm. Parameterized algorithms are algorithms that run in time f(k)poly(n)Where p - polynomial, f is an arbitrary computable function, and k - some parameter, which, quite possibly, will be much less than the size of the problem.

All reasoning up to this algorithm gives an example kernelization is one of the common techniques for creating parameterized algorithms. Kernelization is the reduction of the size of a task to a value limited by a function of a parameter. The resulting problem is often referred to as the kernel. Thus, by simple reasoning about the degrees of vertices, we obtained a quadratic kernel for the Vertex Cover problem, parameterized by the size of the answer. There are other options you can choose for this task (for example, Vertex Cover Above LP), but that's the one we'll be discussing here.

Pace Challenge

Competition PACE Challenge (The Parameterized Algorithms and Computational Experiments Challenge) was born in 2015 to establish a connection between parameterized algorithms and approaches used in practice to solve computational problems. The first three competitions were devoted to finding the tree width of the graph (treewidth), search for the Steiner tree (Steiner Tree) and searching for a set of vertices that cuts cycles (Feedback Vertex Set). This year, one of the problems that you could try your hand at was the vertex cover problem described above.

The competition is gaining popularity every year. According to preliminary data, this year only 24 teams took part in the competition for solving the problem of vertex coverage. It is worth noting that the competition lasts not a few hours or even a week, but several months. Teams have the opportunity to study literature, come up with their own original idea and try to implement it. In fact, this competition is a research work. Ideas for the most effective solutions and awarding of winners will be held together with the conference IPEC (International Symposium on Parameterized and Exact Computation) as part of the largest annual algorithmic meeting in Europe SOMETHING. More information about the competition itself can be found at Online, and the results of previous years lie here.

Solution scheme

To deal with the vertex coverage problem, I tried parameterized algorithms. They typically consist of two parts: simplification rules (which ideally lead to kernelization) and splitting rules. Simplification rules are input preprocessing in polynomial time. The purpose of applying such rules is to reduce the problem to an equivalent problem of smaller size. The simplification rules are the most expensive part of the algorithm, and the application of this particular part leads to the total running time How to solve NP-hard problems with parameterized algorithms instead of simple polynomial time. In our case, the splitting rules are based on the fact that for each vertex you need to answer either it or its neighbor.

The general scheme is as follows: we apply the simplification rules, then select some vertex, and make two recursive calls: in the first we take it as an answer, and in the second we take all its neighbors. This is what we call splitting (branching) along this vertex.

Exactly one addition will be made to this scheme in the next paragraph.

Ideas for splitting (branching) rules

Let's discuss how to choose a vertex for splitting.
The main idea is very greedy in the algorithmic sense: let's take a vertex of the maximum degree and split exactly along it. Why does it feel like it's better? Because in the second branch of the recursive call, we will remove a lot of vertices in this way. We can expect that there will be a small graph left and we will work on it quickly.

This approach, with the simple kernelization techniques already discussed, performs well, solving some tests with a size of several thousand vertices. But, for example, it does not work well for cubic graphs (that is, graphs whose degree of each vertex is equal to three).
There is another idea based on a fairly simple thought: if the graph is disconnected, the problem on its connected components can be solved independently by combining the answers at the end. This, by the way, is a small modification in the scheme, which will significantly speed up the solution: earlier, in this case, we worked for the product of the times of counting the responses of the components, and now we work for the sum. And to speed up branching, you need to turn a connected graph into a disconnected one.

How to do it? If there is an articulation point in the graph, you need to squabble over it. An articulation point is a vertex that, when removed, the graph loses connectivity. You can find all articulation points in a graph using a classical algorithm in linear time. This approach noticeably speeds up branching.
How to solve NP-hard problems with parameterized algorithms
When removing any of the selected vertices, the graph breaks up into connected components.

We will do this, but we want more. For example, look for small vertex cuts in a graph and perform splitting by vertices from it. The most efficient way I know of to find the minimum global vertex cut is to use the Gomori-Hu tree, which builds in cubic time. In the PACE Challenge, a typical graph size is several thousand vertices. In this scenario, billions of operations need to be performed at each vertex of the recursion tree. It turns out that it is simply impossible to solve the problem in the allotted time.

Let's try to optimize the solution. The minimum vertex cut between a pair of vertices can be found by any algorithm that constructs a maximum flow. Can be launched on such a network Dinit's algorithm, in practice it works very fast. I have a suspicion that it is theoretically possible to prove an estimate for the running time How to solve NP-hard problems with parameterized algorithms, which is quite acceptable.

I tried several times to look for cuts between pairs of random vertices and take the most balanced one from them. Unfortunately, this gave poor results in the open PACE Challenge tests. I compared it with an algorithm that split on the vertices of maximum degree, running them with a limit on the depth of descent. After an algorithm trying to find a cut in this way, larger graphs were left. This is due to the fact that the cuts turned out to be very unbalanced: by removing 5-10 vertices, it was possible to split off only 15-20.

It is worth noting that in articles about the theoretically fastest algorithms, much more advanced techniques for choosing vertices for splitting are used. Such techniques have a very complex implementation and often poor time and memory estimates. I was not able to single out from them quite acceptable for practice.

How to apply simplification rules

We already have ideas for kernelization. Let me remind you:

  1. If there is an isolated vertex, remove it.
  2. If there is a vertex of degree 1, remove it and take its neighbor as the answer.
  3. If there is a vertex of at least k+1, take it back.

With the first two, everything is clear, with the third there is one trick. If in a joke problem about a bar we were given an upper bound on k, then in the PACE Challenge you just need to find the minimum size vertex cover. This is a typical transformation of Search Problems into Decision Problems, often making no difference between the two kinds of problems. In practice, if we are writing a vertex cover solver, there can be a difference. For example, as in the third paragraph.

From an implementation point of view, there are two ways to proceed. The first approach is called Iterative Deepening. It is as follows: we can start with some reasonable bottom constraint on the answer, and then run our algorithm using this constraint as a constraint on the answer from above, without recursing lower than this constraint. If we have found some answer, it is guaranteed to be optimal, otherwise we can increase this limit by one and start again.

Another approach is to store some current optimal answer and look for a smaller answer, changing this parameter when found. k for more cutting off unnecessary branches in the search.

After a few nightly experiments, I settled on a combination of these two methods: first, I run my algorithm with some kind of limit on the search depth (selecting it so that it takes negligible time compared to the main solution) and use the best solution found as an upper limit to the answer - that is, to the same k.

Vertices of degree 2

We have dealt with vertices of degree 0 and 1. It turns out that this can also be done with vertices of degree 2, but this requires more complex operations from the graph.

To explain this, it is necessary to somehow designate the peaks. We call a vertex of degree 2 a vertex v, and its neighbors are vertices x и y. Next, we will have two cases.

  1. When x и y - neighbours. Then you can answer x и y, v delete. Indeed, at least two vertices from this triangle need to be taken in response, and we definitely won’t lose if we take x и y: they probably have other neighbors, and v they are not.
  2. When x и y are not neighbours. Then it is stated that all three vertices can be glued into one. The idea is that in this case there is an optimal answer, in which we take either v, or both vertices x и y. Moreover, in the first case, we will have to take all the neighbors in response x и y, but not necessarily in the second. This exactly corresponds to the cases when we do not take the glued vertex in response and when we do. It remains only to note that in both cases the answer from such an operation is reduced by one.

How to solve NP-hard problems with parameterized algorithms

It is worth noting that such an approach in honest linear time is quite difficult to accurately implement. Gluing vertices is a complex operation, you need to copy the lists of neighbors. If this is done inaccurately, one can get asymptotically non-optimal running time (for example, if many edges are copied after each gluing). I settled on finding whole paths from vertices of degree 2 and analyzing a bunch of special cases, like cycles from such vertices or from all such vertices except one.

In addition, this operation must be reversible so that during the return from the recursion we restore the graph in its original form. To ensure this, I did not clear the edge lists of the merged vertices, after which I just knew which edges to go where. This implementation of graphs also requires accuracy, but it provides honest linear time. And for graphs of several tens of thousands of edges, it fits into the processor cache, which gives great speed advantages.

Linear kernel

Finally, the most interesting part of the kernel.

To begin with, recall that in bipartite graphs, the minimum vertex cover can be sought for How to solve NP-hard problems with parameterized algorithms. To do this, you need to use the algorithm Hopcroft-Karp in order to find the maximum matching there, and then use the theorem König-Egervari.

The idea of ​​the linear kernel is as follows: first we bifurcate the graph, that is, instead of each vertex v let's get two peaks How to solve NP-hard problems with parameterized algorithms и How to solve NP-hard problems with parameterized algorithms, and instead of each edge u-v let's get two ribs How to solve NP-hard problems with parameterized algorithms и How to solve NP-hard problems with parameterized algorithms. The resulting graph will be bipartite. Find the minimum vertex cover in it. Some vertices of the original graph will get there twice, some only once, and some never. The Nemhauser-Trotter theorem states that in this case it is possible to remove vertices that did not hit once and take back those that hit twice. Moreover, she says that from the remaining vertices (those that hit once) you need to take at least half in response.

We have just learned to leave no more than 2k peaks. Indeed, if the answer in the remainder is at least half of all vertices, then there are no more vertices than 2k.

Here I managed to take a small step forward. It is clear that the kernel constructed in this way depends on what kind of minimal vertex cover in the bipartite graph we have taken. I would like to take such that the number of remaining vertices is minimal. Previously, they were able to do this only in time How to solve NP-hard problems with parameterized algorithms. I came up with the implementation of this algorithm in time How to solve NP-hard problems with parameterized algorithms, thus, this core can be searched for on graphs with hundreds of thousands of vertices at each stage of branching.

Experience the Power of Effective Results

Practice shows that my solution works well on tests with several hundred vertices and several thousand edges. On such tests, it is quite possible to expect that the solution will be found in half an hour. The probability of finding an answer in an acceptable time increases in principle if the graph has a sufficiently large number of vertices of high degree, for example, degree 10 and higher.

To participate in the competition, solutions had to be sent to optil.io. According to what is presented there plate, my solution in open tests takes the third place out of twenty by a wide margin from the second. To be completely honest, it is not entirely clear how solutions will be evaluated at the competition itself: for example, my solution passes fewer tests than the solution in fourth place, but it works faster on those that pass.

The results of the closed tests will be known on the first of July.

Source: habr.com