GSoC 2019: Checking graphs for bipartiteness and monad transformers

Last summer I participated in Google Summer of Code - a program for students from Google. Every year, the organizers select several Open Source projects, including those from such well-known organizations as Boost.org и The Linux Foundation. Google invites students from all over the world to work on these projects. 

As a participant of Google Summer of Code 2019, I did a project within the framework of the library Alga with the organization Haskell.org, engaged in the development of the Haskell language - one of the most famous functional programming languages. Alga is a library that represents type-safe representation for graphs in Haskell. It is used, for example, in semantics - a library from Github that builds semantic trees, call and dependency graphs from code and can compare them. My project was to add a type-safe representation for bipartite graphs and algorithms for this representation. 

In the post I will talk about my implementation of the algorithm for checking a graph for bipartiteness in Haskell. Even though the algorithm is one of the most basic, its beautiful functional style implementation took me several iterations and required quite a lot of work. As a result, I settled on an implementation with monad transformers. 

GSoC 2019: Checking graphs for bipartiteness and monad transformers

About Me

My name is Vasily Alferov, I am a fourth-year student at HSE St. Petersburg. Earlier in my blog I wrote about my project about parameterized algorithms и about the trip to ZuriHac. Right now I'm on an internship at University of Bergen in Norway, where I work on approaches to the problem List Coloring. My interests include parameterized algorithms and functional programming.

About the implementation of the algorithm

foreword

Students participating in the program are strongly encouraged to blog. They gave me a place to blog Summer of Haskell. This article is a translation Articles, written by me there in July in English, with a small preface. 

Pull Request with the code in question can be found here.

You can read about the results of my work (in English) here.

The post assumes that the reader is familiar with the basic concepts in functional programming, although I will try to recall all the terms used when the time comes.

Checking graphs for bipartite 

An algorithm for checking a graph for bipartite is usually given in the course of algorithms as one of the simplest graph algorithms. His idea is straightforward: first we somehow put the vertices in the left or right share, and when a conflicting edge is found, we assert that the graph is not bipartite.

A little more detail: first we put some kind of vertex in the left share. Obviously, all neighbors of this vertex must lie in the right lobe. Further, all the neighbors of the neighbors of this vertex must lie in the left lobe, and so on. We keep assigning shares to vertices as long as there are still vertices in the connected component of the vertex we started with that we haven't assigned neighbors to. Then we repeat this action for all connected components.

If there is an edge between vertices that fall into the same part, it is not difficult to find an odd cycle in the graph, which is widely known (and quite obviously) impossible in a bipartite graph. Otherwise, we have a correct partition into parts, which means that the graph is bipartite.

Typically, this algorithm is implemented using breadth first search or depth-first search. Depth-first search is usually used in imperative languages, as it is a little more simple and does not require additional data structures. I also chose depth-first search as more traditional.

Thus, we have come to the following scheme. We traverse the vertices of the graph using depth-first search and assign shares to them, changing the number of the share as we go along the edge. If we are trying to assign a share to a vertex that has already been assigned a share, we can safely say that the graph is not bipartite. At the moment when all vertices are assigned a share and we have looked at all the edges, we have a good partition.

Calculation purity

In Haskell, we assume that all calculations are clean. However, if this were indeed the case, we wouldn't be able to print anything to the screen. At all, clean computation is so lazy that there is no clean reasons to calculate anything. All calculations occurring in the program are forced in one way or another in "impure" IO monad.

Monads are a way of representing computations with effects in Haskell. An explanation of how they work is beyond the scope of this post. A good and understandable description can be read in English here.

What I want to point out here is that while some monads, such as IO, are implemented through compiler magic, almost all others are implemented in software and all computation is pure.

There are a lot of effects, and each has its own monad. This is a very strong and beautiful theory: all monads implement the same interface. We will talk about the following three monads:

  • Either ea is a calculation that returns a value of type a or throws an exception of type e. The behavior of this monad is very similar to dealing with exceptions in imperative languages: errors can be caught or passed on. The main difference is that the monad is fully logically implemented in the standard library in the same Haskell, while in imperative languages, operating system mechanisms are usually used.
  • State sa is a computation that returns a value of type a and has access to a mutable state of type s.
  • Maybe a. The Maybe monad expresses a computation that can be interrupted at any time by returning Nothing. However, we will talk about the implementation of the MonadPlus class for the Maybe type, which expresses the opposite effect: it is a calculation that can be interrupted at any time by returning a particular value.

Implementation of the algorithm

We have two data types, Graph a and Bigraph ab, the first of which represents graphs with vertices labeled with values ​​of type a, and the second represents bipartite graphs with vertices on the left side labeled with values ​​of type a and vertices on the right side labeled with values ​​of type b.

These are not types from the Alga library. Alga does not have a representation for undirected bipartite graphs. I made the types like this for clarity.

We also need helper functions with the following signatures:

-- Список соседей данной вершины.
neighbours :: Ord a => a -> Graph a -> [a]

-- Построить двудольный граф по графу и функции, для каждой вершины
-- выдающей её долю и пометку в новой доле, игнорируя конфликтные рёбра.
toBipartiteWith :: (Ord a, Ord b, Ord c) => (a -> Either b c)
                                         -> Graph a
                                         -> Bigraph b c

-- Список вершин в графе
vertexList :: Ord a => Graph a -> [a]
Сигнатура функции, которую мы будем писать, выглядит так:

type OddCycle a = [a]
detectParts :: Ord a => Graph a -> Either (OddCycle a) (Bigraph a a)

It is easy to see that if we found a conflicting edge during the depth-first search, the odd cycle lies on top of the recursion stack. Thus, to restore it, we need to cut off everything from the recursion stack up to the first occurrence of the last vertex.

We implement depth-first search by maintaining an associative array of beat numbers for each vertex. The recursion stack will be automatically maintained through the implementation of the Functor class of the monad we have chosen: it will only be necessary to put all the vertices from the path into the result returned from the recursive function.

My first idea was to use the Either monad, which seems to implement exactly the effects that we need. The first implementation I wrote was very close to this option. In fact, I had five different implementations at one point and ended up with another one.

First, we need to maintain an associative array of beat IDs - it's something about State. Secondly, we need to be able to stop when a conflict is detected. It can be either Monad for Either or MonadPlus for Maybe. The main difference is that Either can return a value if the calculation has not been stopped, while Maybe returns only information about it in such a case. Since we don't need a separate value on success (it's already stored in the State), we choose Maybe. And at the moment when we need to combine the effects of two monads, the monad transformers, which just combine these effects.

Why did I choose such a complex type? Two reasons. First, the implementation turns out to be very similar to the imperative one. Secondly, we need to manipulate the value returned in case of a conflict when returning back from the recursion to restore the odd loop, which is much easier to do in the Maybe monad.

Thus, we get such an implementation.

{-# LANGUAGE ExplicitForAll #-}
{-# LANGUAGE ScopedTypeVariables #-}

data Part = LeftPart | RightPart

otherPart :: Part -> Part
otherPart LeftPart  = RightPart
otherPart RightPart = LeftPart

type PartMap a = Map.Map a Part
type OddCycle a = [a]

toEither :: Ord a => PartMap a -> a -> Either a a
toEither m v = case fromJust (v `Map.lookup` m) of
                    LeftPart  -> Left  v
                    RightPart -> Right v

type PartMonad a = MaybeT (State (PartMap a)) [a]

detectParts :: forall a. Ord a => Graph a -> Either (OddCycle a) (Bigraph a a)
detectParts g = case runState (runMaybeT dfs) Map.empty of
                     (Just c, _)  -> Left  $ oddCycle c
                     (Nothing, m) -> Right $ toBipartiteWith (toEither m) g
    where
        inVertex :: Part -> a -> PartMonad a
        inVertex p v = ((:) v) <$> do modify $ Map.insert v p
                                      let q = otherPart p
                                      msum [ onEdge q u | u <- neigbours v g ]

        {-# INLINE onEdge #-}
        onEdge :: Part -> a -> PartMonad a
        onEdge p v = do m <- get
                        case v `Map.lookup` m of
                             Nothing -> inVertex p v
                             Just q  -> do guard (q /= p)
                                           return [v]

        processVertex :: a -> PartMonad a
        processVertex v = do m <- get
                             guard (v `Map.notMember` m)
                             inVertex LeftPart v

        dfs :: PartMonad a
        dfs = msum [ processVertex v | v <- vertexList g ]

        oddCycle :: [a] -> [a]
        oddCycle c = tail (dropWhile ((/=) last c) c)

The where block is the core of the algorithm. I will try to explain what is going on inside it.

  • inVertex is the part of the depth-first search where we visit the vertex for the first time. Here we assign a share number to the vertex and run onEdge on all neighbors. This is also the place where we restore the call stack: if msum returned a value, we push the top v there.
  • onEdge is the part where we visit the edge. It is called twice for each edge. Here we check if the vertex has been visited from the other side, and visit it if not. If visited, check if the edge is in conflict. If it is, we return the value - the very top of the recursion stack, where all the other vertices will be hung when returning.
  • processVertex checks for each vertex to see if it has been visited and runs inVertex on it if not.
  • dfs runs processVertex on all vertices.

That's all.

History of the word INLINE

The word INLINE was not in the first implementation of the algorithm, it appeared later. When I tried to find a better implementation, I found that the non-INLINE version was noticeably slower on some graphs. Given that semantically, the functions should work the same way, this surprised me a lot. Even stranger, on another machine with a different version of GHC, there was no noticeable difference.

After I spent a week reading the GHC Core output, I was able to fix the problem with a single line with an explicit INLINE. At some point between GHC 8.4.4 and GHC 8.6.5, the optimizer stopped doing this on its own.

I didn't expect to see such dirt in Haskell programming. However, optimizers still sometimes make mistakes even in our time, and giving them hints is our task. For example, here we know that the function must be inlined, since it is inlined in the imperative version, and this is a reason to give the compiler a hint.

What happened next?

Then I implemented the Hopcroft-Karp algorithm with other monads, and the program ended there.

Thanks to Google Summer of Code, I gained practical experience in functional programming, which not only helped me get an internship at Jane Street next summer (I'm not sure how this place is known even among Habr's knowledgeable audience, but it is one of the few where you can summer to do functional programming), but also introduced me to the wonderful world of applying this paradigm in practice, which is significantly different from my experience in traditional languages.

Source: habr.com

Add a comment