ZuriHac: Practicing Functional Programming

In June of this year, in the small Swiss town of Rapperswil, for the tenth time, an event called ZuriHac. This time it brought together more than five hundred Haskell lovers, from beginners to the founding fathers of the language. Although the organizers call this event a hackathon, it is not a conference or a hackathon in the classic sense. Its format is different from traditional programmers. We learned about ZuriHac by a lucky chance, took part in it, and now we consider it our duty to tell you about an unusual find!

ZuriHac: Practicing Functional Programming

About Us

This article was prepared by two 3rd year students of the Applied Mathematics and Informatics program at the Higher School of Economics - St. Petersburg: Vasily Alferov and Elizaveta Vasilenko. The passion for functional programming for both of us began with a series of lectures by D. N. Moskvin in the 2nd year of the university. Vasily is currently participating in the Google Summer of Code program, in which he is implementing algebraic graphs in Haskell under the guidance of the project team Alga. Elizaveta applied the acquired functional programming skills in her term paper on the implementation of the anti-unification algorithm with subsequent application in type theory.

Event format

The target audience are owners of open source projects, programmers who want to participate in their development, functional programming researchers and people who are simply passionate about Haskell. This year's venue, HSR Hochschule fΓΌr Technik Rapperswil, brought together developers from more than fifty open Haskell projects from around the world to talk about their products and get fresh people interested in their development.

ZuriHac: Practicing Functional Programming

Photo from Twitter ZuriHac

The scheme is very simple: you need to write several proposals about your project in advance and send them to the organizers, who will post information about your project on the event page. In addition, on the first day, the authors of the projects have thirty seconds to very briefly tell from the stage what they are doing and what needs to be done. Then interested people look for the authors and ask in detail about the tasks.

We don't have our own open projects yet, but we really want to contribute to existing ones, so we registered as regular participants. For three days we worked with two groups of developers. It turns out that the joint study of the code and live communication makes the interaction of the project authors and contributors very productive - at ZuriHac we were able to understand areas that were new to us and were able to help two completely different teams by completing a task in each of the projects.

In addition to valuable practice, several lectures and master classes were also read at ZuriHac. We especially remember two lectures. At the first of them, Andrey Mokhov from the University of Newcastle spoke about selective applicative functors, a class of types that should become intermediate between applicative functors and monads. In another lecture, one of the founders of Haskell, Simon Peyton Jones, talked about how type inference works in the GHC compiler.

ZuriHac: Practicing Functional Programming

Lecture by Simon Peyton Jones. Photo from Twitter ZuriHac

The master classes held during the hackathon were divided into three categories depending on the level of the participants. The tasks offered to the participants who joined the development of the projects were also marked with the level of difficulty. The small but friendly community of functional programmers is happy to welcome newcomers into their ranks. To understand the lectures of Andrey Mokhov and Simon Peyton Jones, however, the course of functional programming at the university was very useful to us.

Registration for the event is free for both ordinary participants and project authors. We applied for participation in the first days of June, after which we were quickly transferred from the waiting list to the list of confirmed participants.

And now we will talk about the projects in the development of which we took part.

pandoc

pandoc - this is a universal converter of text documents, in fact - from any format to any. For example, from docx to pdf, or from Markdown to MediaWiki. Its author John MacFarlane is a professor of philosophy at the University of California at Berkeley. In general, Pandoc is quite famous, and some of our acquaintances were surprised when they found out that Pandoc was written in Haskell.

ZuriHac: Practicing Functional Programming

List of document formats supported by Pandoc. There is also a whole graph on the site, but this picture does not fit into the article.

Of course, Pandoc does not implement direct conversion for every pair of formats. To support such an extensive set of transformations, a standard architectural solution is used: first, the entire document is translated into a special internal intermediate representation, and then a document in a different format is generated from this internal representation. The developers call the internal representation "AST", which stands for Abstract Syntax Tree, or abstract syntax tree. It is very easy to look at the intermediate representation: for this you just need to set the output format to "native"

$ cat example.html
<h1>Hello, World!</h1>

$ pandoc -f html -t native example.html
[Header 1 ("hello-world",[],[]) [Str "Hello,",Space,Str "World!"]]

Readers who have worked at least a little with Haskell can already assume from this small example that Pandoc is written in Haskell: the output of this command is a string representation of the internal structures of Pandoc, created in the similarity as it is usually done in Haskell, e.g. in the standard library.

So, here you can see that the internal representation is a recursive structure, in each internal node of which there is a list. For example, at the top level there is a list of one element - the first level heading with the attributes "hello-world",[],[]. Hidden inside this header is a list of the string β€œHello,” followed by a space and the string β€œWorld!”.

As you can see, the internal representation is not much different from HTML. It is a tree, where each internal node reports some information about the formatting of its descendants, and the actual contents of the document are in the leaves.

If you go down to the level of a specific implementation, the data type for the entire document is defined like this:

data Pandoc = Pandoc Meta [Block]

Here, Block is exactly the internal vertices that are mentioned above, and Meta is meta-information about the document, such as the title, date of creation, authors - this is different for different formats, and Pandoc tries to save such information whenever possible when translating from format to format.

Almost all constructors of type Block - for example, Header or Para (paragraph) - take as arguments attributes and a list of lower-level vertices - Inline, as a rule. For example, Space or Str are constructors of the Inline type, and the HTML tag also turns into its own special Inline. We do not see the point in giving the full definition of these types, but note that it can be viewed here here.

Interestingly, the Pandoc type is a monoid. This means that there is some kind of empty document, and that the documents can be stacked together. This is convenient to use when writing Readers - you can break the document into parts with arbitrary logic, parse each separately, and then put everything together into one document. In this case, meta-information will be collected from all parts of the document at once.

When converting, say, from LaTeX to HTML, first a special module called LaTeXReader converts the input document to AST, then another module called HTMLWriter converts the AST to HTML. Thanks to this architecture, there is no need to write a quadratic number of conversions - it is enough to write Reader and Writer for each new format, and all possible pairs of conversions will be automatically supported.

It is clear that such an architecture also has its drawbacks, long predicted by specialists in the field of software architecture. The most significant is the cost of making changes to the syntax tree. If the change is big enough, you will have to change the code in all Readers and Writers. For example, one of the challenges facing Pandoc developers is to support complex table formats. Right now, Pandoc only knows how to do the simplest tables, with a header, columns, and a value in each cell. Let's say the colspan attribute in HTML will simply be ignored. One of the reasons for this behavior is the lack of a unified table representation scheme in all or at least many formats - accordingly, it is not clear in what form tables should be stored in the internal representation. But even after choosing a specific view, you will need to change absolutely all Readers and Writers that support working with tables.

Haskell was chosen not only because of the great love of the authors for functional programming. Haskell is known for its extensive text processing capabilities. One example is the library parsec - a library that actively uses the concepts of functional programming - monoids, monads, applicative and alternative functors - to write arbitrary parsers. The full power of Parsec can be seen in Example from HaskellWiki, which parses a complete parser for a simple imperative programming language. Of course, Parsec is actively used in Pandoc as well.

In short, monads are used for sequential parsing, when one comes first, and then the other. For example, in this example:

whileParser :: Parser Stmt
whileParser = whiteSpace >> statement

First you need to read the space, and then statement - which also has the Parser Stmt type.

Alternative functors are used for rollback if parsing fails. For example,

statement :: Parser Stmt
statement = parens statement <|> sequenceOfStmt

Means that you need to either try to read the statement in brackets, or sequentially try to read several statements.

Applicative functors are used primarily as shortcuts for monads. For example, let the tok function read some token (this is a real function from LaTeXReader). Let's look at this combination.

const <$> tok <*> tok

It will read two tokens in a row and return the first of them.

For all of these classes, Haskell has nice symbolic operators, which makes Reader programming look like ASCII art. Just take a look at this wonderful code.

Our tasks were related to LaTeXReader. Vasily's task was to support the mbox and hbox commands, which are useful when writing packages in LaTeX. Elizabeth's responsibility was to support the epigraph command, which allows you to format epigraphs in LaTeX documents.

Hatrace

UNIX-like operating systems often implement the ptrace system call. It is useful for debugging and simulating program environments, allowing you to track the system calls a program makes. For example, the very useful strace utility uses ptrace internally.

Hatrace is a library that provides an interface to ptrace in Haskell. The fact is that ptrace itself is highly sophisticated and it is rather difficult to use it directly, especially from functional languages.

Hatrace behaves like strace when run and takes similar arguments. It differs from strace in that it is also a library that provides a simpler interface than just ptrace.

With the help of hatrace we have already caught one unpleasant bug in the GHC Haskell compiler - being killed at the wrong time, it generates incorrect object files and does not recompile them when it is restarted. Scripting on system calls made it possible to reliably reproduce the error in one run, when random kills reproduced the error in about two hours.

We added system call interfaces to the library - Elizabeth added brk, and Vasily added mmap. As a result of our work, it is possible to more easily and accurately use the arguments of these system calls when using the library.

Source: habr.com

Add a comment