LLVM from a Go perspective

Compiler development is a very difficult task. But, fortunately, with the development of projects like LLVM, the solution to this problem is greatly simplified, which allows even a lone programmer to create a new language that is close in performance to C. Working with LLVM is complicated by the fact that this system is represented by a huge amount of code, provided with little documentation . In order to try to fix this shortcoming, the author of the material, the translation of which we are publishing today, is going to demonstrate code examples written in Go and show how they are first translated into Go SSA, and then - in LLVM IR using a compiler tinyGO. The Go SSA and LLVM IR code has been slightly edited to remove things that do not apply to the explanations given here, in order to make these explanations more understandable.

LLVM from a Go perspective

First example

The first function I'm going to look at here is a simple mechanism for adding numbers:

func myAdd(a, b int) int{
    return a + b
}

This function is very simple, and, perhaps, nothing simpler can be invented. It translates to the following Go SSA code:

func myAdd(a int, b int) int:
entry:
    t0 = a + b                                                    int
    return t0

With this representation of the function, hints about data types are placed on the right, in most cases they can be ignored.

This small example already allows you to see the essence of one of the aspects of SSA. Namely, when converting the code to the SSA form, each expression is broken down into the most elementary parts of which it consists. In our case, the command return a + bis actually two operations: adding two numbers and returning the result.

In addition, here you can see the basic blocks of the program, in this code there is only one block - the input (entry block). We will talk more about blocks below.

The Go SSA code is easily converted to LLVM IR:

define i64 @myAdd(i64 %a, i64 %b) {
entry:
  %0 = add i64 %a, %b
  ret i64 %0
}

You can see that although other syntactic constructions are used here, the structure of the function has basically not changed. The LLVM IR code is a little stronger than the Go SSA code, similar to C. Here, in the function declaration, the data type returned to it is first described, the argument type is indicated before the argument name. In addition, to simplify IR parsing, the names of global entities are preceded by the symbol @, and before the local names - the symbol % (a function is also considered a global entity).

One thing to note about this code is that the decision to represent a Go type int, which can be a 32-bit or 64-bit value, depending on the compiler and the target of compilation, is accepted when generating the LLVM IR code. This is one of the many reasons why the LLVM IR code is not, as many people think, platform-independent. Such code, created for one platform, cannot simply be taken and compiled for another platform (unless you approach this task with extreme care).

Another interesting point worth noting is that the type i64 is not a signed integer: it is neutral in terms of representing the sign of the number. Depending on the instruction, it can represent both signed and unsigned numbers. In the case of the representation of the addition operation, this does not matter, so there is no difference in working with signed or unsigned numbers. Here I would like to note that in the C language, overflow of a signed integer variable leads to undefined behavior, so the Clang frontend adds a flag to the operation nsw (no signed wrap), which tells LLVM that it can assume that addition never overflows.

This may be important for some optimizations. For example, adding two values i16 on a 32-bit platform (with 32-bit registers) needs, after performing an addition, a sign extension operation in order to stay in the range i16. Because of this, it is often more efficient to perform integer operations given the machine size of the register.

What happens next with this IR code is not of particular interest to us now. The code is optimized (but in the case of such a simple example as ours, nothing is already optimized), and then converted to machine code.

Second example

The next example we'll look at is a bit more complicated. Namely, we are talking about a function that sums a slice of integers:

func sum(numbers []int) int {
    n := 0
    for i := 0; i < len(numbers); i++ {
        n += numbers[i]
    }
    return n
}

This code translates to the following Go SSA code:

func sum(numbers []int) int:
entry:
    jump for.loop
for.loop:
    t0 = phi [entry: 0:int, for.body: t6] #n                       int
    t1 = phi [entry: 0:int, for.body: t7] #i                       int
    t2 = len(numbers)                                              int
    t3 = t1 < t2                                                  bool
    if t3 goto for.body else for.done
for.body:
    t4 = &numbers[t1]                                             *int
    t5 = *t4                                                       int
    t6 = t0 + t5                                                   int
    t7 = t1 + 1:int                                                int
    jump for.loop
for.done:
    return t0

Here you can already see more constructs that are typical for representing code in the form of SSA. Probably the most obvious feature of this code is the fact that there are no structured commands to control the flow of computation. To control the flow of computation, there are only conditional and unconditional jumps, and if we consider this command as a command to control the flow, the return command.

In fact, here you can pay attention to the fact that the program is not divided into blocks using curly braces (as in the languages ​​of the C family). It is separated by labels, which is reminiscent of assembly languages, and is presented in the form of basic blocks. In SSA, basic blocks are called contiguous sequences of code starting with a label and ending with basic block completion instructions, for example − return и jump.

Another interesting detail of this code is provided by the instruction phi. The instruction is quite unusual, it may take some time to figure it out. remember, that SSA is an abbreviation for Static Single Assignment. This is an intermediate representation of the code used by compilers, in which each variable is assigned a value only once. This is great for expressing simple functions like ours. myAddshown above, but is not suitable for more complex functions like the function discussed in this section sum. In particular, during the execution of the loop, the variables change i и n.

SSA circumvents the restriction on assigning values ​​to variables once using a so-called instruction phi (its name is taken from the Greek alphabet). The fact is that in order for the SSA representation of the code to be formed for languages ​​like C, you have to resort to some tricks. The result of calling this instruction is the current value of the variable (i or n), and the list of base blocks is used as its parameters. For example, consider this statement:

t0 = phi [entry: 0:int, for.body: t6] #n

Its meaning is as follows: if the previous base block was a block entry (input), then t0 is a constant 0, and if the previous base block was for.body, then you need to take the value t6 from this block. All this may look rather mysterious, but thanks to this mechanism, the work of SSA is ensured. From a human point of view, all this makes the code harder to understand, but the fact that each value is assigned only once greatly simplifies many optimizations.

Note that if you write your own compiler, then you usually don't have to deal with this sort of thing. Even Clang doesn't generate all these instructions phi, it uses the mechanism alloca (it resembles working with ordinary local variables). Then, when running an LLVM optimization pass called mem2reg, instructions alloca converted to the SSA form. TinyGo, however, receives input from the Go SSA, which, conveniently, has already been converted to SSA form.

Another novelty of the considered fragment of the intermediate code is that access to the elements of the slice by index is represented as an operation for calculating the address and an operation for dereferencing the resulting pointer. Here you can see the direct addition of constants to the IR code (for example - 1:int). In the example with the function myAdd this has not been used. Now, having dealt with these features, let's take a look at what this code will turn into when it is converted to the LLVM IR form:

define i64 @sum(i64* %ptr, i64 %len, i64 %cap) {
entry:
  br label %for.loop

for.loop:                                         ; preds = %for.body, %entry
  %0 = phi i64 [ 0, %entry ], [ %5, %deref.next ]
  %1 = phi i64 [ 0, %entry ], [ %6, %deref.next ]
  %2 = icmp slt i64 %1, %len
  br i1 %2, label %for.body, label %for.done

for.body:                                         ; preds = %for.loop
  %3 = getelementptr i64, i64* %ptr, i64 %1
  %4 = load i64, i64* %3
  %5 = add i64 %0, %4
  %6 = add i64 %1, 1
  br label %for.loop

for.done:                                         ; preds = %for.loop
  ret i64 %0
}

Here, as before, we can see the same structure, which includes other syntactic constructions. For example, in calls phi swapped values ​​and labels. However, there is something here that is worth paying special attention to.

For starters, here you can see a completely different function signature. LLVM does not support slices, as a result, as an optimization, the TinyGo compiler that generated this intermediate code divided the description of this data structure into parts. It could represent three slice elements (ptr, len и cap) as a structure (struct), but representing them as three separate entities allows for some optimizations. Other compilers may represent the slice in some other way, depending on the target platform's function calling conventions.

Another interesting feature of this code is the use of the instruction getelementptr (often abbreviated as GEP).

This instruction works with pointers and is used to get a pointer to a slice element. For example, let's compare it to the following code written in C:

int* sliceptr(int *ptr, int index) {
    return &ptr[index];
}

Or with the following equivalent to this:

int* sliceptr(int *ptr, int index) {
    return ptr + index;
}

The most important thing here is that the instruction getelementptr does not perform dereference operations. It just calculates a new pointer based on the existing one. It can be taken as instructions mul и add at the hardware level. You can read more about the GEP instruction here.

Another interesting feature of this intermediate code is the use of the instruction icmp. This is a general purpose instruction used to implement integer comparison. The result of executing this instruction is always a value of type i1 is a boolean value. In this case, the comparison is made using the keyword slt (signed less than), since we are comparing two numbers previously represented by the type int. If we were comparing two unsigned integers, then we would use icmp, and the keyword used in the comparison would be ult. To compare floating point numbers, another instruction is used, fcmp, which works in a similar way.

Results

I believe that in this article I have considered the most important features of LLVM IR. Of course, there is a lot more here. In particular, in the intermediate representation of the code, there can be many annotations that allow taking into account certain features of the code, known to the compiler, that cannot be expressed in IR in another way, during optimization passes. For example, this is the flag inbounds GEP instructions, or flags nsw и nuw, which can be added to the instructions add. The same goes for the keyword. private, indicating to the optimizer that the function marked by it will not be referenced from outside the current compilation unit. This allows for many interesting inter-procedural optimizations, such as eliminating unused arguments.

Details about LLVM can be found in documentation, which you will refer to frequently when developing your own compiler based on LLVM. Here guide, which discusses the development of a compiler for a very simple language. Both of these sources of information will come in handy when building your own compiler.

Dear Readers, Do you use LLVM?

LLVM from a Go perspective

Source: habr.com

Add a comment