The path to typechecking 4 million lines of Python code. Part 1

Today we bring to your attention the first part of the translation of the material on how Dropbox deals with type control of Python code.

The path to typechecking 4 million lines of Python code. Part 1

Dropbox writes a lot in Python. It's a language that we use extremely widely, both for back-end services and desktop client applications. We also use Go, TypeScript and Rust a lot, but Python is our main language. Considering our scale, and we are talking about millions of lines of Python code, it turned out that the dynamic typing of such code unnecessarily complicated its understanding and began to seriously affect labor productivity. To mitigate this problem, we have begun to gradually transition our code to static type checking using mypy. This is probably the most popular standalone type checking system for Python. Mypy is an open source project, its main developers work in Dropbox.

Dropbox was one of the first companies to implement static type checking in Python code at this scale. Mypy is used in thousands of projects these days. This tool countless times, as they say, "tested in battle." We have come a long way to get to where we are now. Along the way, there were many unsuccessful undertakings and failed experiments. This post covers the history of static type checking in Python, from its rocky beginnings as part of my research project, to the present day, when type checking and type hinting have become commonplace for countless developers who write in Python. These mechanisms are now supported by many tools such as IDEs and code analyzers.

β†’ Read the second part

Why is type checking necessary?

If you've ever used dynamically typed Python, you might have some confusion as to why there's been such a fuss around static typing and mypy lately. Or maybe you like Python precisely because of its dynamic typing, and what is happening simply upsets you. The key to the value of static typing is the scale of the solutions: the larger your project, the more you lean towards static typing, and in the end, the more you really need it.

Suppose a certain project has reached the size of tens of thousands of lines, and it turned out that several programmers are working on it. Looking at a similar project, based on our experience, we can say that understanding its code will be the key to keeping developers productive. Without type annotations, it can be difficult to figure out, for example, what arguments to pass to a function, or what types a function can return. Here are typical questions that are often difficult to answer without using type annotations:

  • Can this function return None?
  • What should this argument be? items?
  • What is the attribute type id: int is it, str, or maybe some custom type?
  • Should this argument be a list? Is it possible to pass a tuple to it?

If you look at the following type-annotated code snippet and try to answer similar questions, it turns out that this is the simplest task:

class Resource:
    id: bytes
    ...
    def read_metadata(self, 
                      items: Sequence[str]) -> Dict[str, MetadataItem]:
        ...

  • read_metadata does not return None, since the return type is not Optional[…].
  • Argument items is a sequence of lines. It cannot be iterated randomly.
  • Attribute id is a string of bytes.

In an ideal world, one would expect that all such subtleties would be described in the built-in documentation (docstring). But experience gives a lot of examples of the fact that such documentation is often not observed in the code with which you have to work. Even if such documentation is present in the code, one cannot count on its absolute correctness. This documentation can be vague, inaccurate, and open to misunderstandings. In large teams or large projects, this problem can become extremely acute.

While Python excels in the early or intermediate stages of projects, at some point successful projects and companies that use Python may face the vital question: β€œShould we rewrite everything in a statically typed language?”.

Type checking systems like mypy solve the above problem by providing the developer with a formal language for describing types, and by checking that type declarations match the program implementation (and, optionally, by checking for their existence). In general, we can say that these systems put at our disposal something like carefully checked documentation.

The use of such systems has other advantages, and they are already completely non-trivial:

  • The type checker can detect some minor (and not so minor) bugs. A typical example is when they forget to process a value None or some other special condition.
  • Code refactoring is greatly simplified because the type checking system is often very accurate about what code needs to be changed. At the same time, we do not need to hope for 100% code coverage with tests, which, in any case, is usually not feasible. We don't need to delve into the depths of the stack trace to find out the cause of the problem.
  • Even on large projects, mypy can often do full type checking in a fraction of a second. And the execution of tests usually takes tens of seconds or even minutes. The type checking system gives the programmer instant feedback and allows him to do his job faster. He no longer needs to write fragile and hard to maintain unit tests that replace real entities with mocks and patches just to get code test results faster.

IDEs and editors such as PyCharm or Visual Studio Code use the power of type annotations to provide developers with code completion, error highlighting, and support for commonly used language constructs. And these are just some of the benefits of typing. For some programmers, all this is the main argument in favor of typing. This is something that benefits immediately after implementation. This use case for types does not require a separate type checking system such as mypy, although it should be noted that mypy helps keep type annotations consistent with code.

Background of mypy

The history of mypy began in the UK, in Cambridge, a few years before I joined Dropbox. I have been involved, as part of my doctoral research, in the unification of statically typed and dynamic languages. I was inspired by an article on incremental typing by Jeremy Siek and Walid Taha, and by the Typed Racket project. I tried to find ways to use the same programming language for various projects - from small scripts to code bases consisting of many millions of lines. At the same time, I wanted to ensure that in a project of any scale, one would not have to make too big compromises. An important part of all this was the idea of ​​gradually moving from an untyped prototype project to a comprehensively tested statically typed finished product. These days, these ideas are largely taken for granted, but in 2010 it was a problem that was still being actively explored.

My original type checking work was not aimed at Python. Instead, I used a small "homemade" language Alor. Here is an example that will let you understand what we are talking about (type annotations are optional here):

def Fib(n as Int) as Int
  if n <= 1
    return n
  else
    return Fib(n - 1) + Fib(n - 2)
  end
end

Using a simplified native language is a common approach used in scientific research. This is so, not least because it allows you to quickly conduct experiments, and also due to the fact that what has nothing to do with research can be easily ignored. Real-world programming languages ​​tend to be large-scale phenomena with complex implementations, and this slows down experimentation. However, any results based on a simplified language look a little suspicious, since in obtaining these results the researcher may have sacrificed considerations important for the practical use of languages.

My type checker for Alore looked very promising, but I wanted to test it out by experimenting with real code, which, you might say, wasn't written in Alore. Luckily for me, the Alore language was largely based on the same ideas as Python. It was easy enough to change the typechecker so that it could work with Python's syntax and semantics. This allowed us to try type checking in open source Python code. In addition, I wrote a transpiler to convert code written in Alore to Python code and used it to translate my typechecker code. Now I had a type checking system written in Python that supported a subset of Python, some kind of that language! (Certain architectural decisions that made sense for Alore were poorly suited for Python, and this is still noticeable in parts of the mypy codebase.)

In fact, the language supported by my type system could not quite be called Python at this point: it was a variant of Python due to some limitations of the Python 3 type annotation syntax.

It looked like a mixture of Java and Python:

int fib(int n):
    if n <= 1:
        return n
    else:
        return fib(n - 1) + fib(n - 2)

One of my ideas at the time was to use type annotations to improve performance by compiling this kind of Python to C, or perhaps JVM bytecode. I got to the stage of writing a compiler prototype, but I abandoned this idea, since type checking itself looked quite useful.

I ended up presenting my project at PyCon 2013 in Santa Clara. I also talked about this with Guido van Rossum, the benevolent Python dictator for life. He convinced me to drop my own syntax and stick with the standard Python 3 syntax. Python 3 supports function annotations, so my example could be rewritten as shown below, resulting in a normal Python program:

def fib(n: int) -> int:
    if n <= 1:
        return n
    else:
        return fib(n - 1) + fib(n - 2)

I had to make some compromises (first of all, I want to note that I invented my own syntax for this very reason). In particular, Python 3.3, the most recent version of the language at the time, did not support variable annotations. I discussed with Guido by e-mail various possibilities for syntactic design of such annotations. We decided to use type comments for variables. This served the intended purpose, but was somewhat cumbersome (Python 3.6 gave us a nicer syntax):

products = []  # type: List[str]  # Eww

Type comments also came in handy to support Python 2, which doesn't have built-in support for type annotations:

f fib(n):
    # type: (int) -> int
    if n <= 1:
        return n
    else:
        return fib(n - 1) + fib(n - 2)

It turned out that these (and other) trade-offs didn't really matter - the benefits of static typing meant that users soon forgot about less-than-perfect syntax. Since no special syntactic constructs were used in the type-checked Python code, existing Python tools and code processing processes continued to work normally, making it much easier for developers to learn the new tool.

Guido also convinced me to join Dropbox after I completed my graduate thesis. This is where the most interesting part of the mypy story begins.

To be continued ...

Dear Readers, If you use Python, please tell us about the scale of projects you develop in this language.

The path to typechecking 4 million lines of Python code. Part 1
The path to typechecking 4 million lines of Python code. Part 1

Source: habr.com

Add a comment