The path to typechecking 4 million lines of Python code. Part 2

Today we are publishing the second part of the translation of the material on how Dropbox organized type control of several million lines of Python code.

The path to typechecking 4 million lines of Python code. Part 2

β†’ Read the first part

Official type support (PEP 484)

We did our first serious experiments with mypy on Dropbox during Hack Week 2014. Hack Week is a one-week event hosted by Dropbox. During this time, employees can work on anything! Some of Dropbox's most famous tech projects started at events like this. As a result of this experiment, we concluded that mypy looks promising, although this project has not yet been ready for widespread use.

At the time, the idea of ​​standardizing Python's type hinting systems was in the air. As I said, since Python 3.0 it was possible to use type annotations on functions, but these were just arbitrary expressions, with no defined syntax or semantics. During program execution, these annotations were, for the most part, simply ignored. After Hack Week, we started working on semantic standardization. This work has led to the emergence PEP 484 (Guido van Rossum, Lukasz Langa and myself worked together on this document).

Our motives could be viewed from two sides. First, we hoped that the entire Python ecosystem could adopt a common approach to using type hints (type hints is a term used in Python as an analogue of "type annotations"). This, given the possible risks, would be better than using many mutually incompatible approaches. Second, we wanted to openly discuss the mechanics of type annotation with a lot of people in the Python community. In part, this desire was dictated by the fact that we would not want to look like "apostates" from the basic ideas of the language in the eyes of the general public of Python programmers. It is a dynamically typed language known for "duck typing". In the community, at the very beginning, a somewhat suspicious attitude towards the idea of ​​static typing could not help but arise. But that sentiment eventually faded after it became clear that static typing wasn't going to be mandatory (and after people realized it was really useful).

The type hint syntax that was eventually adopted was very similar to that supported by mypy at the time. PEP 484 was released with Python 3.5 in 2015. Python was no longer a dynamically typed language. I like to think of this event as a milestone in the history of Python.

Start of migration

At the end of 2015, a team of three people was created at Dropbox to work on mypy. It included Guido van Rossum, Greg Price and David Fisher. From that moment on, the situation began to develop extremely rapidly. The first obstacle to the growth of mypy was performance. As I hinted above, in the early days of the project, I was thinking about translating the mypy implementation into C, but that idea has been crossed off the list for now. We are stuck on the fact that the CPython interpreter was used to start the system, which is not fast enough for tools like mypy. (The PyPy project, an alternative implementation of Python with a JIT compiler, didn't help either.)

Fortunately, here some algorithmic improvements came to our aid. The first powerful "accelerator" was the implementation of incremental checks. The idea behind this improvement was simple: if all of the module's dependencies haven't changed since the previous run of mypy, then we can use the data cached during the previous session to work with dependencies. We only needed to perform type checking on the changed files and on those files that depended on them. Mypy even went a little further: if the external interface of a module did not change, mypy considered that other modules that import this module did not need to be rechecked.

Incremental checking helped us a lot when annotating large amounts of existing code. The point is that this process usually involves many iterative runs of mypy as annotations are gradually added to the code and gradually improved. The first run of mypy was still very slow, as it required a lot of dependencies to be checked. Then, in order to improve the situation, we implemented a remote caching mechanism. If mypy detects that the local cache is probably out of date, it downloads the current cache snapshot for the entire codebase from the centralized repository. It then performs an incremental check using that snapshot. This is one big step forward in mypy's performance improvement journey.

This was a period of rapid and natural adoption of Dropbox's type checking system. By the end of 2016, we already had about 420000 lines of Python code with type annotations. Many users have become enthusiastic about type checking. More development teams used Dropbox mypy.

Everything looked good then, but we still had a lot to do. We began to carry out periodic internal user surveys in order to identify problem areas of the project and understand what issues need to be resolved first (this practice is still used in the company today). The most important, as it became clear, were two tasks. The first was that we needed more code coverage with types, the second was that mypy needed to work faster. It was very clear that our work on speeding up mypy and getting it into the company's projects was far from over. We, fully realizing the importance of these two tasks, took up their solution.

More performance!

Incremental checks made mypy faster, but it still wasn't fast enough. Many incremental checks lasted about a minute. The reason for this was cyclical imports. This probably won't surprise someone who has worked with large codebases written in Python. We had sets of hundreds of modules, each of which indirectly imported all the others. If any file in the import loop changed, mypy had to process all the files in that loop, and often any modules importing modules from that loop as well. One such cycle was the infamous β€œaddiction tangle” that caused a lot of trouble at Dropbox. Once this structure contained several hundred modules, while it was imported, directly or indirectly, many tests, it was also used in the production code.

We considered unraveling circular dependencies, but we didn't have the resources to do so. There was too much code that we were not familiar with. As a result, we came up with an alternative approach. We decided to make mypy run fast even with "dependency tangles". We have achieved this goal with the mypy daemon. A daemon is a server process that implements two interesting features. Firstly, it keeps in memory information about the entire codebase. This means that every time you run mypy you don't have to load cached data related to thousands of imported dependencies. Secondly, he carefully, at the level of small structural units, analyzes the dependencies between functions and other entities. For example, if the function foo calls a function bar, then there is a dependence foo from bar. When a file changes, the daemon first, in isolation, processes only the changed file. It then looks for externally visible changes to that file, such as changed function signatures. The daemon uses detailed information about imports only to double-check those functions that actually use the modified function. Usually, with this approach, there are very few functions to check.

Implementing all of this was tricky, as the original implementation of mypy was heavily focused on processing one file at a time. We had to deal with a lot of edge situations, the occurrence of which required repeated checks in cases where something changed in the code. For example, this happens when a class is assigned a new base class. After we did what we wanted, we were able to reduce the execution time of most incremental checks to a few seconds. It seemed like a big win to us.

Even more performance!

Together with the remote caching I talked about above, the mypy daemon has almost completely solved the problems that arise when a programmer often runs type checking, making changes to a small number of files. However, system performance in its worst-case use case was still far from optimal. A clean start of mypy could take over 15 minutes. And that was far more than we would have liked. Every week things got worse as programmers kept writing new code and adding annotations to existing code. Our users were still hungry for more performance, but we were happy to meet them.

We decided to go back to one of the early mypy ideas. Namely, to converting Python code to C code. Experiments with Cython (this is a system that allows you to translate code written in Python into C code) did not give us any visible speedup, so we decided to revive the idea of ​​writing our own compiler. Since the mypy codebase (written in Python) already contained all the necessary type annotations, it seemed worthwhile to try to use these annotations to speed up the system. I quickly created a prototype to test this idea. It showed more than a 10-fold increase in performance on various micro-benchmarks. Our idea was to compile Python modules into C modules using Cython, and turn type annotations into runtime typechecks (usually type annotations are ignored at runtime and only used by typecheckers). ). We were, in fact, planning to translate mypy's implementation from Python into a language that was created statically typed, that would look (and, for the most part, work) exactly like Python. (This kind of cross-language migration has become something of a mypy project tradition. The original implementation of mypy was written in Alore, then there was a syntactic hybrid of Java and Python).

Focusing on the CPython extensions API was key to not losing project management capabilities. We didn't have to implement a virtual machine or any libraries that mypy needed. In addition, the entire Python ecosystem would still be available to us, all tools (such as pytest) would be available. This meant that we could continue to use interpreted Python code during development, which would allow us to continue to work using a very fast scheme for making changes to the code and testing it, rather than waiting for the code to compile. It looked like we were doing great, so to speak, sitting on two chairs, and we liked it.

The compiler, which we named mypyc (because it uses mypy as a front-end for parsing types), turned out to be a very successful project. Overall - we achieved about 4x speedup of frequent mypy launches without using caching. It took a small team of Michael Sullivan, Ivan Levkivsky, Hugh Khan, and myself to develop the core of the mypyc project in about 4 calendar months. This amount of work was much less ambitious than what would be needed to rewrite mypy, for example, in C ++ or in Go. And we had to make much fewer changes to the project than we would have had to make when rewriting it in another language. We also hoped that we would be able to bring mypyc to such a level that other programmers from Dropbox could use it to compile and speed up their code.

To achieve this level of performance, we had to apply some interesting engineering solutions. For example, the compiler can speed up many operations by using fast, low-level C constructs. For example, a call to a compiled function is translated into a call to a C function. And such a call is much faster than calling an interpreted function. Some operations, such as dictionary lookups, were still limited to using normal CPython C-API calls, which were only marginally faster after compilation. We were able to get rid of the additional load on the system created by the interpretation, but this in this case gave only a small performance gain.

To identify the most common "slow" operations, we performed code profiling. Armed with this data, we tried to either tweak mypyc so that it would generate faster C code for such operations, or rewrite the corresponding Python code using faster operations (and sometimes we simply did not have a simple enough solution for that or other problem). Rewriting the Python code has often proven to be an easier solution to a problem than having the compiler do the same transformation automatically. In the long run, we wanted to automate many of these transformations, but at the time, we were focused on speeding up mypy with as little effort as possible. And we, moving towards this goal, cut off several corners.

To be continued ...

Dear Readers, What were your impressions of the mypy project at the time you found out about it?

The path to typechecking 4 million lines of Python code. Part 2
The path to typechecking 4 million lines of Python code. Part 2

Source: habr.com

Add a comment