The path to typechecking 4 million lines of Python code. Part 3

We present to your attention the third part of the translation of material about the path that Dropbox has gone through, implementing a system for checking the types of Python code.

The path to typechecking 4 million lines of Python code. Part 3

β†’ Previous parts: first ΠΈ second

Achieving 4 million lines of typed code

Another major challenge (and the second most common concern for those who participated in internal surveys) was to increase the amount of code in Dropbox covered by type checks. We have tried several approaches to this task, from naturally growing the size of the typed codebase to focusing mypy team members on static and dynamic automated type inference. As a result, it seemed that there was no simple winning strategy, but we were able to achieve rapid growth in the amount of annotated code by combining many approaches.

As a result, in our largest Python repository (with backend code), the number of lines of annotated code has reached almost 4 million. The work on static code typing was carried out in about three years. Mypy now supports various types of code coverage reports that make it easier to keep track of typing progress. In particular, we can report on code with type ambiguities, such as explicit use of type Any in annotations that cannot be verified, or with annotations such as third-party library imports that do not have type annotations. We, as part of the Dropbox type checking accuracy project, have contributed to improving type definitions (so-called stub files) for some popular open source libraries in a centralized Python repository typeshed.

We have implemented (and standardized in subsequent PEPs) new type system features that allow more precise types to be used for some specific Python patterns. A notable example of this is TypeDict, which provides types for JSON-like dictionaries that have a fixed set of string keys, each with a value of its own type. We will continue to expand the type system. Perhaps our next step will be to improve support for Python's number manipulation capabilities.

The path to typechecking 4 million lines of Python code. Part 3
Number of lines of annotated code: server

The path to typechecking 4 million lines of Python code. Part 3
Number of lines of annotated code: client

The path to typechecking 4 million lines of Python code. Part 3
Total number of lines of annotated code

Here is an overview of the main features of the steps we took to increase the amount of annotated code in Dropbox:

Severity of annotation. We gradually increased the requirements for the strictness of annotating new code. We started with linter tips that suggested adding annotations to files that already have some annotations. We now require type annotations in new Python files and in most existing files.

Typing reports. We send weekly reports to teams about the level of typing of their code and give advice on what to annotate first.

Popularizing mypy. We talk about mypy at various events and talk to teams to help them get started using type annotations.

Polls. We conduct periodic user surveys to identify major issues. We are ready to go far enough to solve these problems (even to the point of creating a new language to speed up mypy!).

Performance. We have greatly improved the performance of mypy through the use of the daemon and mypyc. This was done in order to smooth out the inconveniences that arise during the annotation process, and in order to be able to work with large amounts of code.

Integration with editors. We've created tools to support running mypy in editors popular with Dropbox. This includes PyCharm, Vim, and VS Code. This greatly simplified the process of annotating the code and checking its performance. Such actions are usually typical when annotating existing code.

Static analysis. We have created a tool for deriving function signatures using static analysis tools. This tool may only work in relatively simple situations, but it has helped us increase our code coverage with types without much effort.

Support for third party libraries. Many of our projects use the SQLAlchemy toolkit. It takes advantage of the dynamic features of Python that PEP 484 types are unable to model directly. We, in accordance with PEP 561, created the corresponding stub file and wrote a plugin for mypy (open source) that improves support for SQLAlchemy.

The difficulties we encountered

The path to 4 million lines of typed code has not always been easy for us. Along the way, we met a lot of holes and made a few mistakes. Here are some of the problems we encountered. We hope that the story about them will help others to avoid similar problems.

Skipped files. We started by checking only a small amount of files. Anything not included in these files was not checked. Files were added to the check list when the first annotations appeared in them. If something was imported from a module located outside the scope of the check, then it was about working with values ​​of type Anywhich were not tested at all. This resulted in a significant loss of typing accuracy, especially in the early stages of the migration. This approach has worked remarkably well so far, although it has been common for adding files to the scan area to reveal problems in other parts of the codebase. In the worst case, when two isolated areas of code were combined, in which, independently of each other, types were already checked, it turned out that the types of these areas were incompatible with each other. This led to the need to make many changes to the annotations. Now, looking back, we realize that we should have added core library modules to mypy's type-checking scope as early as possible. This would make our work much more predictable.

Annotating old code. When we started, we had about 4 million lines of existing Python code. It was clear that annotating all this code was no easy task. We have created a tool called PyAnnotate that can collect type information during test execution and can add type annotations to code based on the information collected. However, we did not notice a particularly wide implementation of this tool. Gathering type information was slow, and auto-generated annotations often required many manual edits. We thought about running this tool automatically on every code check, or gathering type information based on some small amount of actual network requests, but decided not to because either approach is too risky.

As a result, it can be noted that most of the code was manually annotated by its owners. We, in order to direct this process in the right direction, prepare reports on especially important modules and functions that need to be annotated. For example, it's important to type annotate a library module used in hundreds of places. But the old service, which is being replaced by a new one, is no longer so important to annotate. We are also experimenting with using static analysis to generate type annotations for legacy code.

Cyclic imports. Above, I talked about circular imports (about β€œtangles of dependencies”), the existence of which made it difficult for mypy to speed up. We also had to work hard to make mypy support all kinds of idioms that these circular imports are causing. We recently completed a major system redesign that fixed most of mypy's issues with circular imports. These problems actually stemmed from the very early days of the project, back from Alore, the teaching language that the mypy project was originally focused on. The Alore syntax makes it easy to solve the problems of cyclic import commands. Modern mypy has inherited some of the limitations from its earlier, simpler implementation (which was great for Alore). Python makes it difficult to work with circular imports, mainly because of the ambiguity of expressions. For example, an assignment operation might actually define a type alias. Mypy isn't always able to detect things like this until most of the import loop has been processed. There were no such ambiguities in Alore. Poor decisions made in the early stages of system development can give the programmer an unpleasant surprise many years later.

Results: the path to 5 million lines of code and to new horizons

The mypy project has come a long way from early prototypes to a type-control system for 4 million lines of production code. As mypy evolved, Python's type hinting was standardized. A powerful ecosystem has developed around typing Python code these days. It has found a place for library support, it has helpers for IDEs and editors, it has several type control systems, each of which has its pros and cons.

While type checking is already taken for granted at Dropbox, I'm sure we're still in the early days of Python code typing. I think type checking technologies will continue to evolve and improve.

If you haven't used type checking in your large-scale Python project yet, then know that now is a very good time to start moving to static typing. I have spoken to those who have made this transition. None of them regretted it. Type checking makes Python a much better language than "plain Python" for developing large projects.

Dear Readers, Do you use type checking in your Python projects?

The path to typechecking 4 million lines of Python code. Part 3
The path to typechecking 4 million lines of Python code. Part 3

Source: habr.com

Add a comment