Facebook open-sourced Cinder, a fork of CPython used by Instagram

Facebook has published the source code for the Cinder project, which develops a fork from CPython 3.8.5, the main reference implementation of the Python programming language. Cinder is used in Facebook's production infrastructure to power the Instagram service and includes performance optimizations.

The code is published to discuss the possibility of porting prepared optimizations to the core CPython framework and to help other projects that improve CPython performance. Facebook is not going to support Cinder as a standalone open source project and the code is presented as it is used in the company's infrastructure, without additional styling and documentation. Cinder is also not being promoted as an alternative to CPython - the main goal of development is the desire to improve CPython itself.

The Cinder code is noted as fairly robust and tested in production environments, but if problems are found, they will have to be resolved on their own, since Facebook does not guarantee that it will respond to external error messages and pull requests. At the same time, Fecebook does not rule out constructive cooperation with the community and is ready to discuss ideas on how to make Cinder even faster or how to speed up the transfer of prepared changes to the CPython core.

The main optimizations implemented in Cinder:

  • Inline bytecode caching ("shadow bytecode"). The essence of the method is to identify situations of execution of a typical opcode that can be optimized, and dynamically replace such an opcode with faster specialized options (for example, replacing frequently called functions).
  • Active evaluation of coroutines (Eager coroutine evaluation). For async function calls that are processed immediately (await does not wait and the function reaches the return statement earlier), the result of such functions is directly substituted without creating a coroutine and without involving the event loop. In Facebook's code that heavily uses async/await, the optimization results in about a 5% speedup.
  • Selective JIT compilation at the level of individual methods and functions (method-at-a-time). It is enabled through the "-X jit" option or the PYTHONJIT=1 environment variable and allows you to speed up the execution of many performance tests by 1.5-4 times. Since JIT compilation is relevant only for frequently executed functions, it is inappropriate to use it for rarely used functions, the compilation overhead of which can only slow down the execution of the program.

    Through the "-X jit-list-file=/path/to/jitlist.txt" option or the "PYTHONJITLISTFILE=/path/to/jitlist.txt" environment variable, you can specify a file with a list of functions for which JIT can be used (path .to.module:funcname or path.to.module:ClassName.method_name). The list of features for which JIT should be enabled can be determined based on the profiling results. In the future, support for dynamic JIT compilation based on internal analysis of the frequency of function calls is expected, but taking into account the specifics of running processes on Instagram, JIT compilation at the initial stage is also suitable for Facebook.

    JIT first converts the Python bytecode to a High Level Intermediate Representation (HIR), which is close enough to Python bytecode, but is designed to use a register virtual machine instead of a stack one, and also uses type information and additional details important for performance (e.g. reference counting) . The HIR is then converted to the SSA (static single assignment) form and goes through optimization steps that take into account the results of reference counting and data on memory consumption. As a result, a low-level intermediate representation (LIR) is generated that is close to assembly language. After another phase of LIR-based optimizations, assembler instructions are generated using the asmjit library.

  • strict mode for modules. The functionality has three components: The StrictModule type. A static analyzer capable of determining that the execution of a module does not affect code outside of that module. A module loader that determines if modules are set to strict mode (indicates "import __strict__" in the code), checks that there are no intersections with other modules, and loads strict modules into sys.modules as a StrictModule object.
  • Static Python is an experimental bytecode compiler that uses type annotations to generate type-specific bytecode and run faster through the use of JIT compilation. In some tests, the combination of Static Python and JIT shows up to a 7x performance improvement over typical CPython. In many situations, the results are estimated to be close to those of the MyPyC and Cython compilers.

Source: opennet.ru

Add a comment