Facebook publishes Hermit, a toolkit for repeatable program execution

Facebook (banned in the Russian Federation) has published the code of the Hermit toolkit, which forms an environment for deterministic program execution, which allows obtaining the same result and repeating the progress of execution using the same input data at different launches. The project code is written in Rust and distributed under the BSD license.

During normal execution, various extraneous factors affect the result, such as the current time, thread scheduling features, virtual memory addresses, data from a pseudo-random number generator, and various unique identifiers. Hermit allows you to run the program in a container in which these factors remain constant on subsequent runs. Repeatable execution, which fully reproduces volatile environment settings, can be used for error diagnosis, multi-stage debugging with reruns, creating a fixed environment for regression tests, stress testing, troubleshooting multithreading issues, and repeatable build systems.

Facebook publishes Hermit, a toolkit for repeatable program execution

A reproducible environment is created by intercepting system calls, some of which are replaced with their own handlers that produce a constant result, and some are redirected to the kernel, after which the result is cleared of non-persistent data. To intercept system calls, the reverie framework is used, the code of which is also published by Facebook. To prevent changes in the file system and network requests from affecting the execution progress, execution is performed using a fixed FS image and with access to external networks disabled. When accessing the pseudo-random number generator, Hermit produces a predefined sequence that is repeated every time it is run.

Among the most complex non-permanent influences on execution is the thread scheduler, whose behavior depends on many external factors, such as the number of CPU cores and the presence of other threads running. To ensure repeatable behavior of the scheduler, all threads are serialized, bound to only one CPU core, and in the order in which control is passed to the threads. Each thread is allowed to execute a fixed number of instructions, after which the execution stops and is transferred to another thread (to limit, the CPU PMU (Performance Monitoring Unit) is used, which stops execution after a specified number of conditional branches).

To diagnose problems with threads due to a race condition, Hermit has a mode for detecting operations that were out of order and led to a crash. To identify such problems, a comparison is made of the states in which correct operation and abnormal completion of execution were recorded.

Source: opennet.ru

Add a comment