Release of ControlFlag 1.0, a tool for detecting errors in C code

Intel has published the first major release of the ControlFlag 1.0 tool, which allows you to identify errors and anomalies in source code using a machine learning system trained on a large amount of existing code. Unlike traditional static analyzers, ControlFlag does not apply ready-made rules, in which it is difficult to provide for all possible options, but is based on statistics on the use of various language constructs in a large number of existing projects. The ControlFlag code is written in C++ and is open sourced under the MIT license.

The system is trained by building a statistical model of the existing code array of open-source projects published in GitHub and similar public repositories. At the training stage, the system determines typical patterns for constructing structures in the code and builds a syntactic tree of connections between these patterns, reflecting the flow of code execution in the program. As a result, a reference decision-making tree is formed that combines the development experience of all analyzed source codes. The code under review undergoes a similar process of identifying patterns that are checked against a reference decision tree. Large discrepancies with neighboring branches indicate the presence of an anomaly in the pattern being checked.

Release of ControlFlag 1.0, a tool for detecting errors in C code

As an example of ControlFlag's capabilities, the developers analyzed the source codes of the OpenSSL and cURL projects:

  • Anomalous constructs “(s1 == NULL) ∧ (s2 == NULL)” and “(s1 == NULL) | (s2 == NULL)" , which do not match the commonly used pattern "(s1 == NULL) || (s2 == NULL)". The code also identified anomalies in the expressions “(-2 == rv)” (the minus was a typo) and “BIO_puts(bp, “:”) <= 0)” (in the context of checking the successful completion of the function it should have been “== 0").
  • In cURL, an error was discovered that was not detected by static analyzers when using the structure element “s->keepon”, which had a numeric type, but was compared with the boolean value TRUE.

Among the features of the ControlFlag 1.0 version, there is full support for standard templates for the C language and the ability to detect anomalies in conditional “if” expressions. For example, when analyzing the code fragment “if (x = 7) y = x;” The system will determine that the “if” statement usually uses the “variable == number” construction to compare numeric values, so it is highly likely that the “variable = number” in the “if” expression is caused by a typo. The kit includes a script that allows you to download existing C language repositories on GitHub and use them to build the model. Ready-made models are also available, allowing you to immediately begin checking the code.

Source: opennet.ru

Add a comment