Implement static analysis into the process, rather than looking for bugs with it

I was inspired to write this article by a large number of materials on static analysis that are increasingly coming across. First, this PVS-studio blog, which actively promotes itself on HabrΓ© with reviews of errors found by their tool in open source projects. PVS-studio recently implemented Java support, and, of course, the developers of IntelliJ IDEA, whose built-in analyzer is probably the most advanced for Java today, couldn't stay away.

When reading such reviews, one gets the feeling that we are talking about a magic elixir: press the button, and here it is - a list of defects in front of your eyes. It seems that as the analyzers improve, there will automatically be more and more bugs, and the products scanned by these robots will become better and better, without any effort on our part.

But there are no magic elixirs. I would like to talk about what is usually not discussed in posts like β€œhere are the things our robot can find”: what analyzers cannot do, what is their real role and place in the software delivery process, and how to implement them correctly.

Implement static analysis into the process, rather than looking for bugs with it
Ratchet (source: wikipedia).

What Static Analyzers Can Never Do

What is, from a practical point of view, source code analysis? We feed in some sources, and in a short time (much shorter than running tests) we get some information about our system. The fundamental and mathematically insurmountable limitation is that we can thus obtain only a rather narrow class of information.

The most famous example of a problem that cannot be solved by static analysis is stop problem: this is a theorem that proves that it is impossible to develop a general algorithm that would determine from the source code of the program whether it will loop or terminate in a finite time. An extension of this theorem is Rice's theorem, which states that for any non-trivial property of computable functions, determining whether an arbitrary program evaluates a function with such a property is an algorithmically unsolvable problem. For example, it is impossible to write an analyzer that can determine from any source code whether the analyzed program is an implementation of an algorithm that calculates, say, squaring an integer.

Thus, the functionality of static analyzers has insurmountable limitations. A static analyzer will never be able to determine in all cases such things as, for example, the occurrence of "null pointer exception" in nullable languages, or in all cases to determine the occurrence of "attribute not found" in languages ​​with dynamic typing. All that the most advanced static analyzer can do is highlight special cases, the number of which, among all possible problems with your source code, is, without exaggeration, a drop in the ocean.

Static analysis is not a search for bugs

The conclusion follows from the above: static analysis is not a means of reducing the number of defects in a program. I would venture to say that when first applied to your project, it will find "amusing" places in the code, but, most likely, will not find any defects that affect the quality of your program.

The examples of defects automatically found by the analyzers are impressive, but we should not forget that these examples were found by scanning a large set of large codebases. By the same principle, crackers who are able to try several simple passwords on a large number of accounts eventually find those accounts that have a simple password.

Does this mean that static analysis should not be applied? Of course not! And exactly for the same reason why it is worth checking each new password for getting into the stop list of β€œsimple” passwords.

Static analysis is more than finding bugs

In fact, the problems practically solved by analysis are much broader. After all, in general, static analysis is any check of source codes that is carried out before they are launched. Here are some things you can do:

  • Checking coding style in the broadest sense of the word. This includes both checking formatting and looking for the use of empty/extra parentheses, setting thresholds on metrics like number of lines / cyclomatic method complexity, etc. - everything that potentially makes code more readable and maintainable. In Java this tool is Checkstyle, in Python it is flake8. Programs of this class are usually called "linters".
  • Not only executable code can be analyzed. Resource files like JSON, YAML, XML, .properties can (and should!) be automatically checked for validity. Isn't it better to find out that because of some unpaired quotes the JSON structure is broken at an early stage of automatic Pull Request validation than when executing tests or at Run time? Appropriate tools are available: for example, YAMLlint, JSONLint.
  • Compilation (or parsing for dynamic programming languages) is also a kind of static analysis. As a rule, compilers are capable of issuing warnings that indicate problems with the quality of the source code, and they should not be ignored.
  • Sometimes compilation isn't just about compiling executable code. For example, if you have documentation in the format AsciiDoctor, then at the moment of its transformation into HTML/PDF handler AsciiDoctor (maven plugin) can issue warnings, for example, about broken internal links. And this is a good reason not to accept the Pull Request with documentation changes.
  • Spell checking is also a type of static analysis. Utility aspell is able to check spelling not only in documentation, but also in program source codes (comments and literals) in different programming languages, including C/C++, Java and Python. A spelling error in the user interface or documentation is also a defect!
  • Configuration tests (for what it is, see this ΠΈ this reports), although they are executed in a unit test runtime like pytest, they are actually also a kind of static analysis, since they do not execute source codes during their execution.

As you can see, finding bugs in this list takes the least important role, and everything else is available through the use of free open source tools.

Which of these types of static analysis should be used in your project? Of course, the more, the better! The main thing is to implement it correctly, which will be discussed further.

Delivery pipeline as a multi-stage filter and static analysis as its first cascade

The classic metaphor for continuous integration is the pipeline (pipeline) through which changes flow - from changing the source code to delivery to production. The standard sequence of stages of this pipeline looks like this:

  1. static analysis
  2. compilation
  3. unit tests
  4. integration tests
  5. UI tests
  6. manual check

Changes rejected in the Nth stage of the pipeline are not propagated to stage N+1.

Why exactly this way and not otherwise? In the testing part of the pipeline, testers will recognize the well-known testing pyramid.

Implement static analysis into the process, rather than looking for bugs with it
Test pyramid. Source: article Martin Fowler.

At the bottom of this pyramid are tests that are easier to write, run faster, and don't tend to false positive. Therefore, there should be more of them, they should cover more code and be executed first. At the top of the pyramid, the opposite is true, so the number of integration and UI tests should be reduced to the necessary minimum. The person in this chain is the most expensive, slowest and most unreliable resource, so he is at the very end and only performs work if the previous stages did not find any defects. However, according to the same principles, the pipeline is built in parts that are not directly related to testing!

I would like to offer an analogy in the form of a multi-stage water filtration system. Dirty water is supplied to the input (changes with defects), at the output we must get clean water, in which all unwanted contaminants are eliminated.

Implement static analysis into the process, rather than looking for bugs with it
Multistage filter. Source: Wikimedia Commons

As you know, cleaning filters are designed in such a way that each next cascade can filter out an ever smaller fraction of contaminants. At the same time, coarser purification cascades have a higher throughput and lower cost. In our analogy, this means that the input quality gates are faster, require less effort to start, and are themselves more unpretentious in operation - and this is exactly the sequence they are built in. The role of static analysis, which, as we now understand, is capable of weeding out only the most gross defects, is the role of the grating-"mud" at the very beginning of the filter cascade.

Static analysis by itself does not improve the quality of the final product, just as a β€œmud trap” does not make water drinkable. And yet, in common with other elements of the conveyor, its importance is obvious. Although in a multi-stage filter, the output stages are potentially capable of capturing everything the same as the input stages, it is clear what consequences an attempt to get by with only fine purification stages, without input stages, will lead to.

The purpose of the "mud collector" is to unload subsequent cascades from capturing very gross defects. For example, at a minimum, a code reviewer should not be distracted by incorrectly formatted code and violations of established coding standards (like extra parentheses or too deeply nested branches). Bugs like NPE should be caught by unit tests, but if even before the test the analyzer indicates to us that the bug must inevitably happen, this will significantly speed up its fixing.

I think it is now clear why static analysis does not improve the quality of a product if it is used occasionally, and should be used constantly to filter out changes with gross defects. Asking if using a static analyzer will improve the quality of your product is roughly equivalent to asking β€œWill the drinking quality of water taken from a dirty pond improve if it is passed through a colander?”

Implementation in a legacy project

An important practical question: how to introduce static analysis into the process of continuous integration as a "quality gate"? In the case of automatic tests, everything is obvious: there is a set of tests, the failure of any of them is a sufficient reason to believe that the assembly did not pass the quality gate. An attempt to install a gate in the same way based on the results of static analysis fails: there are too many analysis warnings in the legacy code, you don’t want to completely ignore them, but it’s also impossible to stop the delivery of a product just because it contains analyzer warnings.

When used for the first time, the analyzer generates a huge number of warnings on any project, the vast majority of which are not related to the correct functioning of the product. It is impossible to correct all these comments at once, and many of them are not necessary. After all, we know that our product as a whole works, even before the introduction of static analysis!

As a result, many people limit themselves to episodic use of static analysis, or use it only in informing mode, when the analyzer report is simply issued during assembly. This is equivalent to the absence of any analysis, because if we already have a lot of warnings, then the occurrence of another one (no matter how serious) when the code changes goes unnoticed.

The following ways of introducing quality gates are known:

  • Sets a limit on the total number of warnings, or the number of warnings divided by the number of lines of code. This does not work well, because such a gate freely skips changes with new defects until their limit is exceeded.
  • Fixing, at some point, all old warnings in the code as being ignored, and failing the build when new warnings occur. This functionality is provided by PVS-studio and some online resources, such as Codacy. I did not have a chance to work in PVS-studio, as for my experience with Codacy, their main problem is that the definition of what is an β€œold” and what is a β€œnew” error is a rather complicated algorithm that does not always work correctly, especially if files are heavily modified or renamed. In my memory, Codacy could skip new warnings in a pull request, and at the same time not skip a pull request due to warnings that were not related to changes in the code of this PR.
  • In my opinion, the most effective solution is described in the book Continuous Delivery "ratcheting" method. The main idea is that a property of each release is the number of static analysis warnings, and only changes that do not increase the total number of warnings are allowed.

Ratchet

It works like this:

  1. At the initial stage, a record in the release metadata of the number of warnings in the code found by the analyzers is implemented. Thus, when you build upstream, your repository manager is written not just "release 7.0.2", but "release 7.0.2 containing 100500 Checkstyle warnings". If you are using an advanced repository manager (such as Artifactory), it is easy to store such metadata about your release.
  2. Now each pull request on build compares the number of warnings it receives with the number in the current release. If PR leads to an increase in this number, then the code does not pass the quality gate in static analysis. If the number of warnings decreases or does not change, then it passes.
  3. On the next release, the recalculated number of warnings will be written back to the release metadata.

So little by little, but steadily (as with a ratchet), the number of warnings will tend to zero. Of course, the system can be fooled by introducing a new warning, but correcting someone else's. This is normal, because in the long run it gives the result: warnings are fixed, as a rule, not one by one, but immediately by a group of a certain type, and all easily eliminated warnings are quickly eliminated.

This graph shows the total number of Checkstyle warnings for six months of operation of such a "ratchet" on one of our open source projects. The number of warnings has decreased by an order of magnitude, and this happened naturally, in parallel with the development of the product!

Implement static analysis into the process, rather than looking for bugs with it

I use a modified version of this method, separately counting warnings by project module and analysis tool, resulting in a YAML file with assembly metadata that looks something like this:

celesta-sql:
  checkstyle: 434
  spotbugs: 45
celesta-core:
  checkstyle: 206
  spotbugs: 13
celesta-maven-plugin:
  checkstyle: 19
  spotbugs: 0
celesta-unit:
  checkstyle: 0
  spotbugs: 0

In any advanced CI system, a ratchet can be implemented for any static analysis tools without relying on plugins and third-party tools. Each of the analyzers produces its report in a simple text or XML format that is easy to parse. It remains to register only the necessary logic in the CI script. You can see how this is implemented in our open source projects based on Jenkins and Artifactory, you can here or here. Both examples are library dependent ratchetlib: method countWarnings() counts xml tags in files generated by Checkstyle and Spotbugs in the usual way, and compareWarningMaps() implements the same ratchet, throwing an error when the number of warnings in any of the categories rises.

An interesting ratchet implementation is possible for spelling analysis of comments, text literals, and documentation using aspell. As you know, when checking spelling, not all words unknown to the standard dictionary are incorrect, they can be added to the user dictionary. If you make the user dictionary part of the project source code, then the spelling quality gate can be formulated as follows: aspell execution with the standard and user dictionary should not find no spelling errors.

About the importance of fixing the analyzer version

In conclusion, the following should be noted: no matter how you implement the analysis into your delivery pipeline, the version of the analyzer must be fixed. If you allow the analyzer to update spontaneously, then when building the next pull request, new defects may β€œemerge”, which are not related to code changes, but are related to the fact that the new analyzer is simply able to find more defects - and this will break your process of accepting pull requests . Upgrading the analyzer should be a conscious action. However, hard fixing the version of each assembly component is a necessary requirement in general and a topic for a separate discussion.

Conclusions

  • Static analysis will not find bugs for you and will not improve the quality of your product as a result of a single application. The only positive effect on quality is its continuous use during the delivery process.
  • Finding bugs is not the main task of analysis at all, the vast majority of useful functions are available in opensource tools.
  • Implement quality gates based on the results of static analysis at the very first stage of the delivery pipeline, using a ratchet for legacy code.

references

  1. Continuous Delivery
  2. A. Kudryavtsev: Program analysis: how to understand that you are a good programmer report on different methods of code analysis (not only static!)

Source: habr.com

Add a comment