How to implement a static code analyzer in a legacy project without demotivating the team

How to implement a static code analyzer in a legacy project without demotivating the team
Trying a static code analyzer is easy. But to implement it, especially in the development of a large old project, it will take skill. With the wrong approach, the analyzer can add work, slow down development, and demotivate the team. Let's briefly talk about how to properly approach the integration of static analysis into the development process and start using it as part of CI/CD.

Introduction

Recently, my attention was drawn to the publication "Getting Started With Static Analysis Without Overwhelming the Team". On the one hand, this is a good article worth getting acquainted with. On the other hand, it seems to me that it did not give a complete answer on how to painlessly implement static analysis in a project with a large amount of legacy code. The article says that You can live with technical debt and work only with new code, but there is no answer to what to do with this technical debt later.

Our PVS-Studio team offers its own view on this topic. Let's look at how the problem of introducing a static code analyzer arises in general, how to overcome this problem and how to painlessly eliminate technical debt gradually.

Problems

As a rule, it is not difficult to launch and see how a static analyzer works [1]. You can see interesting bugs in the code, or even scary potential vulnerabilities. You can even fix something, but then many programmers give up.

All static analyzers give false positives. This is a feature of the static code analysis methodology, and nothing can be done about it. In the general case, this is an unsolvable problem, which is confirmed by the Rice theorem [2]. Machine learning algorithms will not help either [3]. Even if a person cannot always tell whether this or that code is wrong, then you should not expect this from the program :).

False positives are not a problem if the static analyzer is already configured:

  • Disabled outdated rule sets;
  • Disabled some irrelevant diagnostics;
  • If we are talking about C or C++, then macros are marked up containing specific constructs that cause useless warnings to appear in every place where such macros are used;
  • Marked own functions that perform actions similar to system functions (its analogue mempy or printf) [4];
  • Pointwise, with the help of comments, false positives are disabled;
  • And so on.

In this case, we can expect a low level of false positives on the order of 10-15% [5]. In other words, 9 out of 10 analyzer warnings will point to a real problem in the code, or at least "code with a strong smell". Agree, such a scenario is extremely pleasant, and the analyzer is a real friend of the programmer.

How to implement a static code analyzer in a legacy project without demotivating the team
In reality, in a large project, the initial picture will be completely different. The analyzer generates hundreds or thousands of warnings for legacy code. Which of these warnings are relevant and which are not, it is impossible to quickly understand. It is irrational to sit down and start dealing with all these warnings, since the main work in this case will stop for days or weeks. As a rule, the team cannot afford such a scenario. There will also be a huge number of diffs that spoil the change history. And fast mass editing of such a number of fragments in the code will inevitably result in new typos and errors.

And most importantly, such a feat in the fight against warnings makes little sense. Agree that since the project has been successfully working for many years, then most of the critical errors in it have already been fixed. Yes, these fixes were very expensive, you had to debug, get negative feedback from users about bugs, and so on. A static analyzer would help fix many of these errors at the stage of writing the code, quickly and cheaply. But at the moment, one way or another, these errors are fixed, and the analyzer mostly detects non-critical errors in the old code. This code may not be used, used very rarely, and an error in it may not lead to noticeable consequences. Perhaps somewhere the shadow from the button is of the wrong color, but this does not prevent anyone from using the product.

Of course, even minor errors are still errors. And sometimes a real vulnerability can be hidden behind a mistake. However, to give up everything and spend days/weeks dealing with defects that show themselves weakly seems like a dubious idea.

Programmers look, look, look at all these warnings on the old working code... And they think: let's do without static analysis. Let's go better new useful functionality to write.

In a way, they are right. They believe that to begin with, they must somehow get rid of all these warnings. And only then will they be able to benefit from the regular use of the code analyzer. Otherwise, new warnings will simply sink into the old ones, and no one will pay attention to them.

This is the same analogy as with compiler warnings. It is not without reason that it is recommended to keep the number of compiler warnings at 0. If there are 1000 warnings, then when they become 1001, no one will pay attention to this, and it is not clear where to look for this newest warning.

How to implement a static code analyzer in a legacy project without demotivating the team
The worst thing in this story is if someone from above forces you to use static code analysis at this moment. This only demotivates the team, since from their point of view there will be additional bureaucratic complexity that only gets in the way. No one will look at the analyzer reports, and all use will be only β€œon paper”. Those. formally, analysis is built into the DevOps process, but in practice this is of no use to anyone. We have heard detailed stories in the booths from conference attendees. Such an experience can for a long time, if not forever, discourage programmers from getting involved with static analysis tools.

Implementing and Eliminating Technical Debt

In fact, there is nothing difficult or scary about implementing static analysis even in a big old project.

CI / CD

Moreover, the analyzer can immediately be made part of the continuous development process. For example, the PVS-Studio distribution kit has utilities for convenient viewing of the report in the format you need, and notifications for developers who have written problematic code sections. For those who are more interested in launching PVS-Studio from under CI/CD systems, I recommend that you familiarize yourself with the relevant section documentation and a series of articles:

But let's get back to the issue of a large number of false positives at the first stages of implementing code analysis tools.

Fixing existing technical debt and dealing with new alerts

Modern commercial static analyzers allow you to study only new warnings that appear in new or modified code. The implementation of this mechanism is different, but the essence is the same. In the PVS-Studio static analyzer, this functionality is implemented as follows.

To quickly start using static analysis, we suggest PVS-Studio users to use the mass warning suppression mechanism [6]. The general idea is the following. The user launched the analyzer and received a lot of warnings. Since a project that has been developed for many years is alive, developing and making money, then most likely there will not be many warnings in the report indicating critical defects. In other words, critical bugs have already been fixed in one way or another by more expensive means or thanks to customer feedback. Accordingly, everything that the analyzer now finds can be considered technical debt, which is impractical to try to eliminate immediately.

You can tell PVS-Studio to consider these warnings as irrelevant for now (postpone the technical debt for later), and it won't show them anymore. The analyzer creates a special file where it saves information about currently uninteresting errors. And now PVS-Studio will issue warnings only for new or modified code. Moreover, all this is implemented smartly. If, for example, an empty line is added to the beginning of the file with the source code, then the analyzer understands that, in fact, nothing has changed, and will remain silent as before. This markup file can be put into the version control system. The file is large, but it's not scary, since it often makes no sense to lay it down.

Now all programmers will only see warnings related to new or changed code. Thus, the analyzer can be used, as they say, from the next day. And it will be possible to return to technical debt later, and gradually correct errors and tune the analyzer.

So, the first problem with the introduction of the analyzer into a big old project is solved. Now let's figure out what to do with technical debt.

Bug fixing and refactoring

The simplest and most natural way is to set aside some time to parse massively suppressed analyzer warnings and gradually deal with them. Somewhere you should fix errors in the code, somewhere you should refactor to tell the analyzer that the code is not problematic. Simple example:

if (a = b)

Most C++ compilers and analyzers swear at such code, since it is highly likely that they actually wanted to write (a == b). But there is an unspoken agreement, and it is usually noted in the documentation, that if there are additional brackets, then it is considered that the programmer deliberately wrote such code, and there is no need to swear. For example, in the PVS-Studio documentation for diagnostics V559 (CWE-481) it is explicitly written that the following line will be considered correct and safe:

if ((a = b))

Another example. Is it forgotten in this C++ code break or not?

case A:
  foo();
case B:
  bar();
  break;

The PVS-Studio analyzer will issue a warning here V796 (CWE-484). Perhaps this is not an error, in which case you should give the analyzer a hint by adding the attribute [[fallthrough]] or, for example, __attribute__((fallthrough)):

case A:
  foo();
  [[fallthrough]];
case B:
  bar();
  break;

We can say that such code changes do not fix errors. Yes, it is, but it does two useful things. First, the analyzer report gets rid of false positives. Secondly, the code becomes more understandable for the people involved in its maintenance. And this is very important! For the sake of this alone, it is already worth doing a minor refactoring to make the code clearer and easier to maintain. Since it is not clear to the analyzer whether a β€œbreak” is needed or not, it will also be incomprehensible to fellow programmers.

In addition to fixing bugs and refactoring, you can selectively suppress clearly false analyzer warnings. Some irrelevant diagnostics can be disabled. For example, someone considers warnings meaningless V550 about comparing float/double values. And someone classifies them as important and worthy of study [7]. It is up to the development team to decide which warnings are relevant and which are not.

There are other ways to suppress false alerts. For example, macro markup was mentioned earlier. All this is described in more detail in the documentation. The most important thing is to understand that if you gradually and systematically approach the work with false positives, there is nothing wrong with them. The vast majority of uninteresting warnings disappear after configuration, and only places remain that really require careful study and some changes in the code.

Also, we always help our clients to set up PVS-Studio if there are any difficulties. Moreover, there were cases when we ourselves eliminated false warnings and corrected errors [8]. Just in case, I decided to mention that such an option for extended cooperation is also possible :).

Ratchet Method

There is another interesting approach to gradually improve the quality of the code by eliminating the static analyzer warning. The bottom line is that the number of warnings can only decrease.

How to implement a static code analyzer in a legacy project without demotivating the team

The number of warnings issued by the static analyzer is fixed. The quality gate is configured in such a way that now you can only put a code that does not increase the number of hits. As a result, the process of gradual decrease in the number of triggers is started by adjusting the analyzer and correcting errors.

Even if a person wants to cheat a little and decides to pass the quality gate not by eliminating warnings in his new code, but by improving the old third-party code, this is not scary. Anyway, the ratchet rotates in one direction, and gradually the number of defects will decrease. Even if a person does not want to fix their own new defects, they still have to improve something in the neighboring code. At some point, the easy ways to reduce the number of warnings are over, and the moment comes when the real errors will be fixed.

This methodology is described in more detail in a very interesting article by Ivan Ponomarev "Implement static analysis into the process, rather than looking for bugs with it", which I recommend reading for anyone interested in improving code quality.

The author of the article also has a report on this topic: "Continuous static analysis".

Conclusion

I hope that after this article, readers will be more friendly to static analysis tools and want to implement them in the development process. If you have any questions, we are always ready consult users of our static analyzer PVS-Studio and help with its implementation.

There are other typical doubts as to whether static analysis can really be convenient and useful. I tried to dispel most of these doubts in the publication "Reasons to introduce the PVS-Studio static code analyzer into the development process" [9].

Thank you for your attention and come download and try the PVS-Studio analyzer.

Additional links

  1. Andrey Karpov. How to quickly see interesting warnings generated by the PVS-Studio analyzer for C and C++ code?
  2. Wikipedia. Rice's theorem.
  3. Andrey Karpov, Victoria Khanieva. Using Machine Learning in Static Analysis of Program Source Code.
  4. pvs-studio. Documentation. Additional diagnostic settings.
  5. Andrey Karpov. Characteristics of the PVS-Studio analyzer on the example of EFL Core Libraries, 10-15% false positives.
  6. pvs-studio. Documentation. Mass suppression of analyzer messages.
  7. Ivan Andryashin. About how we tested static analysis on our project of an educational simulator of X-ray endovascular surgery.
  8. Pavel Eremeev, Svyatoslav Razmyslov. How the PVS-Studio team improved the Unreal Engine code.
  9. Andrey Karpov. Reasons to introduce the PVS-Studio static code analyzer into the development process.

How to implement a static code analyzer in a legacy project without demotivating the team

If you want to share this article with an English-speaking audience, please use the translation link: Andrey Karpov. How to introduce a static code analyzer in a legacy project and not to discourage the team.

Source: habr.com

Add a comment