Finding bugs in LLVM 8 using the PVS-Studio analyzer

Finding bugs in LLVM 8 using the PVS-Studio analyzer
More than two years have passed since the last check of the LLVM project code using our PVS-Studio analyzer. Let's make sure that the PVS-Studio analyzer is still the leading tool for detecting errors and potential vulnerabilities. To do this, let's check and find new bugs in the LLVM 8.0.0 release.

Article to be written

To be honest, I did not want to write this article. It is not interesting to write about a project that we have repeatedly checked (1, 2, 3). It's better to write about something new, but I have no choice.

Every time a new version of LLVM is released or updated Clang Static Analyzer, we receive questions of the following type in the mail:

Look, the new version of Clang Static Analyzer has learned how to find new bugs! It seems to me that the relevance of using PVS-Studio is decreasing. Clang finds more bugs than before and catches up with PVS-Studio in terms of capabilities. What do you think about it?

To this I always want to answer something along the lines of:

We don't sit idle either! We have significantly improved the capabilities of the PVS-Studio analyzer. So don't worry, we continue to lead as before.

Unfortunately this is a bad answer. It has no proofs. And that is why I am writing this article now. So, the LLVM project was once again checked and various errors were found in it. Those that seemed interesting to me, I will now demonstrate. These errors cannot be found by Clang Static Analyzer (or it is extremely inconvenient to do it with its help). And we can. And I found and wrote out all these errors in one evening.

But the writing of the article was delayed for several weeks. I just couldn't bring myself to put it all in the form of text :).

By the way, if you are interested in what technologies are used in the PVS-Studio analyzer to detect errors and potential vulnerabilities, then I suggest you familiarize yourself with this note.

New and old diagnostics

As already noted, about two years ago, the LLVM project was once again checked, and the errors found were corrected. Now this article will introduce a new batch of errors. Why were new bugs found? There are 3 reasons for this:

  1. The LLVM project is evolving, old code is being changed and new code is appearing. Naturally, there are new errors in the modified and written code. This demonstrates well that static analysis should be applied regularly and not on a case-by-case basis. Our articles show well the capabilities of the PVS-Studio analyzer, but this has nothing to do with improving the quality of the code and reducing the cost of fixing errors. Use a static code analyzer regularly!
  2. We refine and improve existing diagnostics. Therefore, the analyzer can detect errors that were not noticed during previous checks.
  3. PVS-Studio has new diagnostics, which didn't exist 2 years ago. I decided to separate them into a separate section to visually show the development of PVS-Studio.

Defects identified by diagnostics that existed 2 years ago

Fragment N1: Copy-Paste

static bool ShouldUpgradeX86Intrinsic(Function *F, StringRef Name) {
  if (Name == "addcarryx.u32" || // Added in 8.0
    ....
    Name == "avx512.mask.cvtps2pd.128" || // Added in 7.0
    Name == "avx512.mask.cvtps2pd.256" || // Added in 7.0
    Name == "avx512.cvtusi2sd" || // Added in 7.0
    Name.startswith("avx512.mask.permvar.") || // Added in 7.0     // <=
    Name.startswith("avx512.mask.permvar.") || // Added in 7.0     // <=
    Name == "sse2.pmulu.dq" || // Added in 7.0
    Name == "sse41.pmuldq" || // Added in 7.0
    Name == "avx2.pmulu.dq" || // Added in 7.0
  ....
}

PVS-Studio warning: V501 [CWE-570] There are identical sub-expressions 'Name.startswith("avx512.mask.permvar.")' to the left and to the right of the '||' operator. AutoUpgrade.cpp 73

It is double checked that the name starts with the substring "avx512.mask.permvar.". In the second check, they clearly wanted to write something else, but forgot to correct the copied text.

Fragment N2: Typo

enum CXNameRefFlags {
  CXNameRange_WantQualifier = 0x1,
  CXNameRange_WantTemplateArgs = 0x2,
  CXNameRange_WantSinglePiece = 0x4
};

void AnnotateTokensWorker::HandlePostPonedChildCursor(
    CXCursor Cursor, unsigned StartTokenIndex) {
  const auto flags = CXNameRange_WantQualifier | CXNameRange_WantQualifier;
  ....
}

PVS-Studio warning: V501 There are identical sub-expressions 'CXNameRange_WantQualifier' to the left and to the right of the '|' operator. CIndex.cpp 7245

The same named constant is used twice due to a typo CXNameRange_WantQualifier.

Fragment N3: Operator precedence confusion

int PPCTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) {
  ....
  if (ISD == ISD::EXTRACT_VECTOR_ELT && Index == ST->isLittleEndian() ? 1 : 0)
    return 0;
  ....
}

PVS-Studio warning: V502 [CWE-783] Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '==' operator. PPCTargetTransformInfo.cpp 404

In my opinion, this is a very beautiful mistake. Yes, I know that I have strange ideas about beauty :).

Now, according to operator precedence, the expression is evaluated as follows:

(ISD == ISD::EXTRACT_VECTOR_ELT && (Index == ST->isLittleEndian())) ? 1 : 0

From a practical point of view, such a condition does not make sense, since it can be reduced to:

(ISD == ISD::EXTRACT_VECTOR_ELT && Index == ST->isLittleEndian())

This is a clear mistake. Most likely, they wanted to compare 0/1 with a variable Index. To fix the code, you need to add parentheses around the ternary operator:

if (ISD == ISD::EXTRACT_VECTOR_ELT && Index == (ST->isLittleEndian() ? 1 : 0))

By the way, the ternary operator is very dangerous and provokes logical errors. Be very careful with it and don't be greedy with parentheses. I have looked at this topic in more detail. here, in the chapter "Beware the ?: operator and enclose it in parentheses".

Fragment N4, N5: Null pointer

Init *TGParser::ParseValue(Record *CurRec, RecTy *ItemType, IDParseMode Mode) {
  ....
  TypedInit *LHS = dyn_cast<TypedInit>(Result);
  ....
  LHS = dyn_cast<TypedInit>(
    UnOpInit::get(UnOpInit::CAST, LHS, StringRecTy::get())
      ->Fold(CurRec));
  if (!LHS) {
    Error(PasteLoc, Twine("can't cast '") + LHS->getAsString() +
                    "' to string");
    return nullptr;
  }
  ....
}

PVS-Studio warning: V522 [CWE-476] Dereferencing of the null pointer 'LHS' might take place. TGParser.cpp 2152

If the pointer LHS is null, then a warning should be issued. However, instead, this very null pointer will be dereferenced: LHS->getAsString().

This is a very typical situation when an error is hidden in an error handler, since no one is testing them. Static analyzers check all reachable code, no matter how often it is used. This is a very good example of how static analysis complements other testing and error prevention techniques.

Similar pointer handling error RHS allowed in code just below: V522 [CWE-476] Dereferencing of the null pointer 'RHS' might take place. TGParser.cpp 2186

Fragment N6: Using the pointer after moving

static Expected<bool>
ExtractBlocks(....)
{
  ....
  std::unique_ptr<Module> ProgClone = CloneModule(BD.getProgram(), VMap);
  ....
  BD.setNewProgram(std::move(ProgClone));                                // <=
  MiscompiledFunctions.clear();

  for (unsigned i = 0, e = MisCompFunctions.size(); i != e; ++i) {
    Function *NewF = ProgClone->getFunction(MisCompFunctions[i].first);  // <=
    assert(NewF && "Function not found??");
    MiscompiledFunctions.push_back(NewF);
  }
  ....
}

PVS-Studio warning: V522 [CWE-476] Dereferencing of the null pointer 'ProgClone' might take place. miscompilation.cpp 601

Start smart pointer ProgClone ceases to own the object:

BD.setNewProgram(std::move(ProgClone));

In fact, now ProgClone is a null pointer. Therefore, a null pointer should be dereferenced just below:

Function *NewF = ProgClone->getFunction(MisCompFunctions[i].first);

But, in fact, this will not happen! Note that the loop is not actually being executed.

At the beginning of the container MiscompiledFunctions cleared:

MiscompiledFunctions.clear();

Next, the size of this container is used in the loop condition:

for (unsigned i = 0, e = MisCompFunctions.size(); i != e; ++i) {

It is easy to see that the loop is not running. I think this is also a mistake, and the code should be written differently.

It seems that we have met the very famous parity of errors! One mistake masks another :).

Fragment N7: Using the pointer after moving

static Expected<bool> TestOptimizer(BugDriver &BD, std::unique_ptr<Module> Test,
                                    std::unique_ptr<Module> Safe) {
  outs() << "  Optimizing functions being tested: ";
  std::unique_ptr<Module> Optimized =
      BD.runPassesOn(Test.get(), BD.getPassesToRun());
  if (!Optimized) {
    errs() << " Error running this sequence of passes"
           << " on the input program!n";
    BD.setNewProgram(std::move(Test));                       // <=
    BD.EmitProgressBitcode(*Test, "pass-error", false);      // <=
    if (Error E = BD.debugOptimizerCrash())
      return std::move(E);
    return false;
  }
  ....
}

PVS-Studio warning: V522 [CWE-476] Dereferencing of the null pointer 'Test' might take place. miscompilation.cpp 709

Again the same situation. At the beginning, the contents of the object are moved, and then it is used as if nothing had happened. I see this situation more and more often in program code, after C++ introduced move semantics. This is why I love the C++ language! There are more and more ways to shoot yourself in the foot. The PVS-Studio analyzer will always have a job :).

Fragment N8: Null pointer

void FunctionDumper::dump(const PDBSymbolTypeFunctionArg &Symbol) {
  uint32_t TypeId = Symbol.getTypeId();
  auto Type = Symbol.getSession().getSymbolById(TypeId);
  if (Type)
    Printer << "<unknown-type>";
  else
    Type->dump(*this);
}

PVS-Studio warning: V522 [CWE-476] Dereferencing of the null pointer 'Type' might take place. PrettyFunctionDumper.cpp 233

In addition to error handlers, debug printout functions are usually not tested either. We have just such a case. The function is waiting for the user, who, instead of solving their problems, will be forced to fix it.

Correctly:

if (Type)
  Type->dump(*this);
else
  Printer << "<unknown-type>";

Fragment N9: Null pointer

void SearchableTableEmitter::collectTableEntries(
    GenericTable &Table, const std::vector<Record *> &Items) {
  ....
  RecTy *Ty = resolveTypes(Field.RecType, TI->getType());
  if (!Ty)                                                              // <=
    PrintFatalError(Twine("Field '") + Field.Name + "' of table '" +
                    Table.Name + "' has incompatible type: " +
                    Ty->getAsString() + " vs. " +                       // <=
                    TI->getType()->getAsString());
   ....
}

PVS-Studio warning: V522 [CWE-476] Dereferencing of the null pointer 'Ty' might take place. SearchableTableEmitter.cpp 614

I think that everything is clear and does not require explanation.

Fragment N10: Typo

bool FormatTokenLexer::tryMergeCSharpNullConditionals() {
  ....
  auto &Identifier = *(Tokens.end() - 2);
  auto &Question = *(Tokens.end() - 1);
  ....
  Identifier->ColumnWidth += Question->ColumnWidth;
  Identifier->Type = Identifier->Type;                    // <=
  Tokens.erase(Tokens.end() - 1);
  return true;
}

PVS-Studio warning: V570 The 'Identifier->Type' variable is assigned to itself. FormatTokenLexer.cpp 249

There is no point in assigning a variable to itself. They probably meant to write:

Identifier->Type = Question->Type;

Fragment N11: Suspicious break

void SystemZOperand::print(raw_ostream &OS) const {
  switch (Kind) {
    break;
  case KindToken:
    OS << "Token:" << getToken();
    break;
  case KindReg:
    OS << "Reg:" << SystemZInstPrinter::getRegisterName(getReg());
    break;
  ....
}

PVS-Studio warning: V622 [CWE-478] Consider inspecting the 'switch' statement. It's possible that the first 'case' operator is missing. SystemZAsmParser.cpp 652

There is a very suspicious operator at the beginning break. Did you forget to write something else here?

Fragment N12: Pointer check after dereference

InlineCost AMDGPUInliner::getInlineCost(CallSite CS) {
  Function *Callee = CS.getCalledFunction();
  Function *Caller = CS.getCaller();
  TargetTransformInfo &TTI = TTIWP->getTTI(*Callee);

  if (!Callee || Callee->isDeclaration())
    return llvm::InlineCost::getNever("undefined callee");
  ....
}

PVS-Studio warning: V595 [CWE-476] The 'Callee' pointer was utilized before it was verified against nullptr. Check lines: 172, 174. AMDGPUInline.cpp 172

Index callee at the beginning is dereferenced at the time of the function call getTTI.

And then it turns out that this pointer should be checked for equality nullptr:

if (!Callee || Callee->isDeclaration())

But it's' too late…

Fragment N13 - N…: Pointer check after dereference

The situation discussed in the previous code snippet is not unique. She meets here:

static Value *optimizeDoubleFP(CallInst *CI, IRBuilder<> &B,
                               bool isBinary, bool isPrecise = false) {
  ....
  Function *CalleeFn = CI->getCalledFunction();
  StringRef CalleeNm = CalleeFn->getName();                 // <=
  AttributeList CalleeAt = CalleeFn->getAttributes();
  if (CalleeFn && !CalleeFn->isIntrinsic()) {               // <=
  ....
}

PVS-Studio warning: V595 [CWE-476] The 'CalleeFn' pointer was utilized before it was verified against nullptr. Check lines: 1079, 1081. SimplifyLibCalls.cpp 1079

And here:

void Sema::InstantiateAttrs(const MultiLevelTemplateArgumentList &TemplateArgs,
                            const Decl *Tmpl, Decl *New,
                            LateInstantiatedAttrVec *LateAttrs,
                            LocalInstantiationScope *OuterMostScope) {
  ....
  NamedDecl *ND = dyn_cast<NamedDecl>(New);
  CXXRecordDecl *ThisContext =
    dyn_cast_or_null<CXXRecordDecl>(ND->getDeclContext());         // <=
  CXXThisScopeRAII ThisScope(*this, ThisContext, Qualifiers(),
                             ND && ND->isCXXInstanceMember());     // <=
  ....
}

PVS-Studio warning: V595 [CWE-476] The 'ND' pointer was utilized before it was verified against nullptr. Check lines: 532, 534. SemaTemplateInstantiateDecl.cpp 532

And here:

  • V595 [CWE-476] The 'U' pointer was utilized before it was verified against nullptr. Check lines: 404, 407. DWARFFormValue.cpp 404
  • V595 [CWE-476] The 'ND' pointer was utilized before it was verified against nullptr. Check lines: 2149, 2151. SemaTemplateInstantiate.cpp 2149

And then it became uninteresting for me to study warnings with the number V595. So I don't know if there are other similar errors besides those listed here. Most likely there is.

Fragment N17, N18: Suspicious shift

static inline bool processLogicalImmediate(uint64_t Imm, unsigned RegSize,
                                           uint64_t &Encoding) {
  ....
  unsigned Size = RegSize;
  ....
  uint64_t NImms = ~(Size-1) << 1;
  ....
}

PVS-Studio warning: V629 [CWE-190] Consider inspecting the '~(Size - 1) << 1' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. AArch64AddressingModes.h 260

Perhaps this is not a bug, and the code works exactly as intended. But this is clearly a very suspicious place and needs to be checked.

Let's say a variable Size is 16, and then the author of the code planned to get in the variable NImms value:

1111111111111111111111111111111111111111111111111111111111100000

However, in reality, the value will be:

0000000000000000000000000000000011111111111111111111111111100000

The matter is that all calculations occur with use of 32-bit unsigned type. And only then, this 32-bit unsigned type will be implicitly expanded to uint64_t. In this case, the high bits will be zero.

You can fix the situation like this:

uint64_t NImms = ~static_cast<uint64_t>(Size-1) << 1;

Similar situation: V629 [CWE-190] Consider inspecting the 'Immr << 6' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. AArch64AddressingModes.h 269

Fragment N19: Keyword missing else?

void AMDGPUAsmParser::cvtDPP(MCInst &Inst, const OperandVector &Operands) {
  ....
  if (Op.isReg() && Op.Reg.RegNo == AMDGPU::VCC) {
    // VOP2b (v_add_u32, v_sub_u32 ...) dpp use "vcc" token.
    // Skip it.
    continue;
  } if (isRegOrImmWithInputMods(Desc, Inst.getNumOperands())) {    // <=
    Op.addRegWithFPInputModsOperands(Inst, 2);
  } else if (Op.isDPPCtrl()) {
    Op.addImmOperands(Inst, 1);
  } else if (Op.isImm()) {
    // Handle optional arguments
    OptionalIdx[Op.getImmTy()] = I;
  } else {
    llvm_unreachable("Invalid operand type");
  }
  ....
}

PVS-Studio warning: V646 [CWE-670] Consider inspecting the application's logic. It's possible that 'else' keyword is missing. AMDGPUAsmParser.cpp 5655

There is no error here. Since the then-block of the first if ends in keep on going, it doesn't matter, there is a keyword else or not. Either way, the code will work the same way. However, missed else makes the code more obscure and dangerous. If in the future keep on going disappear, the code will start to work in a completely different way. I think it's better to add else.

Fragment N20: Four typos of the same type

LLVM_DUMP_METHOD void Symbol::dump(raw_ostream &OS) const {
  std::string Result;
  if (isUndefined())
    Result += "(undef) ";
  if (isWeakDefined())
    Result += "(weak-def) ";
  if (isWeakReferenced())
    Result += "(weak-ref) ";
  if (isThreadLocalValue())
    Result += "(tlv) ";
  switch (Kind) {
  case SymbolKind::GlobalSymbol:
    Result + Name.str();                        // <=
    break;
  case SymbolKind::ObjectiveCClass:
    Result + "(ObjC Class) " + Name.str();      // <=
    break;
  case SymbolKind::ObjectiveCClassEHType:
    Result + "(ObjC Class EH) " + Name.str();   // <=
    break;
  case SymbolKind::ObjectiveCInstanceVariable:
    Result + "(ObjC IVar) " + Name.str();       // <=
    break;
  }
  OS << Result;
}

PVS-Studio warnings:

  • V655 [CWE-480] The strings were concatenated but are not utilized. Consider inspecting the 'Result + Name.str()' expression. Symbol.cpp 32
  • V655 [CWE-480] The strings were concatenated but are not utilized. Consider inspecting the 'Result + "(ObjC Class) " + Name.str()' expression. Symbol.cpp 35
  • V655 [CWE-480] The strings were concatenated but are not utilized. Consider inspecting the 'Result + "(ObjC Class EH) " + Name.str()' expression. Symbol.cpp 38
  • V655 [CWE-480] The strings were concatenated but are not utilized. Consider inspecting the 'Result + "(ObjC IVar) " + Name.str()' expression. Symbol.cpp 41

By chance, the + operator is used instead of the += operator. The result is constructs that are devoid of meaning.

Fragment N21: Undefined behavior

static void getReqFeatures(std::map<StringRef, int> &FeaturesMap,
                           const std::vector<Record *> &ReqFeatures) {
  for (auto &R : ReqFeatures) {
    StringRef AsmCondString = R->getValueAsString("AssemblerCondString");

    SmallVector<StringRef, 4> Ops;
    SplitString(AsmCondString, Ops, ",");
    assert(!Ops.empty() && "AssemblerCondString cannot be empty");

    for (auto &Op : Ops) {
      assert(!Op.empty() && "Empty operator");
      if (FeaturesMap.find(Op) == FeaturesMap.end())
        FeaturesMap[Op] = FeaturesMap.size();
    }
  }
}

Try to find the dangerous code yourself. And this is a picture to distract attention so as not to immediately look at the answer:

Finding bugs in LLVM 8 using the PVS-Studio analyzer

PVS-Studio warning: V708 [CWE-758] Dangerous construction is used: 'FeaturesMap[Op] = FeaturesMap.size()', where 'FeaturesMap' is of 'map' class. This may lead to undefined behavior. RISCVCompressInstEmitter.cpp 490

Problematic line:

FeaturesMap[Op] = FeaturesMap.size();

If element Op not found, then a new element is created in the map and the number of elements in this map is written there. It's just unknown whether the function will be called size before or after adding a new element.

Fragment N22-N24: Repeated Assignments

Error MachOObjectFile::checkSymbolTable() const {
  ....
  } else {
    MachO::nlist STE = getSymbolTableEntry(SymDRI);
    NType = STE.n_type;                              // <=
    NType = STE.n_type;                              // <=
    NSect = STE.n_sect;
    NDesc = STE.n_desc;
    NStrx = STE.n_strx;
    NValue = STE.n_value;
  }
  ....
}

PVS-Studio warning: V519 [CWE-563] The 'NType' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 1663, 1664. MachOObjectFile.cpp 1664

I don't think there is a real error here. Just an extra repeated assignment. But still a bug.

Similar to:

  • V519 [CWE-563] The 'B.NDesc' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 1488, 1489. llvm-nm.cpp 1489
  • V519 [CWE-563] The variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 59, 61. coff2yaml.cpp 61

Fragment N25-N27: More re-assignments

Now consider a slightly different version of reassignment.

bool Vectorizer::vectorizeLoadChain(
    ArrayRef<Instruction *> Chain,
    SmallPtrSet<Instruction *, 16> *InstructionsProcessed) {
  ....
  unsigned Alignment = getAlignment(L0);
  ....
  unsigned NewAlign = getOrEnforceKnownAlignment(L0->getPointerOperand(),
                                                 StackAdjustedAlignment,
                                                 DL, L0, nullptr, &DT);
  if (NewAlign != 0)
    Alignment = NewAlign;
  Alignment = NewAlign;
  ....
}

PVS-Studio warning: V519 [CWE-563] The 'Alignment' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 1158, 1160. LoadStoreVectorizer.cpp 1160

This is a very strange code that apparently contains a logical error. At the beginning, variable Alignment is assigned a value depending on the condition. And then the assignment happens again, but now without any check.

Similar situations can be seen here:

  • V519 [CWE-563] The 'Effects' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 152, 165. WebAssemblyRegStackify.cpp 165
  • V519 [CWE-563] The 'ExpectNoDerefChunk' variable is assigned values ​​twice successively. Perhaps this is a mistake. Check lines: 4970, 4973. SemaType.cpp 4973

Fragment N28: Always true condition

static int readPrefixes(struct InternalInstruction* insn) {
  ....
  uint8_t byte = 0;
  uint8_t nextByte;
  ....
  if (byte == 0xf3 && (nextByte == 0x88 || nextByte == 0x89 ||
                       nextByte == 0xc6 || nextByte == 0xc7)) {
    insn->xAcquireRelease = true;
    if (nextByte != 0x90) // PAUSE instruction support             // <=
      break;
  }
  ....
}

PVS-Studio warning: V547 [CWE-571] Expression 'nextByte != 0x90' is always true. X86DisassemblerDecoder.cpp 379

The check doesn't make sense. Variable nextByte always not equal to value 0x90, which follows from the previous check. This is some kind of logical error.

Fragment N29 - N...: Always true/false conditions

The analyzer gives a lot of warnings that the entire condition (V547) or part of it (V560) is always true or false. Often these are not real errors, but simply sloppy code, the result of macro expansion, and the like. However, it makes sense to look at all these warnings, since from time to time there are real logical errors. For example, this piece of code is suspicious:

static DecodeStatus DecodeGPRPairRegisterClass(MCInst &Inst, unsigned RegNo,
                                   uint64_t Address, const void *Decoder) {
  DecodeStatus S = MCDisassembler::Success;

  if (RegNo > 13)
    return MCDisassembler::Fail;

  if ((RegNo & 1) || RegNo == 0xe)
     S = MCDisassembler::SoftFail;
  ....
}

PVS-Studio warning: V560 [CWE-570] A part of conditional expression is always false: RegNo == 0xe. ARMDisassembler.cpp 939

The constant 0xE is the value 14 in decimal. Examination RegNo == 0xe doesn't make sense, because if RegNo > 13, then the function will complete its execution.

There were many other warnings with ID V547 and V560, but as with V595, I was not interested in studying these warnings. It was already clear that I had enough material to write an article :). Therefore, it is not known how many errors of this type can be detected in LLVM using PVS-Studio.

I will give an example why it is boring to study these responses. The analyzer is absolutely right in issuing a warning for the following code. But this is not a mistake.

bool UnwrappedLineParser::parseBracedList(bool ContinueOnSemicolons,
                                          tok::TokenKind ClosingBraceKind) {
  bool HasError = false;
  ....
  HasError = true;
  if (!ContinueOnSemicolons)
    return !HasError;
  ....
}

PVS-Studio warning: V547 [CWE-570] Expression '!HasError' is always false. UnwrappedLineParser.cpp 1635

Fragment N30: ​​Suspicious return

static bool
isImplicitlyDef(MachineRegisterInfo &MRI, unsigned Reg) {
  for (MachineRegisterInfo::def_instr_iterator It = MRI.def_instr_begin(Reg),
      E = MRI.def_instr_end(); It != E; ++It) {
    return (*It).isImplicitDef();
  }
  ....
}

PVS-Studio warning: V612 [CWE-670] An unconditional 'return' within a loop. R600OptimizeVectorRegisters.cpp 63

This is either a bug or a specific technique that is designed to explain something to programmers who read the code. This design does not explain anything to me and looks very suspicious. It's better not to write like that :).

Tired? Then it's time to make tea or coffee.

Finding bugs in LLVM 8 using the PVS-Studio analyzer

Defects revealed by new diagnostics

I think 30 triggers of the old diagnostics is enough. Now let's see what interesting things can be found with new diagnostics that appeared in the analyzer after previous checks. In total, 66 general-purpose diagnostics have been added to the C++ analyzer during this time.

Fragment N31: Unreachable code

Error CtorDtorRunner::run() {
  ....
  if (auto CtorDtorMap =
          ES.lookup(JITDylibSearchList({{&JD, true}}), std::move(Names),
                    NoDependenciesToRegister, true))
  {
    ....
    return Error::success();
  } else
    return CtorDtorMap.takeError();

  CtorDtorsByPriority.clear();

  return Error::success();
}

PVS-Studio warning: V779 [CWE-561] Unreachable code detected. It is possible that an error is present. ExecutionUtils.cpp 146

As you can see, both operator branches if end with an operator call return. Accordingly, the container CtorDtorsByPriority will never be cleared.

Fragment N32: Unreachable code

bool LLParser::ParseSummaryEntry() {
  ....
  switch (Lex.getKind()) {
  case lltok::kw_gv:
    return ParseGVEntry(SummaryID);
  case lltok::kw_module:
    return ParseModuleEntry(SummaryID);
  case lltok::kw_typeid:
    return ParseTypeIdEntry(SummaryID);                        // <=
    break;                                                     // <=
  default:
    return Error(Lex.getLoc(), "unexpected summary kind");
  }
  Lex.setIgnoreColonInIdentifiers(false);                      // <=
  return false;
}

PVS-Studio warning: V779 [CWE-561] Unreachable code detected. It is possible that an error is present. LLParser.cpp 835

An interesting situation. Let's take a look at this place first:

return ParseTypeIdEntry(SummaryID);
break;

At first glance, it seems that there is no error here. It looks like the operator break is superfluous and can be simply removed. However, not all so simple.

The analyzer issues a warning on the lines:

Lex.setIgnoreColonInIdentifiers(false);
return false;

Indeed, this code is unreachable. All cases in Switch end with operator call return. And now senseless lonely break doesn't look so harmless! Perhaps one of the branches should end with break, not on return?

Fragment N33: Random zeroing of high bits

unsigned getStubAlignment() override {
  if (Arch == Triple::systemz)
    return 8;
  else
    return 1;
}

Expected<unsigned>
RuntimeDyldImpl::emitSection(const ObjectFile &Obj,
                             const SectionRef &Section,
                             bool IsCode) {
  ....
  uint64_t DataSize = Section.getSize();
  ....
  if (StubBufSize > 0)
    DataSize &= ~(getStubAlignment() - 1);
  ....
}

PVS-Studio warning: V784 The size of the bit mask is less than the size of the first operand. This will cause the loss of higher bits. RuntimeDyld.cpp 815

Please note that the function getStubAlignment returns type unsigned. Let's calculate the value of the expression, assuming that the function returns the value 8:

~(getStubAlignment() - 1)

~(8u-1)

0xFFFFFFF8u

Now notice that the variable DataSize has a 64-bit unsigned type. It turns out that when performing the DataSize & 0xFFFFFFF8u operation, all thirty-two high-order bits will be set to zero. Most likely, this is not what the programmer wanted. I suspect that he wanted to calculate: DataSize & 0xFFFFFFFFFFFFFFF8u.

To fix the error, you should write like this:

DataSize &= ~(static_cast<uint64_t>(getStubAlignment()) - 1);

Or so:

DataSize &= ~(getStubAlignment() - 1ULL);

Fragment N34: Unsuccessful Explicit Type Casting

template <typename T>
void scaleShuffleMask(int Scale, ArrayRef<T> Mask,
                      SmallVectorImpl<T> &ScaledMask) {
  assert(0 < Scale && "Unexpected scaling factor");
  int NumElts = Mask.size();
  ScaledMask.assign(static_cast<size_t>(NumElts * Scale), -1);
  ....
}

PVS-Studio warning: V1028 [CWE-190] Possible overflow. Consider casting operands of the 'NumElts * Scale' operator to the 'size_t' type, not the result. X86ISelLowering.h 1577

Explicit type casting is used to avoid overflow when multiplying type variables int. However, here explicit type casting does not protect against overflow. At the beginning, the variables will be multiplied, and only then the 32-bit result of the multiplication will be expanded to type size_t.

Fragment N35: Unsuccessful Copy-Paste

Instruction *InstCombiner::visitFCmpInst(FCmpInst &I) {
  ....
  if (!match(Op0, m_PosZeroFP()) && isKnownNeverNaN(Op0, &TLI)) {
    I.setOperand(0, ConstantFP::getNullValue(Op0->getType()));
    return &I;
  }
  if (!match(Op1, m_PosZeroFP()) && isKnownNeverNaN(Op1, &TLI)) {
    I.setOperand(1, ConstantFP::getNullValue(Op0->getType()));        // <=
    return &I;
  }
  ....
}

V778 [CWE-682] Two similar code fragments were found. Perhaps this is a typo and 'Op1' variable should be used instead of 'Op0'. InstCombineCompares.cpp 5507

This new interesting diagnostic reveals situations when a piece of code was copied, and some names were changed in it, but they were not corrected in one place.

Please note that in the second block we changed Op0 on Op1. But in one place they did not fix it. Most likely it should have been written like this:

if (!match(Op1, m_PosZeroFP()) && isKnownNeverNaN(Op1, &TLI)) {
  I.setOperand(1, ConstantFP::getNullValue(Op1->getType()));
  return &I;
}

Fragment N36: Confusion in variables

struct Status {
  unsigned Mask;
  unsigned Mode;

  Status() : Mask(0), Mode(0){};

  Status(unsigned Mask, unsigned Mode) : Mask(Mask), Mode(Mode) {
    Mode &= Mask;
  };
  ....
};

PVS-Studio warning: V1001 [CWE-563] The 'Mode' variable is assigned but is not used by the end of the function. SIModeRegister.cpp 48

It is very dangerous to give function arguments the same names as class members. It's very easy to get confused. We have just such a case. This expression doesn't make sense:

Mode &= Mask;

The function argument changes. And that's it. This argument is no longer used. Most likely, it should have been written like this:

Status(unsigned Mask, unsigned Mode) : Mask(Mask), Mode(Mode) {
  this->Mode &= Mask;
};

Fragment N37: Confusion in variables

class SectionBase {
  ....
  uint64_t Size = 0;
  ....
};

class SymbolTableSection : public SectionBase {
  ....
};

void SymbolTableSection::addSymbol(Twine Name, uint8_t Bind, uint8_t Type,
                                   SectionBase *DefinedIn, uint64_t Value,
                                   uint8_t Visibility, uint16_t Shndx,
                                   uint64_t Size) {
  ....
  Sym.Value = Value;
  Sym.Visibility = Visibility;
  Sym.Size = Size;
  Sym.Index = Symbols.size();
  Symbols.emplace_back(llvm::make_unique<Symbol>(Sym));
  Size += this->EntrySize;
}

PVS-Studio warning: V1001 [CWE-563] The 'Size' variable is assigned but is not used by the end of the function. Object.cpp 424

The situation is similar to the previous one. Should be written:

this->Size += this->EntrySize;

Fragment N38-N47: Forgot to check the pointer

Previously, we considered examples of triggering diagnostics V595. Its essence is that the pointer is dereferenced at the beginning, and only then it is checked. Young diagnostics V1004 is the opposite of it in meaning, but also reveals a lot of errors. It detects situations when the pointer was checked at the beginning and then forgotten to do so. Consider such cases found inside LLVM.

int getGEPCost(Type *PointeeType, const Value *Ptr,
               ArrayRef<const Value *> Operands) {
  ....
  if (Ptr != nullptr) {                                            // <=
    assert(....);
    BaseGV = dyn_cast<GlobalValue>(Ptr->stripPointerCasts());
  }
  bool HasBaseReg = (BaseGV == nullptr);

  auto PtrSizeBits = DL.getPointerTypeSizeInBits(Ptr->getType());  // <=
  ....
}

PVS-Studio warning: V1004 [CWE-476] The 'Ptr' pointer was used unsafely after it was verified against nullptr. Check lines: 729, 738. TargetTransformInfoImpl.h 738

Variable Ptr may be equal nullptr, as evidenced by the check:

if (Ptr != nullptr)

However, below this pointer is dereferenced without a preliminary check:

auto PtrSizeBits = DL.getPointerTypeSizeInBits(Ptr->getType());

Let's consider another similar case.

llvm::DISubprogram *CGDebugInfo::getFunctionFwdDeclOrStub(GlobalDecl GD,
                                                          bool Stub) {
  ....
  auto *FD = dyn_cast<FunctionDecl>(GD.getDecl());
  SmallVector<QualType, 16> ArgTypes;
  if (FD)                                                                // <=
    for (const ParmVarDecl *Parm : FD->parameters())
      ArgTypes.push_back(Parm->getType());
  CallingConv CC = FD->getType()->castAs<FunctionType>()->getCallConv(); // <=
  ....
}

PVS-Studio warning: V1004 [CWE-476] The 'FD' pointer was used unsafely after it was verified against nullptr. Check lines: 3228, 3231. CGDebugInfo.cpp 3231

Pay attention to the pointer FD. I am sure the problem is clearly visible, and no special explanation is required.

And more:

static void computePolynomialFromPointer(Value &Ptr, Polynomial &Result,
                                         Value *&BasePtr,
                                         const DataLayout &DL) {
  PointerType *PtrTy = dyn_cast<PointerType>(Ptr.getType());
  if (!PtrTy) {                                                   // <=
    Result = Polynomial();
    BasePtr = nullptr;
  }
  unsigned PointerBits =
      DL.getIndexSizeInBits(PtrTy->getPointerAddressSpace());     // <=
  ....
}

PVS-Studio warning: V1004 [CWE-476] The 'PtrTy' pointer was used unsafely after it was verified against nullptr. Check lines: 960, 965. InterleavedLoadCombinePass.cpp 965

How to protect yourself from such errors? Be more attentive at Code-Review and use the PVS-Studio static analyzer for regular code reviews.

There is no point in citing other code fragments with errors of this type. I will leave only a list of warnings in the article:

  • V1004 [CWE-476] The 'Expr' pointer was used unsafely after it was verified against nullptr. Check lines: 1049, 1078. DebugInfoMetadata.cpp 1078
  • V1004 [CWE-476] The 'PI' pointer was used unsafely after it was verified against nullptr. Check lines: 733, 753. LegacyPassManager.cpp 753
  • V1004 [CWE-476] The 'StatepointCall' pointer was used unsafely after it was verified against nullptr. Check lines: 4371, 4379. Verifier.cpp 4379
  • V1004 [CWE-476] The 'RV' pointer was used unsafely after it was verified against nullptr. Check lines: 2263, 2268. TGParser.cpp 2268
  • V1004 [CWE-476] The 'CalleeFn' pointer was used unsafely after it was verified against nullptr. Check lines: 1081, 1096. SimplifyLibCalls.cpp 1096
  • V1004 [CWE-476] The 'TC' pointer was used unsafely after it was verified against nullptr. Check lines: 1819, 1824. Driver.cpp 1824

Fragment N48-N60: Not critical, but a defect (possible memory leak)

std::unique_ptr<IRMutator> createISelMutator() {
  ....
  std::vector<std::unique_ptr<IRMutationStrategy>> Strategies;
  Strategies.emplace_back(
      new InjectorIRStrategy(InjectorIRStrategy::getDefaultOps()));
  ....
}

PVS-Studio warning: V1023 [CWE-460] A pointer without owner is added to the 'Strategies' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-isel-fuzzer.cpp 58

To add an element to the end of a type container std::vector > you can't just write xxx.push_back(new X), since there is no implicit conversion from X* в std::unique_ptr.

A common solution is to write xxx.emplace_back(new X), since it compiles: method emplace_back constructs the element directly from the arguments and can therefore use explicit constructors.

It is not safe. If the vector is full, then memory is re-allocated. The reallocate operation may fail, resulting in an exception being thrown. std::bad_alloc. In this case, the pointer will be lost and the created object will never be deleted.

The safe solution is to create unique_ptr, which will own the pointer before the vector tries to reallocate the memory:

xxx.push_back(std::unique_ptr<X>(new X))

Since C++14, 'std::make_unique' can be used:

xxx.push_back(std::make_unique<X>())

This type of defect is not critical for LLVM. If it fails to allocate memory, then the compiler will simply stop. However, for applications with a long uptime, which can't just exit if memory allocation fails, this can be a real nasty bug.

So, although this code does not pose a practical danger to LLVM, I found it useful to talk about this error pattern and that the PVS-Studio analyzer has learned how to detect it.

Other warnings of this type:

  • V1023 [CWE-460] A pointer without owner is added to the 'Passes' container by the 'emplace_back' method. A memory leak will occur in case of an exception. PassManager.h 546
  • V1023 [CWE-460] A pointer without owner is added to the 'AAs' container by the 'emplace_back' method. A memory leak will occur in case of an exception. AliasAnalysis.h 324
  • V1023 [CWE-460] A pointer without owner is added to the 'Entries' container by the 'emplace_back' method. A memory leak will occur in case of an exception. DWARFDebugFrame.cpp 519
  • V1023 [CWE-460] A pointer without owner is added to the 'AllEdges' container by the 'emplace_back' method. A memory leak will occur in case of an exception. CFGMST.h 268
  • V1023 [CWE-460] A pointer without owner is added to the 'VMaps' container by the 'emplace_back' method. A memory leak will occur in case of an exception. SimpleLoopUnswitch.cpp 2012
  • V1023 [CWE-460] A pointer without owner is added to the 'Records' container by the 'emplace_back' method. A memory leak will occur in case of an exception. FDRLogBuilder.h 30
  • V1023 [CWE-460] A pointer without owner is added to the 'PendingSubmodules' container by the 'emplace_back' method. A memory leak will occur in case of an exception. ModuleMap.cpp 810
  • V1023 [CWE-460] A pointer without owner is added to the 'Objects' container by the 'emplace_back' method. A memory leak will occur in case of an exception. DebugMap.cpp 88
  • V1023 [CWE-460] A pointer without owner is added to the 'Strategies' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-isel-fuzzer.cpp 60
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 685
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 686
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 688
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 689
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 690
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 691
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 692
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 693
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 694
  • V1023 [CWE-460] A pointer without owner is added to the 'Operands' container by the 'emplace_back' method. A memory leak will occur in case of an exception. GlobalISelEmitter.cpp 1911
  • V1023 [CWE-460] A pointer without owner is added to the 'Stash' container by the 'emplace_back' method. A memory leak will occur in case of an exception. GlobalISelEmitter.cpp 2100
  • V1023 [CWE-460] A pointer without owner is added to the 'Matchers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. GlobalISelEmitter.cpp 2702

Conclusion

In total, I wrote out 60 warnings, after which I stopped. Are there any other defects that the PVS-Studio analyzer detects in LLVM? Yes, I have. However, when I wrote out the code snippets for the article, it was late evening, or rather, even night, and I decided that it was time to wrap up.

I hope you found it interesting and would like to try the PVS-Studio analyzer.

You can download the analyzer and get the trawl key at this page.

Most importantly, use static analysis regularly. One-time checks, which we carry out in order to popularize the methodology of static analysis and PVS-Studio are not a normal scenario.

Good luck in improving the quality and reliability of the code!

Finding bugs in LLVM 8 using the PVS-Studio analyzer

If you want to share this article with an English-speaking audience, please use the translation link: Andrey Karpov. Finding Bugs in LLVM 8 with PVS-Studio.

Source: habr.com

Add a comment