Trojan Source attack to inject code changes that are invisible to the developer

Researchers at the University of Cambridge have published a technique to silently insert malicious code into peer-reviewed source code. The prepared attack method (CVE-2021-42574) is presented under the name Trojan Source and is based on the formation of text that looks different for the compiler / interpreter and the person viewing the code. Examples of applying the method are demonstrated for various compilers and interpreters supplied for C, C++ (gcc and clang), C#, JavaScript (Node.js), Java (OpenJDK 16), Rust, Go, and Python.

The method is based on the use of special Unicode characters in code comments that change the order in which bidirectional text is displayed. With the help of such control characters, some parts of the text can be displayed from left to right, and others from right to left. In everyday practice, such control characters can be used, for example, to insert into a file with a line code in Hebrew or Arabic. But if you combine lines with different text directions in one line, using the specified characters, passages of text displayed from right to left may overlap existing plain text displayed from left to right.

Using this method, you can add a malicious construct to the code, but then make the text with this construct invisible when viewing the code, by adding characters displayed from right to left in the next comment or inside the literal, which will lead to completely different characters being superimposed on the malicious insert. Such code will remain semantically correct, but will be interpreted and displayed differently.

Trojan Source attack to inject code changes that are invisible to the developer

During the code review process, a developer will encounter visual character order and see an unsuspicious comment in a modern text editor, web interface, or IDE, but the compiler and interpreter will use the logical character order and process the malicious insertion as is, ignoring bidirectional text in the comments. The problem affects various popular code editors (VS Code, Emacs, Atom), as well as interfaces for viewing code in repositories (GitHub, Gitlab, BitBucket and all Atlassian products).

Trojan Source attack to inject code changes that are invisible to the developer

There are several ways to use the method to implement malicious actions: adding a hidden "return" expression, leading to the completion of the function execution ahead of time; commenting out expressions normally seen as valid constructs (for example, to disable important checks); assignment of other string values, leading to string validation failures.

For example, an attacker might suggest a change that includes the line: if access_level != "user{U+202E} {U+2066}// Check if admin{U+2069} {U+2066}" {

which will be displayed in the review interface as if access_level != "user" { // Check if admin

Additionally, another variant of the attack was proposed (CVE-2021-42694), associated with the use of homoglyphs, characters that are outwardly similar in outline, but differ in meaning and have different unicode codes (for example, the symbol "Ι‘" resembles "a", "Ι‘" - "g", "Ι©" - "l"). Such characters can be used in some languages ​​in function and variable names to confuse developers. For example, two functions with identical names can be defined that perform different actions. Without a detailed analysis, it is not immediately clear which of these two functions is called in a particular place.

Trojan Source attack to inject code changes that are invisible to the developer

As a security measure, it is recommended that compilers, interpreters, and build tools that support Unicode characters output an error or warning if comments, string literals, or identifiers contain unpaired control characters that reverse the output direction (U+202A, U+202B, U +202C, U+202D, U+202E, U+2066, U+2067, U+2068, U+2069, U+061C, U+200E and U+200F). Such characters should also be explicitly prohibited in programming language specifications and should be taken into account in code editors and interfaces for working with repositories.

Addendum 1: Vulnerability fixes prepared for GCC, LLVM/Clang, Rust, Go, Python and binutils. The issue has also been fixed by GitHub, Bitbucket, and Jira. A fix for GitLab is in preparation. To identify problematic code, it is suggested to use the command: grep -r $'[\u061C\u200E\u200F\u202A\u202B\u202C\u202D\u202E\u2066\u2067\u2068\u2069]' /path/to/source

Addendum 2: Russ Cox, one of the developers of the Plan 9 OS and the Go programming language, criticized the excessive attention to the described attack method, which has long been known (Go, Rust, C ++, Ruby) and was not taken seriously. According to Cox, the problem mainly concerns the correct display of information in code editors and web interfaces, and is solved by using the correct tools and code analyzers during review. Therefore, instead of drawing attention to speculative attacks, it would be more appropriate to focus on improving code review processes and dependencies.

Ras Cox also believes that compilers are not the place to fix the problem, since disabling dangerous symbols at the compiler level leaves a huge layer of tools in which the use of these symbols remains acceptable, such as build systems, assemblers, package managers and various configuration parsers and data. For example, the Rust project is given, which prohibited the processing of LTR / RTL code in the compiler, but did not add a fix to the Cargo package manager, which allows a similar attack through the Cargo.toml file. Similarly, files such as BUILD.bazel, CMakefile, Cargo.toml, Dockerfile, GNUmakefile, Makefile, go.mod, package.json, pom.xml, and requirements.txt can become attack sources.

Source: opennet.ru

Add a comment