Missing commas found in 3.6% of tested Python repositories

The results of a study on the vulnerability of Python code to errors related to the incorrect use of commas in the code have been published. The problems are caused by the fact that when enumerating, Python automatically concatenates the strings in the list if they are not separated by a comma, and also treats the value as a tuple if the value is followed by a comma. After conducting an automated analysis of 666 GitHub repositories with Python code, the researchers identified possible comma issues in 5% of the projects studied.

Further manual inspection showed that real errors were present in only 24 repositories (3.6%), and the remaining 1.4% were false positives (for example, a comma could be deliberately omitted between lines to concatenate multi-line file paths, long hashes, HTML blocks or SQL expressions). It is noteworthy that among the 24 repositories with real errors were such large projects as Tensorflow, Google V8, Sentry, Pydata xarray, rapidpro, django-colorfield and django-helpdesk. However, problems with commas are not specific to Python and often crop up in C/C++ projects (examples of recent fixes are LLVM, Mono, Tensorflow).

The main types of errors studied:

  • Accidentally missing a comma in lists, tuples, and sets, causing strings to be concatenated instead of being interpreted as separate values. For example, in Sentry, one of the tests missed a comma between the strings "releases" and "discover" in the list, which resulted in checking a non-existent "/releasesdiscover" handler, instead of checking "/releases" and "/discover" separately.
    Missing commas found in 3.6% of tested Python repositories

    Another example is that a missing comma in rapidpro caused two different rules to be merged on line 572:

    Missing commas found in 3.6% of tested Python repositories

  • A missing comma at the end of a single-element tuple definition, causing the assignment to assign a regular type rather than a tuple. For example, the expression "values ​​= (1,)" will result in an assignment to a variable of a tuple of one element, but "values ​​= (1)" will result in an assignment of an integer type. The parentheses in these assignments do not affect the type definition and are optional, and the presence of a tuple is determined by the parser only based on the presence of commas. REST_FRAMEWORK = { 'DEFAULT_PERMISSION_CLASSES': ( 'rest_framework.permissions.IsAuthenticated' # will be assigned a string instead of a tuple. ) }
  • The opposite situation is extra commas during assignment. If a comma is accidentally left at the end of an assignment, a tuple will be assigned as the value instead of the usual type (for example, if β€œvalue = 1,” is specified instead of β€œvalue = 1”).

Source: opennet.ru

Add a comment