Crashes in build systems due to changing the checksums of archives in GitHub

GitHub has changed the way the auto-generated ".tar.gz" and ".tgz" archives are generated on the release pages, which has led to changes in their checksums and massive failures in automated build systems that verify the integrity of archives downloaded from GitHub against previously stored checksums, such as those placed in package metadata or in build scripts.

Starting with release 2.38, the Git toolkit included by default a built-in implementation of gzip, which allowed for unified support for this compression method across operating systems and improved performance for creating archives. GitHub picked up the change after updating the version of git in their infrastructure. The problem was that the compressed archives generated by the built-in zlib-based gzip implementation are binary different from the archives created by the gzip utility, which led to different checksums for archives created by different versions of git when running the "git archive" command.

Accordingly, after updating git in GitHub, slightly different archives began to be given on the release pages that did not pass the check against the old checksums. The problem manifested itself in various build systems, continuous integration systems, and toolkits for building packages from source. For example, about 5800 FreeBSD ports were broken, the sources for which were downloaded from GitHub.

In response to early complaints about crashes, GitHub representatives initially pointed out that constant checksums for archives were never guaranteed. After it was shown that getting build systems affected by the change would require a massive amount of work to update the metadata in various ecosystems, GitHub changed their minds, reverted the change, and brought back the old method of generating archives.

The Git developers have not yet come to a decision and are only discussing possible actions. Options considered include falling back to using the default gzip utility; adding the "--stable" flag to maintain compatibility with older archives; binding the built-in implementation to a separate archive format; using the gzip utility for old commits and the built-in implementation for commits starting from a certain date; guaranteeing format stability only for uncompressed archives.

The complexity of the decision is explained by the fact that the rollback to the call of the external utility does not completely solve the problem of the invariance of the checksums, since a change in the external gzip program can also lead to a change in the archive format. Currently, a set of patches is proposed for review, returning by default the old behavior (calling an external gzip utility) and using the built-in implementation when the gzip utility is not present on the system. The patches also add a note to the documentation that the output of "git archive" is not guaranteed to be stable and the format is subject to change in the future.

Source: opennet.ru

Add a comment