GitHub has created a thousand-year repository in which to save Open Source repositories for posterity

GitHub has created a thousand-year repository in which to save Open Source repositories for posterity
Former coal mine that houses the Arctic World Archive vault. PhotoCredit: Guy Martin/Bloomberg Businessweek

Free software is the cornerstone of modern civilization and the common heritage of all mankind. Mission GitHub Archive programs β€” to preserve this code for future generations so that the history of the Library of Alexandria never repeats itself.

To do this, GitHub will have many backups on different media, including long-term storage. arctic code vault on Svalbard. It is located in a former coal mine at a depth of 250 meters in permafrost and is designed for a storage life of at least 1000 years.

A snapshot of humanity's programming code will be taken on February 2, 2020.

The data retention project is launched with the Long Now Foundation, the Internet Archive, the Software Legacy Foundation, Arctic World Archive and other partners.

Project LOCKSS

Code that is vital today can be forgotten or lost over time. The worst thing is if, in the event of a global catastrophe, we lose all the information that was stored on β€œephemeral” media: HDD, SSD, CD and DVD, designed for several decades, on tapes that have a conditional service life of 30 years, which requires strict control of temperature and humidity .

The solution to the problem is duplication of backups, that is, archiving software by several organizations and in different forms. This project is called LOCKSS started yet almost 20 years. In May 2019, the program was presented LOCKSS 2.0 alpha - the first prototype of software for distributed data storage for a long time with support for multiple participants and external storage.

The system's designers assume that hardware can be much more durable than ephemeral media: therefore, "there are a range of possible futures in which working modern computers exist but their software is largely lost."

GitHub reminds you of many lost technologies that could be useful: Roman concrete (his recipe was only rediscovered in 2014), antimalarial drug DFDT, lost drawings of the rocket "Saturn-5". It's easy to imagine a future in which today's software will be treated as a quirky and long-forgotten junk until an unexpected need arises: "Like any backup, the GitHub archive program is also designed for the unforeseen future," the program's GitHub website says. archive.

GitHub Archive

GitHub Archive provides three levels of backups:

  • Hot: almost real time
  • Warm: updated every month to a year
  • Cold: updated every 5+ years

After any action by GitHub users, all Git data is replicated to several data centers around the world. Git backups, issues, pull requests, and all user data on GitHub are stored in several places. This information is available in real time via the GitHub API.

In addition, recursive indexing is organized by the GHTorrent crawler, which will upload archives on a daily or monthly basis. Through GH Archive, snapshots from the archive can be obtained using BigQuery queries. Other copies of the code are hosted by the well-known "Time Machine" for the Internet Archive, which keeps copies in multiple locations. Finally, the Software Heritage Foundation will regularly scan GitHub and add their public repositories to their archive, which has a public API.

Arctic GitHub repository

On February 2, 2020, GitHub will make a copy of all active public repositories and push them to the Arctic GitHub repository.

The data will be stored on 3500-foot film reels provided by the Norwegian company Piql, which specializes in long-term data storage. According to ISO measurements, this silver halide film in polyester has a lifespan of 500 years. Simulated aging tests have shown that Piql film retains information at least twice as long.

In addition, GitHub Archive has partnered with Microsoft Silica project researchers to burn all public repositories onto quartz glass wafers using a femtosecond laser. This storage medium will ensure the safety of data for more than 10 years.

The Arctic GitHub Code Repository is being built on the basis of the Arctic World Archive (AWA) at a depth of 250 meters in the permafrost. The archive is located in a former coal mine on the Svalbard archipelago, which is not very far from the North Pole. Global warming will affect only a few meters of permafrost and does not threaten the mine in the near future (several thousand years).

Svalbard is regulated international treaty as a demilitarized zone. This is one of the most remote and geopolitically stable human settlements on Earth, according to GitHub. Not far away is the famous World Seed Vault, the main hope of mankind in case of an apocalypse.

GitHub has created a thousand-year repository in which to save Open Source repositories for posterity
Svalbard World Seed Store

AWA is a joint initiative between Norwegian state-owned mining company Norske Spitsbergen Kulkompani (SNSK) and digital preservation provider Piql AS. There are already stored historical and cultural data from Italy, Brazil, Norway, the Vatican and other countries.

GitHub has created a thousand-year repository in which to save Open Source repositories for posterity
PhotoCredit: Guy Martin/Bloomberg Businessweek

GitHub code spools will be stored in a steel-walled container inside a sealed chamber. The 02.02.2020 snapshot will include all active GitHub repositories and a significant part of inactive ones (judging by the stars, dependencies, etc.), all binary files up to 100 KB. Each repository in a separate tar file. Everything should fit on 200 coils of 120 GB.

Together with the archive, a human-readable catalog and technical manuals on QR decoding, file formats, character encodings and other important metadata will be put in order for descendants to convert the data back to the source code.

The archive will also include a general Tech Tree guide in case future readers run out of working computers and have to rebuild technology from scratch.

Source: habr.com

Add a comment