Have you ever wondered how much information is lost without a trace? After all, information is what Habr exists for. Do you know what happens most often with resources based on user posts? The authors insert images, pictures and videos from third-party sites and after some time they are no longer available. That's what it was created for. Habrastorage. Practice has shown that no one (except editors and a few enthusiasts) uploads images there on their own. Therefore, at some point, the Habr administration made this function automatic - each image that occurs in the publication is automatically uploaded to the repository and will not disappear from there as long as Habr itself exists. Of course, there are exceptions and something might go wrong, but now not about them.
The biggest problem in this whole scheme with uploading images to Habrastorage occurred during its implementation. By that time, some old publications were already without drawings, and therefore remained so. Today we will try to find out how much graphic information Habr has lost since its birth. Besides, maybe we can find something missing? After all, this "image cannot be loaded" stub is annoying, isn't it? Today's detective is dedicated to just that. Let's get started!
Perhaps you were brought to this article by a mention in tracker? Probably, in one of your old publications the picture disappeared, but I found it. If you don't feel like reading the whole post, you can just scroll down to the spoiler at the very end (section The results), which lists all publications and found images. Thank you!
Introduction and Methods
Our detective will start from the very beginning (logically, right?). Since the beginning of Habr. After all, the earlier a post was published, the more likely it is that the images from it were lost somewhere in history. That is why we will start from 2006 and move forward a little.
All publications from 40 hubs, which are currently at the top of the rating, participate in the consideration. A complete list of these hubs is presented under the spoiler. In fact, many of them did not exist then, however, when new hubs were added, publications were transferred there.
At the very beginning of Habr, there were not as many publications as they are now, and there are even fewer pictures in them. In total in 2006 (starting from 05.06.2006/221/53) 75 posts were published in the listed hubs. 10 of these posts contain a total of XNUMX images. Maximum images (XNUMX) per post"Ten gadgets that changed the world". 50 drawings are already on Habrastorage. Another 25 are lost. All of them are unique and do not repeat.
Interesting fact: Two of the images lead to Habr itself, but have not been available for a long time. These are the images http://www.habrahabr.ru/tmp/sup_blogs_preview.gif and http://www.habrahabr.ru/tmp/upgrade-chart.gif.
So, for 2006 lost 33.3% images in publications.
2007
In 2007, the number of publications increased significantly, as did the number of images - 1 posts were published. 713 Posts contain 599 images. 1 images were transferred to Habrastorage, and 467 were lost (16.2%).
Interesting fact: Publication Top 100 Mac OS Applications contains a maximum of 2007 images for 100 and does not contain copyright text.
In addition, some of these lost images are repeated. So, one of them occurs 6 times in one ARTICLES with only 6 pictures. Also, the image "Up.gif" is repeated 21 times, "Down.gif" 16 times and "Same.gif" 8 times from the same domain. And all these 45 images from one post, which has 47 pictures in total.
Interesting fact: The most unexpected image (or rather, the problem in the design of the publication) is here. As a result, Habr tries to download the image from http://#/.
Rice. 1. General statistics of the considered
Can anything be restored?
Partial restoration is not difficult. For example, the laziest way would be to use Internet Archive in an attempt to load saved pages of publications. In addition, you can try to "find" the images themselves in the archive using direct links.
Life hacks: You need to check for the presence of images in all versions of the page in the archive, not only the oldest and the newest.
Unfortunately, although this method works in some cases, it is so difficult to restore at least half of the images. Therefore, the next step is to check cross-posting, original translations and, of course, archival copies of the original pages.
In addition, you can try to find the desired images using one of Habr's unofficial mirrors, which once worked and still store some of the copied information.
The last and most difficult option is to use search engines. If you know exactly what should be on the image (there is a description and context), there is a chance to find files with the same name if they were once copied to another resource by someone.
Naturally, each next step increases the search time non-linearly.
What managed to find
You may not be too impressed by the number of images found so far - there are 300 of them (contained in 140 publications from 81 authors). If we take into account the number of "losses" (1), then the result is about 24.2%. Why are there fewer missing images than before? Removed from consideration all useless images (like view counters) and non-existent images (like the already mentioned http://#/, as well as http://fig.jpg/ etc.).
How did such a round number come about? The fact is that about 300 hours of searching ended. Initially, I was going to go up to 333, but 300 looks pretty good too. In addition, at the moment, about 33% all "victims of the search".
Rice. 2. Current search results
All found images (except one .bmp, with it would be 301) uploaded to hsto.org, and links to them and publications, as well as indexes of images in them, are given in the next section.
The results
So, successfully found images are shown under the spoiler, as well as the id of publications, the index of the figure inside the text of the publication (starting from 1, not from 0) and the author of the publication. If you are the author of the mentioned publications, and the figures found are correct, please correct your posts. Thank you!
By the way, some images are actually still available for viewing in publications, but they have not been transferred to Habrastorage, and therefore at some point they may also become inaccessible.
Perhaps someone will think that restoring such outdated information does not make any sense. And besides, some of the images found were meaningless even when they were published. Undoubtedly, it is.
Any information is important. At least in terms of historical analysis. Not to mention the fact that in some author's materials it has a key role. Yes, at the moment Habr is not even 15 years old and some of the sources are still available, but over time they will become less and less, and therefore it is worth considering in advance whether something will remain for later, or whether there will be an eternal "image not available".
Unfortunately, while Habrastorage does not directly support downloading for all image formats, but maybe someday this will be fixed.
The last problem that I want to mention, and which you probably thought about, "what if the author has not been using Habr for a long time and he is not interested in fixing junk?" I have this question in my head arose more than once, but the solution here is not so difficult. Old posts can always be corrected UFO represented by moderators (you can, after all, Exosphere?) or administration (Boomburum can give someone a task).
What do you think, is it worth trying to restore at least something?
That's all for today. Thank you for your attention and may all your images be uploaded to Habrastorage without any problems! Let there be no such
PS If you find any typos or errors in the text, please let me know. This can be done by selecting a piece of text and pressing "Ctrl / β + Enter" if you have Ctrl / β, either via private messages. If both options are not available, write about the errors in the comments. Thank you!
P.P.S Perhaps you will also be interested in my other studies of Habr, or you want to propose your topic for the next publication, or maybe even a new cycle of publications.
Where to find the list and how to make an offer
All information can be found in a special repository habr-detective. There you can also find out which proposals have already been voiced, and what is already in the works.
Also, you can mention me (by writing VaskivskyiYe) in the comments to a publication that you think is interesting for research or analysis.