Habra detective: your picture is lost

Habra detective: your picture is lost
Have you ever wondered how much information is lost without a trace? After all, information is what Habr exists for. Do you know what happens most often with resources based on user posts? The authors insert images, pictures and videos from third-party sites and after some time they are no longer available. That's what it was created for. Habrastorage. Practice has shown that no one (except editors and a few enthusiasts) uploads images there on their own. Therefore, at some point, the Habr administration made this function automatic - each image that occurs in the publication is automatically uploaded to the repository and will not disappear from there as long as Habr itself exists. Of course, there are exceptions and something might go wrong, but now not about them.

The biggest problem in this whole scheme with uploading images to Habrastorage occurred during its implementation. By that time, some old publications were already without drawings, and therefore remained so. Today we will try to find out how much graphic information Habr has lost since its birth. Besides, maybe we can find something missing? After all, this "image cannot be loaded" stub is annoying, isn't it? Today's detective is dedicated to just that. Let's get started!

Perhaps you were brought to this article by a mention in tracker? Probably, in one of your old publications the picture disappeared, but I found it. If you don't feel like reading the whole post, you can just scroll down to the spoiler at the very end (section The results), which lists all publications and found images. Thank you!

Introduction and Methods

Our detective will start from the very beginning (logically, right?). Since the beginning of Habr. After all, the earlier a post was published, the more likely it is that the images from it were lost somewhere in history. That is why we will start from 2006 and move forward a little.

All publications from 40 hubs, which are currently at the top of the rating, participate in the consideration. A complete list of these hubs is presented under the spoiler. In fact, many of them did not exist then, however, when new hubs were added, publications were transferred there.

List of hubs

* nix, Algorithms, Artificial Intelligence, Astronautics, biotechnology, Brain, C++, development management, DIY, Ecology, game development, games and game consoles, geek health, History of IT, Information Security, IT career, IT infrastructure, IT companies, Java, JavaScript, Legislation in IT, Lifehacks for geeks, Machine learning, Manufacture and development of electronics, Nginx, Open source, Personnel Management, Physics, Popular science, Product Management, Programming, Project management, Python, reading room, Reverse engineering, Social networks and communities, System administration, System Analysis and Design, The future is here, Website development

The information was collected using a set of PHP scripts. Each post has been uploaded, tag content defined <div id="post-content-body"> and checked for tags <img> inside. For each image, links to images are saved with reference to the publication ID on HabrΓ©. This information is analyzed further.

What was published and when

2006

At the very beginning of Habr, there were not as many publications as they are now, and there are even fewer pictures in them. In total in 2006 (starting from 05.06.2006/221/53) 75 posts were published in the listed hubs. 10 of these posts contain a total of XNUMX images. Maximum images (XNUMX) per post"Ten gadgets that changed the world". 50 drawings are already on Habrastorage. Another 25 are lost. All of them are unique and do not repeat.

Interesting fact: Two of the images lead to Habr itself, but have not been available for a long time. These are the images http://www.habrahabr.ru/tmp/sup_blogs_preview.gif and http://www.habrahabr.ru/tmp/upgrade-chart.gif.

So, for 2006 lost 33.3% images in publications.

2007

In 2007, the number of publications increased significantly, as did the number of images - 1 posts were published. 713 Posts contain 599 images. 1 images were transferred to Habrastorage, and 467 were lost (16.2%).

Interesting fact: Publication Top 100 Mac OS Applications contains a maximum of 2007 images for 100 and does not contain copyright text.

In addition, some of these lost images are repeated. So, one of them occurs 6 times in one ARTICLES with only 6 pictures. Also, the image "Up.gif" is repeated 21 times, "Down.gif" 16 times and "Same.gif" 8 times from the same domain. And all these 45 images from one post, which has 47 pictures in total.

That leaves 191 unique < img >.

2008

Since the number of publications on HabrΓ© only increased from year to year, in 2008 our detective will consider 2 publications, as well as 520 images. We noticed that it was in 2 that the number of images in publications finally exceeded the number of publications. At the same time, only 969 posts contain pictures, and a maximum of 2008 elements of graphic information is presented in the publication "The History of Google's Holiday Logos". 1 images are already saved on Habrastorage, and 943 are lost (34.6%).

Interesting fact: The most unexpected image (or rather, the problem in the design of the publication) is here. As a result, Habr tries to download the image from http://#/.

Habra detective: your picture is lost

Rice. 1. General statistics of the considered

Can anything be restored?

Partial restoration is not difficult. For example, the laziest way would be to use Internet Archive in an attempt to load saved pages of publications. In addition, you can try to "find" the images themselves in the archive using direct links.

Life hacks: You need to check for the presence of images in all versions of the page in the archive, not only the oldest and the newest.

Unfortunately, although this method works in some cases, it is so difficult to restore at least half of the images. Therefore, the next step is to check cross-posting, original translations and, of course, archival copies of the original pages.

In addition, you can try to find the desired images using one of Habr's unofficial mirrors, which once worked and still store some of the copied information.

The last and most difficult option is to use search engines. If you know exactly what should be on the image (there is a description and context), there is a chance to find files with the same name if they were once copied to another resource by someone.

Naturally, each next step increases the search time non-linearly.

What managed to find

You may not be too impressed by the number of images found so far - there are 300 of them (contained in 140 publications from 81 authors). If we take into account the number of "losses" (1), then the result is about 24.2%. Why are there fewer missing images than before? Removed from consideration all useless images (like view counters) and non-existent images (like the already mentioned http://#/, as well as http://fig.jpg/ etc.).

How did such a round number come about? The fact is that about 300 hours of searching ended. Initially, I was going to go up to 333, but 300 looks pretty good too. In addition, at the moment, about 33% all "victims of the search".

Habra detective: your picture is lost

Rice. 2. Current search results

All found images (except one .bmp, with it would be 301) uploaded to hsto.org, and links to them and publications, as well as indexes of images in them, are given in the next section.

The results

So, successfully found images are shown under the spoiler, as well as the id of publications, the index of the figure inside the text of the publication (starting from 1, not from 0) and the author of the publication. If you are the author of the mentioned publications, and the figures found are correct, please correct your posts. Thank you!

By the way, some images are actually still available for viewing in publications, but they have not been transferred to Habrastorage, and therefore at some point they may also become inaccessible.

300 pictures

Author
Post ID
Indexes and links
Example

0x62ash
27149
1
Habra detective: your picture is lost

0xa8
11105
1

2Bad
607
1

1097
1

1106
1, 2, 3, 5, 24

13836
2

4ese
30820
1, 2, 3, 5
Habra detective: your picture is lost

8cinq
41853
1

46498
1

Adam_B
12582
1

ainu
39501
1

alardus
2628
1

Alaska
23447
1, 2
Habra detective: your picture is lost

alex_raiden
24479
2

30594
3

39037
1

40312
1, 2, 3, 4

44152
1, 2, 3

46294
1

46741
1

47782
1, 2, 3, 4, 5

alfsoft
42782
1, 2, 3, 4, 5

alize
37779
1, 2

altblog
44677
1

arestov
37921
1

artch
19726
1

badlittleduck
16292
1, 2, 3, 4, 5

Barkov
26335
1

BBSoD
8505
1

bO_oblik
22150
1, 2, 3, 4, 5

22186
1

22215
1

22322
1, 2, 3, 4, 5, 6

22334
1, 2

22375
1, 2, 3

22510
1, 2

22614
1

22836
1, 2

26181
1, 2, 3, 4, 6

28196
1, 2, 3, 4, 5, 6, 7, 8
Habra detective: your picture is lost

29706
1, 2, 3, 4

31490
1, 2, 3, 4

36713
1

37180
1

37249
1

37306
1, 2

38013
1

38389
1, 2

41104
1, 2

41647
1

41821
1, 2

clean_v
12783
1

chulak
45783
1, 2, 3, 4, 5, 6, 7
Habra detective: your picture is lost

Coss
31069
1

CurlyBrace
11010
1

11941
1

14157
1

37303
1

dreikanter
31320
1, 2, 4

entze
40767
1

Fenniks
20843
2

23902
1

39109
1

firstbyte
38314
1

freetonik
26593
1

frujo
40987
1

garbuz
29694
1

gorinich
12027
1

gravity
28840
1

href
46908
1, 2
Habra detective: your picture is lost

iljava
30902
2, 3

impostor
26566
1

invladis
42904
1

Karlsson
8971
down.gif, Same.gif, tpci_trends.png, Up.gif

31042
1

31050
1

31141
1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17
Habra detective: your picture is lost

Klaus
15775
1, 2, 3, 4, 5, 6, 7, 8

Line_13
16891
2

le0pard
38391
1

LukaSafonov
43537
1

meako
26705
1

Midgard
31419
2, 3, 4

Mio
396
1

753
1

936
1

mosaic
744
1

Mr_Floppy
28343
1

Vittorio Citro Boutique Official Site | Clothing and Footwear Buy the new collection online on Vittoriocitro.it Express Shipping and Free Return.Vittorio Citro Boutique Official Store | Fashion items for men and women
44476
1

officer
110
1

oleg_bunin
7207
1

7226
1

8679
1

12768
1

olegafx
43934
1, 2, 3, 4, 5, 6, 7, 8-9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19

islander
37146
2, 3
Habra detective: your picture is lost

ponomar
14141
1

porchini
21850
1, 2

Pure_BY
8416
1

RAF
851
1, 2

ramber
43693
1

roster
44380
1

ruskar
42578
3, 5, 8
Habra detective: your picture is lost

sainted
702
1

SamDark
30104
1

Ladder
37804
4

Shapelez
23260
1

44379
1, 2

46113
1

46599
1

47536
1

slaff
8134
1, 2

smartov
17160
3

smitana
30375
1

spanasik
44755
17

spiritus_sancti
41129
1, 2
Habra detective: your picture is lost

summer dream
3801
1

sunnybear
31211
1, 2

Switch
9095
1

Taoorus
37507
1

thoggen
38733
1

45024
1

45170
1

tsepelev
36611
1

VadimUA
46922
1

wine
26073
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21
Habra detective: your picture is lost

30171
1, 2, 3

XaocCPS
40036
1

284390
1

284392
1

284394
1

284396
1

yaneblog
39007
1, 6

40621
3

Yesutin
9453
1

9645
1

31078
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
Habra detective: your picture is lost

yshilyaev
5556
1, 2, 3

zada
31123
2

Zigzag
15492
1

Instead of a conclusion

Perhaps someone will think that restoring such outdated information does not make any sense. And besides, some of the images found were meaningless even when they were published. Undoubtedly, it is.

Any information is important. At least in terms of historical analysis. Not to mention the fact that in some author's materials it has a key role. Yes, at the moment Habr is not even 15 years old and some of the sources are still available, but over time they will become less and less, and therefore it is worth considering in advance whether something will remain for later, or whether there will be an eternal "image not available".

Well, do not forget that the stubs of inaccessible pictures are just annoying. Of course, few people will read "some junk", but there are such people. Therefore, since these publications are still on HabrΓ©, then their content should be as complete as possible.

Unfortunately, while Habrastorage does not directly support downloading for all image formats, but maybe someday this will be fixed.

The last problem that I want to mention, and which you probably thought about, "what if the author has not been using Habr for a long time and he is not interested in fixing junk?" I have this question in my head arose more than once, but the solution here is not so difficult. Old posts can always be corrected UFO represented by moderators (you can, after all, Exosphere?) or administration (Boomburum can give someone a task).

What do you think, is it worth trying to restore at least something?

That's all for today. Thank you for your attention and may all your images be uploaded to Habrastorage without any problems! Let there be no such

Habra detective: your picture is lost

PS If you find any typos or errors in the text, please let me know. This can be done by selecting a piece of text and pressing "Ctrl / ⌘ + Enter" if you have Ctrl / ⌘, either via private messages. If both options are not available, write about the errors in the comments. Thank you!

P.P.S Perhaps you will also be interested in my other studies of Habr, or you want to propose your topic for the next publication, or maybe even a new cycle of publications.

Where to find the list and how to make an offer

All information can be found in a special repository habr-detective. There you can also find out which proposals have already been voiced, and what is already in the works.

Also, you can mention me (by writing VaskivskyiYe) in the comments to a publication that you think is interesting for research or analysis.

Source: habr.com

Add a comment