🥇Habrastatistics: how Habr lives without geektimes

Hey Habr.

This article is a logical continuation of the rating The best articles of Habr for 2018. And although the year is not over yet, but as you know, there were changes in the rules in the summer, so it became interesting to see if it affected anything.

In addition to the actual statistics, there will also be an updated rating of articles, as well as some sources for those who are interested in how it works.

For those who are interested in what happened, continued under the cut. Those who are interested in a more detailed analysis of the sections of the site can also look at next part.

Initial data

This rating is unofficial, and I have no insider data. As it is easy to see by looking at the address bar of the browser, all articles on Habré have continuous numbering. Then it's a matter of technique, just in a cycle we read all the articles in a row (in one thread and with pauses so as not to load the server). The values themselves were obtained by a simple Python parser (there are sources here) and saved in a csv file like this:

2019-08-11T22:36Z,https://habr.com/ru/post/463197/,"Blazor + MVVM = Silverlight наносит ответный удар, потому что древнее зло непобедимо",votes:11,votesplus:17,votesmin:6,bookmarks:40,views:5300,comments:73 2019-08-11T05:26Z,https://habr.com/ru/news/t/463199/,"В NASA испытали систему автономного управления одного микроспутника другим",votes:15,votesplus:15,votesmin:0,bookmarks:2,views:1700,comments:7

Performing the shaping

For parsing, we will use Python, Pandas and Matplotlib. Those who are not interested in statistics can skip this part and go straight to the articles.

First you need to load the dataset into memory and select the data for the desired year.

import pandas as pd
import datetime
import matplotlib.dates as mdates
from matplotlib.ticker import FormatStrFormatter
from pandas.plotting import register_matplotlib_converters


df = pd.read_csv("habr.csv", sep=',', encoding='utf-8', error_bad_lines=True, quotechar='"', comment='#')
dates = pd.to_datetime(df['datetime'], format='%Y-%m-%dT%H:%MZ')
df['datetime'] = dates
year = 2019
df = df[(df['datetime'] >= pd.Timestamp(datetime.date(year, 1, 1))) & (df['datetime'] < pd.Timestamp(datetime.date(year+1, 1, 1)))]

print(df.shape)

It turns out that this year (although it is not over yet) at the time of writing, 12715 articles have been published. For comparison, for the whole of 2018 - 15904. In general, a lot - this is about 43 articles per day (and this is only with a positive rating, how many articles are loaded in total that have gone negative or have been deleted, one can only guess or roughly estimate by omissions among identifiers).

Select the required fields from the dataset. As metrics, we will use the number of views, comments, rating values and the number of bookmarks.

def to_float(s):
    # "bookmarks:22" => 22.0
    num = ''.join(i for i in s if i.isdigit())
    return float(num)

def to_int(s):
    # "bookmarks:22" => 22
    num = ''.join(i for i in s if i.isdigit())
    return int(num)

def to_date(dt):
    return dt.date() 

date = dates.map(to_date, na_action=None)
views = df["views"].map(to_int, na_action=None)
bookmarks = df["bookmarks"].map(to_int, na_action=None)
votes = df["votes"].map(to_float, na_action=None)
votes_up = df["up"].map(to_float, na_action=None)
votes_down = df["down"].map(to_float, na_action=None)
comments = df["comments"].map(to_int, na_action=None)

df['date'] = date
df['views'] = views
df['votes'] = votes
df['bookmarks'] = bookmarks
df['up'] = votes_up
df['down'] = votes_down

Now the data has been added to the dataset and we can use it. Let's group the data by days and take the average values.

g = df.groupby(['date'])
days_count = g.size().reset_index(name='counts')
year_days = days_count['date'].values
grouped = g.median().reset_index()
grouped['counts'] = days_count['counts']
counts_per_day = grouped['counts'].values
counts_per_day_avg = grouped['counts'].rolling(window=20).mean()
view_per_day = grouped['views'].values
view_per_day_avg = grouped['views'].rolling(window=20).mean()
votes_per_day = grouped['votes'].values
votes_per_day_avg = grouped['votes'].rolling(window=20).mean()
bookmarks_per_day = grouped['bookmarks'].values
bookmarks_per_day_avg = grouped['bookmarks'].rolling(window=20).mean()

Now the fun part, we can look at the graphs.

Let's see the number of publications on Habré in 2019.

import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (16, 8)
fig, ax = plt.subplots()

plt.bar(year_days, counts_per_day, label='Articles/day')
plt.plot(year_days, counts_per_day_avg, 'g-', label='Articles avg/day')
plt.xticks(rotation=45)
ax.xaxis.set_major_formatter(mdates.DateFormatter("%d-%m-%Y"))  
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
plt.legend(loc='best')
plt.tight_layout()
plt.show()

The result is interesting. As you can see, Habr was slightly "sausage" during the year. I don't know the reason.

For comparison, 2018 looks a little "smoother":

In general, I did not see any drastic decrease in the number of published articles in 2019 on the chart. Moreover, on the contrary, it seems to have even grown a little since the summer.

But the next two graphs depress me a little more.

Average views per article:

Average rating per article:

As you can see, the average number of views during the year is slightly reduced. This can be explained by the fact that new articles have not yet been indexed by search engines, and they are not found so often. But the decrease in the average rating per article is more incomprehensible. The feeling is that readers either simply do not have time to view so many articles or do not pay attention to the ratings. From the point of view of the author encouragement program, this trend is very unpleasant.

By the way, this was not the case in 2018, and the schedule is more or less even.

In general, resource owners have something to think about.

But let's not talk about sad things. In general, we can say that Habr “survived” the summer changes quite successfully, and the number of articles on the site has not decreased.

Rating

Now actually, the rating. Congratulations to those who got into it. Let me remind you once again that the rating is unofficial, maybe I missed something, and if some article should definitely be here, but it is not, write, I will add it manually. As a rating, I use calculated metrics, which I think turned out to be quite interesting.

Top articles by number of views

LED lies of unprecedented proportions 241000 views, 569 comments, rating +364.0/-1.0
'Blow Job Article': Scientists processed 109 hours of oral sex to develop an AI that sucks cock 236000 views, 361 comments, rating +240.0/-68.0
What the designer smoked: unusual firearms 235000 views, 123 comments, rating +119.0/-9.0
How I didn't work at Sberbank for a year 233000 views, 580 comments, rating +449.0/-14.0
Scientists have found the oldest living vertebrate on Earth 221000 views, 211 comments, rating +82.0/-14.0
Discarded smart bulbs are a valuable source of personal information 219000 views, 147 comments, rating +73.0/-11.0
Development King 178000 views, 668 comments, rating +315.0/-60.0
Fraudsters and digital signatures - everything is very bad 175000 views, 778 comments, rating +356.0/-0.0
The series 'Chernobyl': watch and think 172000 views, 803 comments, rating +164.0/-25.0
Worst sound volume control UI ever 166000 views, 176 comments, rating +292.0/-30.0
Honest programmer resume 165000 views, 283 comments, rating +410.0/-40.0
I ruin developers' lives with my code reviews and I'm sorry 164000 views, 12 comments, rating +33.0/-3.0
How Megafon fell asleep on mobile subscriptions 162000 views, 676 comments, rating +624.0/-2.0
Riot on Pikachu. Users go to Reddit en masse 160000 views, 484 comments, rating +215.0/-41.0
Cheap and expensive AAA batteries 159000 views, 382 comments, rating +363.0/-6.0
Retire at 22 156000 views, 922 comments, rating +259.0/-100.0
Man without smartphone 152000 views, 736 comments, rating +173.0/-25.0
Do you want eternal LEDs? Uncover soldering irons and files. Or do-it-yourself home lighting 149000 views, 262 comments, rating +94.0/-6.0
What not to do if your phone is stolen 144000 views, 638 comments, rating +259.0/-27.0
On February 1, 2019 your site may stop working 143000 views, 162 comments, rating +89.0/-8.0

Top articles by rating-to-views ratio

Loosening the Screws, Part 2: Posting Vote Deadline and Other Changes 14000 views, rating +238.0/-3.0
Pretty fanciful 'Beginnings' of Euclid in TeX 10800 views, rating +136.0/-0.0
User reward to the authors of Habr 26400 views, rating +320.0/-0.0
Reporting typos in publications 18900 views, rating +179.0/-2.0
hello world! Or Habr in English, v1.0 21000 views, rating +178.0/-2.0
Life on particles 34000 views, rating +267.0/-2.0
Civilization Springs, 5/5 25800 views, rating +201.0/-1.0
Playing Tetris on an electromechanical screen 16300 views, rating +124.0/-0.0
Recreating fonts from a CRT screen 13400 views, rating +101.0/-0.0
Mathematical model of the Dobble game 14600 views, rating +110.0/-0.0
Important message about invites in the profile 18300 views, rating +137.0/-8.0
We loosen the nuts in the rules of Habr 48300 views, rating +338.0/-13.0
Street magic codec comparison. Revealing secrets 21700 views, rating +144.0/-0.0
Smart number parser written in words 20500 views, rating +136.0/-1.0
Generic and metaprogramming models: Go, Rust, Swift, D and more 17000 views, rating +110.0/-2.0
I create a global knowledge base on batteries 22200 views, rating +139.0/-0.0
How I wrote and published a book about Moscow State University, or 12 critical mistakes 21600 views, rating +134.0/-0.0
About a kote, a wife, two sons, an idea... and more. Story to be continued 43000 views, rating +269.0/-8.0
Computed video at 755 megapixels: plenoptics yesterday, today and tomorrow 41500 views, rating +244.0/-0.0
Plot Density in Retail 27500 views, rating +160.0/-1.0

Top articles by comments to views ratio

GitHub started blocking repositories of users from Crimea, Cuba, Iran, North Korea and Syria 44500 views, 1309 comments, rating +115.0/-6.0
Ukrainian lessons 60400 views, 1672 comments, rating +285.0/-41.0
We loosen the nuts in the rules of Habr 48300 views, 1285 comments, rating +338.0/-13.0
Rally against the isolation of Runet 50900 views, 923 comments, rating +204.0/-32.0
How to ride on two wheels to work 47100 views, 781 comments, rating +113.0/-10.0
Plane crash at Sheremetyevo: historical analogies 82400 views, 1211 comments, rating +147.0/-11.0
Engineers rescue people who disappeared in the forest, but the forest does not give up yet 28900 views, 423 comments, rating +132.0/-1.0
Rally against the isolation of Runet 63300 views, 820 comments, rating +182.0/-20.0
How children are protected from information - and an enchanting story about where it first came from (18+) 65400 views, 811 comments, rating +175.0/-2.0
hello world! Or Habr in English, v1.0 21000 views, 249 comments, rating +178.0/-2.0
How to buy potatoes correctly if you are color blind 51800 views, 607 comments, rating +135.0/-3.0
What is it like to be a free software maintainer 22900 views, 259 comments, rating +129.0/-3.0
Loosening the Screws, Part 2: Posting Vote Deadline and Other Changes 14000 views, 158 comments, rating +238.0/-3.0
Pilot production of electronics at a minimum price 34200 views, 382 comments, rating +165.0/-3.0
How do we equip a megaphone 39800 views, 405 comments, rating +140.0/-6.0
Nuclear wars of the distant past? 83400 views, 843 comments, rating +133.0/-5.0
hello world! Or English Habr, v1.0 60300 views, 591 comments, rating +268.0/-7.0
Space as a vague memory 43200 views, 402 comments, rating +190.0/-7.0
User reward to the authors of Habr 26400 views, 245 comments, rating +320.0/-0.0
The principles of the free market in the understanding of the United States 56300 views, 502 comments, rating +160.0/-44.0

Top most controversial articles

State and T-killers 752 comments, rating +83.0/-80.0, 15100 views
These Toxic Guys: They Poison Projects 120 comments, rating +67.0/-51.0, 50300 views
Why should you learn Go 70 comments, rating +76.0/-57.0, 23100 views
I have read 80 resumes, I have questions 635 comments, rating +135.0/-94.0, 90700 views
Why It's Really Impossible to Be a Vegetarian 940 comments, rating +76.0/-52.0, 51600 views
Functional programming: a stupid toy that kills productivity. Part 1 394 comments, rating +100.0/-68.0, 54000 views
We wrote the most useful code in our lives, but it was thrown into the trash. Together with us 259 comments, rating +101.0/-63.0, 62900 views
Petition to Apple 96 comments, rating +90.0/-52.0, 39300 views
Why Windows does not rule in 2019, or CHADNT? 881 comments, rating +123.0/-70.0, 75000 views
I'm not real 246 comments, rating +105.0/-59.0, 63900 views
Five frightening trends in modern development 262 comments, rating +95.0/-52.0, 77400 views
The sooner you forget OOP, the better for you and your programs 1271 comments, rating +131.0/-63.0, 128000 views
A year driving an electric car 1098 comments, rating +131.0/-58.0, 71800 views
I'll stop throwing good in the trash 179 comments, rating +147.0/-62.0, 34400 views
Catch me if you can 215 comments, rating +141.0/-58.0, 65400 views
Retire at 22 922 comments, rating +259.0/-100.0, 156000 views
Psychiatrist's response to the article 'Sick and well' 272 comments, rating +154.0/-55.0, 43400 views
New programming languages are quietly killing our connection to reality 764 comments, rating +164.0/-52.0, 106000 views
Alcoholism in the last stage 597 comments, rating +208.0/-60.0, 123000 views
'Blow Job Article': Scientists processed 109 hours of oral sex to develop an AI that sucks cock 361 comments, rating +240.0/-68.0, 236000 views

Top articles by rating

How Megafon fell asleep on mobile subscriptions, 676 comments, rating +624.0/-2.0, 162000 views
'Mobile content' is free, without SMS and registrations. Details of fraud from Megafon, 474 comments, rating +488.0/-8.0, 112000 views
Innovation in Russian, 612 comments, rating +480.0/-33.0, 127000 views
How I didn't work at Sberbank for a year, 580 comments, rating +449.0/-14.0, 233000 views
How Protonmail is blocked in Russia, 398 comments, rating +418.0/-7.0, 102000 views
10 years in IT with a diagnosis of schizophrenia, tips for survival, 281 comments, rating +403.0/-8.0, 122000 views
Honest programmer resume, 283 comments, rating +410.0/-40.0, 165000 views
When 'a' is not equal to 'a'. On the trail of a hack, 64 comments, rating +374.0/-5.0, 74600 views
Zoom it! Modern Resolution Upscaling, 214 comments, rating +366.0/-1.0, 104000 views
LED lies of unprecedented proportions, 569 comments, rating +364.0/-1.0, 241000 views
Cheap and expensive AAA batteries, 382 comments, rating +363.0/-6.0, 159000 views
Fraudsters and digital signatures - everything is very bad, 778 comments, rating +356.0/-0.0, 175000 views
Japan: a country so common sense that it is sometimes irrational for us, 483 comments, rating +365.0/-12.0, 138000 views
We loosen the nuts in the rules of Habr, 1285 comments, rating +338.0/-13.0, 48300 views
User reward to the authors of Habr, 245 comments, rating +320.0/-0.0, 26400 views
How do I catch a hacker?, 273 comments, rating +305.0/-6.0, 110000 views
Myths of modern popular physics, 556 comments, rating +304.0/-6.0, 99600 views
Now good developers are measured by views and subscribers - and this is bad, 486 comments, rating +324.0/-26.0, 74800 views
Survive a head-on collision and why amnesia isn't what you think, 165 comments, rating +297.0/-4.0, 61800 views
Port scanner in the personal account of Rostelecom, 194 comments, rating +300.0/-8.0, 111000 views

Top articles by number of bookmarks

42 Google Advanced Search Operators (complete list) 47100 views, 917 bookmarks
How to become a Java developer in 1,5 years 89500 views, 894 bookmarks
sampler. Console utility for visualizing the result of any shell commands 58400 views, 801 bookmarks
HBO, thanks for reminding me… 'Chernobyl first aid kit' by a Belarusian pharmacist 89500 views, 797 bookmarks
Practical Tips, Examples, and SSH Tunnels 40000 views, 787 bookmarks
256 lines of bare C++: writing a ray tracer from scratch in a few hours 60000 views, 745 bookmarks
Asynchronous programming (full course) 36700 views, 690 bookmarks
'Burned' employees: is there a way out? 116000 views, 688 bookmarks
Extensive overview of Python interviews. Tips & Tricks 28400 views, 687 bookmarks
15 Machine Learning Books for Beginners 18700 views, 670 bookmarks
Course of lectures on JavaScript and Node.js in KPI 52500 views, 656 bookmarks
How do I write math notes in LaTeX in Vim 58100 views, 652 bookmarks
What I learned the hard way (30 years in software development) 100000 views, 651 bookmarks
A selection of useful slides from Julia Evans 41000 views, 587 bookmarks
HTTP headers for responsible developer 33600 views, 566 bookmarks
N+7 useful books 42700 views, 563 bookmarks
Hacking the CAN bus of a car. Virtual Dashboard 60700 views, 562 bookmarks
Cautious move to the Netherlands with wife and mortgage. Part 1: job search 76200 views, 555 bookmarks
TCP vs. UDP or the future of network protocols 50300 views, 538 bookmarks
Best Linux distributions for older computers 66000 views, 523 bookmarks

Top by bookmarks to views ratio

15 Machine Learning Books for Beginners 670 bookmarks, 18700 views
Music for your projects: 12 thematic resources with tracks licensed under Creative Commons 477 bookmarks, 18100 views
Extensive overview of Python interviews. Tips & Tricks 687 bookmarks, 28400 views
A selection of datasets for machine learning 455 bookmarks, 19000 views
Dungeon generator based on graph nodes 304 bookmarks, 12700 views
A simple explanation of pathfinding algorithms and A* 316 bookmarks, 13500 views
Web tools, or where should a pentester start? 421 bookmarks, 18800 views
Learning Docker Part 2: Terms and Concepts 341 bookmarks, 15600 views
Learning Docker Part 3: Dockerfiles 297 bookmarks, 13800 views
Tools for analyzing and debugging .NET applications 244 bookmarks, 11600 views
How to debug environment variables in Linux 322 bookmarks, 15900 views
How to take the first steps in robotics? 224 bookmarks, 11200 views
Labyrinths: classification, generation, search for solutions 318 bookmarks, 16000 views
Practical Tips, Examples, and SSH Tunnels 787 bookmarks, 40000 views
Course of lectures 'Fundamentals of digital signal processing' 418 bookmarks, 21400 views
42 Google Advanced Search Operators (complete list) 917 bookmarks, 47100 views
3D game shaders for beginners 239 bookmarks, 12400 views
Spot bypass of PKH locks on an OpenWrt router using WireGuard and DNSCrypt 302 bookmarks, 15700 views
Working on the skill of using grouping and data visualization in Python 192 bookmarks, 10000 views
Another Github 2: machine learning, datasets and Jupyter Notebooks 265 bookmarks, 13900 views

Top articles by number of comments

Ukrainian lessons 1672 comments, 60400 views
Rocket 9M729. A few words about the "violator" of the INF Treaty 1371 comments, 83000 views
GitHub started blocking repositories of users from Crimea, Cuba, Iran, North Korea and Syria 1309 comments, 44500 views
We loosen the nuts in the rules of Habr 1285 comments, 48300 views
The sooner you forget OOP, the better for you and your programs 1271 comments, 128000 views
Plane crash at Sheremetyevo: historical analogies 1211 comments, 82400 views
How did Generation Y become the Burnt Generation? 1122 comments, 81500 views
Electric car is not for me 1116 comments, 50700 views
A year driving an electric car 1098 comments, 71800 views
The current state of the science of consciousness 1021 comments, 27500 views
Finland summed up the preliminary results of the experiment with a guaranteed basic income 999 comments, 62100 views
Fair Economy Talk 997 comments, 7700 views
Why It's Really Impossible to Be a Vegetarian 940 comments, 51600 views
Darling we're killing the internet 933 comments, 120000 views
Rally against the isolation of Runet 923 comments, 50900 views
Retire at 22 922 comments, 156000 views
Choosing a car for an IT specialist, or tips for teapots from a teapot 914 comments, 43400 views
Why Senior Developers Can't Get Jobs 901 comments, 119000 views
The plan is back in the economy 892 comments, 27800 views
Personal urban teleporter 889 comments, 40800 views

And finally, the last Antitop by the number of dislikes

Retire at 22, 922 comments, rating +259.0/-100.0
I have read 80 resumes, I have questions, 635 comments, rating +135.0/-94.0
Darling we're killing the internet, 933 comments, rating +392.0/-83.0
State and T-killers, 752 comments, rating +83.0/-80.0
Why Windows does not rule in 2019, or CHADNT?, 881 comments, rating +123.0/-70.0
Functional programming: a stupid toy that kills productivity. Part 1, 394 comments, rating +100.0/-68.0
'Blow Job Article': Scientists processed 109 hours of oral sex to develop an AI that sucks cock, 361 comments, rating +240.0/-68.0
We wrote the most useful code in our lives, but it was thrown into the trash. Together with us, 259 comments, rating +101.0/-63.0
The sooner you forget OOP, the better for you and your programs, 1271 comments, rating +131.0/-63.0
I'll stop throwing good in the trash, 179 comments, rating +147.0/-62.0
Development King, 668 comments, rating +315.0/-60.0
Alcoholism in the last stage, 597 comments, rating +208.0/-60.0
I'm not real, 246 comments, rating +105.0/-59.0
Catch me if you can, 215 comments, rating +141.0/-58.0
A year driving an electric car, 1098 comments, rating +131.0/-58.0
Why should you learn Go, 70 comments, rating +76.0/-57.0
Psychiatrist's response to the article 'Sick and well', 272 comments, rating +154.0/-55.0
Petition to Apple, 96 comments, rating +90.0/-52.0
New programming languages are quietly killing our connection to reality, 764 comments, rating +164.0/-52.0
Five frightening trends in modern development, 262 comments, rating +95.0/-52.0

Uff. I have a few more interesting selections, but I won't bore readers.

Conclusion

When constructing the rating, I paid attention to two points that seemed interesting.

Firstly, after all, 60% of the top are articles of the “geektimes” genre. Whether there will be fewer of them next year, and how Habr will look like without articles about beer, space, medicine, and so on - I don’t know. Definitely, readers will miss something. Let's see.

Secondly, the bookmark top turned out to be unexpectedly high quality. This is psychologically understandable, readers may not pay attention to the rating, and if the article is needed, then it will be added to the bookmarks. And here is just the largest concentration of useful and serious articles. I think the site owners should somehow think about the connection between the number of bookmarks and the reward program if they want to increase this particular category of articles here on Habré.

Something like this. Hope it was informative.

The list of articles turned out to be long, well, it's probably for the best. Happy reading everyone.

Source: habr.com

Habrastatistics: how Habr lives without geektimes

Initial data

Performing the shaping

Rating

Conclusion

Add a comment Отменить ответ