Habrastatistics: Yuav ua li cas Habr nyob tsis muaj geektimes

Hlo Habr.

Kab lus no yog ib qho laj thawj txuas ntxiv ntawm qhov ntsuas Qhov zoo tshaj plaws Habr cov khoom rau 2018. Thiab txawm hais tias lub xyoo tseem tsis tau tiav, raws li koj paub, nyob rau hauv lub caij ntuj sov muaj kev hloov nyob rau hauv cov kev cai, raws li, nws tau los ua nthuav kom pom tias qhov no cuam tshuam dab tsi.

Habrastatistics: Yuav ua li cas Habr nyob tsis muaj geektimes

Ntxiv nrog rau qhov tseeb txheeb cais, qhov kev ntsuam xyuas tshiab ntawm cov khoom yuav raug muab, nrog rau qee qhov chaws rau cov neeg nyiam ua haujlwm li cas.

Rau cov neeg uas txaus siab rau qhov tshwm sim, qhov txuas ntxiv yog nyob rau hauv kev txiav. Cov neeg uas xav paub ntau ntxiv txog kev txheeb xyuas cov ntu ntawm lub xaib kuj tuaj yeem saib lwm ntu.

Cov ntaub ntawv los ntawm cov ntaub ntawv

Qhov kev ntsuam xyuas no tsis raug cai, thiab kuv tsis muaj cov ntaub ntawv sab hauv. Raws li koj tuaj yeem pom tau yooj yim los ntawm kev saib ntawm qhov chaw nyob ntawm koj tus browser, tag nrho cov ntawv ntawm HabrΓ© muaj tus lej txuas ntxiv. Tom qab ntawd nws yog ib qho teeb meem ntawm cov txheej txheem, peb tsuas yog nyeem tag nrho cov ntawv hauv kab hauv ib lub voj voog (hauv ib lub xov thiab nrog ncua, thiaj li tsis mus thauj cov neeg rau zaub mov). Cov txiaj ntsig lawv tus kheej tau txais los ntawm ib qho yooj yim parser hauv Python (qhov chaw muaj no) thiab khaws cia rau hauv cov ntaub ntawv csv ib yam li no:

2019-08-11T22:36Z,https://habr.com/ru/post/463197/,"Blazor + MVVM = Silverlight наносит ΠΎΡ‚Π²Π΅Ρ‚Π½Ρ‹ΠΉ ΡƒΠ΄Π°Ρ€, ΠΏΠΎΡ‚ΠΎΠΌΡƒ Ρ‡Ρ‚ΠΎ Π΄Ρ€Π΅Π²Π½Π΅Π΅ Π·Π»ΠΎ Π½Π΅ΠΏΠΎΠ±Π΅Π΄ΠΈΠΌΠΎ",votes:11,votesplus:17,votesmin:6,bookmarks:40,views:5300,comments:73
2019-08-11T05:26Z,https://habr.com/ru/news/t/463199/,"Π’ NASA испытали систСму Π°Π²Ρ‚ΠΎΠ½ΠΎΠΌΠ½ΠΎΠ³ΠΎ управлСния ΠΎΠ΄Π½ΠΎΠ³ΠΎ микроспутника Π΄Ρ€ΡƒΠ³ΠΈΠΌ",votes:15,votesplus:15,votesmin:0,bookmarks:2,views:1700,comments:7

Ua

Rau kev txheeb xyuas peb yuav siv Python, Pandas thiab Matplotlib. Cov neeg uas tsis txaus siab rau kev txheeb cais tuaj yeem hla qhov no thiab mus ncaj qha rau cov ntawv.

Ua ntej koj yuav tsum thauj cov dataset rau hauv lub cim xeeb thiab xaiv cov ntaub ntawv rau xyoo xav tau.

import pandas as pd
import datetime
import matplotlib.dates as mdates
from matplotlib.ticker import FormatStrFormatter
from pandas.plotting import register_matplotlib_converters


df = pd.read_csv("habr.csv", sep=',', encoding='utf-8', error_bad_lines=True, quotechar='"', comment='#')
dates = pd.to_datetime(df['datetime'], format='%Y-%m-%dT%H:%MZ')
df['datetime'] = dates
year = 2019
df = df[(df['datetime'] >= pd.Timestamp(datetime.date(year, 1, 1))) & (df['datetime'] < pd.Timestamp(datetime.date(year+1, 1, 1)))]

print(df.shape)

Nws hloov tawm tias xyoo no (txawm tias nws tseem tsis tau tiav) thaum lub sijhawm sau ntawv, 12715 tsab xov xwm tau luam tawm. Rau kev sib piv, rau tag nrho 2018 - 15904. Feem ntau, ntau - qhov no yog hais txog 43 tsab xov xwm ib hnub twg (thiab qhov no tsuas yog nrog qhov kev ntsuam xyuas zoo; pes tsawg cov ntawv tau rub tawm uas mus tsis zoo lossis raug tshem tawm, ib tus tuaj yeem kwv yees xwb. los yog kwv yees kwv yees los ntawm qhov khoob ntawm cov cim).

Cia peb xaiv qhov tsim nyog teb los ntawm cov ntaub ntawv. Raws li kev ntsuas peb yuav siv cov naj npawb ntawm cov kev pom, cov lus pom, qhov ntsuas qhov tseem ceeb thiab tus lej ntawm cov ntawv cim.

def to_float(s):
    # "bookmarks:22" => 22.0
    num = ''.join(i for i in s if i.isdigit())
    return float(num)

def to_int(s):
    # "bookmarks:22" => 22
    num = ''.join(i for i in s if i.isdigit())
    return int(num)

def to_date(dt):
    return dt.date() 

date = dates.map(to_date, na_action=None)
views = df["views"].map(to_int, na_action=None)
bookmarks = df["bookmarks"].map(to_int, na_action=None)
votes = df["votes"].map(to_float, na_action=None)
votes_up = df["up"].map(to_float, na_action=None)
votes_down = df["down"].map(to_float, na_action=None)
comments = df["comments"].map(to_int, na_action=None)

df['date'] = date
df['views'] = views
df['votes'] = votes
df['bookmarks'] = bookmarks
df['up'] = votes_up
df['down'] = votes_down

Tam sim no cov ntaub ntawv tau ntxiv rau hauv dataset thiab peb tuaj yeem siv nws. Cia peb muab cov ntaub ntawv los ntawm ib hnub thiab coj tus nqi nruab nrab.

g = df.groupby(['date'])
days_count = g.size().reset_index(name='counts')
year_days = days_count['date'].values
grouped = g.median().reset_index()
grouped['counts'] = days_count['counts']
counts_per_day = grouped['counts'].values
counts_per_day_avg = grouped['counts'].rolling(window=20).mean()
view_per_day = grouped['views'].values
view_per_day_avg = grouped['views'].rolling(window=20).mean()
votes_per_day = grouped['votes'].values
votes_per_day_avg = grouped['votes'].rolling(window=20).mean()
bookmarks_per_day = grouped['bookmarks'].values
bookmarks_per_day_avg = grouped['bookmarks'].rolling(window=20).mean()

Tam sim no qhov nthuav yog qhov peb tuaj yeem saib cov duab.

Cia peb saib tus naj npawb ntawm cov ntawv tshaj tawm ntawm Habre hauv 2019.

import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (16, 8)
fig, ax = plt.subplots()

plt.bar(year_days, counts_per_day, label='Articles/day')
plt.plot(year_days, counts_per_day_avg, 'g-', label='Articles avg/day')
plt.xticks(rotation=45)
ax.xaxis.set_major_formatter(mdates.DateFormatter("%d-%m-%Y"))  
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
plt.legend(loc='best')
plt.tight_layout()
plt.show()

Qhov tshwm sim yog nthuav. Raws li koj tuaj yeem pom, Habr tau me ntsis "khoom qab zib" thoob plaws hauv lub xyoo. Kuv tsis paub yog vim li cas.

Habrastatistics: Yuav ua li cas Habr nyob tsis muaj geektimes

Rau kev sib piv, 2018 zoo li me ntsis smoother:

Habrastatistics: Yuav ua li cas Habr nyob tsis muaj geektimes

Feem ntau, kuv tsis pom muaj qhov txo qis ntawm cov ntawv luam tawm xyoo 2019 ntawm daim duab. Ntxiv mus, ntawm qhov tsis sib xws, nws zoo li tau nce me ntsis txij li lub caij ntuj sov.

Tab sis ob daim duab tom ntej no ua rau kuv nyuaj siab me ntsis ntxiv.

Qhov nruab nrab tus naj npawb ntawm kev pom ib kab lus:

Habrastatistics: Yuav ua li cas Habr nyob tsis muaj geektimes

Qhov ntsuas nruab nrab ntawm ib kab lus:

Habrastatistics: Yuav ua li cas Habr nyob tsis muaj geektimes

Raws li koj tuaj yeem pom, qhov nruab nrab ntawm cov kev pom tau txo qis me ntsis thoob plaws hauv lub xyoo. Qhov no tuaj yeem piav qhia los ntawm qhov tseeb tias cov ntawv tshiab tseem tsis tau raug ntsuas los ntawm kev tshawb fawb xyaw, thiab lawv tsis pom ntau zaus. Tab sis qhov poob qis hauv qhov ntsuas nruab nrab ntawm ib kab lus yog qhov tsis nkag siab ntau dua. Qhov kev xav yog tias cov neeg nyeem tsuas yog tsis muaj sijhawm los saib los ntawm ntau cov ntawv lossis tsis xyuam xim rau qhov kev ntaus nqi. Los ntawm qhov pom ntawm tus sau nqi zog qhov kev pab cuam, qhov sib txawv no tsis kaj siab heev.

Los ntawm txoj kev, qhov no tsis tshwm sim nyob rau hauv 2018, thiab lub sij hawm yog ntau los yog tsawg txawm.

Habrastatistics: Yuav ua li cas Habr nyob tsis muaj geektimes

Feem ntau, cov tswv peev txheej muaj qee yam xav txog.

Tab sis cia peb tsis txhob tham txog tej yam tu siab. Feem ntau, peb tuaj yeem hais tias Habr "muaj sia nyob" lub caij ntuj sov hloov pauv tau zoo, thiab cov ntawv xov xwm ntawm lub xaib tsis txo qis.

Ntsuam Xyuas

Tam sim no, qhov tseeb, qhov ntsuas. Ua kev zoo siab rau cov neeg uas tau nkag mus rau hauv. Cia kuv ceeb toom rau koj ib zaug ntxiv tias qhov ntsuas tsis raug cai, tej zaum kuv tsis nco ib yam dab tsi, thiab yog tias qee tsab xov xwm yuav tsum nyob ntawm no, tab sis nws tsis yog, sau, kuv yuav ntxiv nws manually. Raws li kev ntsuas, kuv siv cov ntsuas ntsuas, uas kuv xav tias yog qhov nthuav heev.

Sab saum toj kab lus los ntawm tus naj npawb ntawm views

Sab saum toj cov khoom los ntawm kev ntaus nqi rau kev pom piv

Sab saum toj cov khoom los ntawm cov lus pom rau kev pom piv

Sab saum toj feem ntau cov lus tsis sib haum xeeb

Cov lus saum toj kawg nkaus los ntawm kev ntaus nqi

Sab saum toj cov khoom los ntawm tus naj npawb ntawm bookmarks

Sab saum toj los ntawm qhov piv ntawm bookmarks rau views

Sab saum toj cov lus los ntawm cov lus pom

Thiab thaum kawg, tus kawg Antitop los ntawm cov neeg tsis nyiam

Ugh. Kuv muaj ob peb qhov kev xaiv nthuav ntxiv, tab sis kuv yuav tsis dhuav cov neeg nyeem.

xaus

Thaum tsim qhov kev ntsuam xyuas, kuv tau mloog ob lub ntsiab lus uas zoo li nthuav.

Ua ntej, 60% ntawm sab saum toj yog cov khoom ntawm "geektimes" hom. Txawm hais tias yuav muaj tsawg dua ntawm lawv xyoo tom ntej, thiab Habr yuav zoo li cas yam tsis muaj cov khoom hais txog npias, chaw, tshuaj, thiab lwm yam, kuv tsis paub. Muaj tseeb tiag, cov nyeem yuav poob ib yam dab tsi. Cia peb saib.

Thib ob, cov ntawv sau saum toj kawg nkaus tau dhau los ua cov khoom lag luam zoo siab. Qhov no yog kev nkag siab ntawm lub hlwb; cov neeg nyeem yuav tsis them nyiaj rau qhov ntsuas, tab sis yog tias tsab xov xwm xav tau, ces nws yuav muab ntxiv rau koj bookmarks. Thiab ntawm no yog precisely qhov loj tshaj concentration ntawm cov khoom tseem ceeb thiab loj. Kuv xav tias cov tswv ntawm lub vev xaib yuav tsum xav li cas los ntawm kev sib txuas ntawm tus lej ntawm bookmarks thiab cov txiaj ntsig kev pabcuam yog tias lawv xav nce qib tshwj xeeb ntawm cov ntawv no ntawm HabrΓ©.

Tej yam zoo li no. Kuv vam tias nws yog cov ntaub ntawv.

Daim ntawv teev cov kab lus tau hloov mus ntev, zoo, nws yog qhov zoo dua. Zoo siab nyeem sawv daws.

Tau qhov twg los: www.hab.com

Ntxiv ib saib