Habrastatistics: yadda Habr ke rayuwa ba tare da geektimes ba

Hai Habr.

Wannan labarin ci gaba ne na ma'ana na ƙimar Mafi kyawun labaran Habr don 2018. Kuma ko da yake shekara ba ta ƙare ba tukuna, kamar yadda kuka sani, a lokacin rani akwai canje-canje a cikin dokoki, saboda haka, ya zama mai ban sha'awa don ganin ko wannan ya shafi wani abu.

Habrastatistics: yadda Habr ke rayuwa ba tare da geektimes ba

Bugu da ƙari ga ainihin ƙididdiga, za a samar da ingantaccen ƙididdiga na labarai, da kuma wasu lambar tushe ga waɗanda ke sha'awar yadda yake aiki.

Ga masu sha'awar abin da ya faru, ci gaba yana ƙarƙashin yanke. Wadanda ke da sha'awar ƙarin cikakkun bayanai na sassan rukunin yanar gizon kuma suna iya duba kashi na gaba.

Asalin bayanai

Wannan rating ɗin ba na hukuma bane, kuma ba ni da wani bayani na ciki. Kamar yadda zaku iya gani cikin sauƙi ta hanyar duba adireshin adireshin burauzar ku, duk labaran da ke kan Habré suna da ci gaba da ƙididdigewa. Sa'an nan kuma batun fasaha ne, kawai muna karanta duk labaran da ke jere a cikin sake zagayowar (a cikin zare ɗaya da kuma dakatarwa, don kada a loda uwar garken). Ƙimar da kansu an samo su ta hanyar bincike mai sauƙi a cikin Python (ana samun tushe a nan) kuma an adana a cikin fayil ɗin csv wani abu kamar haka:

2019-08-11T22:36Z,https://habr.com/ru/post/463197/,"Blazor + MVVM = Silverlight наносит ответный удар, потому что древнее зло непобедимо",votes:11,votesplus:17,votesmin:6,bookmarks:40,views:5300,comments:73
2019-08-11T05:26Z,https://habr.com/ru/news/t/463199/,"В NASA испытали систему автономного управления одного микроспутника другим",votes:15,votesplus:15,votesmin:0,bookmarks:2,views:1700,comments:7

Tsarin aiki

Don tantancewa za mu yi amfani da Python, Pandas da Matplotlib. Wadanda ba su da sha'awar kididdiga za su iya tsallake wannan bangare kuma su tafi kai tsaye zuwa labaran.

Da farko kuna buƙatar loda saitin bayanai zuwa ƙwaƙwalwar ajiya kuma zaɓi bayanai don shekarar da ake so.

import pandas as pd
import datetime
import matplotlib.dates as mdates
from matplotlib.ticker import FormatStrFormatter
from pandas.plotting import register_matplotlib_converters


df = pd.read_csv("habr.csv", sep=',', encoding='utf-8', error_bad_lines=True, quotechar='"', comment='#')
dates = pd.to_datetime(df['datetime'], format='%Y-%m-%dT%H:%MZ')
df['datetime'] = dates
year = 2019
df = df[(df['datetime'] >= pd.Timestamp(datetime.date(year, 1, 1))) & (df['datetime'] < pd.Timestamp(datetime.date(year+1, 1, 1)))]

print(df.shape)

Ya zama cewa a wannan shekara (ko da yake ba a gama ba tukuna) a lokacin rubutawa, an buga labarai 12715. Don kwatanta, ga dukan 2018 - 15904. Gabaɗaya, mai yawa - wannan shine game da labaran 43 a kowace rana (kuma wannan shine kawai tare da ƙima mai kyau; yawancin labaran da aka sauke da suka tafi mummunan ko an share su, wanda zai iya kawai tsammani. ko kuma a ƙididdige su daga gibin da ke tsakanin masu ganowa).

Bari mu zaɓi filayen da ake buƙata daga tsarin bayanai. A matsayin ma'auni za mu yi amfani da adadin ra'ayoyi, sharhi, ƙimar ƙima da adadin alamomin.

def to_float(s):
    # "bookmarks:22" => 22.0
    num = ''.join(i for i in s if i.isdigit())
    return float(num)

def to_int(s):
    # "bookmarks:22" => 22
    num = ''.join(i for i in s if i.isdigit())
    return int(num)

def to_date(dt):
    return dt.date() 

date = dates.map(to_date, na_action=None)
views = df["views"].map(to_int, na_action=None)
bookmarks = df["bookmarks"].map(to_int, na_action=None)
votes = df["votes"].map(to_float, na_action=None)
votes_up = df["up"].map(to_float, na_action=None)
votes_down = df["down"].map(to_float, na_action=None)
comments = df["comments"].map(to_int, na_action=None)

df['date'] = date
df['views'] = views
df['votes'] = votes
df['bookmarks'] = bookmarks
df['up'] = votes_up
df['down'] = votes_down

Yanzu an ƙara bayanan zuwa bayanan bayanan kuma za mu iya amfani da su. Bari mu tara bayanan da rana kuma mu ɗauki matsakaicin ƙima.

g = df.groupby(['date'])
days_count = g.size().reset_index(name='counts')
year_days = days_count['date'].values
grouped = g.median().reset_index()
grouped['counts'] = days_count['counts']
counts_per_day = grouped['counts'].values
counts_per_day_avg = grouped['counts'].rolling(window=20).mean()
view_per_day = grouped['views'].values
view_per_day_avg = grouped['views'].rolling(window=20).mean()
votes_per_day = grouped['votes'].values
votes_per_day_avg = grouped['votes'].rolling(window=20).mean()
bookmarks_per_day = grouped['bookmarks'].values
bookmarks_per_day_avg = grouped['bookmarks'].rolling(window=20).mean()

Yanzu abin ban sha'awa shine cewa zamu iya kallon jadawali.

Bari mu kalli adadin wallafe-wallafen kan Habré a cikin 2019.

import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (16, 8)
fig, ax = plt.subplots()

plt.bar(year_days, counts_per_day, label='Articles/day')
plt.plot(year_days, counts_per_day_avg, 'g-', label='Articles avg/day')
plt.xticks(rotation=45)
ax.xaxis.set_major_formatter(mdates.DateFormatter("%d-%m-%Y"))  
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
plt.legend(loc='best')
plt.tight_layout()
plt.show()

Sakamakon yana da ban sha'awa. Kamar yadda kake gani, Habr ya kasance ɗan '' tsiran alade '' a duk shekara. Ban san dalili ba.

Habrastatistics: yadda Habr ke rayuwa ba tare da geektimes ba

Don kwatantawa, 2018 ya dubi ɗan laushi:

Habrastatistics: yadda Habr ke rayuwa ba tare da geektimes ba

Gabaɗaya, ban ga wani raguwar raguwar adadin labaran da aka buga a cikin 2019 akan jadawali ba. Bugu da ƙari, akasin haka, yana da alama ya ƙara dan kadan tun lokacin rani.

Amma jadawali biyu na gaba sun ɗan rage mani rauni.

Matsakaicin adadin ra'ayoyi a kowane labarin:

Habrastatistics: yadda Habr ke rayuwa ba tare da geektimes ba

Matsakaicin ƙimar kowane labarin:

Habrastatistics: yadda Habr ke rayuwa ba tare da geektimes ba

Kamar yadda kake gani, matsakaicin adadin ra'ayoyi yana raguwa kaɗan a cikin shekara. Ana iya bayyana wannan ta gaskiyar cewa sabbin labaran ba a riga an tsara su ta hanyar injunan bincike ba, kuma ba a sami su sau da yawa ba. Amma raguwar matsakaicin ƙimar kowane labarin ya fi rashin fahimta. Abin ji shi ne cewa masu karatu ko dai ba su da lokaci don duba labarai da yawa ko kuma ba sa kula da kima. Ta fuskar shirin bayar da kyautar marubuci, wannan yanayin ba shi da dadi sosai.

Af, wannan bai faru ba a cikin 2018, kuma jadawalin yana da yawa ko žasa har ma.

Habrastatistics: yadda Habr ke rayuwa ba tare da geektimes ba

Gabaɗaya, masu albarkatun suna da abin da za su yi tunani akai.

Amma kada mu yi magana game da abubuwa masu ban tausayi. Gabaɗaya, zamu iya cewa Habr "ya tsira" lokacin rani ya canza sosai cikin nasara, kuma adadin abubuwan da ke shafin bai ragu ba.

Bayani

Yanzu, a zahiri, da rating. Ina taya wadanda suka shiga ciki murna. Bari in sake tunatar da ku cewa ƙimar ba ta zama na hukuma ba, watakila na rasa wani abu, kuma idan wani labarin ya kamata ya kasance a nan, amma ba haka ba, rubuta, zan ƙara shi da hannu. A matsayin kima, Ina amfani da ma'aunin ƙididdiga, wanda ina tsammanin ya zama mai ban sha'awa sosai.

Manyan labarai ta yawan ra'ayoyi

Manyan labarai ta hanyar kima zuwa rabon ra'ayi

Manyan labarai ta sharhi zuwa rabon ra'ayi

Manyan labaran da suka fi jawo cece-kuce

Manyan labarai ta hanyar kima

Manyan labarai ta adadin alamomin

Sama da rabon alamun shafi zuwa ra'ayoyi

Manyan labarai ta adadin tsokaci

Kuma a ƙarshe, na ƙarshe Antitop ta adadin abubuwan da ba a so

Ugh Ina da wasu zaɓuɓɓuka masu ban sha'awa, amma ba zan gajiyar da masu karatu ba.

ƙarshe

Lokacin gina ƙima, na ba da hankali ga maki biyu waɗanda suke da ban sha'awa.

Da fari dai, 60% na saman labarai ne na nau'in "geektimes". Ko za a sami kaɗan daga cikinsu a shekara mai zuwa, da kuma yadda Habr zai yi kama da ba tare da labaran game da giya, sararin samaniya, magani, da dai sauransu ba, ban sani ba. Tabbas, masu karatu za su rasa wani abu. Mu gani.

Na biyu, manyan alamomin sun juya sun kasance masu inganci da ba zato ba tsammani. Wannan yana da fahimta ta hankali; masu karatu na iya ba da hankali ga ƙimar, amma idan labarin bukata, sa'an nan za a ƙara zuwa ga alamomin ku. Kuma a nan shi ne ainihin mafi girman taro na labarai masu amfani da mahimmanci. Ina tsammanin ya kamata masu rukunin yanar gizon suyi tunani ta hanyar alaƙa tsakanin adadin alamomin da shirin lada idan suna son haɓaka wannan nau'in labaran musamman a nan kan Habré.

Wani abu kamar wannan. Ina fatan ya kasance m.

Jerin labaran ya juya ya zama tsayi, da kyau, yana yiwuwa don mafi kyau. Barka da karatu kowa.

source: www.habr.com

Add a comment