В bangaren da ya gabata An yi nazarin zirga-zirgar Habr bisa ga manyan sigogi - adadin labarai, ra'ayoyinsu da ƙimar su. Koyaya, batun shaharar sassan rukunin yanar gizon ya kasance ba a bincika ba. Ya zama mai ban sha'awa don kallon wannan dalla-dalla kuma sami mafi mashahuri kuma mafi yawan wuraren da ba a san su ba. A ƙarshe, zan kalli tasirin geektimes daki-daki, yana ƙarewa tare da sabon zaɓi na mafi kyawun labarai dangane da sabbin martaba.
Ga masu sha'awar abin da ya faru, ci gaba yana ƙarƙashin yanke.
Bari in sake tunatar da ku cewa ƙididdiga da ƙididdiga ba na hukuma ba ne, ba ni da wani bayani na ciki. Har ila yau, ba a tabbatar da cewa ban yi kuskure a wani wuri ba ko kuma na rasa wani abu. Amma duk da haka, ina tsammanin ya zama mai ban sha'awa. Za mu fara da lambar farko; waɗanda ba su da sha'awar wannan za su iya tsallake sassan farko.
Tarin bayanai
A cikin sigar farko ta parser, adadin ra'ayoyi, sharhi da kimar labarin kawai aka yi la'akari da su. Wannan ya riga ya yi kyau, amma baya ba ku damar yin ƙarin hadaddun tambayoyi. Lokaci ya yi da za a bincika sassan jigogi na rukunin yanar gizon; wannan zai ba ku damar yin bincike mai ban sha'awa, alal misali, duba yadda shaharar sashin "C++" ya canza cikin shekaru da yawa.
An inganta fassarar labarin, yanzu ya dawo da wuraren da labarin ya kasance, da kuma sunan lakabin marubucin da ƙimarsa (ana iya yin abubuwa masu ban sha'awa da yawa a nan, amma wannan zai zo daga baya). Ana adana bayanan a cikin fayil ɗin csv mai kama da wani abu kamar haka:
2018-12-18T12:43Z,https://habr.com/ru/post/433550/,"Мессенджер Slack — причины выбора, косяки при внедрении и особенности сервиса, облегчающие жизнь",votes:7,votesplus:8,votesmin:1,bookmarks:32,
views:8300,comments:10,user:ReDisque,karma:5,subscribers:2,hubs:productpm+soft
...
Za mu sami jerin manyan wuraren jigo na rukunin yanar gizon.
def get_as_str(link: str) -> Str:
try:
r = requests.get(link)
return Str(r.text)
except Exception as e:
return Str("")
def get_hubs():
hubs = []
for p in range(1, 12):
page_html = get_as_str("https://habr.com/ru/hubs/page%d/" % p)
# page_html = get_as_str("https://habr.com/ru/hubs/geektimes/page%d/" % p) # Geektimes
# page_html = get_as_str("https://habr.com/ru/hubs/develop/page%d/" % p) # Develop
# page_html = get_as_str("https://habr.com/ru/hubs/admin/page%d" % p) # Admin
for hub in page_html.split("media-obj media-obj_hub"):
info = Str(hub).find_between('"https://habr.com/ru/hub', 'list-snippet__tags')
if "*</span>" in info:
hub_name = info.find_between('/', '/"')
if len(hub_name) > 0 and len(hub_name) < 32:
hubs.append(hub_name)
print(hubs)
Find_tsakanin aikin da ajin Str ya zaɓi kirtani tsakanin alamun biyu, na yi amfani da su a baya. Ana yiwa matattarar jigogi alamar "*" don haka za'a iya haskaka su cikin sauƙi, kuma kuna iya rashin gamsuwa da layukan da suka dace don samun sassan wasu nau'ikan.
Fitowar aikin get_hubs jeri ne mai ban sha'awa, wanda muke ajiyewa azaman ƙamus. Ina gabatar da jeri na musamman gaba dayansa domin ku iya kimanta girmansa.
Sauran wuraren an kiyaye su ta hanya guda. Yanzu yana da sauƙi don rubuta aikin da ke mayar da sakamakon ko labarin ya kasance na geektimes ko cibiyar bayanin martaba.
Muna nuna adadin labaran da aka buga ta amfani da Matplotlib:
Na raba labaran "geektimes" da "geektimes kawai" a cikin ginshiƙi, saboda Labari na iya kasancewa cikin sassan biyu a lokaci guda (misali, “DIY” + “microcontrollers” + “C ++”). Na yi amfani da sunan “profile” don haskaka labaran bayanin martaba akan rukunin yanar gizon, kodayake wataƙila bayanin martaba na Ingilishi na wannan bai yi daidai ba.
A cikin ɓangaren da ya gabata mun yi tambaya game da "tasirin geektimes" wanda ke da alaƙa da canji a cikin ka'idodin biyan kuɗi don labarai don geektimes farawa wannan lokacin rani. Bari mu nuna labaran geektimes daban:
Sakamakon yana da ban sha'awa. Matsakaicin ra'ayi na labaran geektimes zuwa jimillar wani wuri kusa da 1:5. Amma yayin da jimillar ra'ayoyi suka bambanta sosai, kallon labaran "nishaɗi" ya kasance a kusan matakin ɗaya.
Hakanan zaka iya lura cewa yawan adadin ra'ayoyin labarai a cikin sashin "geektimes" har yanzu ya faɗi bayan canza dokoki, amma "ta ido", ba fiye da 5% na jimlar ƙimar ba.
Yana da ban sha'awa don duba matsakaicin adadin ra'ayoyi a kowace labarin:
Don labaran “nishadi” kusan kashi 40 ne sama da matsakaici. Wataƙila wannan ba abin mamaki ba ne. Rashin gazawa a farkon watan Afrilu ba a sani ba a gare ni, watakila abin da ya faru ke nan, ko kuma wani nau'in kuskure ne, ko watakila ɗaya daga cikin mawallafin geektimes ya tafi hutu;).
Af, jadawali yana nuna ƙarin kololuwa guda biyu a cikin adadin ra'ayoyin labarai - Sabuwar Shekara da hutun Mayu.
Hubs
Bari mu ci gaba zuwa ga binciken da aka yi alkawari na cibiyoyi. Bari mu lissafa manyan cibiyoyi 20 da adadin ra'ayoyi:
Abin mamaki shine, cibiyar da aka fi sani da ra'ayi ita ce "Tsaron Bayanai"; manyan shugabannin 5 kuma sun hada da "Programming" da "Kimiyya Popular".
Antitop ya mamaye Gtk da koko.
Zan gaya muku wani sirri, ana iya ganin manyan cibiyoyin sadarwa a nan, kodayake ba a nuna adadin ra'ayoyi a wurin ba.
Bayani
Kuma a ƙarshe, ƙimar da aka yi alkawari. Yin amfani da bayanan binciken cibiya, za mu iya nuna shahararrun labarai don fitattun wuraren cibiyoyi na wannan shekara ta 2019.
Kuma a ƙarshe, don kada kowa ya yi fushi, zan ba da ƙimar mafi ƙarancin ziyarta "gtk". A cikin shekara guda aka buga daya Labarin, wanda kuma "ta atomatik" ya mamaye layin farko na ƙimar.