Habrastatistics: ื•ื™ืกืคืึธืจืฉืŸ ื“ื™ ืžืขืจืกื˜ ืื•ืŸ ืžื™ื ื“ืกื˜ืขืจ ื‘ืื–ื•ื›ื˜ ืกืขืงืฉืึทื ื– ืคื•ืŸ ื“ืขื ืคึผืœืึทืฅ

ื”ืขืœื, ื”ืื‘ืจ.

ะ’ ืคึฟืจื™ึดืขืจื“ื™ืงืข ื˜ื™ื™ืœ ื”ืึทื‘ืจ ืก ืคืึทืจืงืขืจ ืื™ื– ืึทื ืึทืœื™ื™ื–ื“ ืœื•ื™ื˜ ื“ื™ ื”ื•ื™ืคึผื˜ ืคึผืึทืจืึทืžืขื˜ืขืจืก - ื“ื™ ื ื•ืžืขืจ ืคื•ืŸ ืึทืจื˜ื™ืงืœืขืŸ, ื–ื™ื™ืขืจ ืงื•ืงืŸ ืื•ืŸ ืจื™ื™ื˜ื™ื ื’ื–. ืึธื‘ืขืจ, ื“ื™ ืึทืจื•ื™ืกื’ืขื‘ืŸ ืคื•ืŸ ื“ื™ ืคึผืึธืคึผื•ืœืึทืจื™ื˜ืขื˜ ืคื•ืŸ ื“ื™ ืคึผืœืึทืฅ ืกืขืงืฉืึทื ื– ืคืืจื‘ืœื™ื‘ืŸ ืึทื ื™ืงืกืึทืžื™ื™ื ื“. ืขืก ืื™ื– ื’ืขื•ื•ืืจืŸ ื˜ืฉื™ืงืึทื•ื•ืข ืฆื• ืงื•ืงืŸ ืื™ืŸ ื“ืขื ืื™ืŸ ืžืขืจ ื“ืขื˜ืึทืœ ืื•ืŸ ื’ืขืคึฟื™ื ืขืŸ ื“ื™ ืžืขืจืกื˜ ืคืึธืœืงืก ืื•ืŸ ืžืขืจืกื˜ ืึทื ืคึผืึทืคึผื™ืึทืœืขืจ ื›ืึทื‘ื–. ืฆื•ื ืกื•ืฃ, ืื™ืš ื•ื•ืขื˜ ืงื•ืงืŸ ืื™ืŸ ื“ื™ ื’ืขืขืงื˜ื™ืžืขืก ื•ื•ื™ืจืงื•ื ื’ ืื™ืŸ ืžืขืจ ื“ืขื˜ืึทืœ, ืขื ื“ื™ืงืŸ ืžื™ื˜ ืึท ื ื™ื™ึทืข ืกืขืœืขืงืฆื™ืข ืคื•ืŸ โ€‹โ€‹โ€‹โ€‹ื“ื™ ื‘ืขืกื˜ืขืจ ืึทืจื˜ื™ืงืœืขืŸ ื‘ืื–ื™ืจื˜ ืื•ื™ืฃ ื ื™ื™ึทืข ืจืึทื ื’ืงื™ื ื’ื–.

Habrastatistics: ื•ื™ืกืคืึธืจืฉืŸ ื“ื™ ืžืขืจืกื˜ ืื•ืŸ ืžื™ื ื“ืกื˜ืขืจ ื‘ืื–ื•ื›ื˜ ืกืขืงืฉืึทื ื– ืคื•ืŸ ื“ืขื ืคึผืœืึทืฅ

ืคืืจ ื“ื™ ื•ื•ืืก ื–ืขื ืขืŸ ืคืืจืื™ื ื˜ืขืจืขืกื™ืจื˜ ืื™ืŸ ื“ืขื ื•ื•ืืก ืื™ื– ื’ืขืฉืขืŸ, ืื™ื– ื“ืขืจ ื”ืžืฉืš ืื•ื ื˜ืขืจ ื“ื™ ืฉื ื™ื˜.

ืœืึธื–ืŸ ืžื™ืจ ื“ืขืจืžืึธื ืขืŸ ืื™ืจ ืึทืžืึธืœ ื•ื•ื™ื“ืขืจ ืึทื– ื“ื™ ืกื˜ืึทื˜ื™ืกื˜ื™ืง ืื•ืŸ ืจื™ื™ื˜ื™ื ื’ื– ื–ืขื ืขืŸ ื ื™ืฉื˜ ื‘ืึทืึทืžื˜ืขืจ, ืื™ืš ื˜ืึธืŸ ื ื™ื˜ ื”ืึธื‘ืŸ ืงื™ื™ืŸ ื™ื ืกื™ื™ื“ืขืจ ืื™ื ืคึฟืึธืจืžืึทืฆื™ืข. ืขืก ืื™ื– ืื•ื™ืš ื ื™ืฉื˜ ื’ืขืจืึทื ื˜ื™ื“ ืึทื– ืื™ืš ื”ืื˜ ื ื™ืฉื˜ ืžืึทื›ืŸ ืึท ื’ืจื™ื™ึทื– ืขืจื’ืขืฅ ืึธื“ืขืจ ืžื™ืกื˜ ืขืคึผืขืก. ืึธื‘ืขืจ ื ืึธืš, ืื™ืš ื˜ืจืึทื›ื˜ืŸ ืขืก ืื™ื– ื’ืขื•ื•ืขืŸ ื˜ืฉื™ืงืึทื•ื•ืข. ืžื™ืจ ื•ื•ืขืœืŸ ืึธื ื”ื™ื™ื‘ืŸ ืžื™ื˜ ื“ื™ ืงืึธื“ ืขืจืฉื˜ืขืจ; ื“ื™ ื•ื•ืืก ื–ืขื ืขืŸ ื ื™ืฉื˜ ืื™ื ื˜ืขืจืขืกื™ืจื˜ ืื™ืŸ ื“ืขื ืงืขื ืขืŸ ื”ืึธืคึผืงืขืŸ ื“ื™ ืขืจืฉื˜ืขืจ ืกืขืงืฉืึทื ื–.

ื“ืึทื˜ืึท ื–ืึทืžืœื•ื ื’

ืื™ืŸ ื“ืขืจ ืขืจืฉื˜ืขืจ ื•ื•ืขืจืกื™ืข ืคื•ืŸ โ€‹โ€‹ื“ื™ ืคึผืึทืจืกืขืจ, ื‘ืœื•ื™ื– ื“ื™ ื ื•ืžืขืจ ืคื•ืŸ ืงื•ืงืŸ, ื‘ืึทืžืขืจืงื•ื ื’ืขืŸ ืื•ืŸ ืึทืจื˜ื™ืงืœ ืจื™ื™ื˜ื™ื ื’ื– ื–ืขื ืขืŸ ื’ืขื ื•ืžืขืŸ ืื™ืŸ ื—ืฉื‘ื•ืŸ. ื“ืึธืก ืื™ื– ืฉื•ื™ืŸ ื’ื•ื˜, ืึธื‘ืขืจ ืขืก ืงืขืŸ ื ื™ืฉื˜ ืœืึธื–ืŸ ืื™ืจ ืžืึทื›ืŸ ืžืขืจ ืงืึธืžืคึผืœื™ืฆื™ืจื˜ ืคึฟืจืื’ืŸ. ืขืก ืื™ื– ืฆื™ื™ื˜ ืฆื• ืคื•ื ืึทื ื“ืขืจืงืœื™ื™ึทื‘ืŸ ื“ื™ ื˜ื™ืžืึทื˜ื™ืง ืกืขืงืฉืึทื ื– ืคื•ืŸ ื“ืขื ืคึผืœืึทืฅ; ื“ืึธืก ื•ื•ืขื˜ ืœืึธื–ืŸ ืื™ืจ ืฆื• ื˜ืึธืŸ ื’ืึทื ืฅ ื˜ืฉื™ืงืึทื•ื•ืข ืคืึธืจืฉื•ื ื’, ืœืžืฉืœ, ื–ืขืŸ ื•ื•ื™ ื“ื™ ืคึผืึธืคึผื•ืœืึทืจื™ื˜ืขื˜ ืคื•ืŸ ื“ื™ "C ++" ืึธืคึผื˜ื™ื™ืœื•ื ื’ ื”ืื˜ ื’ืขื‘ื™ื˜ืŸ ืื™ื‘ืขืจ ืขื˜ืœืขื›ืข ื™ืึธืจืŸ.

ื“ืขืจ ืึทืจื˜ื™ืงืœ ืคึผืึทืจืกืขืจ ืื™ื– ื™ืžืคึผืจื•ื•ื•ื“, ืื™ืฆื˜ ืขืก ืงืขืจื˜ ื“ื™ ื›ืึทื‘ื– ืฆื• ื•ื•ืึธืก ื“ืขืจ ืึทืจื˜ื™ืงืœ ื’ืขื”ืขืจื˜, ื•ื•ื™ ืื•ื™ืš ื“ืขื ืžื—ื‘ืจ 'ืก ื ื™ืงืงื ืึทืžืข ืื•ืŸ ื–ื™ื™ืŸ ืจืึทื ื’ (ืึท ืคึผืœืึทืฅ ืคื•ืŸ ื˜ืฉื™ืงืึทื•ื•ืข ื–ืื›ืŸ ืงืขื ืขืŸ ื–ื™ื™ืŸ ื’ืขื˜ืืŸ ื“ืึธ ืื•ื™ืš, ืึธื‘ืขืจ ื“ืึธืก ื•ื•ืขื˜ ืงื•ืžืขืŸ ืฉืคึผืขื˜ืขืจ). ื“ื™ ื“ืึทื˜ืŸ ื–ืขื ืขืŸ ื’ืขืจืื˜ืขื•ื•ืขื˜ ืื™ืŸ ืึท ืงืกื•ื• ื˜ืขืงืข ื•ื•ืึธืก ืงื•ืงื˜ ืขืคึผืขืก ื•ื•ื™ ื“ืึธืก:

2018-12-18T12:43Z,https://habr.com/ru/post/433550/,"ะœะตััะตะฝะดะถะตั€ Slack โ€” ะฟั€ะธั‡ะธะฝั‹ ะฒั‹ะฑะพั€ะฐ, ะบะพััะบะธ ะฟั€ะธ ะฒะฝะตะดั€ะตะฝะธะธ ะธ ะพัะพะฑะตะฝะฝะพัั‚ะธ ัะตั€ะฒะธัะฐ, ะพะฑะปะตะณั‡ะฐัŽั‰ะธะต ะถะธะทะฝัŒ",votes:7,votesplus:8,votesmin:1,bookmarks:32,
views:8300,comments:10,user:ReDisque,karma:5,subscribers:2,hubs:productpm+soft
...

ืžื™ืจ ื•ื•ืขืœืŸ ื‘ืึทืงื•ืžืขืŸ ืึท ืจืฉื™ืžื” ืคื•ืŸ ื“ื™ ื”ื•ื™ืคึผื˜ ื˜ื™ืžืึทื˜ื™ืง ื›ืึทื‘ื– ืคื•ืŸ ื“ืขื ืคึผืœืึทืฅ.

def get_as_str(link: str) -> Str:
    try:
        r = requests.get(link)
        return Str(r.text)
    except Exception as e:
        return Str("")

def get_hubs():
    hubs = []
    for p in range(1, 12):
        page_html = get_as_str("https://habr.com/ru/hubs/page%d/" % p)
        # page_html = get_as_str("https://habr.com/ru/hubs/geektimes/page%d/" % p)  # Geektimes
        # page_html = get_as_str("https://habr.com/ru/hubs/develop/page%d/" % p)  # Develop
        # page_html = get_as_str("https://habr.com/ru/hubs/admin/page%d" % p)  # Admin
        for hub in page_html.split("media-obj media-obj_hub"):
            info = Str(hub).find_between('"https://habr.com/ru/hub', 'list-snippet__tags') 
            if "*</span>" in info:
                hub_name = info.find_between('/', '/"')
                if len(hub_name) > 0 and len(hub_name) < 32:
                    hubs.append(hub_name)
    print(hubs)

ื“ื™ find_between ืคึฟื•ื ืงืฆื™ืข ืื•ืŸ ื“ื™ Str ืงืœืึทืก ืกืขืœืขืงื˜ื™ืจืŸ ืึท ืฉื˜ืจื™ืงืœ ืฆื•ื•ื™ืฉืŸ ืฆื•ื•ื™ื™ ื˜ืึทื’ืก, ืื™ืš ื’ืขื•ื•ื™ื™ื ื˜ ื–ื™ื™ ืคืจื™ืขืจ. ื˜ื”ืขืžืึทื˜ื™ืง ื›ืึทื‘ื– ื–ืขื ืขืŸ ืื ื’ืขืฆื™ื™ื›ื ื˜ ืžื™ื˜ ืึท "*" ืึทื–ื•ื™ ื–ื™ื™ ืงืขื ืขืŸ ื–ื™ื™ืŸ ืœื™ื™ื›ื˜ ื›ื™ื™ืœื™ื™ื˜ื™ื“, ืื•ืŸ ืื™ืจ ืงืขื ืขืŸ ืื•ื™ืš ืึธืคึผืฉืึทืฆืŸ ื“ื™ ืงืึธืจืึทืกืคึผืึทื ื“ื™ื ื’ ืฉื•ืจื•ืช ืฆื• ื‘ืึทืงื•ืžืขืŸ ืกืขืงืฉืึทื ื– ืคื•ืŸ ืื ื“ืขืจืข ืงืึทื˜ืขื’ืึธืจื™ืขืก.

ื“ืขืจ ืจืขื–ื•ืœื˜ืึทื˜ ืคื•ืŸ ื“ื™ get_hubs ืคึฟื•ื ืงืฆื™ืข ืื™ื– ืึท ื’ืึทื ืฅ ื™ืžืคึผืจืขืกื™ื•ื• ืจืฉื™ืžื”, ื•ื•ืึธืก ืžื™ืจ ืจืึทื˜ืขื•ื•ืขืŸ ื•ื•ื™ ืึท ื•ื•ืขืจื˜ืขืจื‘ื•ืš. ืื™ืš ื‘ื™ืŸ ืกืคึผืึทืกื™ืคื™ืงืœื™ ืคึผืจื™ื–ืขื ื˜ื™ื ื’ ื“ื™ ืจืฉื™ืžื” ืื™ืŸ ื–ื™ื™ึทืŸ ื’ืึทื ืฅ ืึทื–ื•ื™ ืึทื– ืื™ืจ ืงืขื ืขืŸ ืึธืคึผืฉืึทืฆืŸ ื–ื™ื™ึทืŸ ื‘ืึทื ื“.

hubs_profile = {'infosecurity', 'programming', 'webdev', 'python', 'sys_admin', 'it-infrastructure', 'devops', 'javascript', 'open_source', 'network_technologies', 'gamedev', 'cpp', 'machine_learning', 'pm', 'hr_management', 'linux', 'analysis_design', 'ui', 'net', 'hi', 'maths', 'mobile_dev', 'productpm', 'win_dev', 'it_testing', 'dev_management', 'algorithms', 'go', 'php', 'csharp', 'nix', 'data_visualization', 'web_testing', 's_admin', 'crazydev', 'data_mining', 'bigdata', 'c', 'java', 'usability', 'instant_messaging', 'gtd', 'system_programming', 'ios_dev', 'oop', 'nginx', 'kubernetes', 'sql', '3d_graphics', 'css', 'geo', 'image_processing', 'controllers', 'game_design', 'html5', 'community_management', 'electronics', 'android_dev', 'crypto', 'netdev', 'cisconetworks', 'db_admins', 'funcprog', 'wireless', 'dwh', 'linux_dev', 'assembler', 'reactjs', 'sales', 'microservices', 'search_technologies', 'compilers', 'virtualization', 'client_side_optimization', 'distributed_systems', 'api', 'media_management', 'complete_code', 'typescript', 'postgresql', 'rust', 'agile', 'refactoring', 'parallel_programming', 'mssql', 'game_promotion', 'robo_dev', 'reverse-engineering', 'web_analytics', 'unity', 'symfony', 'build_automation', 'swift', 'raspberrypi', 'web_design', 'kotlin', 'debug', 'pay_system', 'apps_design', 'git', 'shells', 'laravel', 'mobile_testing', 'openstreetmap', 'lua', 'vs', 'yii', 'sport_programming', 'service_desk', 'itstandarts', 'nodejs', 'data_warehouse', 'ctf', 'erp', 'video', 'mobileanalytics', 'ipv6', 'virus', 'crm', 'backup', 'mesh_networking', 'cad_cam', 'patents', 'cloud_computing', 'growthhacking', 'iot_dev', 'server_side_optimization', 'latex', 'natural_language_processing', 'scala', 'unreal_engine', 'mongodb', 'delphi',  'industrial_control_system', 'r', 'fpga', 'oracle', 'arduino', 'magento', 'ruby', 'nosql', 'flutter', 'xml', 'apache', 'sveltejs', 'devmail', 'ecommerce_development', 'opendata', 'Hadoop', 'yandex_api', 'game_monetization', 'ror', 'graph_design', 'scada', 'mobile_monetization', 'sqlite', 'accessibility', 'saas', 'helpdesk', 'matlab', 'julia', 'aws', 'data_recovery', 'erlang', 'angular', 'osx_dev', 'dns', 'dart', 'vector_graphics', 'asp', 'domains', 'cvs', 'asterisk', 'iis', 'it_monetization', 'localization', 'objectivec', 'IPFS', 'jquery', 'lisp', 'arvrdev', 'powershell', 'd', 'conversion', 'animation', 'webgl', 'wordpress', 'elm', 'qt_software', 'google_api', 'groovy_grails', 'Sailfish_dev', 'Atlassian', 'desktop_environment', 'game_testing', 'mysql', 'ecm', 'cms', 'Xamarin', 'haskell', 'prototyping', 'sw', 'django', 'gradle', 'billing', 'tdd', 'openshift', 'canvas', 'map_api', 'vuejs', 'data_compression', 'tizen_dev', 'iptv', 'mono', 'labview', 'perl', 'AJAX', 'ms_access', 'gpgpu', 'infolust', 'microformats', 'facebook_api', 'vba', 'twitter_api', 'twisted', 'phalcon', 'joomla', 'action_script', 'flex', 'gtk', 'meteorjs', 'iconoskaz', 'cobol', 'cocoa', 'fortran', 'uml', 'codeigniter', 'prolog', 'mercurial', 'drupal', 'wp_dev', 'smallbasic', 'webassembly', 'cubrid', 'fido', 'bada_dev', 'cgi', 'extjs', 'zend_framework', 'typography', 'UEFI', 'geo_systems', 'vim', 'creative_commons', 'modx', 'derbyjs', 'xcode', 'greasemonkey', 'i2p', 'flash_platform', 'coffeescript', 'fsharp', 'clojure', 'puppet', 'forth', 'processing_lang', 'firebird', 'javame_dev', 'cakephp', 'google_cloud_vision_api', 'kohanaphp', 'elixirphoenix', 'eclipse', 'xslt', 'smalltalk', 'googlecloud', 'gae', 'mootools', 'emacs', 'flask', 'gwt', 'web_monetization', 'circuit-design', 'office365dev', 'haxe', 'doctrine', 'typo3', 'regex', 'solidity', 'brainfuck', 'sphinx', 'san', 'vk_api', 'ecommerce'}

ืคึฟืึทืจ ืคืึทืจื’ืœื™ื™ึทืš, ื“ื™ ื’ืขืขืงื˜ื™ืžื– ืกืขืงืฉืึทื ื– ืงื•ืงืŸ ืžืขืจ ื‘ืึทืฉื™ื™ื“ืŸ:

hubs_gt = {'popular_science', 'history', 'soft', 'lifehacks', 'health', 'finance', 'artificial_intelligence', 'itcompanies', 'DIY', 'energy', 'transport', 'gadgets', 'social_networks', 'space', 'futurenow', 'it_bigraphy', 'antikvariat', 'games', 'hardware', 'learning_languages', 'urban', 'brain', 'internet_of_things', 'easyelectronics', 'cellular', 'physics', 'cryptocurrency', 'interviews', 'biotech', 'network_hardware', 'autogadgets', 'lasers', 'sound', 'home_automation', 'smartphones', 'statistics', 'robot', 'cpu', 'video_tech', 'Ecology', 'presentation', 'desktops', 'wearable_electronics', 'quantum', 'notebooks', 'cyberpunk', 'Peripheral', 'demoscene', 'copyright', 'astronomy', 'arvr', 'medgadgets', '3d-printers', 'Chemistry', 'storages', 'sci-fi', 'logic_games', 'office', 'tablets', 'displays', 'video_conferencing', 'videocards', 'photo', 'multicopters', 'supercomputers', 'telemedicine', 'cybersport', 'nano', 'crowdsourcing', 'infographics'}

ื“ ื™ ืื™ื‘ืขืจื™ืง ืข ื”ืื‘ืข ืŸ ื–ืฒื ืข ืŸ ื’ืขืฐืข ืŸ ืืคื’ืขื”ื™ื˜ ืŸ ืื•ื™ ืฃ ื“ืข ืจ ื–ืขืœื‘ืข ืจ ืื•ืคืŸ . ืื™ืฆื˜ ืขืก ืื™ื– ื’ืจื™ื ื’ ืฆื• ืฉืจื™ื™ึทื‘ืŸ ืึท ืคื•ื ืงืฆื™ืข ื•ื•ืึธืก ืงืขืจื˜ ื“ื™ ืจืขื–ื•ืœื˜ืึทื˜ ืฆื™ ื“ืขืจ ืึทืจื˜ื™ืงืœ ื’ืขื”ืขืจื˜ ืฆื• ื’ืขืขืงื˜ื™ืžืขืก ืึธื“ืขืจ ืึท ืคึผืจืึธืคื™ืœ ื›ืึทื‘.

def is_geektimes(hubs: List) -> bool:
    return len(set(hubs) & hubs_gt) > 0

def is_geektimes_only(hubs: List) -> bool:
    return is_geektimes(hubs) is True and is_profile(hubs) is False

def is_profile(hubs: List) -> bool:
    return len(set(hubs) & hubs_profile) > 0

ืขื ืœืขื›ืข ืคืึทื ื’ืงืฉืึทื ื– ื–ืขื ืขืŸ ื’ืขืžืื›ื˜ ืคึฟืึทืจ ืื ื“ืขืจืข ืกืขืงืฉืึทื ื– ("ืึทื ื˜ื•ื•ื™ืงืœื•ื ื’", "ืึทื“ืžื™ื ื™ืกื˜ืจืึทืฆื™ืข", ืืื–"ื• ื•).

ืคึผืจืึทืกืขืกื™ื ื’

ืขืก ืื™ื– ืฆื™ื™ื˜ ืฆื• ืึธื ื”ื™ื™ื‘ืŸ ืึทื ืึทืœื™ื™ื–ื™ื ื’. ืžื™ืจ ืœืึธื“ืŸ ื“ื™ ื“ืึทื˜ืึทืกืขื˜ ืื•ืŸ ืคึผืจืึธืฆืขืก ื“ื™ ื›ืึทื‘ ื“ืึทื˜ืŸ.

def to_list(s: str) -> List[str]:
    # "user:popular_science+astronomy" => [popular_science, astronomy]
    return s.split(':')[1].split('+')

def to_date(dt: datetime) -> datetime.date:
    return dt.date()

df = pd.read_csv("habr_2019.csv", sep=',', encoding='utf-8', error_bad_lines=True, quotechar='"', comment='#')
dates = pd.to_datetime(df['datetime'], format='%Y-%m-%dT%H:%MZ')
dates += datetime.timedelta(hours=3)
df['date'] = dates.map(to_date, na_action=None)
hubs = df["hubs"].map(to_list, na_action=None)
df['hubs'] = hubs
df['is_profile'] = hubs.map(is_profile, na_action=None)
df['is_geektimes'] = hubs.map(is_geektimes, na_action=None)
df['is_geektimes_only'] = hubs.map(is_geektimes_only, na_action=None)
df['is_admin'] = hubs.map(is_admin, na_action=None)
df['is_develop'] = hubs.map(is_develop, na_action=None)

ืื™ืฆื˜ ืžื™ืจ ืงืขื ืขืŸ ื’ืจื•ืคึผืข ื“ื™ ื“ืึทื˜ืŸ ื“ื•ืจืš ื˜ืึธื’ ืื•ืŸ ื•ื•ื™ื™ึทื–ืŸ ื“ื™ ื ื•ืžืขืจ ืคื•ืŸ ืื•ื™ืกื’ืื‘ืขืก ืคึฟืึทืจ ืคืึทืจืฉื™ื“ืขื ืข ื›ืึทื‘ื–.

g = df.groupby(['date'])
days_count = g.size().reset_index(name='counts')
year_days = days_count['date'].values
grouped = g.sum().reset_index()
profile_per_day_avg = grouped['is_profile'].rolling(window=20, min_periods=1).mean()
geektimes_per_day_avg = grouped['is_geektimes'].rolling(window=20, min_periods=1).mean()
geektimesonly_per_day_avg = grouped['is_geektimes_only'].rolling(window=20, min_periods=1).mean()
admin_per_day_avg = grouped['is_admin'].rolling(window=20, min_periods=1).mean()
develop_per_day_avg = grouped['is_develop'].rolling(window=20, min_periods=1).mean()

ืžื™ืจ ื•ื•ื™ื™ึทื–ืŸ ื“ื™ ื ื•ืžืขืจ ืคื•ืŸ ืืจื•ื™ืก ืึทืจื˜ื™ืงืœืขืŸ ื ื™ืฆืŸ Matplotlib:

Habrastatistics: ื•ื™ืกืคืึธืจืฉืŸ ื“ื™ ืžืขืจืกื˜ ืื•ืŸ ืžื™ื ื“ืกื˜ืขืจ ื‘ืื–ื•ื›ื˜ ืกืขืงืฉืึทื ื– ืคื•ืŸ ื“ืขื ืคึผืœืึทืฅ

ืื™ืš ืฆืขื˜ื™ื™ืœื˜ ื“ื™ ืึทืจื˜ื™ืงืœืขืŸ "ื’ืขืขืงื˜ื™ืžืขืก" ืื•ืŸ "ื’ืขืขืงื˜ื™ืžืขืก ื‘ืœื•ื™ื–" ืื™ืŸ ื“ื™ ื˜ืฉืึทืจื˜, ื•ื•ื™ื™ึทืœ ืึทืŸ ืึทืจื˜ื™ืงืœ ืงืขื ืขืŸ ื’ืขื”ืขืจืŸ ืฆื• ื‘ื™ื™ื“ืข ืกืขืงืฉืึทื ื– ืื™ืŸ ื“ืขืจ ื–ืขืœื‘ื™ืงืขืจ ืฆื™ื™ื˜ (ืœืžืฉืœ, "DIY" + "ืžื™ืงืจืึธืงืึธื ื˜ืจืึธืœืœืขืจืก" + "C ++"). ืื™ืš ื’ืขื•ื•ื™ื™ื ื˜ ื“ื™ ื‘ืึทืฆื™ื™ื›ืขื ื•ื ื’ "ืคึผืจืึธืคื™ืœ" ืฆื• ื”ื•ื™ื›ืคึผื•ื ืงื˜ ืคึผืจืึธืคื™ืœ ืึทืจื˜ื™ืงืœืขืŸ ืื•ื™ืฃ ื“ืขื ืคึผืœืึทืฅ, ื›ืึธื˜ืฉ ื˜ืึธืžืขืจ ื“ื™ ืขื ื’ืœื™ืฉ ื˜ืขืจืžื™ืŸ ืคึผืจืึธืคื™ืœ ืคึฟืึทืจ ื“ืขื ืื™ื– ื ื™ืฉื˜ ืœืขื’ืึทืžืจืข ืจื™ื›ื˜ื™ืง.

ืื™ืŸ ื“ื™ ืคืจื™ืขืจื“ื™ืงืข ื˜ื™ื™ืœ ืžื™ืจ ื’ืขืคืจืขื’ื˜ ื•ื•ืขื’ืŸ ื“ื™ "ื’ืขืขืงื˜ื™ืžืข ื•ื•ื™ืจืงื•ื ื’" ืคึฟืึทืจื‘ื•ื ื“ืŸ ืžื™ื˜ ื“ื™ ืขื ื“ืขืจื•ื ื’ ืื™ืŸ ื“ื™ ืฆืึธืœื•ื ื’ ื›ึผืœืœื™ื ืคึฟืึทืจ ืึทืจื˜ื™ืงืœืขืŸ ืคึฟืึทืจ ื’ืขืขืงื˜ื™ืžืขืก ืกื˜ืึทืจื˜ื™ื ื’ ื“ืขื ื–ื•ืžืขืจ. ืœืึธืžื™ืจ ื•ื•ื™ื™ึทื–ืŸ ื“ื™ ื’ืขืขืงื˜ื™ืžื– ืึทืจื˜ื™ืงืœืขืŸ ืกืขืคึผืขืจืึทื˜ืœื™:

df_gt = df[(df['is_geektimes_only'] == True)]
group_gt = df_gt.groupby(['date'])
days_count_gt = group_gt.size().reset_index(name='counts')
grouped = group_gt.sum().reset_index()
year_days_gt = days_count_gt['date'].values
view_gt_per_day_avg = grouped['views'].rolling(window=20, min_periods=1).mean()

ื“ืขืจ ืจืขื–ื•ืœื˜ืึทื˜ ืื™ื– ื˜ืฉื™ืงืึทื•ื•ืข. ื“ื™ ื“ืขืจื ืขื ื˜ืขืจื  ืคืึทืจื”ืขืœื˜ืขื ื™ืฉ ืคื•ืŸ ืงื•ืงืŸ ืคื•ืŸ ื’ืขืขืงื˜ื™ืžื– ืึทืจื˜ื™ืงืœืขืŸ ืฆื• ื“ื™ ื’ืึทื ืฅ ืื™ื– ืขืจื’ืขืฅ ืึทืจื•ื 1:5. ืึธื‘ืขืจ ื›ืึธื˜ืฉ ื“ื™ ื’ืึทื ืฅ ื ื•ืžืขืจ ืคื•ืŸ ืงื•ืงืŸ ืคืœืึทืงื˜ืฉื•ื™ื™ื˜ื™ื“ ื‘ืืžืขืจืงื˜, ื“ื™ ื•ื•ื™ื•ื™ื ื’ ืคื•ืŸ "ืคืึทืจื•ื•ื™ื™ึทืœื•ื ื’" ืึทืจื˜ื™ืงืœืขืŸ ืคืืจื‘ืœื™ื‘ืŸ ืื™ืŸ ื‘ืขืขืจืขืš ื“ืขืจ ื–ืขืœื‘ื™ืงืขืจ ืžื“ืจื’ื”.

Habrastatistics: ื•ื™ืกืคืึธืจืฉืŸ ื“ื™ ืžืขืจืกื˜ ืื•ืŸ ืžื™ื ื“ืกื˜ืขืจ ื‘ืื–ื•ื›ื˜ ืกืขืงืฉืึทื ื– ืคื•ืŸ ื“ืขื ืคึผืœืึทืฅ

ืื™ืจ ืงืขื ื˜ ืื•ื™ืš ื‘ืึทืžืขืจืงืŸ ืึทื– ื“ื™ ื’ืึทื ืฅ ื ื•ืžืขืจ ืคื•ืŸ ืงื•ืงืŸ ืคื•ืŸ ืึทืจื˜ื™ืงืœืขืŸ ืื™ืŸ ื“ื™ "ื’ืขืขืงื˜ื™ืžืขืก" ืึธืคึผื˜ื™ื™ืœื•ื ื’ ื ืึธืš ื’ืขืคืืœืŸ ื ืึธืš ื˜ืฉืึทื ื’ื™ื ื’ ื“ื™ ื›ึผืœืœื™ื, ืึธื‘ืขืจ "ื“ื•ืจืš ืื•ื™ื’", ืžื™ื˜ ื ื™ื˜ ืžืขืจ ื•ื•ื™ 5% ืคื•ืŸ ื“ื™ ื’ืึทื ืฅ ื•ื•ืึทืœื•ืขืก.

ืขืก ืื™ื– ื˜ืฉื™ืงืึทื•ื•ืข ืฆื• ืงื•ืงืŸ ืื™ืŸ ื“ื™ ื“ื•ืจื›ืฉื ื™ื˜ืœืขืš ื ื•ืžืขืจ ืคื•ืŸ ืงื•ืงืŸ ืคึผืขืจ ืึทืจื˜ื™ืงืœ:

Habrastatistics: ื•ื™ืกืคืึธืจืฉืŸ ื“ื™ ืžืขืจืกื˜ ืื•ืŸ ืžื™ื ื“ืกื˜ืขืจ ื‘ืื–ื•ื›ื˜ ืกืขืงืฉืึทื ื– ืคื•ืŸ ื“ืขื ืคึผืœืึทืฅ

ืคึฟืึทืจ "ืคึฟืึทืจื•ื•ื™ื™ึทืœื•ื ื’" ืึทืจื˜ื™ืงืœืขืŸ ืขืก ืื™ื– ื•ื•ืขื’ืŸ 40% ื”ืขื›ืขืจ ื“ื•ืจื›ืฉื ื™ื˜ืœืขืš. ื“ืึธืก ืื™ื– ืžื™ืกื˜ืึธืžืข ื ื™ืฉื˜ ื—ื™ื“ื•ืฉ. ื“ืขืจ ื“ื•ืจื›ืคืึทืœ ืื™ืŸ ื“ื™ ืึธื ื”ื™ื™ื‘ ืคื•ืŸ ืืคืจื™ืœ ืื™ื– ื•ืžืงืœืึธืจ ืฆื• ืžื™ืจ, ืืคึฟืฉืจ ื“ืึธืก ืื™ื– ื•ื•ืึธืก ื’ืขื˜ืจืืคืŸ, ืึธื“ืขืจ ืขืก ืื™ื– ืึท ืžื™ืŸ ืคื•ืŸ ืคึผืึทืจืกื™ื ื’ ื˜ืขื•ืช, ืึธื“ืขืจ ืืคึฟืฉืจ ืื™ื™ื ืขืจ ืคื•ืŸ ื“ื™ ืžื—ื‘ืจื™ื ืคื•ืŸ ื’ืขืขืงื˜ื™ืžืข ืื™ื– ื’ืขื’ืื ื’ืขืŸ ืื•ื™ืฃ ื•ื•ืึทืงืึทืฆื™ืข;).

ื“ื•ืจืš ื“ืขื ื•ื•ืขื’, ื“ื™ ื’ืจืึทืคื™ืง ื•ื•ื™ื™ื–ื˜ ืฆื•ื•ื™ื™ ืžืขืจ ื‘ืืžืขืจืงื˜ ืคึผื™ืงืก ืื™ืŸ ื“ื™ ื ื•ืžืขืจ ืคื•ืŸ ืงื•ืงืŸ ืคื•ืŸ ืึทืจื˜ื™ืงืœืขืŸ - ื“ื™ ื ื™ื• ื™ืึธืจ ืื•ืŸ ืžื™ื™ึท ื”ืึธืœื™ื“ื™ื™ึทืก.

Hubs

ื–ืืœ ืก ืžืึทืš ืื•ื™ืฃ ืฆื• ื“ื™ ืฆื•ื’ืขื–ืื’ื˜ ืึทื ืึทืœื™ืกื™ืก ืคื•ืŸ ื›ืึทื‘ื–. ื–ืืœ ืก ืจืฉื™ืžื” ื“ื™ ืฉืคึผื™ืฅ 20 ื›ืึทื‘ื– ืœื•ื™ื˜ ื ื•ืžืขืจ ืคื•ืŸ ืงื•ืงืŸ:

hubs_info = []
for hub_name in hubs_all:
    mask = df['hubs'].apply(lambda x: hub_name in x)
    df_hub = df[mask]

    count, views = df_hub.shape[0], df_hub['views'].sum()
    hubs_info.append((hub_name, count, views))

# Draw hubs
hubs_top = sorted(hubs_info, key=lambda v: v[2], reverse=True)[:20]
top_views = list(map(lambda x: x[2], hubs_top))
top_names = list(map(lambda x: x[0], hubs_top))

plt.rcParams["figure.figsize"] = (8, 6)
plt.bar(range(0, len(top_views)), top_views)
plt.xticks(range(0, len(top_names)), top_names, rotation=90)
plt.ticklabel_format(style='plain', axis='y')
plt.tight_layout()
plt.show()

ื“ืขืจ ืจืขื–ื•ืœื˜ืึทื˜:

Habrastatistics: ื•ื™ืกืคืึธืจืฉืŸ ื“ื™ ืžืขืจืกื˜ ืื•ืŸ ืžื™ื ื“ืกื˜ืขืจ ื‘ืื–ื•ื›ื˜ ืกืขืงืฉืึทื ื– ืคื•ืŸ ื“ืขื ืคึผืœืึทืฅ

ืกืึทืคึผืจื™ื™ื–ื™ื ื’ืœื™, ื“ื™ ืžืขืจืกื˜ ืคืึธืœืงืก ื›ืึทื‘ ืื™ืŸ ื˜ืขืจืžื™ื ืขืŸ ืคื•ืŸ ืงื•ืงืŸ ืื™ื– "ืื™ื ืคึฟืึธืจืžืึทืฆื™ืข ื–ื™ื›ืขืจื”ื™ื™ื˜"; ื“ื™ ืฉืคึผื™ืฅ 5 ืคื™ืจืขืจ ืื•ื™ืš ืึทืจื™ื™ึทื ื’ืขืจืขื›ื ื˜ "ืคึผืจืึธื’ืจืึทืžืžื™ื ื’" ืื•ืŸ "ืคืึธืœืงืก ื•ื•ื™ืกื ืฉืึทืคึฟื˜".

ืึทื ื˜ื™ื˜ืึธืคึผ ืึทืงื™ืึทืคึผื™ื™ื– ื’ื˜ืง ืื•ืŸ ืงืึทืงืึทืึธ.

Habrastatistics: ื•ื™ืกืคืึธืจืฉืŸ ื“ื™ ืžืขืจืกื˜ ืื•ืŸ ืžื™ื ื“ืกื˜ืขืจ ื‘ืื–ื•ื›ื˜ ืกืขืงืฉืึทื ื– ืคื•ืŸ ื“ืขื ืคึผืœืึทืฅ

ืื™ืš ื•ื•ืขื˜ ื–ืึธื’ืŸ ืื™ืจ ืึท ืกื•ื“, ื“ื™ ืฉืคึผื™ืฅ ื›ืึทื‘ื– ืงืขื ืขืŸ ืื•ื™ืš ื–ื™ื™ืŸ ื’ืขื–ืขืŸ ื“ืึธ, ื›ืึธื˜ืฉ ื“ื™ ื ื•ืžืขืจ ืคื•ืŸ ืงื•ืงืŸ ืื™ื– ื ื™ืฉื˜ ื’ืขื•ื•ื™ื–ืŸ ื“ืึธืจื˜.

ืฉืึทืฅ

ืื•ืŸ ืœืขืกืึธืฃ, ื“ื™ ืฆื•ื’ืขื–ืื’ื˜ ืจืึทื ื’. ืžื™ื˜ ื›ืึทื‘ ืึทื ืึทืœื™ืกื™ืก ื“ืึทื˜ืŸ, ืžื™ืจ ืงืขื ืขืŸ ื•ื•ื™ื™ึทื–ืŸ ื“ื™ ืžืขืจืกื˜ ืคืึธืœืงืก ืึทืจื˜ื™ืงืœืขืŸ ืคึฟืึทืจ ื“ื™ ืžืขืจืกื˜ ืคืึธืœืงืก ื›ืึทื‘ื– ืคึฟืึทืจ ื“ืขื ื™ืึธืจ 2019.

ืื™ื ืคึฟืึธืจืžืึทืฆื™ืข ื–ื™ื›ืขืจื”ื™ื™ื˜

ืคึผืจืึธื’ืจืึทืžืžื™ื ื’

ืคืึธืœืงืก ื•ื•ื™ืกื ืฉืึทืคึฟื˜

ืงืึทืจื™ืขืจืข

ื’ืขืกืขืฆ - ื’ืขื‘ื•ื ื’ ืื™ืŸ ืขืก

ื•ื•ืขื‘ ืึทื ื˜ื•ื•ื™ืงืœื•ื ื’

ื’ื˜ืง

ืื•ืŸ ืฆื•ื ืกื•ืฃ, ื›ึผื“ื™ ืงื™ื™ื ืขืจ ื–ืึธืœ ื ื™ืฉื˜ ื‘ืึทืœื™ื™ื“ื™ืงืŸ, ื•ื•ืขืœ ืื™ืš ื’ืขื‘ืŸ ื“ื™ ืจืึทื ื’ ืคื•ืŸ ื“ืขืจ ืงืœืขื ืกื˜ืขืจ ื‘ืื–ื•ื›ื˜ ื›ืึทื‘ "ื’ื˜ืง". ื™ืŸ ืึท ื™ืึธืจ ืขืก ืื™ื– ืืจื•ื™ืก ืื™ื™ื ืขืจ ื“ืขืจ ืึทืจื˜ื™ืงืœ, ื•ื•ืึธืก ืื•ื™ืš "ืื•ื™ื˜ืืžืื˜ื™ืฉ" ืึทืงื™ืึทืคึผื™ื™ื– ื“ื™ ืขืจืฉื˜ืขืจ ืฉื•ืจื” ืคื•ืŸ ื“ื™ ืจืึทื ื’.

ืกืึธืฃ

ืขืก ื•ื•ืขื˜ ื–ื™ื™ืŸ ืงื™ื™ืŸ ืžืกืงื ื. ืžื–ืœ ืœื™ื™ืขื ืขืŸ ืึทืœืขืžืขืŸ.

ืžืงื•ืจ: www.habr.com

ืœื™ื™ื’ืŸ ืึท ื‘ืึทืžืขืจืงื•ื ื’