Ungayenza kanjani i-autoscaler yakho yeqoqo

Sawubona! Siqeqesha abantu ukuthi basebenze ngedatha enkulu. Akunakwenzeka ukucabanga ngohlelo lokufundisa ngedatha enkulu ngaphandle kweqoqo layo, lapho bonke abahlanganyeli besebenza ndawonye. Ngalesi sizathu, uhlelo lwethu luhlala lunalo πŸ™‚ Simatasa ekuyicupheni, ekulungiseni nasekulawuleni, futhi abafana bethula ngokuqondile imisebenzi ye-MapReduce lapho futhi basebenzise i-Spark.

Kulokhu okuthunyelwe sizokutshela ukuthi siyixazulule kanjani inkinga yokulayisha kweqoqo elingalingani ngokubhala i-autoscaler yethu sisebenzisa ifu. I-Mail.ru Cloud Solutions.

Inkinga

Iqoqo lethu alisetshenziswa kumodi ejwayelekile. Ukulahlwa akulingani kakhulu. Isibonelo, kunamakilasi okusebenza, lapho bonke abantu abangu-30 kanye nothisha beya eqenjini futhi baqale ukulisebenzisa. Noma futhi, kunezinsuku ngaphambi komnqamulajuqu lapho umthwalo ukhuphuka kakhulu. Isikhathi esisele iqoqo lisebenza kumodi yokulayisha kancane.

Isixazululo #1 siwukugcina iqoqo elizomelana nemithwalo ephezulu, kodwa elizobe lingenzi lutho sonke isikhathi.

Isixazululo #2 ukugcina iqoqo elincane, ongeza kulo mathupha ama-node ngaphambi kwamakilasi nangesikhathi sokulayisha okuphezulu.

Isixazululo #3 ukugcina iqoqo elincane bese ubhala i-autoscaler ezoqapha umthwalo wamanje weqoqo futhi, usebenzisa ama-API ahlukahlukene, wengeze futhi ususe ama-node kusukela kuqoqo.

Kulokhu okuthunyelwe sizokhuluma ngesixazululo #3. Lesi sici esizenzakalelayo sincike kakhulu ezintweni zangaphandle kunezangaphakathi, futhi abahlinzeki ngokuvamile abasinikezi. Sisebenzisa ingqalasizinda yefu ye-Mail.ru Cloud Solutions futhi sabhala i-autoscaler sisebenzisa i-MCS API. Futhi njengoba sifundisa ukuthi kusetshenzwa kanjani ngedatha, sinqume ukubonisa ukuthi ungabhala kanjani i-autoscaler efanayo ngezinjongo zakho futhi usisebenzise nefu lakho.

Okudingekayo

Okokuqala, kufanele ube neqoqo le-Hadoop. Isibonelo, sisebenzisa ukusatshalaliswa kwe-HDP.

Ukuze ama-node akho engezwe ngokushesha futhi asuswe, kufanele ube nokusabalalisa okuthile kwezindima phakathi kwama-node.

  1. I-master node. Hhayi-ke, asikho isidingo sokuchaza noma yini ikakhulukazi: i-node eyinhloko yeqoqo, lapho, ngokwesibonelo, umshayeli we-Spark wethulwa, uma usebenzisa imodi yokuxhumana.
  2. Inodi yedethi. Lena indawo ogcina kuyo idatha ku-HDFS nalapho kubalwa khona.
  3. I-computing node. Lena indawo lapho ungagcini khona lutho ku-HDFS, kodwa lapho kubalwa khona.

Iphuzu elibalulekile. Ukukala okuzenzakalelayo kuzokwenzeka ngenxa yamanodi ohlobo lwesithathu. Uma uqala ukuthatha nokwengeza ama-node ohlobo lwesibili, isivinini sokuphendula sizoba siphansi kakhulu - ukuyekisa ukusebenzisa kanye nokuphinda uvume kuzothatha amahora kuqoqo lakho. Lokhu, vele, akukhona okulindele ku-autoscaling. Okusho ukuthi, asiwathinti ama-node ohlobo lokuqala nolwesibili. Azomela iqoqo elincane elisebenzayo elizoba khona phakathi nesikhathi sohlelo.

Ngakho-ke, i-autoscaler yethu ibhalwe kuPython 3, isebenzisa i-Ambari API ukuphatha izinsizakalo zeqoqo, isebenzisa I-API evela ku-Mail.ru Cloud Solutions (MCS) yokuqala kanye nokumisa imishini.

Isixazululo sezakhiwo

  1. Imodyuli autoscaler.py. Iqukethe amakilasi amathathu: 1) imisebenzi yokusebenza ne-Ambari, 2) imisebenzi yokusebenza ne-MCS, 3) imisebenzi ehlobene ngokuqondile nomqondo we-autoscaler.
  2. Iskripthi observer.py. Ikakhulukazi iqukethe imithetho ehlukene: nini futhi ngaziphi izikhathi zokubiza imisebenzi ye-autoscaler.
  3. Ifayela lokucushwa config.py. Iqukethe, isibonelo, uhlu lwama-node avunyelwe i-autoscaling namanye amapharamitha athinta, isibonelo, ukuthi uzolinda isikhathi esingakanani kusukela lapho i-node entsha yengezwe. Kukhona nezitembu zesikhathi zokuqala kwamakilasi, ukuze ngaphambi kwekilasi kwethulwe ukulungiselelwa okuphezulu okuvunyelwe kweqoqo.

Manje ake sibheke izingcezu zekhodi ngaphakathi kwamafayela amabili okuqala.

1. Imojula ye-Autoscaler.py

Ikilasi le-Ambari

Yile ndlela ucezu lwekhodi oluqukethe ikilasi lubukeka ngayo Ambari:

class Ambari:
    def __init__(self, ambari_url, cluster_name, headers, auth):
        self.ambari_url = ambari_url
        self.cluster_name = cluster_name
        self.headers = headers
        self.auth = auth

    def stop_all_services(self, hostname):
        url = self.ambari_url + self.cluster_name + '/hosts/' + hostname + '/host_components/'
        url2 = self.ambari_url + self.cluster_name + '/hosts/' + hostname
        req0 = requests.get(url2, headers=self.headers, auth=self.auth)
        services = req0.json()['host_components']
        services_list = list(map(lambda x: x['HostRoles']['component_name'], services))
        data = {
            "RequestInfo": {
                "context":"Stop All Host Components",
                "operation_level": {
                    "level":"HOST",
                    "cluster_name": self.cluster_name,
                    "host_names": hostname
                },
                "query":"HostRoles/component_name.in({0})".format(",".join(services_list))
            },
            "Body": {
                "HostRoles": {
                    "state":"INSTALLED"
                }
            }
        }
        req = requests.put(url, data=json.dumps(data), headers=self.headers, auth=self.auth)
        if req.status_code in [200, 201, 202]:
            message = 'Request accepted'
        else:
            message = req.status_code
        return message

Ngenhla, njengesibonelo, ungabheka ukuqaliswa komsebenzi stop_all_services, emisa zonke izinsizakalo endaweni efiselekayo ye-cluster.

Emnyango wekilasi Ambari uyadlula:

  • ambari_url, isibonelo, njenge 'http://localhost:8080/api/v1/clusters/',
  • cluster_name - igama leqembu lakho e-Ambari,
  • headers = {'X-Requested-By': 'ambari'}
  • nangaphakathi auth nali igama lakho lokungena nephasiwedi ye-Ambari: auth = ('login', 'password').

Umsebenzi ngokwawo awulutho ngaphandle kwezingcingo ezimbalwa nge-REST API ukuya e-Ambari. Ngokombono onengqondo, siqala ngokuthola uhlu lwezinsizakalo ezisebenzayo endaweni, bese sibuza kuqoqo elinikeziwe, endaweni enikeziwe, ukudlulisa izinsizakalo zisuka ohlwini ziye kuhulumeni. INSTALLED. Imisebenzi yokuqalisa zonke izinsiza, zokudlulisa amanodi esimeni Maintenance njll. zibukeka zifana - ziyizicelo ezimbalwa nge-API.

I-Class Mcs

Yile ndlela ucezu lwekhodi oluqukethe ikilasi lubukeka ngayo Mcs:

class Mcs:
    def __init__(self, id1, id2, password):
        self.id1 = id1
        self.id2 = id2
        self.password = password
        self.mcs_host = 'https://infra.mail.ru:8774/v2.1'

    def vm_turn_on(self, hostname):
        self.token = self.get_mcs_token()
        host = self.hostname_to_vmname(hostname)
        vm_id = self.get_vm_id(host)
        mcs_url1 = self.mcs_host + '/servers/' + self.vm_id + '/action'
        headers = {
            'X-Auth-Token': '{0}'.format(self.token),
            'Content-Type': 'application/json'
        }
        data = {'os-start' : 'null'}
        mcs = requests.post(mcs_url1, data=json.dumps(data), headers=headers)
        return mcs.status_code

Emnyango wekilasi Mcs sidlulisa i-id yephrojekthi ngaphakathi kwefu kanye ne-id yomsebenzisi, kanye nephasiwedi yakhe. Kumsebenzi vm_turn_on sifuna ukuvula omunye wemishini. I-logic lapha iyinkimbinkimbi kakhulu. Ekuqaleni kwekhodi, eminye imisebenzi emithathu ibizwa ngokuthi: 1) sidinga ukuthola ithokheni, 2) sidinga ukuguqula igama lomethuleli egameni lomshini ku-MCS, 3) thola i-id yalo mshini. Okulandelayo, simane senze isicelo sokuthunyelwe bese sethula lo mshini.

Nakhu ukuthi ubukeka kanjani umsebenzi wokuthola ithokheni:

def get_mcs_token(self):
        url = 'https://infra.mail.ru:35357/v3/auth/tokens?nocatalog'
        headers = {'Content-Type': 'application/json'}
        data = {
            'auth': {
                'identity': {
                    'methods': ['password'],
                    'password': {
                        'user': {
                            'id': self.id1,
                            'password': self.password
                        }
                    }
                },
                'scope': {
                    'project': {
                        'id': self.id2
                    }
                }
            }
        }
        params = (('nocatalog', ''),)
        req = requests.post(url, data=json.dumps(data), headers=headers, params=params)
        self.token = req.headers['X-Subject-Token']
        return self.token

Ikilasi le-Autoscaler

Leli klasi liqukethe imisebenzi ehlobene ne-logic yokusebenza ngokwayo.

Nansi indlela ucezu lwekhodi lwaleli klasi lubukeka ngayo:

class Autoscaler:
    def __init__(self, ambari, mcs, scaling_hosts, yarn_ram_per_node, yarn_cpu_per_node):
        self.scaling_hosts = scaling_hosts
        self.ambari = ambari
        self.mcs = mcs
        self.q_ram = deque()
        self.q_cpu = deque()
        self.num = 0
        self.yarn_ram_per_node = yarn_ram_per_node
        self.yarn_cpu_per_node = yarn_cpu_per_node

    def scale_down(self, hostname):
        flag1 = flag2 = flag3 = flag4 = flag5 = False
        if hostname in self.scaling_hosts:
            while True:
                time.sleep(5)
                status1 = self.ambari.decommission_nodemanager(hostname)
                if status1 == 'Request accepted' or status1 == 500:
                    flag1 = True
                    logging.info('Decomission request accepted: {0}'.format(flag1))
                    break
            while True:
                time.sleep(5)
                status3 = self.ambari.check_service(hostname, 'NODEMANAGER')
                if status3 == 'INSTALLED':
                    flag3 = True
                    logging.info('Nodemaneger decommissioned: {0}'.format(flag3))
                    break
            while True:
                time.sleep(5)
                status2 = self.ambari.maintenance_on(hostname)
                if status2 == 'Request accepted' or status2 == 500:
                    flag2 = True
                    logging.info('Maintenance request accepted: {0}'.format(flag2))
                    break
            while True:
                time.sleep(5)
                status4 = self.ambari.check_maintenance(hostname, 'NODEMANAGER')
                if status4 == 'ON' or status4 == 'IMPLIED_FROM_HOST':
                    flag4 = True
                    self.ambari.stop_all_services(hostname)
                    logging.info('Maintenance is on: {0}'.format(flag4))
                    logging.info('Stopping services')
                    break
            time.sleep(90)
            status5 = self.mcs.vm_turn_off(hostname)
            while True:
                time.sleep(5)
                status5 = self.mcs.get_vm_info(hostname)['server']['status']
                if status5 == 'SHUTOFF':
                    flag5 = True
                    logging.info('VM is turned off: {0}'.format(flag5))
                    break
            if flag1 and flag2 and flag3 and flag4 and flag5:
                message = 'Success'
                logging.info('Scale-down finished')
                logging.info('Cooldown period has started. Wait for several minutes')
        return message

Samukela amakilasi ukuze singene. Ambari ΠΈ Mcs, uhlu lwama-node avunyelwe ukukala, kanye nemingcele yokumisa ama-node: inkumbulo kanye ne-cpu eyabelwe i-node ku-YARN. Kukhona futhi amapharamitha angu-2 angaphakathi q_ram, q_cpu, angulayini. Ngokuwasebenzisa, sigcina amanani omthwalo wamanje weqoqo. Uma sibona ukuthi emizuzwini emi-5 edlule kube nomthwalo okhushulwe ngokungaguquki, khona-ke sinquma ukuthi sidinga ukungeza i-node engu-+1 kuqoqo. Okufanayo kuyiqiniso nge-cluster underutility state.

Ikhodi engenhla iyisibonelo somsebenzi osusa umshini kuqoqo bese uwumisa emafini. Okokuqala kukhona ukuhoxiswa YARN Nodemanager, bese imodi iyavuleka Maintenance, bese simisa zonke izinsizakalo emshinini bese sivala umshini obonakalayo emafini.

2. Isibukeli sesikripthi.py

Ikhodi yesampula esuka lapho:

if scaler.assert_up(config.scale_up_thresholds) == True:
        hostname = cloud.get_vm_to_up(config.scaling_hosts)
        if hostname != None:
            status1 = scaler.scale_up(hostname)
            if status1 == 'Success':
                text = {"text": "{0} has been successfully scaled-up".format(hostname)}
                post = {"text": "{0}".format(text)}
                json_data = json.dumps(post)
                req = requests.post(webhook, data=json_data.encode('ascii'), headers={'Content-Type': 'application/json'})
                time.sleep(config.cooldown_period*60)

Kuyo, sihlola ukuthi ingabe izimo zidalwe yini ukuze kukhuliswe umthamo weqoqo nokuthi ingabe ikhona yini imishini ebekiwe, sithole igama lomethuleli womunye wayo, siwungeze kuqoqo futhi sishicilele umlayezo ngakho ku-Slack yeqembu lethu. Ngemva kwalokho iqala cooldown_period, lapho singangezi noma singakhiphi lutho kuqoqo, kodwa umane siqaphe umthwalo. Uma izinzile futhi ingaphakathi kwephaseji lamanani aphezulu omthwalo, sizobe sesiqhubeka nokuqapha. Uma i-node eyodwa ibinganele, bese sengeza enye.

Ezimweni lapho sinesifundo esingaphambili, sesivele sazi ngokuqinisekile ukuthi i-node eyodwa ngeke yanele, ngakho-ke siqala ngokushesha wonke ama-node mahhala futhi sigcine sisebenza kuze kube sekupheleni kwesifundo. Lokhu kwenzeka kusetshenziswa uhlu lwezitembu zesikhathi zomsebenzi.

isiphetho

I-Autoscaler iyisixazululo esihle nesilungele kulawo macala uma uhlangabezana nokulayisha kweqoqo elingalingani. Ngesikhathi esifanayo ufinyelela ukucushwa kwe-cluster oyifunayo yemithwalo ephezulu futhi ngesikhathi esifanayo ungagcini leli qoqo ngesikhathi sokulayisha, ukonga imali. Yebo, futhi lokhu konke kwenzeka ngokuzenzakalelayo ngaphandle kokuhlanganyela kwakho. I-autoscaler ngokwayo ayiyona into engaphezu kwesethi yezicelo ku-API yomphathi weqoqo kanye ne-API yomhlinzeki wamafu, ebhalwe ngokuvumelana nomqondo othile. Okufanele nakanjani ukukhumbule ukuhlukaniswa kwama-node abe yizinhlobo ezi-3, njengoba sibhale ngaphambili. Futhi uzojabula.

Source: www.habr.com

Engeza amazwana