Maitiro ekugadzira yako autoscaler yeboka

Mhoro! Isu tinodzidzisa vanhu kushanda nedata hombe. Hazvibviri kufungidzira chirongwa chedzidzo pane data hombe pasina sumbu rayo, iro vatori vechikamu vese vanoshanda pamwechete. Nechikonzero ichi, chirongwa chedu chinogara chiinacho πŸ™‚ Isu tiri mukugadzirisa kwayo, tuning uye manejimendi, uye vakomana vanovhura zvakananga MapReduce mabasa ipapo uye shandisa Spark.

Mune ino post tichakuudza magadzirisiro atakaita dambudziko rekusaenzana kurodha nekunyora yedu autoscaler tichishandisa gore. Mail.ru Cloud Solutions.

dambudziko

Cluster yedu haishandiswe mune yakajairwa modhi. Kurasa hakuna kuenzana zvakanyanya. Somuenzaniso, kune makirasi anoshanda, apo vanhu vose 30 nomudzidzisi vanoenda kuboka racho ndokutanga kurishandisa. Kana zvakare, kune mazuva pamberi penguva yekupedzisira apo mutoro unowedzera zvakanyanya. Imwe nguva iyo cluster inoshanda mune underload mode.

Sarudzo #1 ndeyekuchengeta sumbu rinoshingirira mitoro yepamusoro, asi richave risingaite nguva yese.

Solution #2 ndeyekuchengeta diki sumbu, iro iwe pachako unowedzera node pamberi pemakirasi uye panguva yepamusoro mitoro.

Sarudzo #3 ndeyekuchengeta diki diki uye kunyora autoscaler iyo ichaongorora iyo ikozvino mutoro weboka uye, uchishandisa akasiyana APIs, kuwedzera uye kubvisa node kubva musumbu.

Mune ino post tichataura nezve mhinduro #3. Iyi autoscaler inotsamira zvakanyanya pazvinhu zvekunze kwete zvemukati, uye vanopa kazhinji havazvipi. Isu tinoshandisa iyo Mail.ru Cloud Solutions cloud infrastructure uye takanyora autoscaler tichishandisa MCS API. Uye sezvo isu tichidzidzisa maitiro ekushanda nedata, isu takasarudza kuratidza kuti iwe unogona sei kunyora yakafanana autoscaler nekuda kwezvinangwa zvako uye woishandisa negore rako.

Zvinotarisirwa

Kutanga, iwe unofanirwa kuve neHadoop cluster. Semuenzaniso, tinoshandisa kugoverwa kweHDP.

Kuti node dzako dzikurumidze kuwedzerwa uye kubviswa, iwe unofanirwa kuve nekumwe kugovera kwemabasa pakati pemanodhi.

  1. Master node. Zvakanaka, hapana chikonzero chekutsanangura chero chinhu kunyanya: iyo huru node yesumbu, iyo, semuenzaniso, mutyairi weSpark anotangwa, kana ukashandisa iyo inopindirana modhi.
  2. Date node. Iyi ndiyo node yaunochengetera data paHDFS uye uko kuverenga kunoitika.
  3. Computing node. Iyi inzvimbo yausingachengete chero chinhu paHDFS, asi uko kuverenga kunoitika.

Pfungwa inokosha. Autoscaling ichaitika nekuda kwemanodhi emhando yechitatu. Kana iwe ukatanga kutora nekuwedzera ma node emhando yechipiri, kukurumidza kwekupindura kunenge kwakadzikira - kubvisa uye kudzorera kunotora maawa pacluster yako. Izvi, hongu, hazvisi izvo zvaunotarisira kubva ku autoscaling. Ndiko kuti, hatibati node dzemhando yekutanga neyechipiri. Ivo vanomiririra diki diki rinogoneka cluster richave riripo mukati menguva yese yechirongwa.

Saka, yedu autoscaler yakanyorwa muPython 3, inoshandisa iyo Ambari API kubata masevhisi emasumbu, anoshandisa. API kubva kuMail.ru Cloud Solutions (MCS) yekutanga nekumisa michina.

Solution architecture

  1. Module autoscaler.py. Iine makirasi matatu: 1) mabasa ekushanda neAmbari, 2) mabasa ekushanda neMCS, 3) mabasa ane chokuita zvakananga kune logic ye autoscaler.
  2. Script observer.py. Chaizvoizvo ine mitemo yakasiyana: riini uye panguva dzipi dzekudaidza iyo autoscaler mabasa.
  3. Configuration file config.py. Iyo ine, semuenzaniso, runyorwa rwemanodhi anotenderwa autoscaling uye mamwe ma paramita anobata, semuenzaniso, inguva yakadii kumirira kubva panguva iyo node itsva yakawedzerwa. Kune zvakare timestamps yekutanga kwemakirasi, kuitira kuti pamberi pekirasi iyo yakanyanya kubvumidzwa cluster kumisikidzwa inotangwa.

Ngatitarisei zvidimbu zvekodhi mukati memafaira maviri ekutanga.

1. Autoscaler.py module

Ambari class

Izvi ndizvo zvinoita chidimbu chekodhi ine kirasi inotaridzika Ambari:

class Ambari:
    def __init__(self, ambari_url, cluster_name, headers, auth):
        self.ambari_url = ambari_url
        self.cluster_name = cluster_name
        self.headers = headers
        self.auth = auth

    def stop_all_services(self, hostname):
        url = self.ambari_url + self.cluster_name + '/hosts/' + hostname + '/host_components/'
        url2 = self.ambari_url + self.cluster_name + '/hosts/' + hostname
        req0 = requests.get(url2, headers=self.headers, auth=self.auth)
        services = req0.json()['host_components']
        services_list = list(map(lambda x: x['HostRoles']['component_name'], services))
        data = {
            "RequestInfo": {
                "context":"Stop All Host Components",
                "operation_level": {
                    "level":"HOST",
                    "cluster_name": self.cluster_name,
                    "host_names": hostname
                },
                "query":"HostRoles/component_name.in({0})".format(",".join(services_list))
            },
            "Body": {
                "HostRoles": {
                    "state":"INSTALLED"
                }
            }
        }
        req = requests.put(url, data=json.dumps(data), headers=self.headers, auth=self.auth)
        if req.status_code in [200, 201, 202]:
            message = 'Request accepted'
        else:
            message = req.status_code
        return message

Pamusoro, semuenzaniso, unogona kutarisa kushandiswa kwebasa racho stop_all_services, iyo inomisa masevhisi ese pane inodiwa cluster node.

Pamusuwo wekirasi Ambari unopfuura:

  • ambari_url, somuenzaniso, kufanana 'http://localhost:8080/api/v1/clusters/',
  • cluster_name - zita reboka rako muAmbari,
  • headers = {'X-Requested-By': 'ambari'}
  • uye mukati auth heino nzvimbo yako yekupinda uye password yeAmbari: auth = ('login', 'password').

Basa racho pacharo hachisi chinhu chinopfuura mafoni mashoma kuburikidza neREST API kuenda kuAmbari. Kubva pamaonero ane musoro, tinotanga tagashira runyoro rwekumhanyisa masevhisi pane node, tozobvunza pane rakapihwa cluster, pane yakapihwa node, kuendesa masevhisi kubva kurondedzero kuenda kudunhu. INSTALLED. Mabasa ekutangisa ese masevhisi, ekuendesa node kuenda kune state Maintenance nezvimwe zvinotaridzika zvakafanana - zvinongori zvikumbiro zvishoma kuburikidza neAPI.

Kirasi Mcs

Izvi ndizvo zvinoita chidimbu chekodhi ine kirasi inotaridzika Mcs:

class Mcs:
    def __init__(self, id1, id2, password):
        self.id1 = id1
        self.id2 = id2
        self.password = password
        self.mcs_host = 'https://infra.mail.ru:8774/v2.1'

    def vm_turn_on(self, hostname):
        self.token = self.get_mcs_token()
        host = self.hostname_to_vmname(hostname)
        vm_id = self.get_vm_id(host)
        mcs_url1 = self.mcs_host + '/servers/' + self.vm_id + '/action'
        headers = {
            'X-Auth-Token': '{0}'.format(self.token),
            'Content-Type': 'application/json'
        }
        data = {'os-start' : 'null'}
        mcs = requests.post(mcs_url1, data=json.dumps(data), headers=headers)
        return mcs.status_code

Pamusuwo wekirasi Mcs tinopfuudza id yeprojekiti mukati megore uye id yemushandisi, pamwe nepassword yake. Mukushanda vm_turn_on tinoda kubatidza mumwe wemichina. Mafungiro ari pano akati wedzerei. Pakutanga kwekodhi, mamwe mabasa matatu anonzi: 1) tinoda kuwana chiratidzo, 2) tinoda kushandura zita remuenzi kuti rive zita remuchina muMCS, 3) tora id yemuchina uyu. Tevere, isu tinongoita chikumbiro chekutumira uye kuvhura muchina uyu.

Izvi ndizvo zvinoita basa rekutora chiratidzo:

def get_mcs_token(self):
        url = 'https://infra.mail.ru:35357/v3/auth/tokens?nocatalog'
        headers = {'Content-Type': 'application/json'}
        data = {
            'auth': {
                'identity': {
                    'methods': ['password'],
                    'password': {
                        'user': {
                            'id': self.id1,
                            'password': self.password
                        }
                    }
                },
                'scope': {
                    'project': {
                        'id': self.id2
                    }
                }
            }
        }
        params = (('nocatalog', ''),)
        req = requests.post(url, data=json.dumps(data), headers=headers, params=params)
        self.token = req.headers['X-Subject-Token']
        return self.token

Autoscaler kirasi

Kirasi iyi ine mabasa ane chekuita neanoshanda logic pachayo.

Izvi ndizvo zvinoita chidimbu chekodhi yekirasi iyi:

class Autoscaler:
    def __init__(self, ambari, mcs, scaling_hosts, yarn_ram_per_node, yarn_cpu_per_node):
        self.scaling_hosts = scaling_hosts
        self.ambari = ambari
        self.mcs = mcs
        self.q_ram = deque()
        self.q_cpu = deque()
        self.num = 0
        self.yarn_ram_per_node = yarn_ram_per_node
        self.yarn_cpu_per_node = yarn_cpu_per_node

    def scale_down(self, hostname):
        flag1 = flag2 = flag3 = flag4 = flag5 = False
        if hostname in self.scaling_hosts:
            while True:
                time.sleep(5)
                status1 = self.ambari.decommission_nodemanager(hostname)
                if status1 == 'Request accepted' or status1 == 500:
                    flag1 = True
                    logging.info('Decomission request accepted: {0}'.format(flag1))
                    break
            while True:
                time.sleep(5)
                status3 = self.ambari.check_service(hostname, 'NODEMANAGER')
                if status3 == 'INSTALLED':
                    flag3 = True
                    logging.info('Nodemaneger decommissioned: {0}'.format(flag3))
                    break
            while True:
                time.sleep(5)
                status2 = self.ambari.maintenance_on(hostname)
                if status2 == 'Request accepted' or status2 == 500:
                    flag2 = True
                    logging.info('Maintenance request accepted: {0}'.format(flag2))
                    break
            while True:
                time.sleep(5)
                status4 = self.ambari.check_maintenance(hostname, 'NODEMANAGER')
                if status4 == 'ON' or status4 == 'IMPLIED_FROM_HOST':
                    flag4 = True
                    self.ambari.stop_all_services(hostname)
                    logging.info('Maintenance is on: {0}'.format(flag4))
                    logging.info('Stopping services')
                    break
            time.sleep(90)
            status5 = self.mcs.vm_turn_off(hostname)
            while True:
                time.sleep(5)
                status5 = self.mcs.get_vm_info(hostname)['server']['status']
                if status5 == 'SHUTOFF':
                    flag5 = True
                    logging.info('VM is turned off: {0}'.format(flag5))
                    break
            if flag1 and flag2 and flag3 and flag4 and flag5:
                message = 'Success'
                logging.info('Scale-down finished')
                logging.info('Cooldown period has started. Wait for several minutes')
        return message

Tinobvuma makirasi ekupinda. Ambari ΠΈ Mcs, rondedzero yemanodhi anotenderwa kuyera, pamwe neinodhi gadziriso paramita: ndangariro uye cpu yakagoverwa kune node muYARN. Kune zvakare maviri emukati ma paramita q_ram, q_cpu, ari mitsetse. Tichishandisa iwo, isu tinochengeta kukosha kweiyo ikozvino cluster mutoro. Kana tikaona kuti pamusoro pemaminetsi mashanu ekupedzisira pave paine mutoro wakawedzera, tobva tafunga kuti tinoda kuwedzera +2 node kuboka. Izvi zvakafanana kune iyo cluster underutilization state.

Iyo kodhi iri pamusoro muenzaniso webasa rinobvisa muchina kubva musumbu uye kuimisa mugore. Kutanga pane decommissioning YARN Nodemanager, ipapo modhi inobatidza Maintenance, tobva tamisa masevhisi ese pamushini todzima virtual muchina mugore.

2. Script monitorer.py

Sample code kubva ipapo:

if scaler.assert_up(config.scale_up_thresholds) == True:
        hostname = cloud.get_vm_to_up(config.scaling_hosts)
        if hostname != None:
            status1 = scaler.scale_up(hostname)
            if status1 == 'Success':
                text = {"text": "{0} has been successfully scaled-up".format(hostname)}
                post = {"text": "{0}".format(text)}
                json_data = json.dumps(post)
                req = requests.post(webhook, data=json_data.encode('ascii'), headers={'Content-Type': 'application/json'})
                time.sleep(config.cooldown_period*60)

Mariri, isu tinoongorora kana mamiriro akagadzirwa ekuwedzera huwandu hwesumbu uye kana paine chero muchina wakachengetwa, tora zita remugadziri wemumwe wavo, wedzera kune cluster uye ushambadze meseji nezvazvo paSlack yechikwata chedu. Mushure mezvo zvinotanga cooldown_period, patinenge tisingawedzere kana kubvisa chero chinhu kubva musumbu, asi kungotarisa mutoro. Kana yakagadzikana uye iri mukati memukoridho weyakanyanya mitoro kukosha, isu tinongoenderera mberi nekutarisa. Kana imwe node yakanga isina kukwana, saka tinowedzera imwe.

Kune zviitiko kana tine chidzidzo chiri mberi, isu tatoziva nechokwadi kuti imwe node haizove yakakwana, saka isu tinobva tangotanga ese emahara node uye tirambe tichishanda kusvika pakupera kwechidzidzo. Izvi zvinoitika uchishandisa runyoro rwezviitiko zvenguva.

mhedziso

Autoscaler yakanaka uye iri nyore mhinduro kune idzo nyaya kana iwe ukaona kusaenzana kurodha. Iwe panguva imwe chete zadzisa inodiwa cluster kumisikidzwa kwepamusoro mitoro uye panguva imwechete usachengete iyi cluster panguva yekurodha, kuchengetedza mari. Zvakanaka, uye izvi zvese zvinoitika otomatiki pasina kutora chikamu chako. Iyo autoscaler pachayo haisi chimwe chinhu kunze kweseti yezvikumbiro kune iyo cluster maneja API uye gore rinopa API, yakanyorwa zvinoenderana neimwe pfungwa. Chaunonyatso fanirwa kurangarira ndiko kupatsanurwa kwemanodhi mumhando nhatu, sezvatakanyora pakutanga. Uye iwe uchafara.

Source: www.habr.com

Voeg