Yuav ua li cas ua koj tus kheej autoscaler rau ib pawg

Nyob zoo! Peb cob qhia cov neeg ua haujlwm nrog cov ntaub ntawv loj. Nws tsis tuaj yeem xav txog qhov kev kawm ntawm cov ntaub ntawv loj yam tsis muaj nws pawg, uas txhua tus neeg koom ua haujlwm ua ke. Vim li no, peb qhov kev pab cuam ib txwm muaj nws πŸ™‚ Peb tau koom nrog nws txoj kev teeb tsa, kho thiab kev tswj hwm, thiab cov hais mav ncaj qha pib MapReduce txoj haujlwm nyob ntawd thiab siv Spark.

Hauv cov ntawv tshaj tawm no peb yuav qhia koj li cas peb daws qhov teeb meem ntawm qhov tsis sib xws ntawm pawg thauj khoom los ntawm kev sau peb tus kheej autoscaler siv huab Mail.ru Huab Solutions.

teeb meem

Peb pawg tsis yog siv nyob rau hauv ib tug raug hom. Kev pov tseg yog qhov tsis sib xws. Piv txwv li, muaj cov chav kawm ua tau zoo, thaum tag nrho 30 tus neeg thiab tus kws qhia ntawv mus rau pawg thiab pib siv nws. Los yog dua, muaj hnub ua ntej lub sijhawm kawg thaum lub load nce ntau heev. Lub sijhawm so ntawm pawg ua haujlwm hauv hom underload.

Kev daws # 1 yog khaws cov pawg uas yuav tiv taus qhov siab tshaj plaws, tab sis yuav nyob twj ywm rau lub sijhawm so.

Tshuaj #2 yog khaws ib pawg me me, uas koj manually ntxiv cov nodes ua ntej cov chav kawm thiab thaum lub sij hawm ncov loads.

Kev daws #3 yog khaws ib pawg me me thiab sau ib qho autoscaler uas yuav saib xyuas cov khoom tam sim no ntawm pawg thiab, siv ntau yam APIs, ntxiv thiab tshem cov nodes ntawm pawg.

Hauv tsab xov xwm no peb yuav tham txog kev daws teeb meem #3. Qhov autoscaler no muaj kev vam khom rau sab nraud ntau dua li cov khoom siv sab hauv, thiab cov kws kho mob feem ntau tsis muab nws. Peb siv Mail.ru Cloud Solutions huab infrastructure thiab sau ib qho autoscaler siv MCS API. Thiab txij li thaum peb qhia yuav ua li cas ua hauj lwm nrog cov ntaub ntawv, peb txiav txim siab los qhia yuav ua li cas koj yuav sau ib tug zoo xws li autoscaler rau koj tus kheej lub hom phiaj thiab siv nws nrog koj huab.

yuavtsum tau kawm uantej

Ua ntej, koj yuav tsum muaj Hadoop pawg. Piv txwv li, peb siv HDP faib.

Txhawm rau kom koj cov nodes tau ntxiv thiab tshem tawm sai sai, koj yuav tsum muaj qee yam kev faib tawm ntawm cov nodes.

  1. Master node. Zoo, tsis tas yuav piav dab tsi tshwj xeeb: lub ntsiab ntawm cov pawg, uas, piv txwv li, Spark tsav tau pib, yog tias koj siv hom kev sib tham.
  2. Hnub tim. Qhov no yog cov node uas koj khaws cov ntaub ntawv ntawm HDFS thiab qhov chaw suav.
  3. Cov node. Qhov no yog ib qho ntawm qhov uas koj tsis khaws ib yam dab tsi ntawm HDFS, tab sis qhov twg suav tau tshwm sim.

Qhov tseem ceeb. Autoscaling yuav tshwm sim vim cov nodes ntawm hom thib peb. Yog tias koj pib noj thiab ntxiv cov nodes ntawm hom thib ob, cov lus teb ceev yuav tsawg heev - decommissioning thiab recommitting yuav siv sij hawm ntau teev ntawm koj pawg. Qhov no, ntawm chav kawm, tsis yog qhov koj xav tau los ntawm autoscaling. Ntawd yog, peb tsis kov cov nodes ntawm thawj thiab thib ob hom. Lawv yuav sawv cev rau pawg tsawg kawg nkaus uas yuav muaj nyob thoob plaws lub sijhawm ntawm qhov kev zov me nyuam.

Yog li, peb autoscaler tau sau rau hauv Python 3, siv Ambari API los tswj cov kev pabcuam hauv pawg, siv API los ntawm Mail.ru Huab Solutions (MCS) rau pib thiab nres tshuab.

Solution architecture

  1. Module autoscaler.py. Nws muaj peb chav kawm: 1) ua haujlwm rau kev ua haujlwm nrog Ambari, 2) ua haujlwm rau kev ua haujlwm nrog MCS, 3) kev ua haujlwm ncaj qha rau lub logic ntawm autoscaler.
  2. Tsab ntawv observer.py. Qhov tseem ceeb nws muaj cov cai sib txawv: thaum twg thiab lub sijhawm twg los hu rau autoscaler functions.
  3. Configuration file config.py. Nws muaj, piv txwv li, ib daim ntawv teev cov nodes tso cai rau autoscaling thiab lwm yam tsis muaj kev cuam tshuam, piv txwv li, yuav tos ntev npaum li cas los ntawm lub sij hawm ib tug tshiab node ntxiv. Kuj tseem muaj cov ntawv teev sijhawm rau kev pib ntawm cov chav kawm, yog li ua ntej chav kawm, qhov siab tshaj plaws tau tso cai pawg teeb tsa tau pib.

Tam sim no cia peb saib cov ntawv code hauv thawj ob cov ntaub ntawv.

1. Autoscaler.py module

Ambari chav kawm

Qhov no yog dab tsi ib daim code uas muaj cov chav kawm zoo li Ambari:

class Ambari:
    def __init__(self, ambari_url, cluster_name, headers, auth):
        self.ambari_url = ambari_url
        self.cluster_name = cluster_name
        self.headers = headers
        self.auth = auth

    def stop_all_services(self, hostname):
        url = self.ambari_url + self.cluster_name + '/hosts/' + hostname + '/host_components/'
        url2 = self.ambari_url + self.cluster_name + '/hosts/' + hostname
        req0 = requests.get(url2, headers=self.headers, auth=self.auth)
        services = req0.json()['host_components']
        services_list = list(map(lambda x: x['HostRoles']['component_name'], services))
        data = {
            "RequestInfo": {
                "context":"Stop All Host Components",
                "operation_level": {
                    "level":"HOST",
                    "cluster_name": self.cluster_name,
                    "host_names": hostname
                },
                "query":"HostRoles/component_name.in({0})".format(",".join(services_list))
            },
            "Body": {
                "HostRoles": {
                    "state":"INSTALLED"
                }
            }
        }
        req = requests.put(url, data=json.dumps(data), headers=self.headers, auth=self.auth)
        if req.status_code in [200, 201, 202]:
            message = 'Request accepted'
        else:
            message = req.status_code
        return message

Saum toj no, ua piv txwv, koj tuaj yeem saib qhov kev siv ntawm kev ua haujlwm stop_all_services, uas nres tag nrho cov kev pab cuam ntawm qhov xav tau pawg node.

Ntawm qhov nkag mus rau hauv chav kawm Ambari koj hla:

  • ambari_url, npr 'http://localhost:8080/api/v1/clusters/',
  • cluster_name - Lub npe ntawm koj pawg hauv Ambari,
  • headers = {'X-Requested-By': 'ambari'}
  • thiab sab hauv auth Nov yog koj tus username thiab password rau Ambari: auth = ('login', 'password').

Txoj haujlwm nws tus kheej tsis muaj dab tsi ntau tshaj li ob peb hu ntawm REST API rau Ambari. Los ntawm qhov kev xav ntawm kev xav, peb thawj zaug tau txais ib daim ntawv teev cov kev pabcuam khiav ntawm lub node, thiab tom qab ntawd nug ntawm ib pawg, ntawm ib qho ntawm qhov muab, hloov cov kev pabcuam los ntawm cov npe mus rau lub xeev. INSTALLED. Functions rau launching tag nrho cov kev pab cuam, rau kev hloov nodes rau lub xeev Maintenance thiab lwm yam zoo sib xws - lawv tsuas yog qee qhov kev thov los ntawm API.

Chav kawm Mcs

Qhov no yog dab tsi ib daim code uas muaj cov chav kawm zoo li Mcs:

class Mcs:
    def __init__(self, id1, id2, password):
        self.id1 = id1
        self.id2 = id2
        self.password = password
        self.mcs_host = 'https://infra.mail.ru:8774/v2.1'

    def vm_turn_on(self, hostname):
        self.token = self.get_mcs_token()
        host = self.hostname_to_vmname(hostname)
        vm_id = self.get_vm_id(host)
        mcs_url1 = self.mcs_host + '/servers/' + self.vm_id + '/action'
        headers = {
            'X-Auth-Token': '{0}'.format(self.token),
            'Content-Type': 'application/json'
        }
        data = {'os-start' : 'null'}
        mcs = requests.post(mcs_url1, data=json.dumps(data), headers=headers)
        return mcs.status_code

Ntawm qhov nkag mus rau hauv chav kawm Mcs peb dhau qhov project id hauv huab thiab tus neeg siv id, nrog rau nws tus password. Hauv kev ua haujlwm vm_turn_on peb xav qhib ib lub tshuab. Lub logic ntawm no yog qhov nyuaj me ntsis. Thaum pib ntawm tus lej, peb lwm txoj haujlwm hu ua: 1) peb yuav tsum tau txais lub cim, 2) peb yuav tsum hloov lub npe hostname rau hauv lub npe ntawm lub tshuab hauv MCS, 3) tau txais tus ID ntawm lub tshuab no. Tom ntej no, peb tsuas yog ua ib daim ntawv thov thiab tso lub tshuab no.

Nov yog qhov ua haujlwm kom tau txais lub token zoo li:

def get_mcs_token(self):
        url = 'https://infra.mail.ru:35357/v3/auth/tokens?nocatalog'
        headers = {'Content-Type': 'application/json'}
        data = {
            'auth': {
                'identity': {
                    'methods': ['password'],
                    'password': {
                        'user': {
                            'id': self.id1,
                            'password': self.password
                        }
                    }
                },
                'scope': {
                    'project': {
                        'id': self.id2
                    }
                }
            }
        }
        params = (('nocatalog', ''),)
        req = requests.post(url, data=json.dumps(data), headers=headers, params=params)
        self.token = req.headers['X-Subject-Token']
        return self.token

Autoscaler chav kawm

Cov chav kawm no muaj cov haujlwm ntsig txog kev khiav hauj lwm logic nws tus kheej.

Nov yog qee qhov code rau chav kawm no zoo li:

class Autoscaler:
    def __init__(self, ambari, mcs, scaling_hosts, yarn_ram_per_node, yarn_cpu_per_node):
        self.scaling_hosts = scaling_hosts
        self.ambari = ambari
        self.mcs = mcs
        self.q_ram = deque()
        self.q_cpu = deque()
        self.num = 0
        self.yarn_ram_per_node = yarn_ram_per_node
        self.yarn_cpu_per_node = yarn_cpu_per_node

    def scale_down(self, hostname):
        flag1 = flag2 = flag3 = flag4 = flag5 = False
        if hostname in self.scaling_hosts:
            while True:
                time.sleep(5)
                status1 = self.ambari.decommission_nodemanager(hostname)
                if status1 == 'Request accepted' or status1 == 500:
                    flag1 = True
                    logging.info('Decomission request accepted: {0}'.format(flag1))
                    break
            while True:
                time.sleep(5)
                status3 = self.ambari.check_service(hostname, 'NODEMANAGER')
                if status3 == 'INSTALLED':
                    flag3 = True
                    logging.info('Nodemaneger decommissioned: {0}'.format(flag3))
                    break
            while True:
                time.sleep(5)
                status2 = self.ambari.maintenance_on(hostname)
                if status2 == 'Request accepted' or status2 == 500:
                    flag2 = True
                    logging.info('Maintenance request accepted: {0}'.format(flag2))
                    break
            while True:
                time.sleep(5)
                status4 = self.ambari.check_maintenance(hostname, 'NODEMANAGER')
                if status4 == 'ON' or status4 == 'IMPLIED_FROM_HOST':
                    flag4 = True
                    self.ambari.stop_all_services(hostname)
                    logging.info('Maintenance is on: {0}'.format(flag4))
                    logging.info('Stopping services')
                    break
            time.sleep(90)
            status5 = self.mcs.vm_turn_off(hostname)
            while True:
                time.sleep(5)
                status5 = self.mcs.get_vm_info(hostname)['server']['status']
                if status5 == 'SHUTOFF':
                    flag5 = True
                    logging.info('VM is turned off: {0}'.format(flag5))
                    break
            if flag1 and flag2 and flag3 and flag4 and flag5:
                message = 'Success'
                logging.info('Scale-down finished')
                logging.info('Cooldown period has started. Wait for several minutes')
        return message

Peb txais cov chav kawm rau kev nkag. Ambari ΠΈ Mcs, ib daim ntawv teev cov nodes uas tau tso cai rau scaling, raws li zoo raws li node configuration parameters: nco thiab cpu faib rau cov node nyob rau hauv YARN. Kuj tseem muaj 2 qhov ntsuas sab hauv q_ram, q_cpu, uas yog cov kab. Siv lawv, peb khaws cov nqi ntawm cov khoom tam sim no. Yog tias peb pom tias dhau 5 feeb dhau los tau muaj kev nce ntxiv, ces peb txiav txim siab tias peb yuav tsum ntxiv +1 node rau pawg. Ib yam yog qhov tseeb rau pawg underutilization xeev.

Cov cai saum toj no yog ib qho piv txwv ntawm kev ua haujlwm uas tshem tawm lub tshuab ntawm pawg thiab nres nws hauv huab. Ua ntej muaj kev decommissioning YARN Nodemanager, ces hom tig rau Maintenance, ces peb nres tag nrho cov kev pabcuam ntawm lub tshuab thiab tua lub tshuab virtual hauv huab.

2. Script observer.py

Sample code los ntawm qhov ntawd:

if scaler.assert_up(config.scale_up_thresholds) == True:
        hostname = cloud.get_vm_to_up(config.scaling_hosts)
        if hostname != None:
            status1 = scaler.scale_up(hostname)
            if status1 == 'Success':
                text = {"text": "{0} has been successfully scaled-up".format(hostname)}
                post = {"text": "{0}".format(text)}
                json_data = json.dumps(post)
                req = requests.post(webhook, data=json_data.encode('ascii'), headers={'Content-Type': 'application/json'})
                time.sleep(config.cooldown_period*60)

Hauv nws, peb xyuas seb cov xwm txheej puas tau tsim los ua kom muaj peev xwm ntawm pawg thiab seb puas muaj cov tshuab nyob hauv cia, tau txais lub npe hostname ntawm ib qho ntawm lawv, ntxiv rau pawg thiab tshaj tawm cov lus hais txog nws ntawm peb pab pawg Slack. Tom qab ntawd nws pib cooldown_period, thaum peb tsis ntxiv lossis tshem tawm ib yam dab tsi los ntawm pawg, tab sis tsuas yog saib xyuas cov load. Yog hais tias nws tau stabilized thiab nyob rau hauv txoj kev ntawm kev pom zoo load qhov tseem ceeb, ces peb tsuas mus saib xyuas. Yog tias ib qho tsis txaus, ces peb ntxiv ib qho ntxiv.

Rau cov xwm txheej thaum peb muaj cov lus qhia ua ntej, peb twb paub tseeb tias ib qho ntawm qhov yuav tsis txaus, yog li peb tam sim ntawd pib tag nrho cov nodes dawb thiab ua kom lawv nquag mus txog thaum kawg ntawm zaj lus qhia. Qhov no tshwm sim siv cov npe teev sijhawm ua haujlwm.

xaus

Autoscaler yog qhov kev daws teeb meem zoo thiab yooj yim rau cov xwm txheej no thaum koj ntsib qhov tsis sib xws ntawm pawg thauj khoom. Koj ib txhij ua tiav qhov xav tau pawg teeb tsa rau qhov siab tshaj plaws loads thiab tib lub sijhawm tsis txhob khaws cov pawg no thaum lub sijhawm thauj khoom, txuag nyiaj. Zoo, ntxiv rau qhov no txhua yam tshwm sim tsis muaj koj txoj kev koom tes. Lub autoscaler nws tus kheej tsis muaj dab tsi ntau tshaj li qhov kev thov rau pawg tswj hwm API thiab huab muab API, sau raws li qee qhov laj thawj. Dab tsi koj yuav tsum nco ntsoov yog kev faib cov nodes rau 3 hom, raws li peb tau sau ua ntej. Thiab koj yuav zoo siab.

Tau qhov twg los: www.hab.com

Ntxiv ib saib