Aloha! Hoʻomaʻamaʻa mākou i nā kānaka e hana me ka ʻikepili nui. ʻAʻole hiki ke noʻonoʻo i kahi papahana hoʻonaʻauao ma nā ʻikepili nui me ka ʻole o kāna puʻupuʻu ponoʻī, kahi e hana pū ai nā poʻe a pau. No kēia kumu, loaʻa mau i kā mākou papahana :) Ke komo nei mākou i kāna hoʻonohonoho, hoʻokani a me ka hoʻokele, a hoʻomaka pololei nā kāne i nā hana MapReduce ma laila a hoʻohana iā Spark.
Ma kēia pou e haʻi mākou iā ʻoe pehea mākou i hoʻoponopono ai i ka pilikia o ka hoʻouka ʻana i ka puʻupuʻu ʻole ma ke kākau ʻana i kā mākou autoscaler me ka hoʻohana ʻana i ke ao.
pilikia
ʻAʻole hoʻohana ʻia kā mākou pūʻulu ma kahi ʻano maʻamau. ʻAʻole kūlike loa ka hoʻolei ʻana. Eia kekahi laʻana, aia nā papa hana, i ka wā e hele ai nā kānaka 30 a me kahi kumu i ka hui a hoʻomaka e hoʻohana. A i ʻole, aia kekahi mau lā ma mua o ka lā palena ke piʻi nui ka ukana. ʻO ke koena o ka manawa e hana ai ka pūʻulu ma ke ʻano underload.
ʻO ka hopena #1, ʻo ia ka mālama ʻana i kahi puʻupuʻu e kūpaʻa i nā haʻahaʻa kiʻekiʻe, akā e hana ʻole i ke koena o ka manawa.
ʻO ka hoʻonā #2 ka mālama ʻana i kahi puʻupuʻu liʻiliʻi, kahi āu e hoʻohui lima ai i nā nodes ma mua o nā papa a i ka wā o nā haʻahaʻa kiʻekiʻe.
ʻO ka hopena #3 ka mālama ʻana i kahi puʻupuʻu liʻiliʻi a kākau i kahi autoscaler e nānā i ka ukana o kēia manawa o ka pūʻulu a, me ka hoʻohana ʻana i nā API like ʻole, hoʻohui a wehe i nā node mai ka pūʻulu.
Ma kēia pou e kamaʻilio mākou e pili ana i ka hopena #3. Ke hilinaʻi nui nei kēia autoscaler i nā kumu o waho ma mua o nā mea i loko, a ʻaʻole hāʻawi pinepine nā mea hoʻolako. Hoʻohana mākou i ka Mail.ru Cloud Solutions cloud infrastructure a kākau i kahi autoscaler me ka MCS API. A ʻoiai mākou e aʻo pehea e hana me ka ʻikepili, ua hoʻoholo mākou e hōʻike pehea e hiki ai iā ʻoe ke kākau i kahi autoscaler like no kāu mau kumu ponoʻī a hoʻohana me kāu ao.
elue
ʻO ka mea mua, pono ʻoe i kahi hui Hadoop. No ka laʻana, hoʻohana mākou i ka puʻunaue HDP.
I mea e hoʻohui a wehe koke ʻia kāu mau node, pono e loaʻa iā ʻoe kekahi mahele o nā kuleana ma waena o nā nodes.
- Pupu kumu. ʻAe, ʻaʻohe mea e pono ai e wehewehe ma aneʻi: ʻo ka node nui o ka pūʻulu, kahi, no ka laʻana, hoʻokuʻu ʻia ka mea hoʻokele Spark, inā ʻoe e hoʻohana i ka mode interactive.
- Node lā. ʻO kēia ka node kahi āu e mālama ai i ka ʻikepili ma HDFS a kahi e helu ai.
- Helu helu. ʻO kēia kahi node kahi āu e mālama ʻole ai i kekahi mea ma HDFS, akā kahi e helu ai.
Mea nui. E hana ʻia ka autoscaling ma muli o nā nodes o ke ʻano ʻekolu. Inā hoʻomaka ʻoe e lawe a hoʻohui i nā node o ke ʻano ʻelua, e haʻahaʻa loa ka wikiwiki o ka pane - ʻo ka decommissioning a me ka hoʻopaʻa hou ʻana he mau hola ma kāu puʻupuʻu. ʻO kēia, ʻoiaʻiʻo, ʻaʻole ia ka mea āu e manaʻo ai mai ka autoscaling. ʻO ia hoʻi, ʻaʻole mākou e hoʻopā i nā nodes o nā ʻano mua a me ka lua. E hōʻike ana lākou i kahi puʻupuʻu liʻiliʻi liʻiliʻi e ola i loko o ka lōʻihi o ka papahana.
No laila, kākau ʻia kā mākou autoscaler ma Python 3, hoʻohana i ka API Ambari e hoʻokele i nā lawelawe cluster, hoʻohana
Hoʻolālā hoʻonā
- Module
autoscaler.py
. Loaʻa iā ia ʻekolu mau papa: 1) nā hana no ka hana ʻana me Ambari, 2) nā hana no ka hana ʻana me MCS, 3) nā hana e pili pono ana i ka loiloi o ka autoscaler. - Palapala
observer.py
. ʻO ka mea nui aia nā lula like ʻole: i ka manawa a me nā manawa hea e kāhea ai i nā hana autoscaler. - faila hoʻonohonoho
config.py
. Aia, no ka laʻana, he papa inoa o nā nodes i ʻae ʻia no ka autoscaling a me nā ʻāpana ʻē aʻe e pili ana, no ka laʻana, pehea ka lōʻihi o ka kali ʻana mai ka manawa i hoʻohui ʻia ai kahi node hou. Aia kekahi mau kaha manawa no ka hoʻomaka ʻana o nā papa, no laila, ma mua o ka papa e hoʻokuʻu ʻia ka hoʻonohonoho hui pū ʻana i ʻae ʻia.
E nānā kākou i nā ʻāpana code i loko o nā faila mua ʻelua.
1. Autoscaler.py module
Papa Ambari
ʻO kēia ke ʻano o kahi ʻāpana code i loaʻa kahi papa Ambari
:
class Ambari:
def __init__(self, ambari_url, cluster_name, headers, auth):
self.ambari_url = ambari_url
self.cluster_name = cluster_name
self.headers = headers
self.auth = auth
def stop_all_services(self, hostname):
url = self.ambari_url + self.cluster_name + '/hosts/' + hostname + '/host_components/'
url2 = self.ambari_url + self.cluster_name + '/hosts/' + hostname
req0 = requests.get(url2, headers=self.headers, auth=self.auth)
services = req0.json()['host_components']
services_list = list(map(lambda x: x['HostRoles']['component_name'], services))
data = {
"RequestInfo": {
"context":"Stop All Host Components",
"operation_level": {
"level":"HOST",
"cluster_name": self.cluster_name,
"host_names": hostname
},
"query":"HostRoles/component_name.in({0})".format(",".join(services_list))
},
"Body": {
"HostRoles": {
"state":"INSTALLED"
}
}
}
req = requests.put(url, data=json.dumps(data), headers=self.headers, auth=self.auth)
if req.status_code in [200, 201, 202]:
message = 'Request accepted'
else:
message = req.status_code
return message
Ma luna, ma keʻano he laʻana, hiki iāʻoe ke nānā i ka hoʻokōʻana i ka hana stop_all_services
, ka mea e hooki ai i na lawelawe a pau ma ka node huikau i makemakeia.
Ma ka puka o ka papa Ambari
hele ʻoe:
ambari_url
, no ka laʻana, like'http://localhost:8080/api/v1/clusters/'
,cluster_name
- ka inoa o kāu hui ma Ambari,headers = {'X-Requested-By': 'ambari'}
- a i loko
auth
eia kāu kau inoa a me kāu ʻōlelo huna no Ambari:auth = ('login', 'password')
.
ʻO ka hana ponoʻī ʻaʻole ia ma mua o nā kelepona ʻelua ma o ka REST API iā Ambari. Mai ka manaʻo kūpono, loaʻa mua iā mākou kahi papa inoa o nā lawelawe e holo ana ma kahi node, a laila e noi ma kahi hui i hāʻawi ʻia, ma kahi node i hāʻawi ʻia, e hoʻololi i nā lawelawe mai ka papa inoa i ka mokuʻāina. INSTALLED
. Nā hana no ka hoʻomaka ʻana i nā lawelawe āpau, no ka hoʻololi ʻana i nā nodes i ka mokuʻāina Maintenance
etc. like like - he mau noi liʻiliʻi wale nō lākou ma o ka API.
Papa Mcs
ʻO kēia ke ʻano o kahi ʻāpana code i loaʻa kahi papa Mcs
:
class Mcs:
def __init__(self, id1, id2, password):
self.id1 = id1
self.id2 = id2
self.password = password
self.mcs_host = 'https://infra.mail.ru:8774/v2.1'
def vm_turn_on(self, hostname):
self.token = self.get_mcs_token()
host = self.hostname_to_vmname(hostname)
vm_id = self.get_vm_id(host)
mcs_url1 = self.mcs_host + '/servers/' + self.vm_id + '/action'
headers = {
'X-Auth-Token': '{0}'.format(self.token),
'Content-Type': 'application/json'
}
data = {'os-start' : 'null'}
mcs = requests.post(mcs_url1, data=json.dumps(data), headers=headers)
return mcs.status_code
Ma ka puka o ka papa Mcs
Hoʻokomo mākou i ka id papahana i loko o ke ao a me ka mea hoʻohana id, a me kāna ʻōlelo huna. Ma ka hana vm_turn_on
makemake mākou e hoʻā i kekahi o nā mīkini. ʻOi aku ka paʻakikī o ka loiloi ma aneʻi. I ka hoʻomaka ʻana o ke code, ʻekolu mau hana ʻē aʻe i kapa ʻia: 1) pono mākou e kiʻi i kahi hōʻailona, 2) pono mākou e hoʻololi i ka hostname i ka inoa o ka mīkini ma MCS, 3) kiʻi i ka id o kēia mīkini. A laila, hana mākou i kahi noi leka a hoʻomaka i kēia mīkini.
ʻO kēia ke ʻano o ka hana no ka loaʻa ʻana o kahi hōʻailona:
def get_mcs_token(self):
url = 'https://infra.mail.ru:35357/v3/auth/tokens?nocatalog'
headers = {'Content-Type': 'application/json'}
data = {
'auth': {
'identity': {
'methods': ['password'],
'password': {
'user': {
'id': self.id1,
'password': self.password
}
}
},
'scope': {
'project': {
'id': self.id2
}
}
}
}
params = (('nocatalog', ''),)
req = requests.post(url, data=json.dumps(data), headers=headers, params=params)
self.token = req.headers['X-Subject-Token']
return self.token
Papa autoscaler
Aia kēia papa i nā hana e pili ana i ka loina hana pono'ī.
ʻO kēia ke ʻano o kahi ʻāpana code no kēia papa:
class Autoscaler:
def __init__(self, ambari, mcs, scaling_hosts, yarn_ram_per_node, yarn_cpu_per_node):
self.scaling_hosts = scaling_hosts
self.ambari = ambari
self.mcs = mcs
self.q_ram = deque()
self.q_cpu = deque()
self.num = 0
self.yarn_ram_per_node = yarn_ram_per_node
self.yarn_cpu_per_node = yarn_cpu_per_node
def scale_down(self, hostname):
flag1 = flag2 = flag3 = flag4 = flag5 = False
if hostname in self.scaling_hosts:
while True:
time.sleep(5)
status1 = self.ambari.decommission_nodemanager(hostname)
if status1 == 'Request accepted' or status1 == 500:
flag1 = True
logging.info('Decomission request accepted: {0}'.format(flag1))
break
while True:
time.sleep(5)
status3 = self.ambari.check_service(hostname, 'NODEMANAGER')
if status3 == 'INSTALLED':
flag3 = True
logging.info('Nodemaneger decommissioned: {0}'.format(flag3))
break
while True:
time.sleep(5)
status2 = self.ambari.maintenance_on(hostname)
if status2 == 'Request accepted' or status2 == 500:
flag2 = True
logging.info('Maintenance request accepted: {0}'.format(flag2))
break
while True:
time.sleep(5)
status4 = self.ambari.check_maintenance(hostname, 'NODEMANAGER')
if status4 == 'ON' or status4 == 'IMPLIED_FROM_HOST':
flag4 = True
self.ambari.stop_all_services(hostname)
logging.info('Maintenance is on: {0}'.format(flag4))
logging.info('Stopping services')
break
time.sleep(90)
status5 = self.mcs.vm_turn_off(hostname)
while True:
time.sleep(5)
status5 = self.mcs.get_vm_info(hostname)['server']['status']
if status5 == 'SHUTOFF':
flag5 = True
logging.info('VM is turned off: {0}'.format(flag5))
break
if flag1 and flag2 and flag3 and flag4 and flag5:
message = 'Success'
logging.info('Scale-down finished')
logging.info('Cooldown period has started. Wait for several minutes')
return message
ʻAe mākou i nā papa no ke komo ʻana. Ambari
и Mcs
, he papa inoa o nā nodes i ʻae ʻia no ka scaling, a me nā ʻāpana hoʻonohonoho node: hoʻomanaʻo a me cpu i hoʻokaʻawale ʻia i ka node ma YARN. Aia kekahi 2 mau ʻāpana kūloko q_ram, q_cpu, ʻo ia nā queues. Ke hoʻohana nei iā lākou, mālama mākou i nā waiwai o ka ukana cluster o kēia manawa. Inā ʻike mākou i nā minuke 5 i hala ua hoʻonui mau ʻia ka ukana, a laila hoʻoholo mākou e pono mākou e hoʻohui i ka +1 node i ka hui. Pela no ka hui underutilization state.
ʻO ke code ma luna nei he laʻana o kahi hana e wehe ai i kahi mīkini mai ka pūʻulu a hoʻopaʻa iā ia i ke ao. ʻO ka mua, aia kahi decommissioning YARN Nodemanager
, a laila ho'āla ke ʻano Maintenance
, a laila hoʻopau mākou i nā lawelawe āpau ma ka mīkini a hoʻopau i ka mīkini virtual i ke ao.
2. Script observer.py
Laʻana code mai laila mai:
if scaler.assert_up(config.scale_up_thresholds) == True:
hostname = cloud.get_vm_to_up(config.scaling_hosts)
if hostname != None:
status1 = scaler.scale_up(hostname)
if status1 == 'Success':
text = {"text": "{0} has been successfully scaled-up".format(hostname)}
post = {"text": "{0}".format(text)}
json_data = json.dumps(post)
req = requests.post(webhook, data=json_data.encode('ascii'), headers={'Content-Type': 'application/json'})
time.sleep(config.cooldown_period*60)
I loko o ia mea, nānā mākou inā ua hana ʻia nā kūlana no ka hoʻonui ʻana i ka hiki o ka puʻupuʻu a inā he mau mīkini i mālama ʻia, e kiʻi i ka inoa inoa o kekahi o lākou, e hoʻohui i ka hui a hoʻolaha i kahi leka e pili ana iā ia ma kā mākou hui Slack. Ma hope o ka hoʻomaka ʻana cooldown_period
, inā ʻaʻole mākou e hoʻohui a wehe paha i kekahi mea mai ka pūʻulu, akā e nānā wale i ka ukana. Inā ua kūpaʻa ʻo ia a aia i loko o ke ala o nā koina ukana maikaʻi loa, a laila hoʻomau mākou i ka nānā ʻana. Inā ʻaʻole lawa kekahi node, a laila hoʻohui mākou i kekahi.
No nā hihia i loaʻa iā mākou kahi haʻawina ma mua, ua ʻike maopopo mākou ʻaʻole lawa ka node, no laila hoʻomaka koke mākou i nā node manuahi āpau a mālama iā lākou a hiki i ka hopena o ka haʻawina. Hana kēia me ka papa inoa o nā kaha manawa hana.
hopena
ʻO Autoscaler kahi hopena maikaʻi a kūpono hoʻi no kēlā mau hihia ke ʻike ʻoe i ka hoʻouka ʻana i ka hui. Loaʻa iā ʻoe i ka manawa like ka hoʻonohonoho cluster makemake no nā haʻahaʻa kiʻekiʻe a ma ka manawa like ʻole e mālama i kēia pūʻulu i ka wā o ka hoʻouka ʻana, e mālama i ke kālā. ʻAe, hoʻohui ʻia kēia mau mea āpau me ka ʻole o kou komo ʻana. ʻO ka autoscaler ponoʻī he mea ʻē aʻe ma mua o kahi hoʻonohonoho o nā noi i ka cluster manager API a me ka API hāʻawi kapua, i kākau ʻia e like me kekahi loiloi. ʻO ka mea āu e hoʻomanaʻo pono ai, ʻo ia ka mahele o nā nodes i 3 mau ʻano, e like me kā mākou i kākau mua ai. A e hauʻoli ʻoe.
Source: www.habr.com