Sawubona! Siqeqesha abantu ukuthi basebenze ngedatha enkulu. Akunakwenzeka ukucabanga ngohlelo lokufundisa ngedatha enkulu ngaphandle kweqoqo layo, lapho bonke abahlanganyeli besebenza ndawonye. Ngalesi sizathu, uhlelo lwethu luhlala lunalo π Simatasa ekuyicupheni, ekulungiseni nasekulawuleni, futhi abafana bethula ngokuqondile imisebenzi ye-MapReduce lapho futhi basebenzise i-Spark.
Kulokhu okuthunyelwe sizokutshela ukuthi siyixazulule kanjani inkinga yokulayisha kweqoqo elingalingani ngokubhala i-autoscaler yethu sisebenzisa ifu.
Inkinga
Iqoqo lethu alisetshenziswa kumodi ejwayelekile. Ukulahlwa akulingani kakhulu. Isibonelo, kunamakilasi okusebenza, lapho bonke abantu abangu-30 kanye nothisha beya eqenjini futhi baqale ukulisebenzisa. Noma futhi, kunezinsuku ngaphambi komnqamulajuqu lapho umthwalo ukhuphuka kakhulu. Isikhathi esisele iqoqo lisebenza kumodi yokulayisha kancane.
Isixazululo #1 siwukugcina iqoqo elizomelana nemithwalo ephezulu, kodwa elizobe lingenzi lutho sonke isikhathi.
Isixazululo #2 ukugcina iqoqo elincane, ongeza kulo mathupha ama-node ngaphambi kwamakilasi nangesikhathi sokulayisha okuphezulu.
Isixazululo #3 ukugcina iqoqo elincane bese ubhala i-autoscaler ezoqapha umthwalo wamanje weqoqo futhi, usebenzisa ama-API ahlukahlukene, wengeze futhi ususe ama-node kusukela kuqoqo.
Kulokhu okuthunyelwe sizokhuluma ngesixazululo #3. Lesi sici esizenzakalelayo sincike kakhulu ezintweni zangaphandle kunezangaphakathi, futhi abahlinzeki ngokuvamile abasinikezi. Sisebenzisa ingqalasizinda yefu ye-Mail.ru Cloud Solutions futhi sabhala i-autoscaler sisebenzisa i-MCS API. Futhi njengoba sifundisa ukuthi kusetshenzwa kanjani ngedatha, sinqume ukubonisa ukuthi ungabhala kanjani i-autoscaler efanayo ngezinjongo zakho futhi usisebenzise nefu lakho.
Okudingekayo
Okokuqala, kufanele ube neqoqo le-Hadoop. Isibonelo, sisebenzisa ukusatshalaliswa kwe-HDP.
Ukuze ama-node akho engezwe ngokushesha futhi asuswe, kufanele ube nokusabalalisa okuthile kwezindima phakathi kwama-node.
- I-master node. Hhayi-ke, asikho isidingo sokuchaza noma yini ikakhulukazi: i-node eyinhloko yeqoqo, lapho, ngokwesibonelo, umshayeli we-Spark wethulwa, uma usebenzisa imodi yokuxhumana.
- Inodi yedethi. Lena indawo ogcina kuyo idatha ku-HDFS nalapho kubalwa khona.
- I-computing node. Lena indawo lapho ungagcini khona lutho ku-HDFS, kodwa lapho kubalwa khona.
Iphuzu elibalulekile. Ukukala okuzenzakalelayo kuzokwenzeka ngenxa yamanodi ohlobo lwesithathu. Uma uqala ukuthatha nokwengeza ama-node ohlobo lwesibili, isivinini sokuphendula sizoba siphansi kakhulu - ukuyekisa ukusebenzisa kanye nokuphinda uvume kuzothatha amahora kuqoqo lakho. Lokhu, vele, akukhona okulindele ku-autoscaling. Okusho ukuthi, asiwathinti ama-node ohlobo lokuqala nolwesibili. Azomela iqoqo elincane elisebenzayo elizoba khona phakathi nesikhathi sohlelo.
Ngakho-ke, i-autoscaler yethu ibhalwe kuPython 3, isebenzisa i-Ambari API ukuphatha izinsizakalo zeqoqo, isebenzisa
Isixazululo sezakhiwo
- Imodyuli
autoscaler.py
. Iqukethe amakilasi amathathu: 1) imisebenzi yokusebenza ne-Ambari, 2) imisebenzi yokusebenza ne-MCS, 3) imisebenzi ehlobene ngokuqondile nomqondo we-autoscaler. - Iskripthi
observer.py
. Ikakhulukazi iqukethe imithetho ehlukene: nini futhi ngaziphi izikhathi zokubiza imisebenzi ye-autoscaler. - Ifayela lokucushwa
config.py
. Iqukethe, isibonelo, uhlu lwama-node avunyelwe i-autoscaling namanye amapharamitha athinta, isibonelo, ukuthi uzolinda isikhathi esingakanani kusukela lapho i-node entsha yengezwe. Kukhona nezitembu zesikhathi zokuqala kwamakilasi, ukuze ngaphambi kwekilasi kwethulwe ukulungiselelwa okuphezulu okuvunyelwe kweqoqo.
Manje ake sibheke izingcezu zekhodi ngaphakathi kwamafayela amabili okuqala.
1. Imojula ye-Autoscaler.py
Ikilasi le-Ambari
Yile ndlela ucezu lwekhodi oluqukethe ikilasi lubukeka ngayo Ambari
:
class Ambari:
def __init__(self, ambari_url, cluster_name, headers, auth):
self.ambari_url = ambari_url
self.cluster_name = cluster_name
self.headers = headers
self.auth = auth
def stop_all_services(self, hostname):
url = self.ambari_url + self.cluster_name + '/hosts/' + hostname + '/host_components/'
url2 = self.ambari_url + self.cluster_name + '/hosts/' + hostname
req0 = requests.get(url2, headers=self.headers, auth=self.auth)
services = req0.json()['host_components']
services_list = list(map(lambda x: x['HostRoles']['component_name'], services))
data = {
"RequestInfo": {
"context":"Stop All Host Components",
"operation_level": {
"level":"HOST",
"cluster_name": self.cluster_name,
"host_names": hostname
},
"query":"HostRoles/component_name.in({0})".format(",".join(services_list))
},
"Body": {
"HostRoles": {
"state":"INSTALLED"
}
}
}
req = requests.put(url, data=json.dumps(data), headers=self.headers, auth=self.auth)
if req.status_code in [200, 201, 202]:
message = 'Request accepted'
else:
message = req.status_code
return message
Ngenhla, njengesibonelo, ungabheka ukuqaliswa komsebenzi stop_all_services
, emisa zonke izinsizakalo endaweni efiselekayo ye-cluster.
Emnyango wekilasi Ambari
uyadlula:
ambari_url
, isibonelo, njenge'http://localhost:8080/api/v1/clusters/'
,cluster_name
- igama leqembu lakho e-Ambari,headers = {'X-Requested-By': 'ambari'}
- nangaphakathi
auth
nali igama lakho lokungena nephasiwedi ye-Ambari:auth = ('login', 'password')
.
Umsebenzi ngokwawo awulutho ngaphandle kwezingcingo ezimbalwa nge-REST API ukuya e-Ambari. Ngokombono onengqondo, siqala ngokuthola uhlu lwezinsizakalo ezisebenzayo endaweni, bese sibuza kuqoqo elinikeziwe, endaweni enikeziwe, ukudlulisa izinsizakalo zisuka ohlwini ziye kuhulumeni. INSTALLED
. Imisebenzi yokuqalisa zonke izinsiza, zokudlulisa amanodi esimeni Maintenance
njll. zibukeka zifana - ziyizicelo ezimbalwa nge-API.
I-Class Mcs
Yile ndlela ucezu lwekhodi oluqukethe ikilasi lubukeka ngayo Mcs
:
class Mcs:
def __init__(self, id1, id2, password):
self.id1 = id1
self.id2 = id2
self.password = password
self.mcs_host = 'https://infra.mail.ru:8774/v2.1'
def vm_turn_on(self, hostname):
self.token = self.get_mcs_token()
host = self.hostname_to_vmname(hostname)
vm_id = self.get_vm_id(host)
mcs_url1 = self.mcs_host + '/servers/' + self.vm_id + '/action'
headers = {
'X-Auth-Token': '{0}'.format(self.token),
'Content-Type': 'application/json'
}
data = {'os-start' : 'null'}
mcs = requests.post(mcs_url1, data=json.dumps(data), headers=headers)
return mcs.status_code
Emnyango wekilasi Mcs
sidlulisa i-id yephrojekthi ngaphakathi kwefu kanye ne-id yomsebenzisi, kanye nephasiwedi yakhe. Kumsebenzi vm_turn_on
sifuna ukuvula omunye wemishini. I-logic lapha iyinkimbinkimbi kakhulu. Ekuqaleni kwekhodi, eminye imisebenzi emithathu ibizwa ngokuthi: 1) sidinga ukuthola ithokheni, 2) sidinga ukuguqula igama lomethuleli egameni lomshini ku-MCS, 3) thola i-id yalo mshini. Okulandelayo, simane senze isicelo sokuthunyelwe bese sethula lo mshini.
Nakhu ukuthi ubukeka kanjani umsebenzi wokuthola ithokheni:
def get_mcs_token(self):
url = 'https://infra.mail.ru:35357/v3/auth/tokens?nocatalog'
headers = {'Content-Type': 'application/json'}
data = {
'auth': {
'identity': {
'methods': ['password'],
'password': {
'user': {
'id': self.id1,
'password': self.password
}
}
},
'scope': {
'project': {
'id': self.id2
}
}
}
}
params = (('nocatalog', ''),)
req = requests.post(url, data=json.dumps(data), headers=headers, params=params)
self.token = req.headers['X-Subject-Token']
return self.token
Ikilasi le-Autoscaler
Leli klasi liqukethe imisebenzi ehlobene ne-logic yokusebenza ngokwayo.
Nansi indlela ucezu lwekhodi lwaleli klasi lubukeka ngayo:
class Autoscaler:
def __init__(self, ambari, mcs, scaling_hosts, yarn_ram_per_node, yarn_cpu_per_node):
self.scaling_hosts = scaling_hosts
self.ambari = ambari
self.mcs = mcs
self.q_ram = deque()
self.q_cpu = deque()
self.num = 0
self.yarn_ram_per_node = yarn_ram_per_node
self.yarn_cpu_per_node = yarn_cpu_per_node
def scale_down(self, hostname):
flag1 = flag2 = flag3 = flag4 = flag5 = False
if hostname in self.scaling_hosts:
while True:
time.sleep(5)
status1 = self.ambari.decommission_nodemanager(hostname)
if status1 == 'Request accepted' or status1 == 500:
flag1 = True
logging.info('Decomission request accepted: {0}'.format(flag1))
break
while True:
time.sleep(5)
status3 = self.ambari.check_service(hostname, 'NODEMANAGER')
if status3 == 'INSTALLED':
flag3 = True
logging.info('Nodemaneger decommissioned: {0}'.format(flag3))
break
while True:
time.sleep(5)
status2 = self.ambari.maintenance_on(hostname)
if status2 == 'Request accepted' or status2 == 500:
flag2 = True
logging.info('Maintenance request accepted: {0}'.format(flag2))
break
while True:
time.sleep(5)
status4 = self.ambari.check_maintenance(hostname, 'NODEMANAGER')
if status4 == 'ON' or status4 == 'IMPLIED_FROM_HOST':
flag4 = True
self.ambari.stop_all_services(hostname)
logging.info('Maintenance is on: {0}'.format(flag4))
logging.info('Stopping services')
break
time.sleep(90)
status5 = self.mcs.vm_turn_off(hostname)
while True:
time.sleep(5)
status5 = self.mcs.get_vm_info(hostname)['server']['status']
if status5 == 'SHUTOFF':
flag5 = True
logging.info('VM is turned off: {0}'.format(flag5))
break
if flag1 and flag2 and flag3 and flag4 and flag5:
message = 'Success'
logging.info('Scale-down finished')
logging.info('Cooldown period has started. Wait for several minutes')
return message
Samukela amakilasi ukuze singene. Ambari
ΠΈ Mcs
, uhlu lwama-node avunyelwe ukukala, kanye nemingcele yokumisa ama-node: inkumbulo kanye ne-cpu eyabelwe i-node ku-YARN. Kukhona futhi amapharamitha angu-2 angaphakathi q_ram, q_cpu, angulayini. Ngokuwasebenzisa, sigcina amanani omthwalo wamanje weqoqo. Uma sibona ukuthi emizuzwini emi-5 edlule kube nomthwalo okhushulwe ngokungaguquki, khona-ke sinquma ukuthi sidinga ukungeza i-node engu-+1 kuqoqo. Okufanayo kuyiqiniso nge-cluster underutility state.
Ikhodi engenhla iyisibonelo somsebenzi osusa umshini kuqoqo bese uwumisa emafini. Okokuqala kukhona ukuhoxiswa YARN Nodemanager
, bese imodi iyavuleka Maintenance
, bese simisa zonke izinsizakalo emshinini bese sivala umshini obonakalayo emafini.
2. Isibukeli sesikripthi.py
Ikhodi yesampula esuka lapho:
if scaler.assert_up(config.scale_up_thresholds) == True:
hostname = cloud.get_vm_to_up(config.scaling_hosts)
if hostname != None:
status1 = scaler.scale_up(hostname)
if status1 == 'Success':
text = {"text": "{0} has been successfully scaled-up".format(hostname)}
post = {"text": "{0}".format(text)}
json_data = json.dumps(post)
req = requests.post(webhook, data=json_data.encode('ascii'), headers={'Content-Type': 'application/json'})
time.sleep(config.cooldown_period*60)
Kuyo, sihlola ukuthi ingabe izimo zidalwe yini ukuze kukhuliswe umthamo weqoqo nokuthi ingabe ikhona yini imishini ebekiwe, sithole igama lomethuleli womunye wayo, siwungeze kuqoqo futhi sishicilele umlayezo ngakho ku-Slack yeqembu lethu. Ngemva kwalokho iqala cooldown_period
, lapho singangezi noma singakhiphi lutho kuqoqo, kodwa umane siqaphe umthwalo. Uma izinzile futhi ingaphakathi kwephaseji lamanani aphezulu omthwalo, sizobe sesiqhubeka nokuqapha. Uma i-node eyodwa ibinganele, bese sengeza enye.
Ezimweni lapho sinesifundo esingaphambili, sesivele sazi ngokuqinisekile ukuthi i-node eyodwa ngeke yanele, ngakho-ke siqala ngokushesha wonke ama-node mahhala futhi sigcine sisebenza kuze kube sekupheleni kwesifundo. Lokhu kwenzeka kusetshenziswa uhlu lwezitembu zesikhathi zomsebenzi.
isiphetho
I-Autoscaler iyisixazululo esihle nesilungele kulawo macala uma uhlangabezana nokulayisha kweqoqo elingalingani. Ngesikhathi esifanayo ufinyelela ukucushwa kwe-cluster oyifunayo yemithwalo ephezulu futhi ngesikhathi esifanayo ungagcini leli qoqo ngesikhathi sokulayisha, ukonga imali. Yebo, futhi lokhu konke kwenzeka ngokuzenzakalelayo ngaphandle kokuhlanganyela kwakho. I-autoscaler ngokwayo ayiyona into engaphezu kwesethi yezicelo ku-API yomphathi weqoqo kanye ne-API yomhlinzeki wamafu, ebhalwe ngokuvumelana nomqondo othile. Okufanele nakanjani ukukhumbule ukuhlukaniswa kwama-node abe yizinhlobo ezi-3, njengoba sibhale ngaphambili. Futhi uzojabula.
Source: www.habr.com