ํด๋Ÿฌ์Šคํ„ฐ์— ๋Œ€ํ•œ ์ž์ฒด ์ž๋™ ํฌ๊ธฐ ์กฐ์ •๊ธฐ๋ฅผ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•

์•ˆ๋…•ํ•˜์„ธ์š”! ์šฐ๋ฆฌ๋Š” ๋น…๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋„๋ก ์‚ฌ๋žŒ๋“ค์„ ๊ต์œกํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  ์ฐธ๊ฐ€์ž๊ฐ€ ํ•จ๊ป˜ ์ž‘์—…ํ•˜๋Š” ์ž์ฒด ํด๋Ÿฌ์Šคํ„ฐ ์—†์ด ๋น…๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๊ต์œก ํ”„๋กœ๊ทธ๋žจ์„ ์ƒ์ƒํ•˜๋Š” ๊ฒƒ์€ ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฐ ์ด์œ ๋กœ ์šฐ๋ฆฌ ํ”„๋กœ๊ทธ๋žจ์—๋Š” ํ•ญ์ƒ ๊ทธ๋Ÿฐ ๊ฒƒ์ด ์žˆ์Šต๋‹ˆ๋‹ค ๐Ÿ™‚ ์šฐ๋ฆฌ๋Š” ๊ตฌ์„ฑ, ์กฐ์ • ๋ฐ ๊ด€๋ฆฌ์— ์ฐธ์—ฌํ•˜๊ณ  ์žˆ์œผ๋ฉฐ ์ง์›๋“ค์€ ๊ทธ๊ณณ์—์„œ ์ง์ ‘ MapReduce ์ž‘์—…์„ ์‹œ์ž‘ํ•˜๊ณ  Spark๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ด ๊ฒŒ์‹œ๋ฌผ์—์„œ๋Š” ํด๋ผ์šฐ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž์ฒด ์ž๋™ ํฌ๊ธฐ ์กฐ์ •๊ธฐ๋ฅผ ์ž‘์„ฑํ•˜์—ฌ ๊ณ ๋ฅด์ง€ ์•Š์€ ํด๋Ÿฌ์Šคํ„ฐ ๋กœ๋”ฉ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ ๋ฐฉ๋ฒ•์„ ์•Œ๋ ค ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค. Mail.ru ํด๋ผ์šฐ๋“œ ์†”๋ฃจ์…˜.

๋ฌธ์ œ

์šฐ๋ฆฌ ํด๋Ÿฌ์Šคํ„ฐ๋Š” ์ผ๋ฐ˜ ๋ชจ๋“œ์—์„œ ์‚ฌ์šฉ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ฒ˜๋ฆฌ๊ฐ€ ๋งค์šฐ ๊ณ ๋ฅด์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, 30๋ช… ์ „์›๊ณผ ์„ ์ƒ๋‹˜ ํ•œ ๋ช…์ด ํด๋Ÿฌ์Šคํ„ฐ์— ๊ฐ€์„œ ์‚ฌ์šฉํ•˜๊ธฐ ์‹œ์ž‘ํ•˜๋Š” ์‹ค์Šต ์ˆ˜์—…์ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ˜น์€ ๋งˆ๊ฐ์ผ์„ ์•ž๋‘๊ณ  ๋ถ€ํ•˜๊ฐ€ ํฌ๊ฒŒ ๋Š˜์–ด๋‚˜๋Š” ๋‚ ๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‚˜๋จธ์ง€ ์‹œ๊ฐ„์—๋Š” ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ์–ธ๋”๋กœ๋“œ ๋ชจ๋“œ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

์†”๋ฃจ์…˜ #1์€ ์ตœ๋Œ€ ๋ถ€ํ•˜๋ฅผ ๊ฒฌ๋”œ ์ˆ˜ ์žˆ์ง€๋งŒ ๋‚˜๋จธ์ง€ ์‹œ๊ฐ„์—๋Š” ์œ ํœด ์ƒํƒœ์ธ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์†”๋ฃจ์…˜ #2๋Š” ํด๋ž˜์Šค ์ด์ „๊ณผ ์ตœ๋Œ€ ๋กœ๋“œ ์ค‘์— ๋…ธ๋“œ๋ฅผ ์ˆ˜๋™์œผ๋กœ ์ถ”๊ฐ€ํ•˜๋Š” ์ž‘์€ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์†”๋ฃจ์…˜ #3์€ ์ž‘์€ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์œ ์ง€ํ•˜๊ณ  ํด๋Ÿฌ์Šคํ„ฐ์˜ ํ˜„์žฌ ๋กœ๋“œ๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ณ  ๋‹ค์–‘ํ•œ API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ๋…ธ๋“œ๋ฅผ ์ถ”๊ฐ€ ๋ฐ ์ œ๊ฑฐํ•˜๋Š” ์ž๋™ ํฌ๊ธฐ ์กฐ์ •๊ธฐ๋ฅผ ์ž‘์„ฑํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ์†”๋ฃจ์…˜ #3์— ๋Œ€ํ•ด ์ด์•ผ๊ธฐํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด ์ž๋™ ํฌ๊ธฐ ์กฐ์ •๊ธฐ๋Š” ๋‚ด๋ถ€ ์š”์ธ๋ณด๋‹ค๋Š” ์™ธ๋ถ€ ์š”์ธ์— ํฌ๊ฒŒ ์˜์กดํ•˜๋ฉฐ ๊ณต๊ธ‰์ž๊ฐ€ ์ด๋ฅผ ์ œ๊ณตํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” Mail.ru Cloud Solutions ํด๋ผ์šฐ๋“œ ์ธํ”„๋ผ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  MCS API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž๋™ ํฌ๊ธฐ ์กฐ์ •๊ธฐ๋ฅผ ์ž‘์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ฐ์ดํ„ฐ ์ž‘์—… ๋ฐฉ๋ฒ•์„ ๊ฐ€๋ฅด์น˜๊ธฐ ๋•Œ๋ฌธ์— ์ž์‹ ์˜ ๋ชฉ์ ์— ๋งž๊ฒŒ ์œ ์‚ฌํ•œ ์ž๋™ ํฌ๊ธฐ ์กฐ์ •๊ธฐ๋ฅผ ์ž‘์„ฑํ•˜๊ณ  ํด๋ผ์šฐ๋“œ์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ ์ฃผ๊ธฐ๋กœ ๊ฒฐ์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.

์‚ฌ์ „ ์กฐ๊ฑด

๋จผ์ € Hadoop ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด HDP ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋…ธ๋“œ๋ฅผ ์‹ ์†ํ•˜๊ฒŒ ์ถ”๊ฐ€ํ•˜๊ณ  ์ œ๊ฑฐํ•˜๋ ค๋ฉด ๋…ธ๋“œ ๊ฐ„์— ์ผ์ •ํ•œ ์—ญํ•  ๋ถ„๋ฐฐ๊ฐ€ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

  1. ๋งˆ์Šคํ„ฐ ๋…ธ๋“œ. ๊ธ€์Ž„, ํŠน๋ณ„ํžˆ ์„ค๋ช…ํ•  ํ•„์š”๋Š” ์—†์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋Œ€ํ™”ํ˜• ๋ชจ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ Spark ๋“œ๋ผ์ด๋ฒ„๊ฐ€ ์‹คํ–‰๋˜๋Š” ํด๋Ÿฌ์Šคํ„ฐ์˜ ๊ธฐ๋ณธ ๋…ธ๋“œ์ž…๋‹ˆ๋‹ค.
  2. ๋‚ ์งœ ๋…ธ๋“œ. ์ด๋Š” HDFS์— ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ณ  ๊ณ„์‚ฐ์ด ์ˆ˜ํ–‰๋˜๋Š” ๋…ธ๋“œ์ž…๋‹ˆ๋‹ค.
  3. ์ปดํ“จํŒ… ๋…ธ๋“œ. ์ด๋Š” HDFS์— ์•„๋ฌด ๊ฒƒ๋„ ์ €์žฅํ•˜์ง€ ์•Š์ง€๋งŒ ๊ณ„์‚ฐ์ด ์ด๋ฃจ์–ด์ง€๋Š” ๋…ธ๋“œ์ž…๋‹ˆ๋‹ค.

์ค‘์š”ํ•œ ์ . ์„ธ ๋ฒˆ์งธ ์œ ํ˜•์˜ ๋…ธ๋“œ๋กœ ์ธํ•ด ์ž๋™ ํ™•์žฅ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ์œ ํ˜•์˜ ๋…ธ๋“œ๋ฅผ ๊ฐ€์ ธ์™€ ์ถ”๊ฐ€ํ•˜๊ธฐ ์‹œ์ž‘ํ•˜๋ฉด ์‘๋‹ต ์†๋„๊ฐ€ ๋งค์šฐ ๋Š๋ ค์ง‘๋‹ˆ๋‹ค. ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ํ•ด์ œํ•˜๊ณ  ๋‹ค์‹œ ์ปค๋ฐ‹ํ•˜๋Š” ๋ฐ ๋ช‡ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฝ๋‹ˆ๋‹ค. ๋ฌผ๋ก  ์ด๋Š” ์ž๋™ ํฌ๊ธฐ ์กฐ์ •์—์„œ ๊ธฐ๋Œ€ํ•˜๋Š” ๊ฒƒ๊ณผ๋Š” ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ์ฆ‰, ์ฒซ ๋ฒˆ์งธ ๋ฐ ๋‘ ๋ฒˆ์งธ ์œ ํ˜•์˜ ๋…ธ๋“œ๋ฅผ ๊ฑด๋“œ๋ฆฌ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํ”„๋กœ๊ทธ๋žจ ๊ธฐ๊ฐ„ ๋™์•ˆ ์กด์žฌํ•  ์ตœ์†Œ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ ์ž๋™ ํฌ๊ธฐ ์กฐ์ •๊ธฐ๋Š” Python 3์œผ๋กœ ์ž‘์„ฑ๋˜์—ˆ์œผ๋ฉฐ Ambari API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํด๋Ÿฌ์Šคํ„ฐ ์„œ๋น„์Šค๋ฅผ ๊ด€๋ฆฌํ•˜๊ณ  Mail.ru ํด๋ผ์šฐ๋“œ ์†”๋ฃจ์…˜์˜ API (MCS) ๊ธฐ๊ณ„ ์‹œ๋™ ๋ฐ ์ •์ง€์šฉ.

์†”๋ฃจ์…˜ ์•„ํ‚คํ…์ฒ˜

  1. ๊ธฐ์ค€ ์น˜์ˆ˜ autoscaler.py. ์—ฌ๊ธฐ์—๋Š” 1) Ambari ์ž‘์—…์„ ์œ„ํ•œ ํ•จ์ˆ˜, 2) MCS ์ž‘์—…์„ ์œ„ํ•œ ํ•จ์ˆ˜, 3) ์ž๋™ ํฌ๊ธฐ ์กฐ์ •๊ธฐ์˜ ๋…ผ๋ฆฌ์™€ ์ง์ ‘ ๊ด€๋ จ๋œ ํ•จ์ˆ˜์˜ ์„ธ ๊ฐ€์ง€ ํด๋ž˜์Šค๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
  2. ์Šคํฌ๋ฆฝํŠธ observer.py. ๊ธฐ๋ณธ์ ์œผ๋กœ ์ด๋Š” ์ž๋™ ํฌ๊ธฐ ์กฐ์ • ๊ธฐ๋Šฅ์„ ํ˜ธ์ถœํ•  ์‹œ๊ธฐ์™€ ์ˆœ๊ฐ„ ๋“ฑ ๋‹ค์–‘ํ•œ ๊ทœ์น™์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.
  3. ๊ตฌ์„ฑ ํŒŒ์ผ config.py. ์˜ˆ๋ฅผ ๋“ค์–ด ์ž๋™ ํฌ๊ธฐ ์กฐ์ •์ด ํ—ˆ์šฉ๋˜๋Š” ๋…ธ๋“œ ๋ชฉ๋ก๊ณผ ์ƒˆ ๋…ธ๋“œ๊ฐ€ ์ถ”๊ฐ€๋œ ์ˆœ๊ฐ„๋ถ€ํ„ฐ ๋Œ€๊ธฐ ์‹œ๊ฐ„ ๋“ฑ์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๊ธฐํƒ€ ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ ์ˆ˜์—… ์‹œ์ž‘์— ๋Œ€ํ•œ ํƒ€์ž„์Šคํƒฌํ”„๋„ ์žˆ์œผ๋ฏ€๋กœ ์ˆ˜์—… ์ „์— ํ—ˆ์šฉ๋˜๋Š” ์ตœ๋Œ€ ํด๋Ÿฌ์Šคํ„ฐ ๊ตฌ์„ฑ์ด ์‹œ์ž‘๋ฉ๋‹ˆ๋‹ค.

์ด์ œ ์ฒ˜์Œ ๋‘ ํŒŒ์ผ ๋‚ด์˜ ์ฝ”๋“œ ์กฐ๊ฐ์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

1. Autoscaler.py ๋ชจ๋“ˆ

์•”๋ฐ”๋ฆฌ ์ˆ˜์—…

ํด๋ž˜์Šค๋ฅผ ํฌํ•จํ•˜๋Š” ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. Ambari:

class Ambari:
    def __init__(self, ambari_url, cluster_name, headers, auth):
        self.ambari_url = ambari_url
        self.cluster_name = cluster_name
        self.headers = headers
        self.auth = auth

    def stop_all_services(self, hostname):
        url = self.ambari_url + self.cluster_name + '/hosts/' + hostname + '/host_components/'
        url2 = self.ambari_url + self.cluster_name + '/hosts/' + hostname
        req0 = requests.get(url2, headers=self.headers, auth=self.auth)
        services = req0.json()['host_components']
        services_list = list(map(lambda x: x['HostRoles']['component_name'], services))
        data = {
            "RequestInfo": {
                "context":"Stop All Host Components",
                "operation_level": {
                    "level":"HOST",
                    "cluster_name": self.cluster_name,
                    "host_names": hostname
                },
                "query":"HostRoles/component_name.in({0})".format(",".join(services_list))
            },
            "Body": {
                "HostRoles": {
                    "state":"INSTALLED"
                }
            }
        }
        req = requests.put(url, data=json.dumps(data), headers=self.headers, auth=self.auth)
        if req.status_code in [200, 201, 202]:
            message = 'Request accepted'
        else:
            message = req.status_code
        return message

์œ„์˜ ์˜ˆ๋ฅผ ๋“ค์–ด ํ•จ์ˆ˜ ๊ตฌํ˜„์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. stop_all_services, ์›ํ•˜๋Š” ํด๋Ÿฌ์Šคํ„ฐ ๋…ธ๋“œ์˜ ๋ชจ๋“  ์„œ๋น„์Šค๋ฅผ ์ค‘์ง€ํ•ฉ๋‹ˆ๋‹ค.

๊ฐ•์˜์‹ค ์ž…๊ตฌ์—๋Š” Ambari ๋‹น์‹ ์€ ํ†ต๊ณผ:

  • ambari_url, ์˜ˆ๋ฅผ ๋“ค์–ด 'http://localhost:8080/api/v1/clusters/',
  • cluster_name โ€“ Ambari์˜ ํด๋Ÿฌ์Šคํ„ฐ ์ด๋ฆ„
  • headers = {'X-Requested-By': 'ambari'}
  • ๊ทธ๋ฆฌ๊ณ  ๋‚ด๋ถ€ auth Ambari์˜ ๋กœ๊ทธ์ธ ๋ฐ ๋น„๋ฐ€๋ฒˆํ˜ธ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. auth = ('login', 'password').

ํ•จ์ˆ˜ ์ž์ฒด๋Š” REST API๋ฅผ ํ†ตํ•ด Ambari์— ๋Œ€ํ•œ ๋ช‡ ๋ฒˆ์˜ ํ˜ธ์ถœ์— ์ง€๋‚˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋…ผ๋ฆฌ์ ์ธ ๊ด€์ ์—์„œ ๋ณผ ๋•Œ, ์šฐ๋ฆฌ๋Š” ๋จผ์ € ๋…ธ๋“œ์—์„œ ์‹คํ–‰ ์ค‘์ธ ์„œ๋น„์Šค ๋ชฉ๋ก์„ ๋ฐ›์€ ๋‹ค์Œ ํŠน์ • ํด๋Ÿฌ์Šคํ„ฐ, ํŠน์ • ๋…ธ๋“œ์—์„œ ๋ชฉ๋ก์—์„œ ์ƒํƒœ๋กœ ์„œ๋น„์Šค๋ฅผ ์ „์†กํ•˜๋„๋ก ์š”์ฒญํ•ฉ๋‹ˆ๋‹ค. INSTALLED. ๋ชจ๋“  ์„œ๋น„์Šค๋ฅผ ์‹œ์ž‘ํ•˜๊ณ  ๋…ธ๋“œ๋ฅผ ์ƒํƒœ๋กœ ์ „์†กํ•˜๋Š” ๊ธฐ๋Šฅ Maintenance ๋“ฑ์€ ๋น„์Šทํ•ด ๋ณด์ž…๋‹ˆ๋‹ค. API๋ฅผ ํ†ตํ•œ ๋ช‡ ๊ฐ€์ง€ ์š”์ฒญ์ผ ๋ฟ์ž…๋‹ˆ๋‹ค.

ํด๋ž˜์Šค MC

ํด๋ž˜์Šค๋ฅผ ํฌํ•จํ•˜๋Š” ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. Mcs:

class Mcs:
    def __init__(self, id1, id2, password):
        self.id1 = id1
        self.id2 = id2
        self.password = password
        self.mcs_host = 'https://infra.mail.ru:8774/v2.1'

    def vm_turn_on(self, hostname):
        self.token = self.get_mcs_token()
        host = self.hostname_to_vmname(hostname)
        vm_id = self.get_vm_id(host)
        mcs_url1 = self.mcs_host + '/servers/' + self.vm_id + '/action'
        headers = {
            'X-Auth-Token': '{0}'.format(self.token),
            'Content-Type': 'application/json'
        }
        data = {'os-start' : 'null'}
        mcs = requests.post(mcs_url1, data=json.dumps(data), headers=headers)
        return mcs.status_code

๊ฐ•์˜์‹ค ์ž…๊ตฌ์—๋Š” Mcs ํด๋ผ์šฐ๋“œ ๋‚ด๋ถ€์˜ ํ”„๋กœ์ ํŠธ ID์™€ ์‚ฌ์šฉ์ž ID, ๋น„๋ฐ€๋ฒˆํ˜ธ๋ฅผ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋Šฅ ์ค‘ vm_turn_on ๊ธฐ๊ณ„ ์ค‘ ํ•˜๋‚˜๋ฅผ ์ผœ๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ์˜ ๋…ผ๋ฆฌ๋Š” ์กฐ๊ธˆ ๋” ๋ณต์žกํ•ฉ๋‹ˆ๋‹ค. ์ฝ”๋“œ ์‹œ์ž‘ ๋ถ€๋ถ„์—์„œ๋Š” ์„ธ ๊ฐ€์ง€ ๋‹ค๋ฅธ ํ•จ์ˆ˜๊ฐ€ ํ˜ธ์ถœ๋ฉ๋‹ˆ๋‹ค. 1) ํ† ํฐ์„ ๊ฐ€์ ธ์™€์•ผ ํ•ฉ๋‹ˆ๋‹ค. 2) ํ˜ธ์ŠคํŠธ ์ด๋ฆ„์„ MCS์˜ ์ปดํ“จํ„ฐ ์ด๋ฆ„์œผ๋กœ ๋ณ€ํ™˜ํ•ด์•ผ ํ•˜๋ฉฐ 3) ์ด ์ปดํ“จํ„ฐ์˜ ID๋ฅผ ๊ฐ€์ ธ์™€์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ์œผ๋กœ ๊ฐ„๋‹จํžˆ post ์š”์ฒญ์„ ํ•˜๊ณ  ์ด ๋จธ์‹ ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

ํ† ํฐ์„ ์–ป๋Š” ํ•จ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

def get_mcs_token(self):
        url = 'https://infra.mail.ru:35357/v3/auth/tokens?nocatalog'
        headers = {'Content-Type': 'application/json'}
        data = {
            'auth': {
                'identity': {
                    'methods': ['password'],
                    'password': {
                        'user': {
                            'id': self.id1,
                            'password': self.password
                        }
                    }
                },
                'scope': {
                    'project': {
                        'id': self.id2
                    }
                }
            }
        }
        params = (('nocatalog', ''),)
        req = requests.post(url, data=json.dumps(data), headers=headers, params=params)
        self.token = req.headers['X-Subject-Token']
        return self.token

์ž๋™ ํ™•์žฅ ์ฒ˜๋ฆฌ ํด๋ž˜์Šค

์ด ํด๋ž˜์Šค์—๋Š” ์šด์˜ ๋…ผ๋ฆฌ ์ž์ฒด์™€ ๊ด€๋ จ๋œ ๊ธฐ๋Šฅ์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ํด๋ž˜์Šค์˜ ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

class Autoscaler:
    def __init__(self, ambari, mcs, scaling_hosts, yarn_ram_per_node, yarn_cpu_per_node):
        self.scaling_hosts = scaling_hosts
        self.ambari = ambari
        self.mcs = mcs
        self.q_ram = deque()
        self.q_cpu = deque()
        self.num = 0
        self.yarn_ram_per_node = yarn_ram_per_node
        self.yarn_cpu_per_node = yarn_cpu_per_node

    def scale_down(self, hostname):
        flag1 = flag2 = flag3 = flag4 = flag5 = False
        if hostname in self.scaling_hosts:
            while True:
                time.sleep(5)
                status1 = self.ambari.decommission_nodemanager(hostname)
                if status1 == 'Request accepted' or status1 == 500:
                    flag1 = True
                    logging.info('Decomission request accepted: {0}'.format(flag1))
                    break
            while True:
                time.sleep(5)
                status3 = self.ambari.check_service(hostname, 'NODEMANAGER')
                if status3 == 'INSTALLED':
                    flag3 = True
                    logging.info('Nodemaneger decommissioned: {0}'.format(flag3))
                    break
            while True:
                time.sleep(5)
                status2 = self.ambari.maintenance_on(hostname)
                if status2 == 'Request accepted' or status2 == 500:
                    flag2 = True
                    logging.info('Maintenance request accepted: {0}'.format(flag2))
                    break
            while True:
                time.sleep(5)
                status4 = self.ambari.check_maintenance(hostname, 'NODEMANAGER')
                if status4 == 'ON' or status4 == 'IMPLIED_FROM_HOST':
                    flag4 = True
                    self.ambari.stop_all_services(hostname)
                    logging.info('Maintenance is on: {0}'.format(flag4))
                    logging.info('Stopping services')
                    break
            time.sleep(90)
            status5 = self.mcs.vm_turn_off(hostname)
            while True:
                time.sleep(5)
                status5 = self.mcs.get_vm_info(hostname)['server']['status']
                if status5 == 'SHUTOFF':
                    flag5 = True
                    logging.info('VM is turned off: {0}'.format(flag5))
                    break
            if flag1 and flag2 and flag3 and flag4 and flag5:
                message = 'Success'
                logging.info('Scale-down finished')
                logging.info('Cooldown period has started. Wait for several minutes')
        return message

์ž…ํ•™์„ ์œ„ํ•œ ์ˆ˜์—…์„ ๋ฐ›๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Ambari ะธ Mcs, ํ™•์žฅ์ด ํ—ˆ์šฉ๋˜๋Š” ๋…ธ๋“œ ๋ชฉ๋ก ๋ฐ ๋…ธ๋“œ ๊ตฌ์„ฑ ๋งค๊ฐœ๋ณ€์ˆ˜(YARN์˜ ๋…ธ๋“œ์— ํ• ๋‹น๋œ ๋ฉ”๋ชจ๋ฆฌ ๋ฐ CPU)์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ ๋Œ€๊ธฐ์—ด์ธ 2๊ฐœ์˜ ๋‚ด๋ถ€ ๋งค๊ฐœ๋ณ€์ˆ˜ q_ram, q_cpu๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ˜„์žฌ ํด๋Ÿฌ์Šคํ„ฐ ๋ถ€ํ•˜ ๊ฐ’์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ์ง€๋‚œ 5๋ถ„ ๋™์•ˆ ๋กœ๋“œ๊ฐ€ ์ง€์†์ ์œผ๋กœ ์ฆ๊ฐ€ํ•œ ๊ฒƒ์„ ํ™•์ธํ•˜๋ฉด ํด๋Ÿฌ์Šคํ„ฐ์— ๋…ธ๋“œ 1๊ฐœ๋ฅผ ์ถ”๊ฐ€ํ•ด์•ผ ํ•œ๋‹ค๊ณ  ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ํด๋Ÿฌ์Šคํ„ฐ ํ™œ์šฉ๋„๊ฐ€ ๋‚ฎ์€ ์ƒํƒœ์—์„œ๋„ ๋งˆ์ฐฌ๊ฐ€์ง€์ž…๋‹ˆ๋‹ค.

์œ„ ์ฝ”๋“œ๋Š” ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ๋จธ์‹ ์„ ์ œ๊ฑฐํ•˜๊ณ  ํด๋ผ์šฐ๋“œ์—์„œ ์ค‘์ง€ํ•˜๋Š” ํ•จ์ˆ˜์˜ ์˜ˆ์ž…๋‹ˆ๋‹ค. ๋จผ์ € ํ•ด์ฒด๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค YARN Nodemanager, ๊ทธ๋Ÿฌ๋ฉด ๋ชจ๋“œ๊ฐ€ ์ผœ์ง‘๋‹ˆ๋‹ค Maintenance, ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋จธ์‹ ์˜ ๋ชจ๋“  ์„œ๋น„์Šค๋ฅผ ์ค‘์ง€ํ•˜๊ณ  ํด๋ผ์šฐ๋“œ์˜ ๊ฐ€์ƒ ๋จธ์‹ ์„ ๋•๋‹ˆ๋‹ค.

2. ์Šคํฌ๋ฆฝํŠธ ๊ด€์ฐฐ์ž.py

๊ฑฐ๊ธฐ์˜ ์ƒ˜ํ”Œ ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

if scaler.assert_up(config.scale_up_thresholds) == True:
        hostname = cloud.get_vm_to_up(config.scaling_hosts)
        if hostname != None:
            status1 = scaler.scale_up(hostname)
            if status1 == 'Success':
                text = {"text": "{0} has been successfully scaled-up".format(hostname)}
                post = {"text": "{0}".format(text)}
                json_data = json.dumps(post)
                req = requests.post(webhook, data=json_data.encode('ascii'), headers={'Content-Type': 'application/json'})
                time.sleep(config.cooldown_period*60)

์—ฌ๊ธฐ์—์„œ ํด๋Ÿฌ์Šคํ„ฐ ์šฉ๋Ÿ‰์„ ๋Š˜๋ฆฌ๊ธฐ ์œ„ํ•œ ์กฐ๊ฑด์ด ์ƒ์„ฑ๋˜์—ˆ๋Š”์ง€, ์˜ˆ๋น„ ๋จธ์‹ ์ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๊ณ , ๊ทธ ์ค‘ ํ•˜๋‚˜์˜ ํ˜ธ์ŠคํŠธ ์ด๋ฆ„์„ ๊ฐ€์ ธ์™€ ํด๋Ÿฌ์Šคํ„ฐ์— ์ถ”๊ฐ€ํ•˜๊ณ  ์ด์— ๋Œ€ํ•œ ๋ฉ”์‹œ์ง€๋ฅผ ์šฐ๋ฆฌ ํŒ€์˜ Slack์— ๊ฒŒ์‹œํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ํ›„์—๋Š” ์‹œ์ž‘๋ฉ๋‹ˆ๋‹ค cooldown_period, ํด๋Ÿฌ์Šคํ„ฐ์— ์•„๋ฌด๊ฒƒ๋„ ์ถ”๊ฐ€ํ•˜๊ฑฐ๋‚˜ ์ œ๊ฑฐํ•˜์ง€ ์•Š๊ณ  ๋‹จ์ˆœํžˆ ๋กœ๋“œ๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋Š” ๊ฒฝ์šฐ์ž…๋‹ˆ๋‹ค. ์•ˆ์ •ํ™”๋˜์–ด ์ตœ์ ์˜ ๋ถ€ํ•˜ ๊ฐ’ ๋ฒ”์œ„ ๋‚ด์— ์žˆ์œผ๋ฉด ๊ณ„์† ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ํ•˜๋‚˜์˜ ๋…ธ๋“œ๊ฐ€ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์œผ๋ฉด ๋‹ค๋ฅธ ๋…ธ๋“œ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

์•ž์œผ๋กœ ๊ฐ•์˜๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ ํ•˜๋‚˜์˜ ๋…ธ๋“œ๋กœ๋Š” ์ถฉ๋ถ„ํ•˜์ง€ ์•Š๋‹ค๋Š” ๊ฒƒ์„ ์ด๋ฏธ ์•Œ๊ณ  ์žˆ์œผ๋ฏ€๋กœ ๋ชจ๋“  ๋ฌด๋ฃŒ ๋…ธ๋“œ๋ฅผ ์ฆ‰์‹œ ์‹œ์ž‘ํ•˜๊ณ  ๊ฐ•์˜๊ฐ€ ๋๋‚  ๋•Œ๊นŒ์ง€ ํ™œ์„ฑ ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ํ™œ๋™ ํƒ€์ž„์Šคํƒฌํ”„ ๋ชฉ๋ก์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

๊ฒฐ๋ก 

Autoscaler๋Š” ํด๋Ÿฌ์Šคํ„ฐ ๋กœ๋”ฉ์ด ๊ณ ๋ฅด์ง€ ์•Š์€ ๊ฒฝ์šฐ์— ์œ ์šฉํ•˜๊ณ  ํŽธ๋ฆฌํ•œ ์†”๋ฃจ์…˜์ž…๋‹ˆ๋‹ค. ์ตœ๋Œ€ ๋กœ๋“œ์— ๋Œ€ํ•ด ์›ํ•˜๋Š” ํด๋Ÿฌ์Šคํ„ฐ ๊ตฌ์„ฑ์„ ๋™์‹œ์— ๋‹ฌ์„ฑํ•˜๋Š” ๋™์‹œ์— ๋กœ๋“œ๊ฐ€ ์ ์€ ๋™์•ˆ์—๋Š” ์ด ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์œ ์ง€ํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ ๋น„์šฉ์ด ์ ˆ์•ฝ๋ฉ๋‹ˆ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ ์ด ๋ชจ๋“  ์ผ์€ ๊ท€ํ•˜์˜ ์ฐธ์—ฌ ์—†์ด๋„ ์ž๋™์œผ๋กœ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ž๋™ ํฌ๊ธฐ ์กฐ์ •๊ธฐ ์ž์ฒด๋Š” ํŠน์ • ๋…ผ๋ฆฌ์— ๋”ฐ๋ผ ์ž‘์„ฑ๋œ ํด๋Ÿฌ์Šคํ„ฐ ๊ด€๋ฆฌ์ž API ๋ฐ ํด๋ผ์šฐ๋“œ ๊ณต๊ธ‰์ž API์— ๋Œ€ํ•œ ์ผ๋ จ์˜ ์š”์ฒญ์— ์ง€๋‚˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋ฐ˜๋“œ์‹œ ๊ธฐ์–ตํ•ด์•ผ ํ•  ๊ฒƒ์€ ์•ž์„œ ์ผ๋˜ ๊ฒƒ์ฒ˜๋Ÿผ ๋…ธ๋“œ๋ฅผ 3๊ฐ€์ง€ ์œ ํ˜•์œผ๋กœ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋‹น์‹ ์€ ํ–‰๋ณตํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ถœ์ฒ˜ : habr.com

์ฝ”๋ฉ˜ํŠธ๋ฅผ ์ถ”๊ฐ€