Kubernetes ma DomClick: pehea e hiamoe ai me ka maluhia e mālama ana i kahi pūʻulu o 1000 microservices

ʻO Viktor Yagofarov koʻu inoa, a ke kūkulu nei au i ka paepae Kubernetes ma DomClick ma ke ʻano he luna hoʻomohala ʻenehana i ka hui Ops (operation). Makemake wau e kamaʻilio e pili ana i ke ʻano o kā mākou Dev <-> Ops kaʻina hana, nā hiʻohiʻona o ka hana ʻana i kekahi o nā pūʻulu k8s nui loa ma Rūsia, a me nā hana DevOps/SRE e pili ana kā mākou hui.

Kubernetes ma DomClick: pehea e hiamoe ai me ka maluhia e mālama ana i kahi pūʻulu o 1000 microservices

Hui Ops

He 15 kanaka ka hui Ops i keia manawa. ʻEkolu o lākou ke kuleana no ke keʻena, ʻelua hana i kahi manawa ʻokoʻa a loaʻa, me ka pō. No laila, aia kekahi mai Ops ma ka mākaʻikaʻi a mākaukau e pane i kahi hanana o kekahi paʻakikī. ʻAʻohe o mākou hoʻololi i ka pō, e mālama ana i ko mākou psyche a hāʻawi i ka poʻe āpau i ka manawa e lawa ai ka hiamoe a hoʻolilo i ka manawa leʻaleʻa ʻaʻole wale ma ke kamepiula.

Kubernetes ma DomClick: pehea e hiamoe ai me ka maluhia e mālama ana i kahi pūʻulu o 1000 microservices

Loaʻa i kēlā me kēia kanaka nā mākau like ʻole: nā pūnaewele, nā DBA, nā loea ELK stack, nā Kubernetes admins/developers, ka nānā ʻana, virtualization, nā loea loea, etc. Hoʻokahi mea e hoʻohui i nā mea a pau - hiki i nā mea a pau ke pani i kekahi o mākou i kekahi ʻano: no ka laʻana, hoʻokomo i nā nodes hou i loko o ka pūʻulu k8s, hōʻano hou i ka PostgreSQL, kākau i kahi pipeline CI/CD + Ansible, hoʻomaʻamaʻa i kahi mea ma Python/Bash/Go, hoʻohui i ka lako lako. Ke kikowaena ʻikepili. ʻAʻole hiki i nā mākaukau ikaika ma nā wahi āpau ke pale iā ʻoe mai ka hoʻololi ʻana i kāu kuhikuhi o ka hana a hoʻomaka e hoʻomaikaʻi i kekahi wahi ʻē aʻe. No ka laʻana, ua komo au i kahi hui ma ke ʻano he loea PostgreSQL, a i kēia manawa ʻo kaʻu wahi nui o ke kuleana ʻo Kubernetes clusters. I loko o ka hui, ʻoluʻolu ke kiʻekiʻe a hoʻomohala ʻia ke ʻano o ka leverage.

Ma ke ala, ke ʻimi nei mākou. He kūlana maʻamau nā koi no nā moho. Noʻu iho, he mea nui e komo ke kanaka i ka hui, ʻaʻole hakakā, akā ʻike nō hoʻi i ka pale ʻana i kona manaʻo, makemake e hoʻomohala a ʻaʻole makaʻu e hana i kahi mea hou, hāʻawi i kāna mau manaʻo. Eia kekahi, pono nā mākau hoʻolālā i nā ʻōlelo kākau, ʻike i nā kumu o Linux a me English. Pono ʻia ka ʻōlelo Pelekania i hiki i ke kanaka ke loaʻa i kahi fakap ke google i kahi hopena i ka pilikia i 10 kekona, ʻaʻole i 10 mau minuke. He mea paʻakikī loa ka loaʻa ʻana o nā loea me ka ʻike hohonu o Linux: he ʻakaʻaka, akā ʻelua o ʻekolu mau moho ʻaʻole hiki ke pane i ka nīnau "He aha ka Load Average? He aha ka mea i hana ʻia ai?", A ʻo ka nīnau "Pehea e hōʻuluʻulu ai i kahi pahu nui mai kahi papahana C" i manaʻo ʻia he mea mai ka honua o nā supermen ... a i ʻole nā ​​dinosaurs. Pono mākou e hoʻomanawanui i kēia, no ka mea, ua hoʻomohala nui nā kānaka i nā mākaukau ʻē aʻe, akā e aʻo mākou iā Linux. ʻO ka pane i ka nīnau "no ke aha e ʻike ai kahi ʻenekini DevOps i kēia mau mea a pau i ka honua o nā ao hou" e waiho ʻia ma waho o ke ʻano o ka ʻatikala, akā i ʻekolu mau huaʻōlelo: pono kēia mau mea āpau.

Mea Hana Hui

He kuleana nui ka hui Tools i ka automation. ʻO kā lākou hana nui ka hana ʻana i nā kiʻi kiʻi kūpono a me nā mea hana CLI no nā mea hoʻomohala. No ka laʻana, ʻo kā mākou Confer hoʻomohala kūloko e ʻae iā ʻoe e ʻōwili maoli i kahi noi i nā Kubernetes me nā kaomi ʻiole liʻiliʻi, hoʻonohonoho i kāna mau kumuwaiwai, nā kī mai ka vault, etc. Ma mua, aia ʻo Jenkins + Helm 2, akā pono iaʻu e hoʻomohala i kaʻu mea hana ponoʻī e hoʻopau i ka kope-paste a lawe i ka like ʻole i ke ola o ka polokalamu.

ʻAʻole kākau ka hui Ops i nā pipelines no nā mea hoʻomohala, akā hiki ke aʻo i nā pilikia i kā lākou kākau ʻana (ua loaʻa i kekahi poʻe Helm 3).

Nā DevOps

No DevOps, ʻike mākou penei:

Kākau nā hui Dev i ke code, ʻōwili iā ia ma o Confer to dev -> qa/stage -> prod. ʻO ke kuleana no ka hōʻoia ʻana ʻaʻole e lohi ka code a ʻaʻole i loaʻa nā hewa i nā hui Dev a me Ops. I ke ao, pono e pane mua ka mea hana mai ka hui Ops i kekahi hanana me kāna noi, a i ke ahiahi a me ka pō, pono e ho'āla ka luna hoʻomalu (Ops) i ka mea hoʻomohala hana inā ʻike ʻo ia ʻike maopopo ʻaʻole i loko o ka ʻōnaehana ka pilikia. Hō'ike 'akomi a semi-akomi paha nā ana a me nā māka'ika'i i ka nānā 'ana.

Hoʻomaka ka ʻāpana kuleana o Ops mai ka manawa i ʻōwili ʻia ai ka noi i ka hana, akā ʻaʻole pau ke kuleana o Dev ma laila - hana mākou i ka mea like a aia i ka moku hoʻokahi.

Aʻo nā mea hoʻomohala i nā mea hoʻokele inā makemake lākou i ke kōkua e kākau i kahi lawelawe microservice (no ka laʻana, Go backend + HTML5), a ʻōlelo nā mea hoʻokele i nā mea hoʻomohala i nā pilikia a i ʻole nā ​​​​pilikia e pili ana i k8s.

Ma ke ala, ʻaʻohe o mākou monolith, nā microservices wale nō. Ke loli nei ko lākou helu ma waena o 900 a me 1000 i ka pūʻulu prod k8s, inā e ana ʻia e ka helu. hoʻoili. Hoʻololi ka helu o nā pods ma waena o 1700 a me 2000. Aia i kēia manawa ma kahi o 2000 pods i ka pūʻulu prod.

ʻAʻole hiki iaʻu ke hāʻawi i nā helu kikoʻī, no ka mea ke nānā nei mākou i nā microservice pono ʻole a ʻokiʻoki iā lākou i semi-akomi. Kōkua ʻo K8s iā mākou e mālama i nā mea pono ʻole mea hana ole, ka mea e mālama ai i ka nui o nā kumuwaiwai a me ke kālā.

Hooponopono waiwai

Ka mālama ʻana

ʻO ka nānā ʻana i hoʻonohonoho maikaʻi ʻia a me ka ʻike e lilo i pōhaku kihi i ka hana ʻana o kahi hui nui. ʻAʻole i loaʻa iā mākou kahi hoʻonā āpau e uhi i ka 100% o nā pono kiaʻi āpau, no laila hana mākou i kēlā me kēia manawa i nā hāʻina maʻamau like ʻole i kēia kaiapuni.

  • ʻO Zabbix. ʻO ka nānā ʻana kahiko maikaʻi, i manaʻo nui ʻia e nānā i ke kūlana holoʻokoʻa o ka ʻōnaehana. Hōʻike ia iā mākou i ka wā e make ai kahi node ma ke ʻano o ka hana, hoʻomanaʻo, disks, network, a pēlā aku. ʻAʻohe mea kupanaha, akā loaʻa iā mākou kahi ʻokoʻa DaemonSet o nā ʻelele, me ke kōkua o ia, no ka laʻana, nānā mākou i ke kūlana o DNS i loko o ka pūʻulu: ke ʻimi nei mākou i nā coredns pods, nānā mākou i ka loaʻa ʻana o nā pūʻali o waho. Me he mea lā ke kumu e hoʻopilikia ai i kēia, akā me ka nui o nā kaʻa he wahi koʻikoʻi kēia o ka hāʻule. ʻO wau nō wehewehe ʻia, pehea wau i hakakā ai me ka hana DNS ma kahi hui.
  • Mea hana Prometheus. Hāʻawi kahi pūʻulu o nā mea kūʻai aku i kahi ʻike nui o nā ʻāpana āpau o ka hui. A laila, ʻike mākou i kēia mau mea āpau ma nā dashboards nui ma Grafana, a hoʻohana i ka alertmanager no nā makaʻala.

ʻO kekahi mea hana pono no mākou papa inoa komo. Ua kākau mākou iā ia ma hope o kekahi mau manawa a mākou i ʻike ai i kahi kūlana kahi i hoʻopili ai kekahi hui i nā ala Ingress o kekahi hui, i hopena i 50x hewa. I kēia manawa ma mua o ka hoʻokuʻu ʻana i ka hana, ʻike nā mea hoʻomohala ʻaʻole e hoʻopilikia ʻia kekahi, a no kaʻu hui he mea hana maikaʻi kēia no ka ʻike mua ʻana o nā pilikia me Ingresses. He mea ʻakaʻaka ia i ka wā mua i kākau ʻia no nā admins a he ʻano "clumsy" ia, akā ma hope o ke aloha ʻana o nā hui dev i ka hāmeʻa, ua loli nui a hoʻomaka ʻo ia e like me "ua hana kahi admin i kahi maka pūnaewele no nā admins. ” E haʻalele koke mākou i kēia mea hana a e hōʻoia ʻia kēlā mau kūlana ma mua o ka ʻōwili ʻia ʻana o ka pipeline.

Nā kumuwaiwai hui ma ka Cube

Ma mua o ko mākou komo ʻana i nā laʻana, pono e wehewehe i ke ʻano o ka hoʻokaʻawale ʻana i nā kumuwaiwai no microservices.

E hoʻomaopopo i nā hui a me ka nui o ka hoʻohana ʻana i kā lākou kumuwaiwai (mea hana, hoʻomanaʻo, SSD kūloko), hoʻokaʻawale mākou i kēlā me kēia kauoha i kāna iho namespace i loko o ka "Cube" a kaupalena i kona mau mana kiʻekiʻe e pili ana i ke kaʻina hana, ka hoʻomanaʻo a me ka disk, ua kūkākūkā mua i nā pono o nā hui. No laila, hoʻokahi kauoha, ma ka laulā, ʻaʻole e ālai i ka pūʻulu holoʻokoʻa no ka waiho ʻana, hoʻokaʻawale i nā tausani o nā cores a me nā terabytes o ka hoʻomanaʻo. Hāʻawi ʻia ke komo ʻana i ka namespace ma o AD (hoʻohana mākou i ka RBAC). Hoʻohui ʻia nā inoa inoa a me ko lākou mau palena ma o kahi noi huki i ka waihona GIT, a laila ʻōwili ʻia nā mea āpau ma o ka pipeline Ansible.

He laʻana o ka hoʻokaʻawale ʻana i nā kumuwaiwai i kahi hui:

namespaces:

  chat-team:
    pods: 23
    limits:
      cpu: 11
      memory: 20Gi
    requests:
      cpu: 11
      memory: 20Gi

Nā noi a me nā palena

Cubed" noi ʻo ia ka helu o nā kumuwaiwai i mālama ʻia no pod (hoʻokahi a ʻoi aku paha nā pahu docker) i loko o kahi pūʻulu. ʻO ka palena ka palena palena ʻole. Hiki iā ʻoe ke ʻike pinepine ma nā pakuhi pehea i hoʻonohonoho ai kekahi hui iā ia iho i nā noi he nui no kāna mau noi āpau a ʻaʻole hiki ke kau i ka noi i ka "Cube", no ka mea, ua "hoʻopau ʻia" nā noi āpau ma lalo o ko lākou inoa inoa.

ʻO ke ala kūpono i waho o kēia kūlana, ʻo ia ka nānā ʻana i ka hoʻohana waiwai maoli a hoʻohālikelike me ka nui i noi ʻia (Noi).

Kubernetes ma DomClick: pehea e hiamoe ai me ka maluhia e mālama ana i kahi pūʻulu o 1000 microservices
Kubernetes ma DomClick: pehea e hiamoe ai me ka maluhia e mālama ana i kahi pūʻulu o 1000 microservices

Ma nā kiʻi kiʻi ma luna nei hiki iā ʻoe ke ʻike e pili ana nā CPU "Noi" i ka helu maoli o nā kaula, a hiki i nā palena ke ʻoi aku ma mua o ka helu maoli o nā kaula CPU =)

I kēia manawa, e nānā pono kākou i kahi inoa inoa (ua koho au i ka namespace kube-system - ka inoa inoa pūnaewele no nā ʻāpana o ka "Cube" ponoʻī) a ʻike i ka ratio o ka manawa hana a me ka hoʻomanaʻo i ka mea i noi ʻia:

Kubernetes ma DomClick: pehea e hiamoe ai me ka maluhia e mālama ana i kahi pūʻulu o 1000 microservices

ʻIke ʻia ʻoi aku ka nui o ka hoʻomanaʻo a me ka CPU i mālama ʻia no nā lawelawe ʻōnaehana ma mua o ka hoʻohana maoli ʻia. I ka hihia o ka ʻōnaehana kube, ua ʻāpono ʻia kēia: ʻo ka nginx ingress controller a i ʻole nodelocaldns i ko lākou kiʻekiʻe i paʻi i ka CPU a hoʻopau i ka nui o ka RAM, no laila e ʻae ʻia kēlā ʻano mālama. Eia hou, ʻaʻole hiki iā mākou ke hilinaʻi i nā pakuhi no nā hola 3 i hala iho nei: makemake ʻia e ʻike i nā metric mōʻaukala i kahi manawa nui.

Ua hoʻomohala ʻia kahi ʻōnaehana "manaʻo". Eia kekahi laʻana, hiki iā ʻoe ke ʻike i nā kumuwaiwai e ʻoi aku ka maikaʻi ma ka hoʻonui ʻana i nā "palena" (ka pā i ʻae ʻia i luna) i ʻole e hiki mai ka "throttling": ʻo ka manawa i hoʻohana mua ai kahi kumuwaiwai i ka CPU a i ʻole ka hoʻomanaʻo i ka ʻāpana manawa i hāʻawi ʻia. ke kali nei a hiki i ka "unfrozen":

Kubernetes ma DomClick: pehea e hiamoe ai me ka maluhia e mālama ana i kahi pūʻulu o 1000 microservices

A eia nā ʻōpala e pale i ko lākou ʻai:

Kubernetes ma DomClick: pehea e hiamoe ai me ka maluhia e mālama ana i kahi pūʻulu o 1000 microservices

maluna o hoʻopaʻapaʻa + ka nānā ʻana i nā kumuwaiwai, hiki iā ʻoe ke kākau i ʻoi aku ma mua o hoʻokahi ʻatikala, no laila e nīnau i nā nīnau ma nā ʻōlelo. Ma nā huaʻōlelo liʻiliʻi, hiki iaʻu ke ʻōlelo he paʻakikī loa ka hana o ka hoʻomaʻamaʻa ʻana i kēlā mau metric a koi i ka manawa nui a me ke kaulike ʻana me nā hana "window" a me "CTE" Prometheus / VictoriaMetrics (ʻo kēia mau huaʻōlelo i loko o nā huaʻōlelo, no ka mea, kokoke ʻAʻohe mea e like me kēia ma PromQL, a pono ʻoe e hoʻokaʻawale i nā nīnau makaʻu i loko o kekahi mau ʻaoʻao o ka kikokikona a hoʻonui iā lākou).

ʻO ka hopena, loaʻa i nā mea hoʻomohala nā mea hana no ka nānā ʻana i kā lākou mau inoa inoa ma Cube, a hiki iā lākou ke koho no lākou iho i hea a i ka manawa hea e hiki ai i nā noi ke "ʻoki" i kā lākou mau kumuwaiwai, a hiki ke hāʻawi ʻia nā kikowaena i ka CPU holoʻokoʻa i ka pō a pau.

Nā ʻano hana

I ka hui e like me kēia manawa ʻano hiʻohiʻona, pili mākou iā DevOps- a HĀLOH-hoʻomaʻamaʻa Ke loaʻa i kahi hui he 1000 microservices, ma kahi o 350 mau mea hoʻomohala a me 15 mau mea hoʻokele no ka ʻōnaehana holoʻokoʻa, pono ʻoe e "ʻano": ma hope o kēia mau "baswords" aia kahi koi wikiwiki e hoʻomaʻamaʻa i nā mea āpau a me nā mea āpau, a ʻaʻole pono nā mea hoʻokele e lilo i bottleneck. i nā kaʻina hana.

Ma ke ʻano he Ops, hāʻawi mākou i nā ana a me nā dashboards no nā mea hoʻomohala e pili ana i nā helu pane a me nā hewa.

Hoʻohana mākou i nā ʻano hana e like me: ʻULAʻULA, Mālama и Nā hōʻailona gulama ka hui pu ana. Ke ho'āʻo nei mākou e hōʻemi i ka helu o nā dashboards i hiki ke ʻike i ka lawelawe e hoʻohaʻahaʻa nei i kēia manawa (no ka laʻana, nā code pane i kekona, ka manawa pane e 99th percentile), a pēlā aku. I ka wā e pono ai kekahi mau metric hou no nā dashboard maʻamau, huki koke mākou a hoʻohui iā lākou.

ʻAʻole au i kaha kiʻi no hoʻokahi mahina. He hōʻailona maikaʻi paha kēia: ʻo ia ka hapa nui o nā "makemake" i ʻike mua ʻia. I loko o ka pule e kahakiʻi au i ka pakuhi hou ma ka liʻiliʻi loa i hoʻokahi lā i ka lā.

Kubernetes ma DomClick: pehea e hiamoe ai me ka maluhia e mālama ana i kahi pūʻulu o 1000 microservices

Kubernetes ma DomClick: pehea e hiamoe ai me ka maluhia e mālama ana i kahi pūʻulu o 1000 microservices

He mea waiwai ka hopena no ka mea ʻaʻole hiki i nā mea hoʻomohala ke hele i nā admin me nā nīnau "kahi e nānā ai i kekahi ʻano metric."

Ka hoʻokō ʻana Mesh lawelawe aia ma kahi o ke kihi a pono e hoʻomaʻamaʻa i ke ola no kēlā me kēia kanaka, ua kokoke nā hoahana mai Tools i ka hoʻokō ʻana i ka abstract "Istio o ke kanaka olakino": ʻike ʻia ke ola o kēlā me kēia noi HTTP (s) i ka nānā ʻana, a hiki ke hoʻomaopopo "i ka wā hea i haki ai nā mea a pau" i ka wā o ka launa pū ʻana (a ʻaʻole wale). E kau inoa i ka nūhou mai ka DomClick hub. =)

Kākoʻo ʻōnaehana Kubernetes

Ma ka mōʻaukala, hoʻohana mākou i ka mana patched Kubespray - He kuleana kūpono no ka lawe ʻana, hoʻonui a hoʻonui i nā Kubernetes. I kekahi manawa, ua ʻoki ʻia ke kākoʻo no nā hoʻonohonoho non-kubeadm mai ka lālā nui, a ʻaʻole i manaʻo ʻia ke kaʻina hana o ka hoʻololi ʻana i kubeadm. ʻO ka hopena, ua hana ka hui ʻo Southbridge i kāna ʻōpala ponoʻī (me ke kākoʻo kubeadm a me ka hoʻoponopono wikiwiki no nā pilikia koʻikoʻi).

ʻO ke kaʻina hana no ka hoʻonui ʻana i nā pūʻulu k8s e like me kēia:

  • Lawe Kubespray mai Southbridge, e nānā me kā mākou pae, Merjim.
  • Ke hoʻolaha nei mākou i ka mea hou i ka hoʻonaʻauao- "Cube".
  • Hoʻopuka mākou i ka hoʻonui i hoʻokahi node i ka manawa (ma Ansible ʻo ia ka "serial: 1") ma Dev- "Cube".
  • Hoʻonui mākou Prod i ke ahiahi Poaono hookahi node i ka manawa.

Aia nā hoʻolālā e hoʻololi iā ia i ka wā e hiki mai ana Kubespray no ka mea wikiwiki a hele i kubeadm.

Loaʻa iā mākou ʻekolu "Cubes": Stress, Dev a me Prod. Ke hoʻolālā nei mākou e hoʻomaka i kahi ʻē aʻe (kū wela) Prod-"Cube" ma ka lua o ke kikowaena ikepili. ka hoʻonaʻauao и Dev noho i loko o nā "mīkini virtual" (oVirt no Stress a me VMWare kapua no Dev). Prod- Noho ʻo "Cube" ma "metala ʻole": he mau nodes like kēia me 32 CPU threads, 64-128 GB o ka hoʻomanaʻo a me 300 GB SSD RAID 10 - aia he 50 o lākou i ka huina. Hoʻolaʻa ʻia ʻekolu nodes "thin" i nā "masters" Prod- "Cuba": 16 GB o ka hoʻomanaʻo, 12 mau kaula CPU.

No ke kūʻai aku, makemake mākou e hoʻohana i ka "mea kila" a pale i nā papa pono ʻole e like me OpenStack: ʻaʻole pono mākou i "nā hoa noho walaʻau" a me ka CPU ʻaihue manawa. A ʻo ka paʻakikī o ka hoʻokele ʻana ma kahi o pālua i ka hihia o OpenStack i loko.

No ka CI/CD "Cubic" a me nā ʻāpana hana ʻē aʻe, hoʻohana mākou i kahi kikowaena GIT ʻokoʻa, Helm 3 (he hoʻololi ʻeha ia mai Helm 2, akā hauʻoli nui mākou i nā koho. ʻapio), Jenkins, Ansible a me Docker. Makemake mākou i nā lālā hiʻohiʻona a me ka hoʻolaha ʻana i nā kaiapuni like ʻole mai kahi waihona.

hopena

Kubernetes ma DomClick: pehea e hiamoe ai me ka maluhia e mālama ana i kahi pūʻulu o 1000 microservices
ʻO kēia, ma nā ʻōlelo maʻamau, ke ʻano o ke kaʻina hana DevOps ma DomClick mai ka manaʻo o kahi ʻenekini hana. Ua lilo ka ʻatikala i mea ʻenehana ma mua o kaʻu i manaʻo ai: no laila, e hahai i ka nūhou DomClick ma Habré: e nui aʻe nā ʻatikala "paʻakikī" e pili ana iā Kubernetes a ʻoi aku.

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka