"I-Kubernetes inyuse i-latency ngamaxesha angama-10": ngubani onetyala ngale nto?

Phawula. transl.: Eli nqaku, elibhalwe nguGalo Navarro, ophethe isikhundla seNjineli yeSoftware eyiNqununu kwinkampani yaseYurophu i-Adevinta, "uphando" olunomdla kunye nokufundisa kwinkalo yemisebenzi yeziseko. Isihloko sayo sokuqala sandiswa kancinci ekuguquleleni ngesizathu esichazwa ngumbhali kwasekuqaleni.

"I-Kubernetes inyuse i-latency ngamaxesha angama-10": ngubani onetyala ngale nto?

Inqaku elivela kumbhali: Ijongeka njengale post nomtsalane ingqalelo kakhulu kunoko bekulindelekile. Ndisafumana amagqabaza anomsindo okuba umxholo weli nqaku uyalahlekisa yaye abanye abafundi bakhathazekile. Ndiyaziqonda izizathu zento eyenzekayo, ngoko ke, nangona umngcipheko wokonakalisa yonke into, ndifuna ukukuxelela ngokukhawuleza ukuba yintoni eli nqaku. Into enomdla endiyibonileyo njengamaqela afudukela eKubernetes kukuba nanini na xa kuvela ingxaki (efana nokunyuka kwexesha lokuhamba emva kokufuduka), into yokuqala etyholwa nguKubernetes, kodwa emva koko kuye kwavela ukuba iorchestrator ayifanelekanga. ityala. Eli nqaku lisixelela ngemeko enye enjalo. Igama layo liphinda isikhuzo somnye wabaphuhlisi bethu (kamva uya kubona ukuba iKubernetes ayinanto yakwenza nayo). Awuzukufumana naziphi na izityhilelo ezothusayo malunga neKubernetes apha, kodwa unokulindela izifundo ezilungileyo malunga neenkqubo ezintsonkothileyo.

Kwiiveki ezimbalwa ezidlulileyo, iqela lam lalifuduka i-microservice enye ukuya kwiqonga elingundoqo elibandakanya i-CI / CD, ixesha lokugijima elisekelwe ku-Kubernetes, i-metrics, kunye nezinye izinto ezilungileyo. Eli nyathelo ibiyeyohlobo lovavanyo: besiceba ukulithatha njengesiseko kwaye sidlulisele malunga ne-150 yeenkonzo ezingakumbi kwiinyanga ezizayo. Bonke banoxanduva lokusebenza kwamanye amaqonga amakhulu e-intanethi eSpain (Infojobs, Fotocasa, njl.).

Emva kokuba sithumele isicelo ku-Kubernetes kwaye saphinda sathumela itrafikhi kuyo, sasilindelwe yinto eyothusayo. Ukulibazisa (ukubambezeleka) izicelo kwi-Kubernetes zaziphezulu ngamaxesha angama-10 kune-EC2. Ngokubanzi, kwakuyimfuneko ukuba ufumane isisombululo kule ngxaki, okanye ulahle ukufuduka kwe-microservice (kwaye, mhlawumbi, yonke iprojekthi).

Kutheni i-latency iphezulu kakhulu e-Kubernetes kune-EC2?

Ukufumana i-bottleneck, siqokelele iimetrics kuyo yonke indlela yesicelo. I-architecture yethu ilula: i-API gateway (Zuul) i-proxies icela kwiimeko ze-microservice kwi-EC2 okanye i-Kubernetes. Kwi-Kubernetes sisebenzisa i-NGINX Ingress Controller, kwaye i-backends zizinto eziqhelekileyo ezifana Ukuthunyelwa ngesicelo JVM kwiqonga Spring.

                                  EC2
                            +---------------+
                            |  +---------+  |
                            |  |         |  |
                       +-------> BACKEND |  |
                       |    |  |         |  |
                       |    |  +---------+  |                   
                       |    +---------------+
             +------+  |
Public       |      |  |
      -------> ZUUL +--+
traffic      |      |  |              Kubernetes
             +------+  |    +-----------------------------+
                       |    |  +-------+      +---------+ |
                       |    |  |       |  xx  |         | |
                       +-------> NGINX +------> BACKEND | |
                            |  |       |  xx  |         | |
                            |  +-------+      +---------+ |
                            +-----------------------------+

Ingxaki ibonakala ngathi inxulumene ne-latency yokuqala kwi-backend (ndiphawule indawo yengxaki kwigrafu njenge "xx"). Kwi-EC2, impendulo yesicelo ithathe malunga ne-20ms. Kwi-Kubernetes, i-latency yanda ukuya kwi-100-200 ms.

Sikhuphe ngokukhawuleza abarhanelwa abanokubakho malunga nokutshintsha kwexesha lokuqhuba. Inguqulo ye-JVM ihlala ifana. Iingxaki zokufakwa komgqomo nazo zazingenanto yakwenza nayo: isicelo sasisele sisebenza ngempumelelo kwizikhongozeli ezikwi-EC2. Iyalayisha? Kodwa siye sabona ukubambezeleka okuphezulu nakwisicelo esinye ngomzuzwana. Ukunqumama kokuqokelelwa kwenkunkuma nako kunokungahoywa.

Omnye wethu we-Kubernetes admins wazibuza ukuba isicelo sinokuxhomekeka kwangaphandle kuba imibuzo ye-DNS ibangele imiba efanayo kwixesha elidlulileyo.

I-hypothesis 1: isisombululo segama le-DNS

Kwisicelo ngasinye, isicelo sethu sifikelela kumzekelo we-AWS Elasticsearch enye ukuya kathathu kwisizinda esinje elastic.spain.adevinta.com. Ngaphakathi kwezikhongozeli zethu kukho iqokobhe, ngoko sinokujonga ukuba ukukhangela i-domain kuthatha ixesha elide.

Imibuzo ye-DNS kwisikhongozeli:

[root@be-851c76f696-alf8z /]# while true; do dig "elastic.spain.adevinta.com" | grep time; sleep 2; done
;; Query time: 22 msec
;; Query time: 22 msec
;; Query time: 29 msec
;; Query time: 21 msec
;; Query time: 28 msec
;; Query time: 43 msec
;; Query time: 39 msec

Izicelo ezifanayo kwenye yeemeko zeEC2 apho isicelo siqhuba:

bash-4.4# while true; do dig "elastic.spain.adevinta.com" | grep time; sleep 2; done
;; Query time: 77 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec

Ukuthathela ingqalelo ukuba ukujonga kuthathe malunga ne-30ms, kwacaca ukuba isisombululo se-DNS xa ufikelela kwi-Elasticsearch eneneni yayinegalelo ekonyukeni kwe-latency.

Nangona kunjalo, oku kwakungaqhelekanga ngenxa yezizathu ezibini:

  1. Sele sinetoni yezicelo ze-Kubernetes ezisebenzisana nezixhobo ze-AWS ngaphandle kokubandezeleka kwi-latency ephezulu. Nokuba siyintoni na isizathu, inxulumene ngokuthe ngqo kule meko.
  2. Siyazi ukuba i-JVM yenza kwi-memory DNS caching. Kwimifanekiso yethu, ixabiso le-TTL libhalwe kuyo $JAVA_HOME/jre/lib/security/java.security kwaye usete kwimizuzwana eyi-10: networkaddress.cache.ttl = 10. Ngamanye amazwi, i-JVM kufuneka igcine yonke imibuzo ye-DNS imizuzwana eli-10.

Ukuqinisekisa i-hypothesis yokuqala, sagqiba ekubeni siyeke ukubiza i-DNS ixesha elithile kwaye sibone ukuba ingxaki ihambe. Okokuqala, sigqibe ekubeni sihlengahlengise isicelo ukuze sinxibelelane ngokuthe ngqo ne-Elasticsearch ngedilesi ye-IP, kunokuba sisebenzise igama lesizinda. Oku kuya kufuna utshintsho lwekhowudi kunye nokusasazwa okutsha, ke simane senza imephu yendawo kwidilesi yayo ye-IP /etc/hosts:

34.55.5.111 elastic.spain.adevinta.com

Ngoku isikhongozeli sifumene i-IP phantse ngoko nangoko. Oku kubangele ukuphucuka okuthile, kodwa sasisondele kancinane kumanqanaba alindelekileyo okulinda. Nangona isisombululo se-DNS sithathe ixesha elide, esona sizathu sisasibaleka.

Uxilongo ngenethiwekhi

Sagqiba ekubeni sihlalutye i-traffic kwi-container usebenzisa tcpdumpukubona ukuba kwenzeka ntoni kanye kanye kwinethiwekhi:

[root@be-851c76f696-alf8z /]# tcpdump -leni any -w capture.pcap

Emva koko sathumela izicelo ezininzi kwaye sakhuphela ukubanjwa kwazo (kubectl cp my-service:/capture.pcap capture.pcap) kuhlalutyo olongezelelweyo kwi IWireshark.

Kwakungekho nto ikrokrisayo malunga nemibuzo ye-DNS (ngaphandle kwento encinci endiya kuthetha ngayo kamva). Kodwa kwakukho izinto ezingaqhelekanga kwindlela inkonzo yethu eyayisisingatha ngayo isicelo ngasinye. Apha ngezantsi kukho umfanekiso weskrini obonisa isicelo samkelwe ngaphambi kokuba impendulo iqale:

"I-Kubernetes inyuse i-latency ngamaxesha angama-10": ngubani onetyala ngale nto?

Amanani epakethi abonisiwe kuluhlu lokuqala. Ukucaca, ndiye ndabhala umbala wokuhamba kwe-TCP eyahlukileyo.

Umlambo oluhlaza oqala ngepakethe ye-328 ubonisa indlela umxhasi (172.17.22.150) aseke ngayo uxhumano lwe-TCP kwisitya (172.17.36.147). Emva kokubamba isandla sokuqala (328-330), ipakethe 331 iziswe HTTP GET /v1/.. - isicelo esingenayo kwinkonzo yethu. Yonke inkqubo ithathe i-1 ms.

Umlambo ongwevu (ukusuka kwipakethi 339) ubonisa ukuba inkonzo yethu ithumele isicelo se-HTTP kumzekelo we-Elasticsearch (akukho TCP ukuxhawula ngesandla kuba isebenzisa uqhagamshelwano olukhoyo). Oku kuthathe 18ms.

Ukuza kuthi ga ngoku yonke into ilungile, kwaye amaxesha ahambelana nokulibaziseka okulindelekileyo (20-30 ms xa kulinganiswa kumxhasi).

Nangona kunjalo, icandelo eliluhlaza okwesibhakabhaka lithatha i-86ms. Kwenzeka ntoni kuyo? Ngepakethi ye-333, inkonzo yethu ithumele isicelo se-HTTP GET ku /latest/meta-data/iam/security-credentials, kwaye ngoko nangoko emva kwayo, phezu uqhagamshelo TCP efanayo, esinye isicelo GET ukuba /latest/meta-data/iam/security-credentials/arn:...

Sifumanise ukuba oku kuphindaphindwe ngesicelo ngasinye kulo lonke umkhondo. Isisombululo se-DNS ngenene siyacotha kancinci kwizikhongozelo zethu (ingcaciso yale nto inomdla kakhulu, kodwa ndiza kuyigcinela inqaku elahlukileyo). Kwavela ukuba imbangela yokulibaziseka kwexesha elide yayiyiminxeba kwinkonzo ye-AWS Instance Metadata kwisicelo ngasinye.

I-hypothesis 2: iifowuni ezingeyomfuneko kwi-AWS

Zombini iziphelo zeze AWS Instance Metadata API. I-microservice yethu isebenzisa le nkonzo ngelixa iqhuba i-Elasticsearch. Zombini iifowuni ziyinxalenye yenkqubo yogunyaziso olusisiseko. Isiphelo esifikelelwe kwisicelo sokuqala sikhupha indima ye-IAM enxulumene nomzekelo.

/ # curl http://169.254.169.254/latest/meta-data/iam/security-credentials/
arn:aws:iam::<account_id>:role/some_role

Isicelo sesibini sibuza isiphelo sesibini seemvume zexeshana kulo mzekelo:

/ # curl http://169.254.169.254/latest/meta-data/iam/security-credentials/arn:aws:iam::<account_id>:role/some_role`
{
    "Code" : "Success",
    "LastUpdated" : "2012-04-26T16:39:16Z",
    "Type" : "AWS-HMAC",
    "AccessKeyId" : "ASIAIOSFODNN7EXAMPLE",
    "SecretAccessKey" : "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
    "Token" : "token",
    "Expiration" : "2017-05-17T15:09:54Z"
}

Umxhasi unokuzisebenzisa ixesha elifutshane kwaye kufuneka ngamaxesha athile afumane izatifikethi ezitsha (ngaphambi kokuba zibe njalo Expiration). Imodeli ilula: I-AWS ijikelezisa izitshixo zethutyana rhoqo ngenxa yezizathu zokhuseleko, kodwa abathengi bangazigcina kwi-cache imizuzu embalwa ukuhlawula isohlwayo sokusebenza esinxulumene nokufumana izatifikethi ezitsha.

I-AWS Java SDK kufuneka ithathe uxanduva lokuququzelela le nkqubo, kodwa ngenxa yesizathu esithile oku akwenzeki.

Emva kokukhangela imiba kwi-GitHub, siye safumana ingxaki #1921. Wasinceda sabona icala esimele β€œsembe” kulo ngakumbi.

I-AWS SDK ihlaziya izatifikethi xa enye yezi meko zilandelayo isenzeka:

  • Umhla WOKUPHELELWA (Expiration) Yiwa kwi EXPIRATION_THRESHOLD, ifakwe kwi-hardcode ukuya kwimizuzu eyi-15.
  • Ixesha elininzi lidlulile ukususela kwimizamo yokugqibela yokuhlaziya izatifikethi kunokuba REFRESH_THRESHOLD, i-hardcoded imizuzu engama-60.

Ukubona owona mhla wokuphelelwa kwezatifikethi esizifumanayo, siqhube le miyalelo ye-cURL ingasentla kuzo zombini isikhongozeli kunye nomzekelo we-EC2. Ixesha lokuqinisekisa lesiqinisekiso esifunyenwe kwisikhongozeli liye laba lifutshane kakhulu: imizuzu eli-15 ngqo.

Ngoku yonke into iye yacaca: kwisicelo sokuqala, inkonzo yethu yafumana izatifikethi zexeshana. Ekubeni bezingasebenzi ngaphezu kwemizuzu ye-15, i-AWS SDK iya kuthatha isigqibo sokuyihlaziya kwisicelo esilandelayo. Kwaye oku kwenzeka ngesicelo ngasinye.

Kutheni ixesha lokuqinisekiswa kwezatifikethi libe lifutshane?

I-AWS Instance Metadata yenzelwe ukusebenza kunye neemeko ze-EC2, hayi i-Kubernetes. Kwelinye icala, asizange sifune ukutshintsha ujongano lwesicelo. Kule nto sisebenzise KIAM - isixhobo esithi, usebenzisa i-agent kwi-node nganye ye-Kubernetes, ivumela abasebenzisi (iinjineli ezihambisa izicelo kwi-cluster) ukuba babele iindima ze-IAM kwiziqulathi kwiipods njengokungathi ziyimizekelo ye-EC2. I-KIAM ibamba iifowuni kwinkonzo ye-AWS Instance Metadata kwaye isebenze kwi-cache yayo, sele ifumene ngaphambili kwi-AWS. Ukusuka kwindawo yokujonga isicelo, akukho nto iguqukayo.

I-KIAM inikezela ngezatifikethi zexesha elifutshane kwiipods. Oku kunengqiqo xa kuqwalaselwa ukuba umndilili wobomi bepod bufutshane kunomzekelo weEC2. Ixesha elimiselweyo lokuqinisekisa iziqinisekiso ilingana nemizuzu eli-15 efanayo.

Ngenxa yoko, ukuba ugubungela zombini amaxabiso angagqibekanga ngaphezulu komnye, kuvela ingxaki. Isatifikethi ngasinye esinikezelwe kwisicelo siphelelwa emva kwemizuzu eli-15. Nangona kunjalo, i-AWS Java SDK inyanzelisa uhlaziyo lwaso nasiphi na isatifikethi esinemizuzu engaphantsi kwe-15 eshiyekileyo phambi komhla waso wokuphelelwa.

Ngenxa yoko, isatifikethi sexeshana sinyanzeliswa ukuba sihlaziywe ngesicelo ngasinye, esibandakanya iifowuni ezimbalwa kwi-AWS API kwaye ibangela ukunyuka okukhulu kwi-latency. Kwi-AWS Java SDK sifumene isicelo somsebenzi, ekhankanya ingxaki efanayo.

Isisombululo sabonakala silula. Simane sihlengahlengisa i-KIAM ukuze sicele izatifikethi ezinexesha elide lokuqinisekisa. Emva kokuba oku kwenzekile, izicelo zaqala ukuhamba ngaphandle kokuthatha inxaxheba kwenkonzo ye-AWS yeMetadata, kwaye i-latency yehla ukuya kumanqanaba aphantsi kunama-EC2.

ezifunyanisiweyo

Ngokusekwe kumava ethu ngokufuduka, enye yezona mithombo zixhaphakileyo zeengxaki azikho iibhugi kwi-Kubernetes okanye ezinye izinto zeqonga. Kananjalo ayijongani naziphi na iimpazamo ezisisiseko kwiinkonzo ezincinci esizihambisayo. Iingxaki zidla ngokuvela ngenxa nje yokuba sidibanisa izinto ezahlukeneyo.

Sixuba kunye iinkqubo ezinzima ezingazange zidibane kunye ngaphambili, silindele ukuba kunye ziya kwenza inkqubo enye, enkulu. Yeha, izinto ezingakumbi, indawo eninzi yeempazamo, kokukhona i-entropy iphezulu.

Kwimeko yethu, i-latency ephezulu ayizange ibe ngumphumo we-bugs okanye izigqibo ezimbi kwi-Kubernetes, KIAM, AWS Java SDK, okanye i-microservice yethu. Yayisisiphumo sokudibanisa izicwangciso ezimbini ezizimeleyo ezizimeleyo: enye kwi-KIAM, enye kwi-AWS Java SDK. Zithatyathwe ngokwahlukeneyo, zombini iiparameters zinengqiqo: umgaqo-nkqubo wohlaziyo lwesatifikethi esisebenzayo kwi-AWS Java SDK, kunye nexesha elifutshane elisemthethweni lezatifikethi kwi-KAIM. Kodwa xa uzidibanisa, iziphumo ziba zingalindelekanga. Izisombululo ezibini ezizimeleyo nezinengqiqo akufuneki zibe nentsingiselo xa zidityanisiwe.

PS evela kumguquleli

Unokufunda ngakumbi malunga noyilo lwesixhobo se-KIAM sokudibanisa i-AWS IAM kunye neKubernetes eli nqaku evela kubadali bayo.

Kwakhona funda kwibhlog yethu:

umthombo: www.habr.com

Yongeza izimvo