Phawula. transl.: Eli nqaku, elibhalwe nguGalo Navarro, ophethe isikhundla seNjineli yeSoftware eyiNqununu kwinkampani yaseYurophu i-Adevinta, "uphando" olunomdla kunye nokufundisa kwinkalo yemisebenzi yeziseko. Isihloko sayo sokuqala sandiswa kancinci ekuguquleleni ngesizathu esichazwa ngumbhali kwasekuqaleni.
Inqaku elivela kumbhali: Ijongeka njengale post
Kwiiveki ezimbalwa ezidlulileyo, iqela lam lalifuduka i-microservice enye ukuya kwiqonga elingundoqo elibandakanya i-CI / CD, ixesha lokugijima elisekelwe ku-Kubernetes, i-metrics, kunye nezinye izinto ezilungileyo. Eli nyathelo ibiyeyohlobo lovavanyo: besiceba ukulithatha njengesiseko kwaye sidlulisele malunga ne-150 yeenkonzo ezingakumbi kwiinyanga ezizayo. Bonke banoxanduva lokusebenza kwamanye amaqonga amakhulu e-intanethi eSpain (Infojobs, Fotocasa, njl.).
Emva kokuba sithumele isicelo ku-Kubernetes kwaye saphinda sathumela itrafikhi kuyo, sasilindelwe yinto eyothusayo. Ukulibazisa (ukubambezeleka) izicelo kwi-Kubernetes zaziphezulu ngamaxesha angama-10 kune-EC2. Ngokubanzi, kwakuyimfuneko ukuba ufumane isisombululo kule ngxaki, okanye ulahle ukufuduka kwe-microservice (kwaye, mhlawumbi, yonke iprojekthi).
Kutheni i-latency iphezulu kakhulu e-Kubernetes kune-EC2?
Ukufumana i-bottleneck, siqokelele iimetrics kuyo yonke indlela yesicelo. I-architecture yethu ilula: i-API gateway (Zuul) i-proxies icela kwiimeko ze-microservice kwi-EC2 okanye i-Kubernetes. Kwi-Kubernetes sisebenzisa i-NGINX Ingress Controller, kwaye i-backends zizinto eziqhelekileyo ezifana
EC2
+---------------+
| +---------+ |
| | | |
+-------> BACKEND | |
| | | | |
| | +---------+ |
| +---------------+
+------+ |
Public | | |
-------> ZUUL +--+
traffic | | | Kubernetes
+------+ | +-----------------------------+
| | +-------+ +---------+ |
| | | | xx | | |
+-------> NGINX +------> BACKEND | |
| | | xx | | |
| +-------+ +---------+ |
+-----------------------------+
Ingxaki ibonakala ngathi inxulumene ne-latency yokuqala kwi-backend (ndiphawule indawo yengxaki kwigrafu njenge "xx"). Kwi-EC2, impendulo yesicelo ithathe malunga ne-20ms. Kwi-Kubernetes, i-latency yanda ukuya kwi-100-200 ms.
Sikhuphe ngokukhawuleza abarhanelwa abanokubakho malunga nokutshintsha kwexesha lokuqhuba. Inguqulo ye-JVM ihlala ifana. Iingxaki zokufakwa komgqomo nazo zazingenanto yakwenza nayo: isicelo sasisele sisebenza ngempumelelo kwizikhongozeli ezikwi-EC2. Iyalayisha? Kodwa siye sabona ukubambezeleka okuphezulu nakwisicelo esinye ngomzuzwana. Ukunqumama kokuqokelelwa kwenkunkuma nako kunokungahoywa.
Omnye wethu we-Kubernetes admins wazibuza ukuba isicelo sinokuxhomekeka kwangaphandle kuba imibuzo ye-DNS ibangele imiba efanayo kwixesha elidlulileyo.
I-hypothesis 1: isisombululo segama le-DNS
Kwisicelo ngasinye, isicelo sethu sifikelela kumzekelo we-AWS Elasticsearch enye ukuya kathathu kwisizinda esinje elastic.spain.adevinta.com
. Ngaphakathi kwezikhongozeli zethu
Imibuzo ye-DNS kwisikhongozeli:
[root@be-851c76f696-alf8z /]# while true; do dig "elastic.spain.adevinta.com" | grep time; sleep 2; done
;; Query time: 22 msec
;; Query time: 22 msec
;; Query time: 29 msec
;; Query time: 21 msec
;; Query time: 28 msec
;; Query time: 43 msec
;; Query time: 39 msec
Izicelo ezifanayo kwenye yeemeko zeEC2 apho isicelo siqhuba:
bash-4.4# while true; do dig "elastic.spain.adevinta.com" | grep time; sleep 2; done
;; Query time: 77 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
Ukuthathela ingqalelo ukuba ukujonga kuthathe malunga ne-30ms, kwacaca ukuba isisombululo se-DNS xa ufikelela kwi-Elasticsearch eneneni yayinegalelo ekonyukeni kwe-latency.
Nangona kunjalo, oku kwakungaqhelekanga ngenxa yezizathu ezibini:
- Sele sinetoni yezicelo ze-Kubernetes ezisebenzisana nezixhobo ze-AWS ngaphandle kokubandezeleka kwi-latency ephezulu. Nokuba siyintoni na isizathu, inxulumene ngokuthe ngqo kule meko.
- Siyazi ukuba i-JVM yenza kwi-memory DNS caching. Kwimifanekiso yethu, ixabiso le-TTL libhalwe kuyo
$JAVA_HOME/jre/lib/security/java.security
kwaye usete kwimizuzwana eyi-10:networkaddress.cache.ttl = 10
. Ngamanye amazwi, i-JVM kufuneka igcine yonke imibuzo ye-DNS imizuzwana eli-10.
Ukuqinisekisa i-hypothesis yokuqala, sagqiba ekubeni siyeke ukubiza i-DNS ixesha elithile kwaye sibone ukuba ingxaki ihambe. Okokuqala, sigqibe ekubeni sihlengahlengise isicelo ukuze sinxibelelane ngokuthe ngqo ne-Elasticsearch ngedilesi ye-IP, kunokuba sisebenzise igama lesizinda. Oku kuya kufuna utshintsho lwekhowudi kunye nokusasazwa okutsha, ke simane senza imephu yendawo kwidilesi yayo ye-IP /etc/hosts
:
34.55.5.111 elastic.spain.adevinta.com
Ngoku isikhongozeli sifumene i-IP phantse ngoko nangoko. Oku kubangele ukuphucuka okuthile, kodwa sasisondele kancinane kumanqanaba alindelekileyo okulinda. Nangona isisombululo se-DNS sithathe ixesha elide, esona sizathu sisasibaleka.
Uxilongo ngenethiwekhi
Sagqiba ekubeni sihlalutye i-traffic kwi-container usebenzisa tcpdump
ukubona ukuba kwenzeka ntoni kanye kanye kwinethiwekhi:
[root@be-851c76f696-alf8z /]# tcpdump -leni any -w capture.pcap
Emva koko sathumela izicelo ezininzi kwaye sakhuphela ukubanjwa kwazo (kubectl cp my-service:/capture.pcap capture.pcap
) kuhlalutyo olongezelelweyo kwi
Kwakungekho nto ikrokrisayo malunga nemibuzo ye-DNS (ngaphandle kwento encinci endiya kuthetha ngayo kamva). Kodwa kwakukho izinto ezingaqhelekanga kwindlela inkonzo yethu eyayisisingatha ngayo isicelo ngasinye. Apha ngezantsi kukho umfanekiso weskrini obonisa isicelo samkelwe ngaphambi kokuba impendulo iqale:
Amanani epakethi abonisiwe kuluhlu lokuqala. Ukucaca, ndiye ndabhala umbala wokuhamba kwe-TCP eyahlukileyo.
Umlambo oluhlaza oqala ngepakethe ye-328 ubonisa indlela umxhasi (172.17.22.150) aseke ngayo uxhumano lwe-TCP kwisitya (172.17.36.147). Emva kokubamba isandla sokuqala (328-330), ipakethe 331 iziswe HTTP GET /v1/..
- isicelo esingenayo kwinkonzo yethu. Yonke inkqubo ithathe i-1 ms.
Umlambo ongwevu (ukusuka kwipakethi 339) ubonisa ukuba inkonzo yethu ithumele isicelo se-HTTP kumzekelo we-Elasticsearch (akukho TCP ukuxhawula ngesandla kuba isebenzisa uqhagamshelwano olukhoyo). Oku kuthathe 18ms.
Ukuza kuthi ga ngoku yonke into ilungile, kwaye amaxesha ahambelana nokulibaziseka okulindelekileyo (20-30 ms xa kulinganiswa kumxhasi).
Nangona kunjalo, icandelo eliluhlaza okwesibhakabhaka lithatha i-86ms. Kwenzeka ntoni kuyo? Ngepakethi ye-333, inkonzo yethu ithumele isicelo se-HTTP GET ku /latest/meta-data/iam/security-credentials
, kwaye ngoko nangoko emva kwayo, phezu uqhagamshelo TCP efanayo, esinye isicelo GET ukuba /latest/meta-data/iam/security-credentials/arn:..
.
Sifumanise ukuba oku kuphindaphindwe ngesicelo ngasinye kulo lonke umkhondo. Isisombululo se-DNS ngenene siyacotha kancinci kwizikhongozelo zethu (ingcaciso yale nto inomdla kakhulu, kodwa ndiza kuyigcinela inqaku elahlukileyo). Kwavela ukuba imbangela yokulibaziseka kwexesha elide yayiyiminxeba kwinkonzo ye-AWS Instance Metadata kwisicelo ngasinye.
I-hypothesis 2: iifowuni ezingeyomfuneko kwi-AWS
Zombini iziphelo zeze
/ # curl http://169.254.169.254/latest/meta-data/iam/security-credentials/
arn:aws:iam::<account_id>:role/some_role
Isicelo sesibini sibuza isiphelo sesibini seemvume zexeshana kulo mzekelo:
/ # curl http://169.254.169.254/latest/meta-data/iam/security-credentials/arn:aws:iam::<account_id>:role/some_role`
{
"Code" : "Success",
"LastUpdated" : "2012-04-26T16:39:16Z",
"Type" : "AWS-HMAC",
"AccessKeyId" : "ASIAIOSFODNN7EXAMPLE",
"SecretAccessKey" : "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"Token" : "token",
"Expiration" : "2017-05-17T15:09:54Z"
}
Umxhasi unokuzisebenzisa ixesha elifutshane kwaye kufuneka ngamaxesha athile afumane izatifikethi ezitsha (ngaphambi kokuba zibe njalo Expiration
). Imodeli ilula: I-AWS ijikelezisa izitshixo zethutyana rhoqo ngenxa yezizathu zokhuseleko, kodwa abathengi bangazigcina kwi-cache imizuzu embalwa ukuhlawula isohlwayo sokusebenza esinxulumene nokufumana izatifikethi ezitsha.
I-AWS Java SDK kufuneka ithathe uxanduva lokuququzelela le nkqubo, kodwa ngenxa yesizathu esithile oku akwenzeki.
Emva kokukhangela imiba kwi-GitHub, siye safumana ingxaki
I-AWS SDK ihlaziya izatifikethi xa enye yezi meko zilandelayo isenzeka:
- Umhla WOKUPHELELWA (
Expiration
) Yiwa kwiEXPIRATION_THRESHOLD
, ifakwe kwi-hardcode ukuya kwimizuzu eyi-15. - Ixesha elininzi lidlulile ukususela kwimizamo yokugqibela yokuhlaziya izatifikethi kunokuba
REFRESH_THRESHOLD
, i-hardcoded imizuzu engama-60.
Ukubona owona mhla wokuphelelwa kwezatifikethi esizifumanayo, siqhube le miyalelo ye-cURL ingasentla kuzo zombini isikhongozeli kunye nomzekelo we-EC2. Ixesha lokuqinisekisa lesiqinisekiso esifunyenwe kwisikhongozeli liye laba lifutshane kakhulu: imizuzu eli-15 ngqo.
Ngoku yonke into iye yacaca: kwisicelo sokuqala, inkonzo yethu yafumana izatifikethi zexeshana. Ekubeni bezingasebenzi ngaphezu kwemizuzu ye-15, i-AWS SDK iya kuthatha isigqibo sokuyihlaziya kwisicelo esilandelayo. Kwaye oku kwenzeka ngesicelo ngasinye.
Kutheni ixesha lokuqinisekiswa kwezatifikethi libe lifutshane?
I-AWS Instance Metadata yenzelwe ukusebenza kunye neemeko ze-EC2, hayi i-Kubernetes. Kwelinye icala, asizange sifune ukutshintsha ujongano lwesicelo. Kule nto sisebenzise
I-KIAM inikezela ngezatifikethi zexesha elifutshane kwiipods. Oku kunengqiqo xa kuqwalaselwa ukuba umndilili wobomi bepod bufutshane kunomzekelo weEC2. Ixesha elimiselweyo lokuqinisekisa iziqinisekiso
Ngenxa yoko, ukuba ugubungela zombini amaxabiso angagqibekanga ngaphezulu komnye, kuvela ingxaki. Isatifikethi ngasinye esinikezelwe kwisicelo siphelelwa emva kwemizuzu eli-15. Nangona kunjalo, i-AWS Java SDK inyanzelisa uhlaziyo lwaso nasiphi na isatifikethi esinemizuzu engaphantsi kwe-15 eshiyekileyo phambi komhla waso wokuphelelwa.
Ngenxa yoko, isatifikethi sexeshana sinyanzeliswa ukuba sihlaziywe ngesicelo ngasinye, esibandakanya iifowuni ezimbalwa kwi-AWS API kwaye ibangela ukunyuka okukhulu kwi-latency. Kwi-AWS Java SDK sifumene
Isisombululo sabonakala silula. Simane sihlengahlengisa i-KIAM ukuze sicele izatifikethi ezinexesha elide lokuqinisekisa. Emva kokuba oku kwenzekile, izicelo zaqala ukuhamba ngaphandle kokuthatha inxaxheba kwenkonzo ye-AWS yeMetadata, kwaye i-latency yehla ukuya kumanqanaba aphantsi kunama-EC2.
ezifunyanisiweyo
Ngokusekwe kumava ethu ngokufuduka, enye yezona mithombo zixhaphakileyo zeengxaki azikho iibhugi kwi-Kubernetes okanye ezinye izinto zeqonga. Kananjalo ayijongani naziphi na iimpazamo ezisisiseko kwiinkonzo ezincinci esizihambisayo. Iingxaki zidla ngokuvela ngenxa nje yokuba sidibanisa izinto ezahlukeneyo.
Sixuba kunye iinkqubo ezinzima ezingazange zidibane kunye ngaphambili, silindele ukuba kunye ziya kwenza inkqubo enye, enkulu. Yeha, izinto ezingakumbi, indawo eninzi yeempazamo, kokukhona i-entropy iphezulu.
Kwimeko yethu, i-latency ephezulu ayizange ibe ngumphumo we-bugs okanye izigqibo ezimbi kwi-Kubernetes, KIAM, AWS Java SDK, okanye i-microservice yethu. Yayisisiphumo sokudibanisa izicwangciso ezimbini ezizimeleyo ezizimeleyo: enye kwi-KIAM, enye kwi-AWS Java SDK. Zithatyathwe ngokwahlukeneyo, zombini iiparameters zinengqiqo: umgaqo-nkqubo wohlaziyo lwesatifikethi esisebenzayo kwi-AWS Java SDK, kunye nexesha elifutshane elisemthethweni lezatifikethi kwi-KAIM. Kodwa xa uzidibanisa, iziphumo ziba zingalindelekanga. Izisombululo ezibini ezizimeleyo nezinengqiqo akufuneki zibe nentsingiselo xa zidityanisiwe.
PS evela kumguquleli
Unokufunda ngakumbi malunga noyilo lwesixhobo se-KIAM sokudibanisa i-AWS IAM kunye neKubernetes
Kwakhona funda kwibhlog yethu:
- Β«
Amabali ama-3 okusilela kwe-Kubernetes kwimveliso: anti-affinity, ukuvalwa okuthandekayo, i-webhook "; - Β«
Indlela izinto eziphambili ze-pod eKubernetes ezibangele ngayo ixesha lokungasebenzi kwiiLabhu zaseGrafana "; - Β«
6 inkqubo yokonwabisa bugs ekusebenzeni kweKubernetes [kunye nesisombululo sabo] "; - Β«
Amabali ama-6 asebenzayo asuka kubomi bethu bemihla ngemihla be-SRE Β».
umthombo: www.habr.com