"Kubernetes mụbara latency site na ugboro 10": onye ga-ata ụta maka nke a?

Rịba ama. ntụgharị asụsụ.: Edemede a, nke Galo Navarro dere, bụ onye na-ejide ọnọdụ onye isi Software Engineer na ụlọ ọrụ Europe Adevinta, bụ "nyocha" na-adọrọ mmasị ma na-enye ntụziaka n'ọhịa nke arụ ọrụ akụrụngwa. Agbasawanye aha mbụ ya na ntụgharị asụsụ n'ihi ihe kpatara onye dere ya kọwara na mmalite.

"Kubernetes mụbara latency site na ugboro 10": onye ga-ata ụta maka nke a?

Rịba ama site n'aka onye edemede: Ọ dị ka ọkwa a dọtara Nlebara anya karịa ka a tụrụ anya ya. M ka na-ewe iwe na-ekwu na isiokwu nke isiokwu a na-eduhie eduhie nakwa na ụfọdụ ndị na-agụ na-ewute. Aghọtara m ihe kpatara ihe na-eme, ya mere, n'agbanyeghị ihe ize ndụ nke imebi atụmatụ ahụ dum, achọrọ m ịgwa gị ozugbo ihe isiokwu a gbasara. Otu ihe dị ịtụnanya m hụworo ka ndị otu na-akwaga Kubernetes bụ na mgbe ọ bụla nsogbu bilitere (dị ka ịba ụba latency mgbe ọpụpụ), ihe mbụ a na-ata ụta bụ Kubernetes, mana mgbe ahụ ọ na-apụta na onye na-agụ egwú abụghị n'ezie. ụta. Isiokwu a na-ekwu banyere otu ụdị ikpe ahụ. Aha ya na-ekwughachi mkpu otu n'ime ndị mmepe anyị (emesia ị ga-ahụ na Kubernetes enweghị ihe jikọrọ ya na ya). Ị gaghị ahụ mkpughe ọ bụla na-eju anya gbasara Kubernetes ebe a, mana ị nwere ike ịtụ anya nkuzi dị mma maka usoro mgbagwoju anya.

N'izu ole na ole gara aga, ndị otu m na-akwaga otu microservice na ikpo okwu bụ isi nke gụnyere CI/CD, oge ịgba ọsọ nke Kubernetes, metrik, na ihe ọma ndị ọzọ. Nkwagharị a bụ nke nnwale: anyị mere atụmatụ iwere ya dị ka ntọala wee nyefee ihe dị ka ọrụ 150 ọzọ n'ime ọnwa ndị na-abịa. Ha niile na-ahụ maka ịrụ ọrụ nke ụfọdụ nyiwe ntanetị kachasị na Spain (Infojobs, Fotocasa, wdg).

Mgbe anyị bufere ngwa ahụ na Kubernetes wee bugharịa okporo ụzọ gaa na ya, ihe ịtụnanya dị egwu chere anyị. igbu oge (latency) Arịrịọ dị na Kubernetes dị okpukpu iri karịa nke EC10. N'ozuzu, ọ dị mkpa ịchọta ngwọta maka nsogbu a, ma ọ bụ hapụ ịkwaga microservice (na, ikekwe, ọrụ dum).

Kedu ihe kpatara latency ji dị elu na Kubernetes karịa na EC2?

Iji chọta ọkpọ ahụ, anyị chịkọtara metrik n'akụkụ ụzọ arịrịọ niile. Ihe owuwu anyị dị mfe: ọnụ ụzọ API (Zuul) proxies na-arịọ arịrịọ maka microservice na EC2 ma ọ bụ Kubernetes. Na Kubernetes anyị na-eji NGINX Ingress Controller, na azụ azụ bụ ihe nkịtị dị ka Ịnye ọrụ na ngwa JVM na ikpo okwu mmiri.

                                  EC2
                            +---------------+
                            |  +---------+  |
                            |  |         |  |
                       +-------> BACKEND |  |
                       |    |  |         |  |
                       |    |  +---------+  |                   
                       |    +---------------+
             +------+  |
Public       |      |  |
      -------> ZUUL +--+
traffic      |      |  |              Kubernetes
             +------+  |    +-----------------------------+
                       |    |  +-------+      +---------+ |
                       |    |  |       |  xx  |         | |
                       +-------> NGINX +------> BACKEND | |
                            |  |       |  xx  |         | |
                            |  +-------+      +---------+ |
                            +-----------------------------+

Nsogbu a yiri ka ọ na-ejikọta na nkwụsị nke mbụ na azụ azụ (anyere m mpaghara nsogbu na eserese dị ka "xx"). Na EC2, nzaghachi ngwa ahụ were ihe dịka 20ms. Na Kubernetes, latency mụbara ruo 100-200 ms.

Anyị wepụrụ ngwa ngwa ndị a na-enyo enyo metụtara mgbanwe oge oji. Ụdị JVM ka dị otu. Nsogbu nchekwa ihe enweghịkwa ihe jikọrọ ya na ya: ngwa a na-arụ ọrụ nke ọma n'ime akpa na EC2. Na-ebugo? Mana anyị hụrụ latencies dị elu ọbụlagodi na arịrịọ 1 kwa sekọnd. Enwere ike ileghara nkwụsịtụ maka mkpofu ahịhịa.

Otu n'ime ndị ọrụ Kubernetes anyị nọ na-eche ma ngwa a nwere ndabere mpụga n'ihi na ajụjụ DNS kpatarala ụdị nsogbu ahụ n'oge gara aga.

Hypothesis 1: Mkpebi aha DNS

Maka arịrịọ ọ bụla, ngwa anyị na-enweta ihe atụ AWS Elasticsearch otu ugboro atọ na ngalaba dịka elastic.spain.adevinta.com. N'ime akpa anyị e nwere shei, yabụ anyị nwere ike ịlele ma ịchọ ngalaba na-ewe ogologo oge n'ezie.

Ajuju DNS sitere na akpa:

[root@be-851c76f696-alf8z /]# while true; do dig "elastic.spain.adevinta.com" | grep time; sleep 2; done
;; Query time: 22 msec
;; Query time: 22 msec
;; Query time: 29 msec
;; Query time: 21 msec
;; Query time: 28 msec
;; Query time: 43 msec
;; Query time: 39 msec

Arịrịọ yiri nke ahụ sitere n'otu n'ime oge EC2 ebe ngwa a na-arụ:

bash-4.4# while true; do dig "elastic.spain.adevinta.com" | grep time; sleep 2; done
;; Query time: 77 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec

N'iburu n'uche na nyocha ahụ were ihe dị ka 30ms, ọ bịara doo anya na mkpebi DNS mgbe ị na-enweta Elasticsearch na-enye aka n'ezie na mmụba nke latency.

Agbanyeghị, nke a dị ịtụnanya n'ihi ihe abụọ:

  1. Anyị enweela ọtụtụ ngwa Kubernetes na-emekọrịta ihe na akụrụngwa AWS na-enweghị nhụjuanya site na nnukwu latency. Ihe ọ bụla kpatara ya, ọ metụtara kpọmkwem na nke a.
  2. Anyị maara na JVM na-eme nchekwa nchekwa DNS na ebe nchekwa. Na onyonyo anyị, edere uru TTL na ya $JAVA_HOME/jre/lib/security/java.security ma tọọ ya na sekọnd iri: networkaddress.cache.ttl = 10. N'ikwu ya n'ụzọ ọzọ, JVM kwesịrị ịchekwa ajụjụ DNS niile maka 10 sekọnd.

Iji kwado echiche nke mbụ, anyị kpebiri ịkwụsị ịkpọ DNS ruo nwa oge wee hụ ma nsogbu ahụ apụọ. Nke mbụ, anyị kpebiri ịhazigharị ngwa ahụ ka o wee jiri adreesị IP gwa ya ozugbo na Elasticsearch, kama site na aha ngalaba. Nke a ga-achọ mgbanwe koodu yana mbugharị ọhụrụ, yabụ anyị na-edepụta ngalaba ahụ na adreesị IP ya /etc/hosts:

34.55.5.111 elastic.spain.adevinta.com

Ugbu a akpa nwetara IP fọrọ nke nta ka ozugbo. Nke a butere mmụba ụfọdụ, mana anyị dị ntakịrị nso n'ọkwa latency a tụrụ anya ya. Ọ bụ ezie na mkpebi DNS were ogologo oge, ezigbo ihe kpatara ya ka gbanahụ anyị.

Nyocha site na netwọk

Anyị kpebiri iji nyochaa okporo ụzọ si akpa tcpdumpka ịhụ ihe na-eme na netwọk:

[root@be-851c76f696-alf8z /]# tcpdump -leni any -w capture.pcap

Anyị zigaziri ọtụtụ arịrịọ wee budata ijide ha (kubectl cp my-service:/capture.pcap capture.pcap) maka nyocha ọzọ na Wireshark.

Ọ nweghị ihe na-enyo enyo gbasara ajụjụ DNS (belụsọ otu obere ihe m ga-ekwu maka ya ma emechaa). Ma, e nwere ụfọdụ ihe na-adịghị mma n’otú ozi anyị si mesoo arịrịọ nke ọ bụla. N'okpuru bụ nseta ihuenyo nke njide na-egosi na anabatara arịrịọ tupu nzaghachi amalite:

"Kubernetes mụbara latency site na ugboro 10": onye ga-ata ụta maka nke a?

E gosiri nọmba ngwugwu na kọlụm nke mbụ. Maka idoanya, agbadola m ụzọ TCP dị iche iche.

Mmiri na-acha akwụkwọ ndụ na-amalite na ngwugwu 328 na-egosi otú onye ahịa (172.17.22.150) siri guzobe njikọ TCP na akpa (172.17.36.147). Mgbe akachara nke mbụ (328-330), ngwugwu 331 wetara HTTP GET /v1/.. - arịrịọ mbata maka ọrụ anyị. Usoro niile were 1 ms.

iyi isi awọ (site na ngwugwu 339) na-egosi na ọrụ anyị zigara arịrịọ HTTP na ihe atụ Elasticsearch (enweghị aka aka TCP n'ihi na ọ na-eji njikọ dị adị). Nke a were 18ms.

Ka ọ dị ugbu a ihe niile dị mma, na oge na-adaba na oge a na-atụ anya ya (20-30 ms mgbe atụnyere onye ahịa).

Agbanyeghị, ngalaba na-acha anụnụ anụnụ na-ewe 86ms. Kedu ihe na-eme na ya? Site na ngwugwu 333, ọrụ anyị zigara arịrịọ HTTP GET na ya /latest/meta-data/iam/security-credentials, na ozugbo ọ gachara, n'otu njikọ TCP, arịrịọ GET ọzọ /latest/meta-data/iam/security-credentials/arn:...

Anyị chọpụtara na nke a na-eme ugboro ugboro na arịrịọ ọ bụla n'ime usoro ahụ. Mkpebi DNS bụ n'ezie ntakịrị nwayọ na arịa anyị (nkọwa maka ihe omume a bụ ihe na-atọ ụtọ, mana m ga-echekwa ya maka edemede dị iche). Ọ tụgharịrị na ihe kpatara ogologo oge ahụ bụ oku na-aga na ọrụ Metadata AWS na arịrịọ ọ bụla.

Hypothesis 2: oku na-enweghị isi na AWS

Ebe njedebe abụọ bụ nke API metadata AWS. Microservice anyị na-eji ọrụ a ka ọ na-agba ọsọ Elasticsearch. Oku abụọ a bụ akụkụ nke usoro ikike ikike. Ebe njedebe a na-enweta na arịrịọ mbụ na-enye ọrụ IAM metụtara ihe atụ ahụ.

/ # curl http://169.254.169.254/latest/meta-data/iam/security-credentials/
arn:aws:iam::<account_id>:role/some_role

Arịrịọ nke abụọ jụrụ njedebe nke abụọ maka ikike nwa oge maka ihe atụ a:

/ # curl http://169.254.169.254/latest/meta-data/iam/security-credentials/arn:aws:iam::<account_id>:role/some_role`
{
    "Code" : "Success",
    "LastUpdated" : "2012-04-26T16:39:16Z",
    "Type" : "AWS-HMAC",
    "AccessKeyId" : "ASIAIOSFODNN7EXAMPLE",
    "SecretAccessKey" : "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
    "Token" : "token",
    "Expiration" : "2017-05-17T15:09:54Z"
}

Onye ahịa nwere ike iji ha obere oge ma ga-enweta asambodo ọhụrụ kwa oge (tupu ha Expiration). Ihe nlereanya ahụ dị mfe: AWS na-atụgharị igodo nwa oge ugboro ugboro maka ihe nchekwa, mana ndị ahịa nwere ike ịchekwa ha maka nkeji ole na ole iji kwụọ ụgwọ ntaramahụhụ arụmọrụ jikọtara na ịnweta asambodo ọhụrụ.

AWS Java SDK kwesịrị weghara ibu ọrụ maka ịhazi usoro a, mana n'ihi ihe ụfọdụ nke a anaghị eme.

Mgbe anyị nyochachara okwu na GitHub, anyị hụrụ otu nsogbu #1921. O nyeere anyị aka ikpebi ụzọ anyị ga-esi 'gwurukwuo ala'.

AWS SDK na-emelite asambodo mgbe otu n'ime ọnọdụ ndị a mere:

  • Ụbọchị mmebi (Expiration) Daba n'ime EXPIRATION_THRESHOLD, nke siri ike maka nkeji iri na ise.
  • More oge agafeela kemgbe ikpeazụ mgbalị ime ọhụrụ asambodo karịa REFRESH_THRESHOLD, hardcode maka nkeji iri isii.

Iji hụ ụbọchị njedebe nke asambodo anyị na-enweta, anyị gbara iwu cURL dị n'elu site na akpa na ihe atụ EC2. Oge nkwado nke akwụkwọ ahụ nwetara site na akpa ahụ wee bụrụ obere mkpụmkpụ: kpọmkwem 15 nkeji.

Ugbu a ihe niile apụtala nke ọma: maka arịrịọ mbụ, ọrụ anyị nwetara asambodo nwa oge. Ebe ha abaghị uru karịa nkeji iri na ise, AWS SDK kpebiri imelite ha oge ọzọ a rịọrọ ha. Ma nke a mere na arịrịọ ọ bụla.

Kedu ihe kpatara oge nkwado nke asambodo ji dị mkpụmkpụ?

Emebere metadata nke AWS ka ọ rụọ ọrụ na oge EC2, ọ bụghị Kubernetes. N'aka nke ọzọ, anyị achọghị ịgbanwe interface ngwa. Maka nke a anyị na-eji KIAM - ngwá ọrụ nke, na-eji ndị ọrụ na onye ọ bụla Kubernetes ọnụ, na-enye ndị ọrụ ohere (ndị injinia na-ebuga ngwa na ụyọkọ) ikenye ọrụ IAM n'ime akpa dị na pọd dị ka a ga-asị na ha bụ oge EC2. KIAM na-egbochi oku na ọrụ Metadata AWS ma hazie ya na cache ya, ebe ọ nataburu ha site na AWS. Site na ngwa ngwa, ọ dịghị ihe na-agbanwe.

KIAM na-enye mpempe akwụkwọ asambodo obere oge na pọd. Nke a nwere ezi uche n'ịtụle na nkezi ndụ nke pọd dị mkpụmkpụ karịa ihe atụ EC2. Oge nkwado ndabere maka asambodo hà ka otu nkeji iri na ise.

N'ihi ya, ọ bụrụ na ị na-ekpuchi ụkpụrụ abụọ ndabara n'elu ibe gị, nsogbu na-ebilite. Asambodo ọ bụla enyere ngwa na-agwụ mgbe nkeji iri na ise gachara. Agbanyeghị, AWS Java SDK na-amanye mmeghari nke asambodo ọ bụla nwere ihe na-erughị nkeji iri na ise tupu ụbọchị ngafe ya.

N'ihi ya, a na-amanye akwụkwọ ikike nwa oge ka emegharịa ya site na arịrịọ ọ bụla, nke gụnyere oku ole na ole na API AWS ma na-ebute mmụba dị ukwuu na latency. Na AWS Java SDK anyị hụrụ atụmatụ arịrịọ, bụ́ nke kwuru banyere nsogbu yiri nke ahụ.

Ngwọta ahụ tụgharịrị dị mfe. Anyị na-ahazigharị KIAM ka ọ rịọ asambodo nwere ogologo oge nkwado. Ozugbo nke a mere, arịrịọ malitere ịbata na-enweghị ntinye aka nke ọrụ Metadata AWS, na latency gbadara ruo ọbụna obere ọkwa karịa na EC2.

Nchoputa

Dabere na ahụmịhe anyị na mbugharị, otu n'ime isi mmalite nke nsogbu abụghị ahụhụ na Kubernetes ma ọ bụ ihe ndị ọzọ nke ikpo okwu. Ọ naghị ekwupụtakwa ntụpọ ọ bụla dị na microservices anyị na-ebufe. Nsogbu na-ebilitekarị nanị n'ihi na anyị na-ejikọta ihe dị iche iche ọnụ.

Anyị na-agwakọta usoro mgbagwoju anya nke na-enwetụbeghị mmekọrịta ọ bụla na mbụ, na-atụ anya na ọnụ ha ga-etolite otu usoro buru ibu. Ewoo, ihe ndị ọzọ, na-enwekwu ohere maka njehie, nke dị elu nke entropy.

N'ọnọdụ anyị, nnukwu latency abụghị n'ihi ahụhụ ma ọ bụ mkpebi ọjọọ na Kubernetes, KIAM, AWS Java SDK, ma ọ bụ microservice anyị. Ọ bụ nsonaazụ nke ijikọta ntọala ndabara abụọ nwere onwe ha: otu na KIAM, nke ọzọ na AWS Java SDK. Ewepụtara ya iche, akụkụ abụọ ahụ nwere ezi uche: amụma mmeghari akwụkwọ anamachọihe na AWS Java SDK, yana obere oge asambodo dị na KAIM. Ma mgbe ị na-ejikọta ha ọnụ, ihe ga-esi na ya pụta na-aghọ ihe a na-atụghị anya ya. Ngwọta abụọ nọọrọ onwe ya na ezi uche agaghị enwe isi mgbe ejikọtara ya.

PS sitere na onye ntụgharị

Ị nwere ike ịmụtakwu banyere ihe owuwu nke ụlọ ọrụ KIAM maka ijikọ AWS IAM na Kubernetes na isiokwu a site n'aka ndị okike ya.

Gụọkwa na blọọgụ anyị:

isi: www.habr.com

Tinye a comment