"Ua hoʻonui ʻo Kubernetes i ka latency e 10 mau manawa": ʻo wai ka hewa no kēia?

Nānā. unuhi.: ʻO kēia ʻatikala, i kākau ʻia e Galo Navarro, ʻo ia ka mea e paʻa ana i ke kūlana ʻo Principal Software Engineer ma ka hui ʻEulopa ʻo Adevinta, he "hoʻokolokolo" hoihoi a hoʻonaʻauao ma ke kahua o ka hana ʻoihana. Ua hoʻonui iki ʻia kona poʻo inoa kumu ma ka unuhi ʻana no ke kumu i wehewehe ai ka mea kākau i ka hoʻomaka.

"Ua hoʻonui ʻo Kubernetes i ka latency e 10 mau manawa": ʻo wai ka hewa no kēia?

Hoʻomaopopo mai ka mea kākau: Me he mea la keia pou huki ʻia ʻoi aku ka nui o ka nānā ʻana ma mua o ka mea i manaʻo ʻia. Ke hoʻomau nei au i nā ʻōlelo huhū no ka hoʻopunipuni ke poʻo o ka ʻatikala a ke kaumaha nei kekahi poʻe heluhelu. Hoʻomaopopo wau i nā kumu o ka mea e hana nei, no laila, ʻoiai ka pilikia o ka hoʻopau ʻana i ka hoʻopunipuni holoʻokoʻa, makemake wau e haʻi koke iā ʻoe i ke ʻano o kēia ʻatikala. ʻO kahi mea kupanaha aʻu i ʻike ai i ka neʻe ʻana o nā hui i Kubernetes, ʻo ia ka manawa e kū mai ai kahi pilikia (e like me ka piʻi ʻana o ka latency ma hope o ka neʻe ʻana), ʻo ka mea mua e hoʻopiʻi ʻia ʻo Kubernetes, akā ua ʻike ʻia ʻaʻole ʻo ka mea hoʻokani pila. hewa. Hōʻike kēia ʻatikala e pili ana i kekahi o ia hihia. Hoʻopuka hou kona inoa i ka ʻōlelo ʻana o kekahi o kā mākou mea hoʻomohala (ma hope e ʻike ʻoe ʻaʻole pili ʻo Kubernetes me ia). ʻAʻole ʻoe e ʻike i nā hōʻike kupanaha e pili ana iā Kubernetes ma aneʻi, akā hiki iā ʻoe ke manaʻo i kekahi mau haʻawina maikaʻi e pili ana i nā ʻōnaehana paʻakikī.

ʻO kekahi mau pule i hala aku nei, ke neʻe nei kaʻu hui i kahi microservice hoʻokahi i kahi kahua koʻikoʻi e komo pū ana me CI/CD, kahi manawa holo Kubernetes, metric, a me nā mea maikaʻi ʻē aʻe. He ʻano hoʻāʻo ka neʻe ʻana: ua hoʻolālā mākou e lawe iā ia i kumu a hoʻololi i kahi 150 mau lawelawe hou aʻe i nā mahina e hiki mai ana. ʻO lākou a pau ke kuleana no ka hana o kekahi o nā pūnaewele pūnaewele nui loa ma Sepania (Infojobs, Fotocasa, etc.).

Ma hope o ko mākou kau ʻana i ka palapala noi iā Kubernetes a hoʻihoʻi hou i kekahi mau kaʻa i laila, ua kali mai kahi pīhoihoi weliweli iā mākou. Hoʻopaneʻe (laency) ʻoi aku ka nui o nā noi ma Kubernetes ma mua o EC10. Ma keʻano laulā, pono e ʻimi i kahi hopena i kēia pilikia, a haʻalele paha i ka neʻe ʻana o ka microservice (a, hiki paha i ka papahana holoʻokoʻa).

No ke aha i ʻoi aku ka kiʻekiʻe o ka latency ma Kubernetes ma mua o EC2?

No ka huli ʻana i ka bottleneck, ua ʻohi mākou i nā ana ma ke ala noi holoʻokoʻa. He mea maʻalahi kā mākou hoʻolālā: noi ʻia kahi puka API (Zuul) i nā mea lawelawe microservice ma EC2 a i ʻole Kubernetes. Ma Kubernetes hoʻohana mākou i ka NGINX Ingress Controller, a ʻo nā hope he mau mea maʻamau kuhikuhi i ka hana me kahi noi JVM ma ka papahana Spring.

                                  EC2
                            +---------------+
                            |  +---------+  |
                            |  |         |  |
                       +-------> BACKEND |  |
                       |    |  |         |  |
                       |    |  +---------+  |                   
                       |    +---------------+
             +------+  |
Public       |      |  |
      -------> ZUUL +--+
traffic      |      |  |              Kubernetes
             +------+  |    +-----------------------------+
                       |    |  +-------+      +---------+ |
                       |    |  |       |  xx  |         | |
                       +-------> NGINX +------> BACKEND | |
                            |  |       |  xx  |         | |
                            |  +-------+      +---------+ |
                            +-----------------------------+

Me he mea lā ua pili ka pilikia i ka latency mua ma ka hope (ua kaha au i ka wahi pilikia ma ka pakuhi e like me "xx"). Ma EC2, lawe ʻia ka pane noi ma kahi o 20ms. Ma Kubernetes, hoʻonui ka latency i 100-200 ms.

Hoʻokuʻu koke mākou i nā mea kānalua e pili ana i ka hoʻololi ʻana i ka wā holo. Ua mau ka mana JVM. ʻAʻohe mea pili i ka pilikia o ka pahu pahu: ua holo pono ka noi ma nā pahu ma EC2. Ke hoʻouka nei? Akā ua nānā mākou i nā latencies kiʻekiʻe ma ka noi 1 i kēlā me kēia kekona. Hiki ke mālama ʻole ʻia nā hoʻomaha no ka ʻohi ʻōpala.

Ua noʻonoʻo kekahi o kā mākou mau mea hoʻokele Kubernetes inā loaʻa nā mea hilinaʻi i waho o ka noi no ka mea ua hana nā nīnau DNS i nā pilikia like i ka wā ma mua.

Kuhiakau 1: Hoʻoholo inoa DNS

No kēlā me kēia noi, loaʻa kā mākou noi i kahi hiʻohiʻona AWS Elasticsearch i hoʻokahi a ʻekolu mau manawa ma kahi kikowaena like elastic.spain.adevinta.com. I loko o kā mākou mau pahu he pūpū, no laila hiki iā mākou ke nānā inā lōʻihi ka ʻimi ʻana i kahi kikowaena.

Nā nīnau DNS mai ka pahu:

[root@be-851c76f696-alf8z /]# while true; do dig "elastic.spain.adevinta.com" | grep time; sleep 2; done
;; Query time: 22 msec
;; Query time: 22 msec
;; Query time: 29 msec
;; Query time: 21 msec
;; Query time: 28 msec
;; Query time: 43 msec
;; Query time: 39 msec

Nā noi like mai kekahi o nā manawa EC2 kahi e holo ai ka noi:

bash-4.4# while true; do dig "elastic.spain.adevinta.com" | grep time; sleep 2; done
;; Query time: 77 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec

I ka noʻonoʻo ʻana ua lawe ʻia ka nānā ʻana ma kahi o 30ms, ua ʻike ʻia ʻo ka hoʻonā DNS i ka wā e komo ai iā Elasticsearch ke hāʻawi maoli nei i ka piʻi ʻana o ka latency.

Eia naʻe, he mea ʻē kēia no nā kumu ʻelua:

  1. Loaʻa iā mākou he ton o nā noi Kubernetes e launa pū me nā kumuwaiwai AWS me ka ʻole o ka loaʻa ʻana o ka latency kiʻekiʻe. ʻO ke kumu, pili pono i kēia hihia.
  2. Ua ʻike mākou e hana ana ka JVM i ka cache DNS me ka hoʻomanaʻo. Ma kā mākou mau kiʻi, ua kākau ʻia ka waiwai TTL $JAVA_HOME/jre/lib/security/java.security a hoʻonoho i 10 kekona: networkaddress.cache.ttl = 10. I nā huaʻōlelo ʻē aʻe, pono ka JVM e hūnā i nā nīnau DNS āpau no 10 kekona.

No ka hōʻoia ʻana i ka hypothesis mua, ua hoʻoholo mākou e hoʻōki i ke kāhea ʻana iā DNS no kekahi manawa a ʻike inā ua pau ka pilikia. ʻO ka mea mua, ua hoʻoholo mākou e hoʻonohonoho hou i ka noi i hiki ke kamaʻilio pololei me Elasticsearch e ka IP address, ma mua o ka inoa inoa. Pono kēia i nā hoʻololi code a me kahi hoʻolālā hou, no laila ua paʻi wale mākou i ka domain i kāna IP address /etc/hosts:

34.55.5.111 elastic.spain.adevinta.com

I kēia manawa ua loaʻa i ka ipu kahi IP kokoke koke. Ua hopena kēia i ka hoʻomaikaʻi ʻana, akā ua kokoke mākou i nā pae latency i manaʻo ʻia. ʻOiai ua lōʻihi ka lōʻihi o ka hoʻonā DNS, ʻaʻole naʻe ke kumu maoli iā mākou.

Nā diagnostics ma o ka pūnaewele

Ua hoʻoholo mākou e kālailai i nā kaʻa mai ka pahu e hoʻohana ana tcpdumpe ʻike i ka mea e hana nei ma ka pūnaewele:

[root@be-851c76f696-alf8z /]# tcpdump -leni any -w capture.pcap

A laila hoʻouna mākou i kekahi mau noi a hoʻoiho i kā lākou hopu (kubectl cp my-service:/capture.pcap capture.pcap) no ka nānā hou aku ma Wireshark.

ʻAʻohe mea kānalua e pili ana i nā nīnau DNS (koe wale nō kahi mea liʻiliʻi aʻu e kamaʻilio ai ma hope). Akā aia kekahi mau mea ʻē aʻe i ke ʻano o kā mākou lawelawe lawelawe ʻana i kēlā me kēia noi. Aia ma lalo kahi kiʻi paʻi kiʻi e hōʻike ana i ka noi ʻana ma mua o ka hoʻomaka ʻana o ka pane:

"Ua hoʻonui ʻo Kubernetes i ka latency e 10 mau manawa": ʻo wai ka hewa no kēia?

Hōʻike ʻia nā helu pūʻolo ma ke kolamu mua. No ka akaka, ua kala wau i nā kahe TCP like ʻole.

ʻO ke kahawai ʻōmaʻomaʻo e hoʻomaka ana me ka packet 328 e hōʻike ana i ka hoʻokumu ʻana o ka mea kūʻai (172.17.22.150) i kahi pilina TCP i ka ipu (172.17.36.147). Ma hope o ka lulu lima mua (328-330), lawe ʻia ka pūʻolo 331 HTTP GET /v1/.. - he noi e komo mai ana i kā mākou lawelawe. ʻO ka hana holoʻokoʻa i lawe i 1 ms.

Hōʻike ke kahawai hina (mai ka packet 339) ua hoʻouna kā mākou lawelawe i kahi noi HTTP i ka laʻana Elasticsearch (ʻaʻohe TCP lima lima no ka mea ke hoʻohana nei ia i kahi pilina pili). He 18ms keia.

I kēia manawa ua maikaʻi nā mea a pau, a ua like nā manawa me nā lohi i manaʻo ʻia (20-30 ms ke ana ʻia mai ka mea kūʻai aku).

Eia naʻe, lawe ka ʻāpana polū i 86ms. He aha ka hana i loko? Me ka packet 333, ua hoʻouna kā mākou lawelawe i kahi noi HTTP GET iā /latest/meta-data/iam/security-credentials, a ma hope koke iho, ma luna o ka pilina TCP like, noi hou GET i /latest/meta-data/iam/security-credentials/arn:...

Ua ʻike mākou ua hana hou ʻia kēia me kēlā me kēia noi a puni ka trace. ʻOi aku ka liʻiliʻi o ka hoʻonā DNS i loko o kā mākou mau ipu (he mea hoihoi loa ka wehewehe ʻana no kēia hanana, akā e mālama wau no kahi ʻatikala ʻokoʻa). Ua ʻike ʻia ʻo ke kumu o ka lohi lōʻihi ke kelepona ʻana i ka lawelawe AWS Instance Metadata ma kēlā me kēia noi.

Kuhiakau 2: nā kelepona pono ʻole iā AWS

No nā helu hope ʻelua AWS Instance Metadata API. Hoʻohana kā mākou microservice i kēia lawelawe ʻoiai e holo ana iā Elasticsearch. ʻO nā kelepona ʻelua he ʻāpana o ke kaʻina hana ʻae kumu. ʻO ka hopena i loaʻa ma ka noi mua e hoʻopuka i ke kuleana IAM e pili ana i ka laʻana.

/ # curl http://169.254.169.254/latest/meta-data/iam/security-credentials/
arn:aws:iam::<account_id>:role/some_role

Nīnau ka lua o ka noi i ka hope ʻelua no nā ʻae pōkole no kēia manawa:

/ # curl http://169.254.169.254/latest/meta-data/iam/security-credentials/arn:aws:iam::<account_id>:role/some_role`
{
    "Code" : "Success",
    "LastUpdated" : "2012-04-26T16:39:16Z",
    "Type" : "AWS-HMAC",
    "AccessKeyId" : "ASIAIOSFODNN7EXAMPLE",
    "SecretAccessKey" : "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
    "Token" : "token",
    "Expiration" : "2017-05-17T15:09:54Z"
}

Hiki i ka mea kūʻai ke hoʻohana iā lākou no kahi manawa pōkole a pono e loaʻa i kēlā me kēia manawa nā palapala hōʻoia hou (ma mua o lākou Expiration). He maʻalahi ke kumu hoʻohālike: Hoʻololi pinepine ʻo AWS i nā kī pōkole no nā kumu palekana, akā hiki i nā mea kūʻai ke hūnā iā lākou no kekahi mau minuke e uku ai i ka hoʻopaʻi hana e pili ana i ka loaʻa ʻana o nā palapala hōʻoia hou.

Pono ka AWS Java SDK e lawe i ke kuleana no ka hoʻonohonoho ʻana i kēia kaʻina hana, akā no kekahi kumu ʻaʻole hiki kēia.

Ma hope o ka ʻimi ʻana i nā pilikia ma GitHub, ʻike mākou i kahi pilikia #1921. Ua kōkua ʻo ia iā mākou e hoʻoholo i ke ala e "ʻeli" hou aku ai.

Hoʻopau ka AWS SDK i nā palapala hōʻoia ke kū kekahi o kēia mau kūlana:

  • lā pau (Expiration) Hāʻule i loko EXPIRATION_THRESHOLD, hoʻopaʻa ʻia i 15 mau minuke.
  • Ua hala ka manawa hou mai ka ho'āʻo hope loa e hoʻololi i nā palapala hōʻoia ma mua o REFRESH_THRESHOLD, paʻa paʻa no 60 mau minuke.

No ka ʻike ʻana i ka lā hoʻopau maoli o nā palapala hōʻoia i loaʻa iā mākou, ua holo mākou i nā kauoha cURL i luna mai ka pahu a me ka EC2 instance. ʻO ka manawa kūpono o ka palapala hōʻoia i loaʻa mai ka pahu i ʻoi aku ka pōkole: pololei 15 mau minuke.

I kēia manawa ua maopopo nā mea a pau: no ka noi mua, ua loaʻa i kā mākou lawelawe nā palapala hōʻoia. No ka mea ʻaʻole kūpono lākou ma mua o 15 mau minuke, e hoʻoholo ka AWS SDK e hōʻano hou iā lākou ma kahi noi ma hope. A ua hana kēia me kēlā me kēia noi.

No ke aha i pōkole ai ka manawa kūpono o nā palapala?

Hoʻolālā ʻia ka AWS Instance Metadata e hana me nā manawa EC2, ʻaʻole nā ​​Kubernetes. Ma kekahi ʻaoʻao, ʻaʻole mākou i makemake e hoʻololi i ka interface noi. No kēia mea mākou i hoʻohana ai KIAM - he mea hana e hoʻohana ana i nā ʻelele ma kēlā me kēia node Kubernetes, e ʻae i nā mea hoʻohana (nā ʻenekinia e kau ana i nā noi i kahi hui) e hoʻoili i nā kuleana IAM i nā ipu i loko o nā pods me he mea lā he mau manawa EC2 lākou. Kākoʻo ʻo KIAM i nā kelepona i ka lawelawe ʻo AWS Instance Metadata a hana iā lākou mai kāna huna huna, ua loaʻa mua iā lākou mai AWS. Mai ka manaʻo noiʻi, ʻaʻohe mea e loli.

Hāʻawi ʻo KIAM i nā palapala hōʻoia no ka wā pōkole i nā pods. Maikaʻi kēia i ka noʻonoʻo ʻana he pōkole ka awelika o ke ola o kahi pod ma mua o kahi hiʻohiʻona EC2. Manawa paʻamau no nā palapala hōʻoia like me 15 minuke.

ʻO ka hopena, inā e kau ʻoe i nā koina paʻamau ʻelua ma luna o kekahi, e kū mai kahi pilikia. Pau kēlā me kēia palapala i hāʻawi ʻia i kahi noi ma hope o 15 mau minuke. Eia nō naʻe, koi ka AWS Java SDK i ka hoʻohou ʻana i kekahi palapala hōʻoia i emi iho ma mua o 15 mau minuke i koe ma mua o kona lā pau.

ʻO ka hopena, koi ʻia ka palapala manawaleʻa e hoʻohou ʻia me kēlā me kēia noi, e pili ana i nā kelepona ʻelua i ka AWS API a hoʻonui nui i ka latency. Ma AWS Java SDK ua loaʻa iā mākou noi hiʻohiʻona, e haʻi ana i kahi pilikia like.

Ua maʻalahi ka hopena. Hoʻoponopono hou mākou iā KIAM e noi i nā palapala hōʻoia me kahi manawa kūpono lōʻihi. I ka manawa i hana ʻia ai kēia, hoʻomaka nā noi e kahe me ka ʻole o ke komo ʻana o ka lawelawe AWS Metadata, a ua hāʻule ka latency i nā pae haʻahaʻa ma mua o EC2.

haʻina

Ma muli o kā mākou ʻike me ka neʻe ʻana, ʻo kekahi o nā kumu maʻamau o nā pilikia ʻaʻole ia he ʻino ma Kubernetes a i ʻole nā ​​​​mea ʻē aʻe o ka paepae. ʻAʻole ia e hoʻoponopono i nā hemahema koʻikoʻi i nā microservice a mākou e lawe nei. Piʻi pinepine nā pilikia ma muli o ka hoʻohui ʻana i nā mea like ʻole.

Hoʻohui pū mākou i nā ʻōnaehana paʻakikī ʻaʻole i launa pū kekahi me kekahi ma mua, me ka manaʻo e hana pū lākou i hoʻokahi ʻōnaehana nui. Auē, ʻoi aku ka nui o nā mea, ʻoi aku ka nui o ka lumi no nā hewa, ʻoi aku ka kiʻekiʻe o ka entropy.

I kā mākou hihia, ʻaʻole ʻo ka latency kiʻekiʻe ka hopena o nā pōpoki a i ʻole nā ​​​​hoʻoholo maikaʻi ʻole ma Kubernetes, KIAM, AWS Java SDK, a i ʻole kā mākou microservice. ʻO ia ka hopena o ka hoʻohui ʻana i ʻelua mau hoʻonohonoho paʻamau kūʻokoʻa: hoʻokahi ma KIAM, ʻo kekahi ma ka AWS Java SDK. Hoʻokaʻawale ʻia, kūpono nā ʻāpana ʻelua: ke kulekele hōʻoia hou i ka AWS Java SDK, a me ka manawa pōkole o nā palapala hōʻoia ma KAIM. Akā ke hoʻohui ʻoe iā lākou, lilo nā hopena i mea ʻike ʻole. ʻAʻole pono ka manaʻo o ʻelua mau hoʻonā kūʻokoʻa a pili pono i ka wā e hui pū ai.

PS mai ka unuhi

Hiki iā ʻoe ke aʻo hou aʻe e pili ana i ka hoʻolālā ʻana o ka pono KIAM no ka hoʻohui ʻana iā AWS IAM me Kubernetes ma kēiaʻatikala mai kona mau mea hana.

Heluhelu pū ma kā mākou blog:

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka