Wani lokaci ƙari yana da ƙasa. Lokacin rage sakamako yana haifar da haɓaka latency

Kamar yadda a cikin mafi yawan posts, akwai matsala tare da sabis ɗin da aka rarraba, bari mu kira wannan sabis ɗin Alvin. A wannan karon ban gano matsalar da kaina ba, mutanen da ke gefen abokin ciniki sun sanar da ni.

Wata rana na farka da imel ɗin da ba a so ba saboda dogon jinkiri tare da Alvin, wanda muka shirya ƙaddamarwa nan gaba. Musamman, abokin ciniki ya sami jinkiri na kashi 99 a cikin yanki na 50 ms, sama da kasafin kuɗin mu na latency. Wannan abin mamaki ne yayin da na gwada sabis ɗin sosai, musamman akan latency, wanda shine ƙararrakin gama gari.

Kafin in sa Alvin cikin gwaji, na yi gwaje-gwaje da yawa tare da tambayoyin 40k a sakan daya (QPS), duk suna nuna latency na ƙasa da 10ms. A shirye nake in bayyana cewa ban yarda da sakamakonsu ba. Amma in sake duba wasiƙar, na lura da wani sabon abu: Ban gwada ainihin yanayin da suka ambata ba, QPS ɗinsu ya yi ƙasa da nawa sosai. Na gwada a 40k QPS, amma sun kasance a 1k kawai. Na sake yin wani gwaji, wannan lokacin tare da ƙaramin QPS, don kawai faranta musu rai.

Tun da nake yin rubutun ra'ayin kanka a yanar gizo game da wannan, tabbas kun riga kun gano cewa lambobin su sun yi daidai. Na gwada abokin ciniki mai kama-da-wane akai-akai, tare da sakamako iri ɗaya: ƙaramin adadin buƙatun ba kawai yana ƙara latency ba, amma yana ƙara yawan buƙatun tare da latency fiye da 10 ms. A wasu kalmomi, idan a 40k QPS game da buƙatun 50 a cikin dakika sun wuce 50 ms, to a 1k QPS akwai buƙatun 100 sama da 50 ms kowane dakika. Paradox!

Wani lokaci ƙari yana da ƙasa. Lokacin rage sakamako yana haifar da haɓaka latency

Rage binciken

Lokacin da aka fuskanci matsalar latency a cikin tsarin rarrabawa tare da abubuwa da yawa, mataki na farko shine ƙirƙirar ɗan gajeren jerin wadanda ake zargi. Bari mu zurfafa zurfafa cikin gine-ginen Alvin:

Wani lokaci ƙari yana da ƙasa. Lokacin rage sakamako yana haifar da haɓaka latency

Kyakkyawan wurin farawa shine jerin ƙaddamarwar I/O da aka kammala (kiran cibiyar sadarwa/ duban diski, da sauransu). Mu yi kokarin gano inda jinkirin yake. Bayan bayyanannen I/O tare da abokin ciniki, Alvin yana ɗaukar ƙarin mataki: ya shiga kantin sayar da bayanai. Koyaya, wannan ajiyar yana aiki a cikin gungu ɗaya kamar Alvin, don haka latency ɗin ya kamata ya kasance ƙasa da abokin ciniki. Don haka, jerin wadanda ake zargi:

  1. Kiran hanyar sadarwa daga abokin ciniki zuwa Alvin.
  2. Kiran hanyar sadarwa daga Alvin zuwa ma'ajiyar bayanai.
  3. Bincika akan faifai a cikin ma'ajin bayanai.
  4. Kiran hanyar sadarwa daga ɗakin ajiyar bayanai zuwa Alvin.
  5. Kiran hanyar sadarwa daga Alvin zuwa abokin ciniki.

Bari mu yi ƙoƙari mu ketare wasu batutuwa.

Adana bayanai ba shi da alaƙa da shi

Abu na farko da na yi shine canza Alvin zuwa uwar garken ping-ping wanda baya aiwatar da buƙatun. Lokacin da ya karɓi buƙata, yana mayar da martani mara komai. Idan latency ya ragu, to, kwaro a cikin Alvin ko aiwatar da kayan ajiyar bayanai ba wani abu bane da ba a taɓa ji ba. A cikin gwajin farko muna samun jadawali mai zuwa:

Wani lokaci ƙari yana da ƙasa. Lokacin rage sakamako yana haifar da haɓaka latency

Kamar yadda kake gani, babu wani ci gaba yayin amfani da sabar ping-ping. Wannan yana nufin cewa ma'ajin bayanan baya ƙara latency, kuma an yanke jerin waɗanda ake zargi da rabi:

  1. Kiran hanyar sadarwa daga abokin ciniki zuwa Alvin.
  2. Kiran hanyar sadarwa daga Alvin zuwa abokin ciniki.

Mai girma! Jerin yana raguwa da sauri. Ina tsammanin na kusan gano dalilin.

gRPC

Yanzu ne lokacin da za a gabatar muku da sabon ɗan wasa: gRPC. Wannan buɗaɗɗen ɗakin karatu ne daga Google don sadarwa cikin tsari CPR... Ko da yake gRPC ingantacciyar ingantacciyar hanya kuma ana amfani da ita sosai, wannan shine karo na farko da nayi amfani da shi akan tsarin girman wannan kuma ina tsammanin aiwatarwa na zai zama mafi inganci - in faɗi kaɗan.

kasancewa gRPC a cikin tarin ya haifar da wata sabuwar tambaya: watakila aiwatarwa ne ko ni kaina gRPC haifar da matsala latency? Ƙara sabon wanda ake zargi zuwa jerin:

  1. Abokin ciniki ya kira ɗakin karatu gRPC
  2. Laburare gRPC yayi kiran hanyar sadarwa zuwa ɗakin karatu akan abokin ciniki gRPC akan uwar garken
  3. Laburare gRPC Lambobin sadarwa Alvin (babu aiki idan akwai sabar ping-pong)

Don ba ku ra'ayin yadda lambar ke kama, aiwatar da abokin ciniki na / Alvin bai bambanta da na abokin ciniki-uwar garken ba. async misalai.

Lura: Jerin da ke sama an ɗan sauƙaƙa ne saboda gRPC yana ba ku damar yin amfani da naku (samfurin?) ƙirar zaren zaren, wanda tarin kisa ke haɗuwa. gRPC da aiwatar da mai amfani. Domin kare kanka da sauƙi, za mu tsaya ga wannan samfurin.

Profiling zai gyara komai

Bayan ketare wuraren ajiyar bayanai, na yi tunanin na kusa gamawa: “Yanzu yana da sauƙi! Mu yi amfani da bayanan martaba kuma mu gano inda jinkirin ya faru.” I babban fan na ainihin bayanin martaba, saboda CPUs suna da sauri sosai kuma mafi yawan lokuta ba su da ƙugiya. Yawancin jinkiri suna faruwa lokacin da mai sarrafawa dole ne ya daina aiki don yin wani abu dabam. Madaidaicin Bayanan martaba na CPU yana yin haka: yana rikodin komai daidai mahallin maɓalli kuma ya bayyana inda jinkirin ke faruwa.

Na ɗauki bayanan martaba guda huɗu: tare da babban QPS (ƙananan latency) kuma tare da uwar garken ping-pong tare da ƙananan QPS (high latency), duka a gefen abokin ciniki da kuma a gefen uwar garke. Kuma kawai idan akwai, Na kuma ɗauki bayanan mai sarrafa samfurin samfurin. Lokacin kwatanta bayanan martaba, yawanci ina neman tarin kira mara kyau. Misali, a gefe mara kyau tare da babban latency akwai ƙarin mahallin mahallin da yawa (sau 10 ko fiye). Amma a yanayina, adadin maɓallan mahallin ya kusan iri ɗaya. Ga tsoro na, babu wani abu mai mahimmanci a wurin.

Ƙarin gyara kurakurai

Na yi matsananciyar wahala. Ban san wasu kayan aikin da zan iya amfani da su ba, kuma shirina na gaba shine ainihin maimaita gwaje-gwajen tare da bambancin daban-daban maimakon tantance matsalar a fili.

Idan fa

Tun daga farkon, na damu da takamaiman latency na 50ms. Wannan babban lokaci ne. Na yanke shawarar cewa zan yanke guntu daga cikin lambar har sai na iya gano ainihin ɓangaren da ke haifar da wannan kuskuren. Sai wani gwaji ya zo da aiki.

Kamar yadda aka saba, a baya ga alama komai a bayyane yake. Na sanya abokin ciniki akan na'ura iri ɗaya kamar Alvin - kuma na aika buƙatun zuwa localhost. Kuma karuwa a latency ya tafi!

Wani lokaci ƙari yana da ƙasa. Lokacin rage sakamako yana haifar da haɓaka latency

Wani abu ba daidai ba tare da hanyar sadarwar.

Koyon dabarun injiniyan hanyar sadarwa

Dole ne in yarda: Ilimi na game da fasahar sadarwar yana da muni, musamman la'akari da gaskiyar cewa ina aiki tare da su kowace rana. Amma hanyar sadarwar ita ce babban wanda ake zargi, kuma ina buƙatar koyon yadda ake gyara ta.

Abin farin ciki, Intanet yana son waɗanda suke so su koyi. Haɗin ping da tracert sun yi kama da kyakkyawan isassun fara gyara matsalolin sufuri na hanyar sadarwa.

Da farko, na ƙaddamar PsPing zuwa tashar TCP ta Alvin. Na yi amfani da saitunan tsoho - babu wani abu na musamman. Fiye da pings dubu, babu wanda ya wuce 10 ms, ban da na farko don dumama. Wannan ya saba wa karuwar latency na 50 ms a kashi 99: a can, ga kowane buƙatun 100, yakamata mu ga game da buƙatun ɗaya tare da latency na 50 ms.

Sai na gwada gano: Ana iya samun matsala a ɗaya daga cikin nodes tare da hanya tsakanin Alvin da abokin ciniki. Amma shi ma mai binciken ya dawo hannu wofi.

Don haka ba lambara ba, aiwatar da gRPC, ko hanyar sadarwa ce ke jawo jinkiri. Na fara damuwa cewa ba zan taba fahimtar wannan ba.

Yanzu menene OS muke kan

gRPC An yi amfani da shi sosai akan Linux, amma m akan Windows. Na yanke shawarar gwada gwaji, wanda yayi aiki: Na ƙirƙiri na'ura mai kama da Linux, na haɗa Alvin don Linux, na tura shi.

Wani lokaci ƙari yana da ƙasa. Lokacin rage sakamako yana haifar da haɓaka latency

Ga abin da ya faru: uwar garken ping-pong na Linux ba ta da jinkiri iri ɗaya kamar mai masaukin Windows iri ɗaya, kodayake tushen bayanan bai bambanta ba. Ya bayyana cewa matsalar tana cikin aiwatar da gRPC don Windows.

Nagle's algorithm

Duk wannan lokacin ina tsammanin ina rasa tuta gRPC. Yanzu na fahimci ainihin abin da yake gRPC Tutar Windows ta ɓace. Na sami ɗakin karatu na RPC na ciki wanda na kasance da tabbacin zai yi aiki da kyau don duk saitin tutoci Winsock. Daga nan sai na ƙara duk waɗannan tutoci zuwa gRPC kuma na tura Alvin akan Windows, a cikin sabar ping-pong ta faci!

Wani lokaci ƙari yana da ƙasa. Lokacin rage sakamako yana haifar da haɓaka latency

Kusan Anyi: Na fara cire tutocin da aka ƙara ɗaya bayan ɗaya har sai an dawo da koma baya don in nuna dalilin. Ya kasance mara kyau TCP_NODELAY, Nagle's algorithm canza.

Nagle's algorithm yunƙurin rage adadin fakitin da aka aika akan hanyar sadarwa ta hanyar jinkirta watsa saƙonni har sai girman fakitin ya wuce takamaiman adadin bytes. Duk da yake wannan na iya zama mai kyau ga matsakaita mai amfani, yana lalata ga sabar sabar na ainihi kamar yadda OS zai jinkirta wasu saƙonni, haifar da ƙarancin QPS. U gRPC an saita wannan tutar a cikin aiwatar da Linux don kwas ɗin TCP, amma ba a cikin Windows ba. Ni ne wannan gyara.

ƙarshe

Mafi girman latency a ƙananan QPS ya haifar da haɓaka OS. A baya, bayanin martaba bai gano jinkiri ba saboda an yi shi a yanayin kernel maimakon a ciki yanayin mai amfani. Ban sani ba idan ana iya lura da algorithm na Nagle ta hanyar ETW kama, amma zai zama mai ban sha'awa.

Dangane da gwajin localhost, mai yiwuwa bai taɓa ainihin lambar sadarwar ba kuma Nagle's algorithm bai gudana ba, don haka lamuran latency sun tafi lokacin da abokin ciniki ya isa Alvin ta hanyar localhost.

Lokaci na gaba da kuka ga karuwa a latency yayin da adadin buƙatun ya ragu a cikin sakan daya, algorithm na Nagle yakamata ya kasance cikin jerin waɗanda ake tuhuma!

source: www.habr.com

Add a comment