Gudun ajiya dace da dai sauransu? Mu tambayi fio

Gudun ajiya dace da dai sauransu? Mu tambayi fio

Takaitaccen labari game da fio da sauransu

Ayyukan tari da dai sauransu ya dogara da aikin ajiyarsa. etcd yana fitar da wasu awoyi zuwa Prometheusdon samar da bayanan aikin ajiya da ake so. Misali, ma'aunin wal_fsync_duration_seconds. Takardun don etcd ya ce: Don a yi la'akari da ajiya cikin sauri, kashi 99 na wannan ma'aunin dole ne ya zama ƙasa da 10ms. Idan kuna shirin gudanar da gungu na sauransu akan injunan Linux kuma kuna son kimantawa idan ma'ajiyar ku ta yi sauri sosai (misali SSD), zaku iya amfani da ita. fio sanannen kayan aiki ne don gwada ayyukan I/O. Gudun umarni mai zuwa, inda bayanan gwaji shine kundin adireshi a ƙarƙashin madaidaicin wurin ajiya:

fio --rw=write --ioengine=sync --fdatasync=1 --directory=test-data --size=22m --bs=2300 --name=mytest

Kuna buƙatar kawai duba sakamakon kuma duba cewa kashi 99 na tsawon lokaci fdatasync kasa da 10 ms. Idan haka ne, kuna da ma'auni mai saurin gaske. Ga misalin sakamakon:

  sync (usec): min=534, max=15766, avg=1273.08, stdev=1084.70
  sync percentiles (usec):
   | 1.00th=[ 553], 5.00th=[ 578], 10.00th=[ 594], 20.00th=[ 627],
   | 30.00th=[ 709], 40.00th=[ 750], 50.00th=[ 783], 60.00th=[ 1549],
   | 70.00th=[ 1729], 80.00th=[ 1991], 90.00th=[ 2180], 95.00th=[ 2278],
   | 99.00th=[ 2376], 99.50th=[ 9634], 99.90th=[15795], 99.95th=[15795],
   | 99.99th=[15795]

Bayanan kula

  • Mun keɓance zaɓin --size da --bs don yanayin mu na musamman. Don samun sakamako mai amfani daga fio, samar da ƙimar ku. A ina zan samo su? Karanta yadda muka koyi saita fio.
  • Yayin gwaji, duk nauyin I/O ya fito daga fio. A cikin yanayin rayuwa ta gaske, da alama za a sami wasu buƙatun rubuta da ke shigowa cikin ma'ajiyar banda waɗanda ke da alaƙa da wal_fsync_duration_seconds. Ƙarin kaya zai ƙara ƙimar wal_fsync_duration_seconds. Don haka idan kashi 99 na kashi yana kusa da 10ms, ajiyar ku yana ƙarewa da sauri.
  • Dauki sigar fio ba kasa da 3.5 ba (waɗanda suka gabata ba sa nuna ƙimar tsawon lokacin fdatasync).
  • A sama snippet ne kawai na sakamakon daga fio.

Dogon labari game da fio da sauransu

Menene WAL a etcd

Yawancin lokaci ana amfani da bayanan bayanai rubuta-gaba log; etcd kuma yana amfani da shi. Ba za mu tattauna dalla-dalla game da rubutun gaba ba (WAL) anan. Ya ishe mu mu san cewa kowane memba na gungun etcd yana kiyaye shi a cikin ma'ajiya mai tsayi. etcd yana rubuta kowane maɓalli-darajar aiki (kamar sabuntawa) zuwa WAL kafin amfani da shi a kantin sayar da. Idan ɗaya daga cikin membobin ma'ajiyar ya yi karo kuma ya sake farawa tsakanin hotunan hoto, zai iya dawo da ma'amaloli a cikin gida tun daga hoto na ƙarshe na abun ciki na WAL.

Lokacin da abokin ciniki ya ƙara maɓalli zuwa mabuɗin-darajar maɓalli ko sabunta ƙimar maɓalli da ke akwai, etcd yana rikodin aiki a cikin WAL, wanda shine fayil na yau da kullun a cikin ma'ajiya mai tsayi. etcd Dole ne a tabbata gabaɗaya cewa shigarwar WAL ta faru kafin a ci gaba da sarrafawa. A Linux, kiran tsarin daya bai isa ba don wannan. rubuta, tun da ainihin rubutawa zuwa ajiyar jiki na iya jinkirtawa. Misali, Linux na iya adana shigarwar WAL a cikin maajiyar kernel memory (kamar cache shafi) na ɗan lokaci. Kuma domin a rubuta bayanan daidai zuwa ajiya na dindindin, ana buƙatar kiran tsarin fdatasync bayan rubutawa, da dai sauransu kawai suna amfani da shi (kamar yadda kuke gani a sakamakon aikin. madauri, inda 8 shine bayanin fayil na WAL):

21:23:09.894875 lseek(8, 0, SEEK_CUR)   = 12808 <0.000012>
21:23:09.894911 write(8, ". 20210220361223255266632$10 20103026"34"rn3fo"..., 2296) = 2296 <0.000130>
21:23:09.895041 fdatasync(8)            = 0 <0.008314>

Abin takaici, rubuta zuwa ma'ajiya mai tsayi baya faruwa nan take. Idan kiran fdatasync yana jinkirin, aikin tsarin etcd zai sha wahala. Takardun don etcd ya cecewa ana ɗaukar ma'ajiyar da sauri idan, a cikin kashi 99th, kiran fdatasync ya ɗauki ƙasa da 10ms don rubutawa zuwa fayil ɗin WAL. Akwai wasu ma'auni masu amfani don ajiya, amma a cikin wannan sakon muna magana ne kawai game da wannan awo.

Ƙimar ajiya tare da fio

Idan kuna buƙatar kimantawa idan ma'ajiyar ku ta dace da sauransu, yi amfani da fio, sanannen kayan aikin gwaji na I/O. Ya kamata a tuna cewa faifai ayyuka na iya zama daban-daban: synchronous da asynchronous, da yawa azuzuwan kira tsarin, da dai sauransu. A sakamakon haka, fio ne quite wuya a yi amfani da. Yana da sigogi da yawa, kuma haɗuwa daban-daban na ƙimar su suna haifar da nauyin aikin I / O daban-daban. Don samun isassun ƙididdiga na etcd, ya kamata ka tabbatar cewa gwajin rubuta lodi daga fio yana kusa da ainihin kaya daga etcd lokacin rubuta fayilolin WAL.

Don haka, fio ya kamata, aƙalla, ya ƙirƙiri nauyin jerin jerin jerin rubuce-rubuce zuwa fayil ɗin, kowane rubutu zai ƙunshi kiran tsarin. rubutabiye da kiran tsarin fdatasync. Rubutun jeri zuwa fio yana buƙatar zaɓin --rw=write. Don fio don amfani da tsarin rubutu kira lokacin rubutu, maimakon rubuta, ya kamata ka saka --ioengine=sync parameter. A ƙarshe, don kiran fdatasync bayan kowane rubutu, kuna buƙatar ƙara ma'aunin --fdatasync=1. Sauran zaɓuɓɓuka biyu a cikin wannan misalin (--size da -bs) ƙayyadaddun rubutun. A cikin sashe na gaba, za mu nuna muku yadda ake saita su.

Me ya sa fio da yadda muka koyi kafa shi

A cikin wannan sakon, mun bayyana ainihin lamarin. Muna da tari Kubernetes v1.13 wanda muka sa ido tare da Prometheus. etcd v3.2.24 an shirya shi akan SSD. Etcd awo ya nuna fdatasync latencies yayi girma sosai, koda lokacin tari bai yi komai ba. Ma'auni sun kasance masu ban mamaki kuma ba mu san ainihin abin da suke nufi ba. Tarin ya ƙunshi na'urori masu kama-da-wane, ya zama dole a fahimci menene matsalar: a cikin SSDs na zahiri ko a cikin ƙirar ƙira. Bugu da ƙari, sau da yawa muna yin canje-canje ga tsarin hardware da software, kuma muna buƙatar hanyar da za mu kimanta sakamakon su. Za mu iya gudanar da etcd a cikin kowane tsari kuma mu kalli ma'aunin Prometheus, amma hakan ya yi yawa na wahala. Muna neman hanya mai sauƙi don kimanta takamaiman tsari. Muna so mu bincika ko mun fahimci ma'aunin Prometheus daga etcd daidai.

Amma don wannan, dole ne a magance matsalolin biyu. Na farko, menene nauyin I/O wanda etcd ke haifarwa yayin rubutawa WAL yayi kama? Wane tsarin kira ake amfani da shi? Menene girman bayanan? Na biyu, idan muka amsa waɗannan tambayoyin, ta yaya za mu sake haifar da irin wannan nauyin aiki tare da fio? Kar ka manta cewa fio kayan aiki ne mai sassauƙa da yawa tare da zaɓuɓɓuka da yawa. Mun warware matsalolin biyu tare da hanya ɗaya - ta amfani da umarni mayanar и madauri. lsof yana lissafin duk masu siffanta fayil ɗin da tsarin ke amfani da su da fayilolin da ke da alaƙa. Kuma tare da strace, za ku iya bincika tsarin da ke gudana, ko fara tsari kuma ku bincika shi. strace yana buga duk kiran tsarin daga tsarin da ake bincika (da tsarin yaran sa). Ƙarshen yana da mahimmanci, tun da dai etcd yana ɗaukar irin wannan hanya.

Mun fara amfani da strace don bincika uwar garken etcd don Kubernetes lokacin da babu kaya akan tari. Mun ga cewa kusan duk bayanan WAL sun kai girman guda: 2200-2400 bytes. Saboda haka, a cikin umarni a farkon post, mun ƙayyade siga -bs=2300 (bs yana nufin girman a cikin bytes don kowane shigarwar fio). Lura cewa girman shigarwar etcd ya dogara da sigar etcd, rarrabawa, ƙimar siga, da sauransu, kuma yana rinjayar tsawon fdatasync. Idan kuna da irin wannan yanayin, bincika ayyukan ku na etcd tare da layi don gano ainihin lambobi.

Sa'an nan, don samun kyakkyawan ra'ayi game da abin da tsarin fayil ɗin etcd ke yi, mun fara shi da strace da zaɓuɓɓukan -ffttT. Don haka mun yi ƙoƙari mu bincika tsarin tafiyar da yara da yin rikodin fitar da kowannensu a cikin wani fayil daban, da kuma samun cikakkun rahotanni game da farawa da tsawon kowane tsarin kiran. Mun yi amfani da lsof don tabbatar da binciken mu game da fitarwar layin kuma mu ga wanene mai bayanin fayil ɗin ake amfani da shi. Don haka tare da taimakon strace, an sami sakamakon da aka nuna a sama. Kididdigar lokacin aiki tare ta tabbatar da cewa wal_fsync_duration_seconds daga etcd yayi daidai da kiran fdatasync tare da masu siffanta fayil ɗin WAL.

Mun shiga cikin takaddun don fio kuma mun zaɓi zaɓuɓɓuka don rubutun mu domin fio ya haifar da kaya mai kama da da dai sauransu. Mun kuma duba kiran tsarin da tsawon lokacin su ta hanyar tafiyar da fio daga strace, kama da dai sauransu.

Mun zaɓi ƙimar ma'aunin --size a hankali don wakiltar ɗaukacin nauyin I/O daga fio. A cikin yanayinmu, wannan shine jimillar adadin bytes da aka rubuta zuwa ma'ajiyar. Ya zama daidai gwargwado kai tsaye da adadin kiran tsarin rubutu (da fdatasync). Don takamaiman ƙimar bs, adadin fdatasync kiran = girman/bs. Tunda muna da sha'awar kashi, dole ne mu sami isassun samfurori don tabbatarwa, kuma mun ƙididdige cewa 10^4 zai ishe mu (wato mebibytes 22 kenan). Idan --size ya fi ƙanƙanta, ƙila za ta iya faruwa (misali, kiran fdatasync da yawa suna ɗaukar tsayi fiye da yadda aka saba kuma suna shafar kashi 99).

Gwada shi da kanku

Mun nuna muku yadda ake amfani da fio kuma mu ga idan ma'ajiyar ta yi sauri don etcd ta yi kyau. Yanzu zaku iya gwadawa da kanku ta amfani da, misali, injunan kama-da-wane tare da ajiyar SSD a ciki IBM Cloud.

source: www.habr.com

Add a comment