Pehea e nānā ai i nā disks me fio no ka hana kūpono no etcd

Nānā. unuhi.: ʻO kēia ʻatikala ka hopena o kahi noiʻi liʻiliʻi i alakaʻi ʻia e IBM Cloud engineers i ka ʻimi ʻana i kahi hopena i kahi pilikia maoli e pili ana i ka hana o ka waihona etcd. Ua pili kekahi hana like ia makou, aka, he mea hoihoi paha ke ala o na manao a me na hana a na mea kakau ma ke ano akea.

Pehea e nānā ai i nā disks me fio no ka hana kūpono no etcd

He pōkole pōkole o ka ʻatikala holoʻokoʻa: fio a etcd

ʻO ka hana o kahi hui etcd e hilinaʻi nui ʻia i ka wikiwiki o ka waihona waihona. No ka nānā ʻana i ka hana, e hoʻokuʻu aku i nā ʻano Prometheus like ʻole. ʻO kekahi o lākou wal_fsync_duration_seconds. Ma ka palapala etcd 'ōlelo mai ia, hiki ke noʻonoʻo ʻia kēlā mālama ʻana i ka wikiwiki inā ʻaʻole ʻoi aku ka 99th percentile o kēia metric ma mua o 10 ms...

Inā ʻoe e noʻonoʻo nei i ka hoʻonohonoho ʻana i kahi hui etcd ma nā mīkini Linux a makemake ʻoe e hoʻāʻo i ka wikiwiki o ka mālama ʻana (e like me SSDs), manaʻo mākou e hoʻohana i kahi hōʻike I/O kaulana i kapa ʻia. fio. E holo wale i kēia kauoha (directory test-data Pono e waiho ʻia ma ka ʻāpana i kau ʻia o ka drive e hoʻāʻo ʻia):

fio --rw=write --ioengine=sync --fdatasync=1 --directory=test-data --size=22m --bs=2300 --name=mytest

ʻO nā mea a pau i koe, ʻo ka nānā ʻana i ka hopena a nānā inā kūpono ia i ka 99th percentile fdatasync ma ka hora 10 ms. Inā pēlā, ua lawa ka wikiwiki o kāu kaʻa. Eia kekahi laʻana hoʻopuka:

fsync/fdatasync/sync_file_range:
  sync (usec): min=534, max=15766, avg=1273.08, stdev=1084.70
  sync percentiles (usec):
   | 1.00th=[ 553], 5.00th=[ 578], 10.00th=[ 594], 20.00th=[ 627],
   | 30.00th=[ 709], 40.00th=[ 750], 50.00th=[ 783], 60.00th=[ 1549],
   | 70.00th=[ 1729], 80.00th=[ 1991], 90.00th=[ 2180], 95.00th=[ 2278],
   | 99.00th=[ 2376], 99.50th=[ 9634], 99.90th=[15795], 99.95th=[15795],
   | 99.99th=[15795]

He mau memo:

  1. Ma ka laʻana i luna nei ua hoʻoponopono mākou i nā ʻāpana --size и --bs no kekahi hihia kiko'ī. No ka loaʻa ʻana o nā hopena koʻikoʻi mai fio, e kuhikuhi i nā waiwai kūpono no kāu hihia hoʻohana. Pehea e koho ai iā lākou e kūkākūkā ʻia ma lalo nei.
  2. I ka wā hoʻāʻo wale nō fio hoʻouka i ka subsystem disk. Ma ke ola maoli, aia paha nā kaʻina hana ʻē aʻe (koe nā mea pili me wal_fsync_duration_seconds). Hiki ke alakaʻi i ka hoʻonui ʻia o ia ukana ʻē aʻe wal_fsync_duration_seconds. I nā huaʻōlelo ʻē aʻe, inā loaʻa ka 99th percentile mai ka hoʻāʻo ʻana me fio, emi iki wale ma lalo o 10 ms, aia ka manawa kūpono ʻaʻole lawa ka hana mālama.
  3. No ka ho'āʻo e pono ʻoe i ka mana fio ʻaʻole ma lalo o 3.5, no ka mea, ʻaʻole hōʻuluʻulu nā mana kahiko i nā hopena fdatasync ma ke ʻano o nā pākēneka.
  4. ʻO ka hopena ma luna nei he wahi liʻiliʻi wale nō ia o ka hopena holoʻokoʻa fio.

Nā kikoʻī e pili ana i ka fio a me etcd

He mau hua'ōlelo e pili ana i nā WAL etcd

ʻO ka maʻamau, hoʻohana nā waihona hoʻopaʻa paʻa (kākau-ma mua logging, WAL). etcd pili keia. ʻO kahi kūkākūkā o WAL ma waho o ke kiko o kēia ʻatikala, akā no kā mākou kumu eia ka mea e pono ai ʻoe e ʻike: ʻO kēlā me kēia lālā hui etcd hale kūʻai WAL ma kahi mālama mau. etcd kākau i kekahi mau hana hale kūʻai waiwai nui (e like me nā mea hou) iā WAL ma mua o ka hoʻokō ʻana iā lākou. Inā hāʻule ka node a hoʻomaka hou ma waena o nā kiʻi paʻi kiʻi, etcd hiki ke hoʻihoʻi i nā hana i hana ʻia mai ka paʻi kiʻi mua e pili ana i nā mea o ka WAL.

No laila, i kēlā me kēia manawa e hoʻohui ka mea kūʻai aku i kahi kī i ka hale kūʻai KV a i ʻole e hoʻonui i ka waiwai o kahi kī i loaʻa, etcd hoʻohui i kahi wehewehe o ka hana iā WAL, ʻo ia ka faila maʻamau i ka mālama mau. etcd Pono e 100% maopopo ua mālama maoli ʻia ka WAL komo ma mua o ka hoʻomau ʻana. No ka hoʻokō ʻana i kēia ma Linux, ʻaʻole lawa ka hoʻohana ʻana i kahi kelepona ʻōnaehana write, no ka mea hiki ke hoʻopaneʻe ka hana kākau iā ia iho i ke ʻano kino. No ka laʻana, hiki i Linux ke mālama i kahi komo WAL i loko o kahi huna huna ma ka hoʻomanaʻo (e like me ka cache ʻaoʻao) no kekahi manawa. No ka hōʻoia ʻana ua kākau ʻia ka ʻikepili i ka media, pono e kāhea ʻia kahi kelepona ma hope o ke kākau ʻana fdatasync - ʻo ia ka mea e hana ai etcd (e like me ka mea i ʻike ʻia ma ka hopena aʻe strace; Eia 8 - WAL waihona waihona):

21:23:09.894875 lseek(8, 0, SEEK_CUR)   = 12808 <0.000012>
21:23:09.894911 write(8, ".20210220361223255266632$1020103026"34"rn3fo"..., 2296) = 2296 <0.000130>
21:23:09.895041 fdatasync(8)            = 0 <0.008314>

ʻO ka mea pōʻino, lōʻihi ka manawa e kākau ai i kahi waihona hoʻomau. ʻO ka lōʻihi o ka hoʻopau ʻana i ke kelepona fdatasync hiki ke hoʻopili i ka hana etcd. Ma ka palapala no ka waihona i kuhikuhi ʻiano ka lawa pono o ka hana e pono ai ka 99th percentile o ka lōʻihi o nā kelepona a pau fdatasync i ke kākau ʻana i kahi faila, ʻaʻole i emi ka WAL ma mua o 10 ms. Aia kekahi mau metric ʻē aʻe e pili ana i ka mālama ʻana, akā e nānā kēia ʻatikala i kēia.

Ka loiloi ʻana i ka mālama ʻana me ka hoʻohana ʻana i ka fio

Hiki iā ʻoe ke loiloi inā kūpono kahi waihona no ka hoʻohana ʻana me etcd me ka hoʻohana ʻana i ka pono fio - he mea hōʻike I/O kaulana. E hoʻomanaʻo i hiki ke hana ʻia ka disk I/O ma nā ʻano like ʻole: sync/async, nā ʻano papa like ʻole o nā kelepona ʻōnaehana, etc. ʻO kēlā ʻaoʻao o ke kālā fio paʻakikī loa e hoʻohana. He nui nā ʻāpana o ka pono, a ʻo nā hui like ʻole o kā lākou waiwai e alakaʻi i nā hopena like ʻole. No ka loaʻa ʻana o kahi kuhi kūpono no etcd, pono ʻoe e hōʻoia i ka like o ka ukana kākau i hana ʻia e fio e like me ka hiki me ka etcd i ka ukana kākau i nā faila WAL:

  • ʻO kēia keʻano o ka mea i hanaʻia fio ʻO ka haʻahaʻa haʻahaʻa ka liʻiliʻi he pūʻulu o nā kākau kikoʻī i kahi faila, kahi i loaʻa i kēlā me kēia hana kākau kahi kelepona ʻōnaehana. writeukali e fdatasync.
  • I mea e hiki ai ke hoʻopaʻa moʻoʻōlelo, pono ʻoe e kuhikuhi i ka hae --rw=write.
  • ia fio kākau me ka hoʻohana ʻana i nā kelepona write (a ʻaʻole kelepona ʻōnaehana ʻē aʻe - no ka laʻana, pwrite), e hoʻohana i ka hae --ioengine=sync.
  • ʻO ka hope, ka hae --fdatasync=1 hōʻoiaʻiʻo ma hope o kēlā me kēia write pono e fdatasync.
  • ʻElua mau ʻāpana ʻē aʻe i kā mākou laʻana: --size и --bs - hiki ke ʻokoʻa ma muli o ka hihia hoʻohana kikoʻī. E wehewehe ana ka ʻāpana aʻe pehea e hoʻonohonoho ai iā lākou.

No ke aha mākou i koho ai i ka fio a pehea mākou i aʻo ai pehea e hoʻonohonoho ai

Loaʻa kēia memo mai kahi hihia maoli a mākou i hālāwai ai. Loaʻa iā mākou kahi hui ma Kubernetes v1.13 me ka nānā ʻana ma Prometheus. Ua hoʻohana ʻia nā mea hoʻokele paʻa ma ke ʻano he waihona no etcd v3.2.24. Ua hōʻike nā metric etcd i nā latencies kiʻekiʻe loa fdatasync, ʻoiai i ka wā i hana ʻole ai ka hui. Ua ʻike mākou he mea kānalua loa kēia mau ana a ʻaʻole maopopo mākou i ka mea i hōʻike ʻia. Eia kekahi, ʻo ka hui pū ʻana he mau mīkini virtual, no laila ʻaʻole hiki ke haʻi i ka lohi ʻana ma muli o ka virtualization a i ʻole nā ​​​​SSD ka hewa.

Eia hou, ke nānā nei mākou i nā loli like ʻole i ka hoʻonohonoho ʻana i nā lako a me nā polokalamu, no laila pono mākou i kahi ala e loiloi ai iā lākou. ʻOiaʻiʻo, hiki ke holo etcd i kēlā me kēia hoʻonohonoho a nānā i nā metric Prometheus pili, akā pono kēia i ka hoʻoikaika nui. Pono mākou i kahi ala maʻalahi e loiloi i kahi hoʻonohonoho kikoʻī. Makemake mākou e hoʻāʻo i ko mākou ʻike i nā metric Prometheus e hele mai ana mai etcd.

No ka hana ʻana i kēia, pono e hoʻoponopono ʻia nā pilikia ʻelua:

  • ʻO ka mea mua, he aha ke ʻano o ka ukana I/O a etcd i ke kākau ʻana i nā faila WAL? He aha nā kelepona pūnaewele i hoʻohana ʻia? He aha ka nui o ka poloka kākau?
  • ʻO ka lua, e ʻōlelo mākou he pane mākou i nā nīnau ma luna. Pehea e hana hou ai i ka ukana pili me fio? Ma hope o nā mea āpau fio - ka mea hoʻohana maʻalahi me ka nui o nā ʻāpana (He maʻalahi kēia e hōʻoia, no ka laʻana, maanei - kokoke. unuhi.).

Ua hoʻoponopono mākou i nā pilikia ʻelua me ka hoʻohana ʻana i ke ʻano hoʻonohonoho kauoha like lsof и strace:

  • Me ke kōkuaʻana o lsof Hiki iā ʻoe ke nānā i nā mea wehewehe faila a pau i hoʻohana ʻia e kahi kaʻina hana, a me nā faila a lākou e kuhikuhi ai.
  • Me ke kōkuaʻana o strace hiki iā ʻoe ke kālailai i kahi kaʻina hana a i ʻole holo i kahi kaʻina a nānā. Hōʻike ke kauoha i nā kelepona ʻōnaehana āpau i hana ʻia e kēia kaʻina hana a, ke koho, kāna mau mamo. He mea koʻikoʻi ka hope no nā kaʻina hana forking, a ʻo etcd kekahi o ia kaʻina hana.

ʻO ka mea mua a mākou i hoʻohana ai strace e aʻo i ka server etcd i ka pūʻulu Kubernetes i ka wā e hana ʻole ai.

No laila, ua ʻike ʻia ua hui pū ʻia nā poloka kākau ma WAL, ʻo ka hapa nui o lākou aia ma ka laulā o 2200-2400 bytes. ʻO kēia ke kumu e hoʻohana ai ke kauoha ma ka hoʻomaka o kēia ʻatikala i ka hae --bs=2300 (bs - ka nui ma nā bytes o kēlā me kēia poloka hoʻopaʻa leo i loko fio).

E ʻoluʻolu, ʻokoʻa paha ka nui o nā poloka kākau etcd ma muli o ke ʻano, ka hoʻolaha ʻana, nā waiwai parameter, etc. - pili kēia i ka lōʻihi fdatasync. Inā loaʻa iā ʻoe kahi hihia hoʻohana like, e loiloi me strace kāu mau hana etcd e kiʻi i nā waiwai hou loa.

A laila, no ka loaʻa ʻana o ka ʻike maopopo a piha i ka hana ʻana o etcd me ka ʻōnaehana faila, holo mākou mai lalo strace me na hae -ffttT. ʻO kēia ka mea i hiki ai ke hopu i nā kaʻina hana keiki a kākau i ka hopena o kēlā me kēia i kahi faila ʻokoʻa. Eia kekahi, ua loaʻa ka ʻike kikoʻī e pili ana i ka manawa hoʻomaka a me ka lōʻihi o kēlā me kēia kelepona ʻōnaehana.

Ua hoʻohana pū mākou i ke kauoha lsofe hōʻoia i kou hoʻomaopopo ʻana i ka hopena strace ma ke ʻano o ka faila wehewehe i hoʻohana ʻia no ke kumu. ʻO ka hopena strace, e like me ka mea maluna. Ua hōʻoia ʻia nā manipulations helu me nā manawa synchronization i ka metric wal_fsync_duration_seconds mai nā kelepona pili etcd fdatasync me nā mea wehewehe waihona WAL.

E hana i ka hoʻohana ʻana fio hana like me ka ukana mai etcd, ua aʻo ʻia nā palapala pono a ua koho ʻia nā ʻāpana kūpono no kā mākou hana. Ua hōʻoia mākou ua pili nā kelepona ʻōnaehana kūpono a hōʻoia i ko lākou lōʻihi ma ka holo ʻana fio mai strace (e like me ka mea i hana ʻia ma ka hihia o etcd).

Ua hāʻawiʻia ka nānā nui i ka hoʻoholoʻana i ka waiwai o ka palena --size. Hōʻike ia i ka nui o ka ukana I/O i hana ʻia e ka pono fio. I kā mākou hihia, ʻo ia ka huina o nā bytes i kākau ʻia i ka media. Pili pololei ia i ka helu o nā kelepona write (a fdatasync). No kekahi bs helu o nā kelepona fdatasync kūlike size / bs.

Ma muli o ko mākou hoihoi i ka percentile, ua ʻimi mākou e hōʻoia i ka nui o nā laʻana i lawa no ka nui o ka helu. A hoʻoholo lākou i kēlā 10^4 (e like me ka nui o 22 MB) e lawa. ʻOi aku ka liʻiliʻi o nā waiwai hoʻohālikelike --size hoʻopuka i ka walaʻau nui (e laʻa, kelepona fdatasync, ʻoi aku ka lōʻihi ma mua o ka mea maʻamau a pili i ka 99th percentile).

Aia iā ʻoe

Hōʻike ka ʻatikala pehea e hoʻohana ai fio hiki iā ʻoe ke loiloi inā wikiwiki ka media i manaʻo ʻia no ka hoʻohana ʻana me etcd. I kēia manawa iā ʻoe! Hiki iā ʻoe ke ʻimi i nā mīkini virtual me kahi waihona SSD-based i ka lawelawe IBM Cloud.

PS mai ka unuhi

Me nā hiʻohiʻona hoʻohana mākaukau fio no ka hoʻoponopono ʻana i nā pilikia ʻē aʻe hiki ke loaʻa ma palapala a i ʻole pololei i hale waihona papahana (ʻoi aku ka nui o lākou ma laila ma mua o ka mea i ʻōlelo ʻia ma ka palapala).

PPS mai ka mea unuhi

E heluhelu pū ma kā mākou blog:

Source: www.habr.com

Pākuʻi i ka manaʻo hoʻopuka