Durable Data Storage ndi Linux File APIs

Pamene ndikufufuza kukhazikika kwa kusungidwa kwa deta mu machitidwe a mtambo, ndinaganiza zodziyesa ndekha kuti nditsimikizire kuti ndikumvetsa zinthu zofunika. Ine idayamba ndikuwerenga mafotokozedwe a NVMe kuti timvetse zomwe zimatsimikizira zosungirako zokhazikika (ndiko kuti, zimatsimikizira kuti deta idzakhalapo pakalephera dongosolo) amapatsidwa kwa ife ndi ma disks a NMVe. Ndinapanga mfundo zazikuluzikulu zotsatirazi: deta iyenera kuganiziridwa kuti yawonongeka kuyambira pomwe lamulo lolemba deta likuperekedwa mpaka pamene lilembedwera kumalo osungirako. Komabe, mapulogalamu ambiri amasangalala kugwiritsa ntchito mafoni amtundu kuti ajambule deta.

Mu positi iyi, ndikuwunika njira zosungira zomwe zimaperekedwa ndi ma API a fayilo ya Linux. Zikuwoneka kuti zonse ziyenera kukhala zophweka apa: pulogalamuyo imayitana lamulo write(), ndipo lamuloli likamaliza, deta idzasungidwa bwino ku diski. Koma write() amangokopera deta yogwiritsira ntchito ku kernel cache yomwe ili mu RAM. Pofuna kukakamiza dongosolo kuti lilembe deta ku disk, muyenera kugwiritsa ntchito njira zina zowonjezera.

Durable Data Storage ndi Linux File APIs

Ponseponse, nkhaniyi ndi mndandanda wa zolemba zokhudzana ndi zomwe ndaphunzira pamutu wondisangalatsa. Ngati tilankhula mwachidule za chinthu chofunikira kwambiri, zimakhala kuti kukonza zosungirako zokhazikika muyenera kugwiritsa ntchito lamulo. fdatasync() kapena tsegulani mafayilo ndi mbendera O_DSYNC. Ngati mukufuna kudziwa zambiri za zomwe zimachitika ku data kuchokera pa code kupita ku disk, yang'anani izi nkhani.

Zomwe mungagwiritse ntchito kulemba () ntchito

Kuitana kwadongosolo write() zofotokozedwa mu muyezo IEEE POSIX monga kuyesa kulemba deta ku fayilo yofotokozera. Mukamaliza bwino write() Ntchito zowerengera deta ziyenera kubweza ndendende ma byte omwe adalembedwa kale, kuchita izi ngakhale datayo ikupezeka kuchokera kunjira zina kapena ulusi (tawonani gawo lofunikira la muyezo wa POSIX). ndi, mu gawo la momwe ulusi umagwirizanirana ndi magwiridwe antchito wamba, pali cholemba chomwe chimati ngati ulusi uwiri uliwonse uitana izi, ndiye kuti kuyimba kulikonse kuyenera kuwona zotsatira zonse zomwe zasankhidwa za foni ina, kapena ayi. zotsatira. Izi zikutanthauza kuti mafayilo onse a I/O ayenera kukhala ndi loko pazomwe akugwiritsa ntchito.

Kodi izi zikutanthauza kuti ntchito write() ndi atomiki? Kuchokera pamalingaliro aukadaulo, inde. Ntchito zowerengera deta ziyenera kubweza zonse kapena chilichonse pazomwe zidalembedwa write(). Koma opareshoni write(), malinga ndi muyezo, sikuyenera kutsiriza mwa kulemba zonse zimene anafunsidwa kuti alembe. Amaloledwa kulemba gawo lokha la deta. Mwachitsanzo, titha kukhala ndi ulusi uliwonse womwe ukuphatikiza ma 1024 byte ku fayilo yomwe ikufotokozedwa ndi fayilo yomweyi. Kuchokera pamawonedwe a muyezo, chotsatira chovomerezeka chidzakhala pamene ntchito iliyonse yolemba ikhoza kuwonjezera pa fayilo imodzi yokha. Zochita izi zidzakhalabe atomiki, koma zitatha, zomwe adalemba pafayilo zidzasakanizidwa. pano zokambirana zosangalatsa kwambiri pamutuwu pa Stack Overflow.

fsync () ndi fdatasync () ntchito

Njira yosavuta yosinthira deta ku diski ndikuyimbira ntchitoyi fsync (). Izi zimafunsa opareshoni kuti asamutsire midadada yonse yosinthidwa kuchokera ku cache kupita ku disk. Izi zikuphatikiza metadata yonse yamafayilo (nthawi yofikira, nthawi yosintha mafayilo, ndi zina zotero). Ndikukhulupirira kuti metadata iyi sifunikira kawirikawiri, kotero ngati mukudziwa kuti sizofunikira kwa inu, mutha kugwiritsa ntchito ntchitoyi. fdatasync(). The Thandizeni pa fdatasync() akuti pakugwira ntchitoyi, metadata yochuluka yotere imasungidwa ku diski yomwe "ndiyofunikira kuti izi zitheke bwino pakuwerenga deta." Ndipo izi ndi zomwe mapulogalamu ambiri amasamala nazo.

Vuto limodzi lomwe lingabwere apa ndikuti njirazi sizikutsimikizira kuti fayiloyo ipezeka ikalephera. Makamaka, popanga fayilo yatsopano, muyenera kuyimba fsync() kwa chikwatu chomwe chili nacho. Apo ayi, pambuyo polephera, zikhoza kukhala kuti fayiloyi kulibe. Chifukwa chake ndikuti mu UNIX, chifukwa chogwiritsa ntchito maulalo olimba, fayilo ikhoza kukhalapo m'madongosolo angapo. Choncho, poyitana fsync() palibe njira yoti fayilo idziwe kuti ndi data yanji yomwe iyeneranso kusinthidwa ku disk (apa Mutha kuwerenga zambiri za izi). Zikuwoneka kuti fayilo ya ext4 imatha basi kutsatira fsync() kupita kumakanema omwe ali ndi mafayilo ofananira, koma izi sizingakhale choncho ndi mafayilo ena.

Makinawa amatha kukhazikitsidwa mosiyanasiyana pamafayilo osiyanasiyana. Ndinagwiritsa ntchito blktrace kuti muphunzire za ntchito za disk zomwe zimagwiritsidwa ntchito mu fayilo ya ext4 ndi XFS. Onsewa amatulutsa malamulo olembera nthawi zonse ku diski ya zomwe zili mufayilo ndi nyuzipepala yamafayilo, tsitsani cache, ndikutuluka pochita FUA (Force Unit Access, kulemba zambiri ku disk, kudutsa posungira) lembani ku magazini. Mwina amachita izi pofuna kutsimikizira kuti ntchitoyo yachitika. Pama drive omwe sagwirizana ndi FUA, izi zimayambitsa ma cache awiri. Zoyesera zanga zinasonyeza zimenezo fdatasync() mofulumira pang'ono fsync(). Zothandiza blktrace zikusonyeza kuti fdatasync() Nthawi zambiri amalemba zochepa pa disk (mu ext4 fsync() amalemba 20 KB, ndi fdatasync() 16KiB). Komanso, ndidapeza kuti XFS ndiyothamanga pang'ono kuposa ext4. Ndipo apa ndi chithandizo blktrace anakwanitsa kupeza zimenezo fdatasync() imatsitsa zambiri ku disk (4KiB mu XFS).

Zinthu zosamveka zomwe zimachitika mukamagwiritsa ntchito fsync()

Ndikhoza kuganizira zinthu zitatu zosamveka bwino fsync()zomwe ndidakumana nazo muzochita.

Mlandu woyamba woterewu unachitika mu 2008. Kenako mawonekedwe a Firefox 3 adayima ngati mafayilo ambiri adalembedwa pa disk. Vuto linali loti kukhazikitsidwa kwa mawonekedwewo kumagwiritsa ntchito nkhokwe ya SQLite kusunga zambiri za dziko lake. Pambuyo pa kusintha kulikonse komwe kunachitika mu mawonekedwe, ntchitoyi idatchedwa fsync(), zomwe zinapereka zitsimikizo zabwino zosungirako deta yokhazikika. Mu fayilo ya ext3 yomwe imagwiritsidwa ntchito, ntchitoyo fsync() adataya masamba onse "odetsedwa" mudongosolo ku disk, osati okhawo omwe anali okhudzana ndi fayilo yofananira. Izi zikutanthauza kuti kudina batani mu Firefox kungayambitse ma megabytes a data kuti alembedwe ku magnetic disk, zomwe zingatenge masekondi ambiri. Njira yothetsera vutoli, monga momwe ndikumvera izo zinthu zinali kusamutsa ntchito ndi nkhokwe ku ntchito zosasinthika zakumbuyo. Izi zikutanthauza kuti Firefox idakhazikitsa kale zofunikira zosungirako kuposa momwe zimafunikira, ndipo mawonekedwe a fayilo ya ext3 adangowonjezera vutoli.

Vuto lachiwiri lidachitika mu 2009. Kenako, pakawonongeka dongosolo, ogwiritsa ntchito fayilo yatsopano ya ext4 adakumana ndi mfundo yakuti mafayilo ambiri omwe adangopangidwa kumene anali ndi ziro, koma izi sizinachitike ndi fayilo yakale ya ext3. M'ndime yapitayi, ndinanena za momwe ext3 imathamangitsira deta yambiri ku diski, zomwe zimachepetsa zinthu kwambiri. fsync(). Kuti izi zitheke, mu ext4 masamba onyansa okhawo omwe ali okhudzana ndi fayilo inayake amathamangitsidwa ku disk. Ndipo deta yochokera kumafayilo ena imakhalabe mukumbukiro kwa nthawi yayitali kuposa ndi ext3. Izi zidachitidwa kuti zithandizire bwino (mwachisawawa, deta imakhalabe m'chigawochi kwa masekondi 30, mutha kukonza izi pogwiritsa ntchito dirty_expire_centisecs; apa Mutha kupeza zowonjezera pa izi). Izi zikutanthauza kuti kuchuluka kwa deta kutha kutayika mosalephera pambuyo polephera. Njira yothetsera vutoli ndikugwiritsa ntchito fsync() m'mapulogalamu omwe amafunikira kuonetsetsa kusungidwa kokhazikika kwa data ndikuwateteza momwe angathere ku zotsatira za kulephera. Ntchito fsync() imagwira ntchito bwino kwambiri mukamagwiritsa ntchito ext4 kuposa mukamagwiritsa ntchito ext3. Choyipa cha njirayi ndikuti kugwiritsa ntchito kwake, monga kale, kumachepetsa kuchitidwa kwazinthu zina, monga kukhazikitsa mapulogalamu. Onani zambiri za izi apa ΠΈ apa.

Vuto lachitatu lokhudza fsync(), idayamba mu 2018. Kenako, mkati mwa dongosolo la polojekiti ya PostgreSQL, zidapezeka kuti ngati ntchitoyi fsync() ikakumana ndi vuto, imalemba masamba "odetsedwa" ngati "oyera". Zotsatira zake, mafoni otsatirawa fsync() Sachita chilichonse ndi masamba otere. Chifukwa cha izi, masamba osinthidwa amasungidwa pamtima ndipo samalembedwa ku diski. Ichi ndi tsoka lenileni, popeza ntchitoyo idzaganiza kuti deta ina yalembedwa ku diski, koma kwenikweni sizidzakhala. Zolephera zotere fsync() ndizosowa, kugwiritsa ntchito muzochitika zotere sikungachite chilichonse kuthana ndi vutoli. Masiku ano, izi zikachitika, PostgreSQL ndi mapulogalamu ena amawonongeka. ndi, m'nkhani yakuti "Kodi Mapulogalamu Angayambe Kulephera kwa Fsync?", Vutoli likufufuzidwa mwatsatanetsatane. Panopa njira yabwino yothetsera vutoli ndikugwiritsa ntchito Direct I/O ndi mbendera O_SYNC kapena ndi mbendera O_DSYNC. Ndi njira iyi, dongosololi lidzafotokoza zolakwika zomwe zingachitike panthawi yolemba, koma njira iyi imafuna kuti pulogalamuyo iziyang'anira ma buffer okha. Werengani zambiri za izi apa ΠΈ apa.

Kutsegula mafayilo pogwiritsa ntchito mbendera za O_SYNC ndi O_DSYNC

Tiyeni tibwererenso kukambitsirana za njira za Linux zomwe zimapereka kusunga kokhazikika kwa data. Ndiko kuti, tikukamba za kugwiritsa ntchito mbendera O_SYNC kapena mbendera O_DSYNC potsegula mafayilo pogwiritsa ntchito foni yamakono tsegula (). Ndi njirayi, ntchito iliyonse yolemba deta imachitidwa ngati pambuyo pa lamulo lililonse write() dongosolo amapatsidwa malamulo molingana fsync() ΠΈ fdatasync(). The Zotsatira za POSIX izi zimatchedwa "Synchronized I/O File Integrity Completion" ndi "Data Integrity Completion". Ubwino waukulu wa njirayi ndikuti kuti muwonetsetse kukhulupirika kwa data, mumangofunika kuyimba foni imodzi, osati ziwiri (mwachitsanzo - - write() ΠΈ fdatasync()). Choyipa chachikulu cha njirayi ndikuti zonse zolembera pogwiritsa ntchito fayilo yofananira zidzalumikizidwa, zomwe zitha kuchepetsa kuthekera kopanga kachidindo kantchito.

Kugwiritsa ntchito Direct I/O yokhala ndi mbendera ya O_DIRECT

Kuitana kwadongosolo open() imathandizira mbendera O_DIRECT, lomwe lapangidwa kuti lizilambalala kachidindo kameneka kuti ligwire ntchito za I / O polumikizana mwachindunji ndi disk. Izi, nthawi zambiri, zimatanthawuza kuti kulemba malamulo operekedwa ndi pulogalamuyi adzamasuliridwa mwachindunji ku malamulo omwe akugwira ntchito ndi disk. Koma, kawirikawiri, makinawa salowa m'malo mwa ntchito fsync() kapena fdatasync(). Chowonadi ndi chakuti disk yokhayo imatha chepetsa kapena cache malamulo ofanana kulemba deta. Ndipo, kuti zinthu ziipireipire, nthawi zina zapadera ntchito za I/O zimachitidwa pogwiritsa ntchito mbendera O_DIRECT, kuwulutsa mu ntchito zachikhalidwe zotetezedwa. Njira yosavuta yothetsera vutoli ndi kugwiritsa ntchito mbendera kutsegula mafayilo O_DSYNC, zomwe zikutanthauza kuti ntchito iliyonse yolemba idzatsatiridwa ndi kuyitana fdatasync().

Zinapezeka kuti fayilo ya XFS idangowonjezera "njira yofulumira" ya O_DIRECT|O_DSYNC-kujambula kwa data. Ngati chipika chalembedwanso pogwiritsa ntchito O_DIRECT|O_DSYNC, ndiye XFS, m'malo motsitsa cache, ipereka lamulo lolemba la FUA ngati chipangizocho chikuchirikiza. Ndinatsimikizira izi pogwiritsa ntchito chida blktrace pa Linux 5.4/Ubuntu 20.04 system. Njirayi iyenera kukhala yothandiza kwambiri, chifukwa ikagwiritsidwa ntchito, chiwerengero chochepa cha deta chimalembedwa ku disk ndipo ntchito imodzi imagwiritsidwa ntchito, osati ziwiri (kulemba ndi kutulutsa cache). Ndapeza ulalo ku chigamba 2018 kernel, yomwe imagwiritsa ntchito njirayi. Pali zokambirana zina pamenepo zokhuza kugwiritsa ntchito kukhathamiritsa kumeneku pamafayilo ena, koma monga ndikudziwira, XFS ndiyo yokhayo yamafayilo yomwe imathandizira izi mpaka pano.

sync_file_range() ntchito

Linux ili ndi kuyimba kwadongosolo kulunzanitsa_fayilo_range(), zomwe zimakupatsani mwayi wotsitsa gawo lokha la fayilo ku diski, osati fayilo yonse. Kuyimba uku kumayambitsa kusanja kwa data kosasinthika ndipo sikudikirira kuti kumalize. Koma mu satifiketi sync_file_range() timuyi akuti "ndi yoopsa kwambiri". Sitikulimbikitsidwa kugwiritsa ntchito. Mbali ndi zoopsa sync_file_range() zofotokozedwa bwino mu izi zakuthupi. Makamaka, kuyimba uku kumawoneka kuti kumagwiritsa ntchito RocksDB kuwongolera pomwe kernel itaya data yonyansa ku disk. Koma nthawi yomweyo, kuonetsetsa kusungidwa kokhazikika kwa data, kumagwiritsidwanso ntchito fdatasync(). The kachidindo RocksDB ili ndi ndemanga zosangalatsa pamutuwu. Mwachitsanzo, zikuwoneka kuti kuyitana sync_file_range() Mukamagwiritsa ntchito ZFS, sichichotsa deta ku disk. Zochitika zimandiuza kuti code yomwe siigwiritsidwa ntchito kawirikawiri imakhala ndi nsikidzi. Chifukwa chake, ndingalangize kuti musagwiritse ntchito kuyimba foniyi pokhapokha ngati kuli kofunikira.

Kuyimba foni pamakina komwe kumathandizira kuonetsetsa kuti data ipitilirabe

Ndazindikira kuti pali njira zitatu zomwe zingagwiritsidwe ntchito pochita ntchito za I / O zomwe zimatsimikizira kulimbikira kwa data. Zonsezi zimafuna kuyitana ntchito fsync() kwa chikwatu chomwe fayilo idapangidwira. Izi ndi njira:

  1. Kuyitana ntchito fdatasync() kapena fsync() pambuyo ntchito write() (ndi bwino kugwiritsa ntchito fdatasync()).
  2. Kugwira ntchito ndi fayilo yofotokozera yotsegulidwa ndi mbendera O_DSYNC kapena O_SYNC (bwino - ndi mbendera O_DSYNC).
  3. Kugwiritsa ntchito lamulo pwritev2() ndi mbendera RWF_DSYNC kapena RWF_SYNC (makamaka ndi mbendera RWF_DSYNC).

Zolemba Zochita

Sindinayeze mosamalitsa momwe zimagwirira ntchito zosiyanasiyana zomwe ndapenda. Kusiyana kumene ndinaona pa liwiro la ntchito yawo n’kochepa kwambiri. Izi zikutanthawuza kuti ndikhoza kukhala ndikulakwitsa, ndi kuti pansi pa mikhalidwe yosiyana chinthu chomwecho chingabweretse zotsatira zosiyana. Choyamba, ndilankhula zomwe zimakhudza magwiridwe antchito kwambiri, ndiyeno zomwe zimakhudza magwiridwe antchito pang'ono.

  1. Kulembanso zambiri zamafayilo ndikofulumira kuposa kuyika deta ku fayilo (phindu la magwiridwe antchito lingakhale 2-100%). Kuyika deta ku fayilo kumafuna kusintha kwina kwa metadata ya fayilo, ngakhale mutayimba foni fallocate(), koma kukula kwa zotsatira zake kungakhale kosiyana. Ndikupangira, kuti mugwire bwino ntchito, kuyimba foni fallocate() kugawatu malo ofunikira. Ndiye danga ili liyenera kudzazidwa momveka bwino ndi ziro ndikuyitanidwa fsync(). Izi zidzaonetsetsa kuti midadada yofananira mu fayilo yamafayilo imalembedwa kuti "yoperekedwa" osati "yosagawidwa". Izi zimapereka kuwongolera kwakung'ono (pafupifupi 2%). Kuphatikiza apo, ma disks ena amatha kukhala ndi mwayi wofikira pang'onopang'ono ku block kuposa ena. Izi zikutanthauza kuti kudzaza malo ndi zero kungapangitse kuti pakhale kusintha kwakukulu (pafupifupi 100%) pakuchita bwino. Makamaka, izi zitha kuchitika ndi ma disks AWS EBS (iyi ndi data yosavomerezeka, sindinathe kutsimikizira). Zomwezo zimapitanso kusungirako GCP Persistent Disk (ndipo izi ndizodziwika kale, zotsimikiziridwa ndi mayesero). Akatswiri ena achitanso chimodzimodzi kuyang'anitsitsa, zokhudzana ndi ma disks osiyanasiyana.
  2. Kuyimba kwamakina ocheperako, kumapangitsa magwiridwe antchito apamwamba (kupindula kungakhale pafupifupi 5%). Zikuwoneka ngati zovuta open() ndi mbendera O_DSYNC kapena kuitana pwritev2() ndi mbendera RWF_SYNC mwachangu kuposa kuyimba fdatasync(). Ndikukayikira kuti mfundo apa ndi yakuti njira iyi imagwira ntchito chifukwa mafoni ochepa amayenera kuchitidwa kuti athetse vuto lomwelo (kuyitana kumodzi m'malo mwa awiri). Koma kusiyana kwa magwiridwe antchito ndikochepa kwambiri, kotero mutha kunyalanyaza kwathunthu ndikugwiritsa ntchito china chake chomwe sichingasokoneze malingaliro ake.

Ngati muli ndi chidwi ndi mutu wa kusungidwa kokhazikika kwa data, nazi zida zothandiza:

  • Njira zopezera I/O - Chidule cha zoyambira za njira zolowera / zotulutsa.
  • Kuonetsetsa kuti deta ifika pa disk - nkhani yokhudza zomwe zimachitika ku data yomwe ili panjira kuchokera ku pulogalamu kupita ku disk.
  • Ndi liti pamene muyenera fsync ndandanda yomwe ili - yankho la funso la nthawi yogwiritsira ntchito fsync() kwa akalozera. Kuyika izi mwachidule, zikuwoneka kuti muyenera kuchita izi popanga fayilo yatsopano, ndipo chifukwa cha malingaliro awa ndikuti mu Linux pakhoza kukhala maumboni ambiri a fayilo yomweyo.
  • SQL Server pa Linux: FUA Internals - nayi kufotokozera momwe kusungirako kusungidwira kosalekeza kumayendetsedwa mu SQL Server pa nsanja ya Linux. Pali kufananitsa kosangalatsa pakati pa mafoni a Windows ndi Linux pano. Ndili wotsimikiza kuti zinali chifukwa cha izi zomwe ndidaphunzira za kukhathamiritsa kwa FUA kwa XFS.

Kodi mwataya deta yomwe mumaganiza kuti idasungidwa bwino pa disk?

Durable Data Storage ndi Linux File APIs

Durable Data Storage ndi Linux File APIs

Source: www.habr.com