Durable Data Storage uye Linux File APIs

Ndichiri kutsvagisa kusimba kwekuchengetwa kwedata mumafu masystem, ndakafunga kuzviyedza kuti ndive nechokwadi chekuti ndinonzwisisa zvinhu zvakakosha. I yakatanga nekuverenga iyo NVMe yakatarwa kuitira kuti tinzwisise kuti ndezvipi zvinovimbisa maererano nekuchengetedzeka kwedata rekuchengetedza (kureva kuti, inovimbisa kuti data ichave iripo mushure mekutadza kwehurongwa) tipe madhisiki eNMVe. Ndakaita mhedziso huru dzinotevera: data inofanirwa kutorwa seyakakuvadzwa kubva panguva iyo murairo wekunyora data wapihwa kusvika panguva yainonyorwa kune yekuchengetera svikiro. Nekudaro, mazhinji mapurogiramu anonyatso kushandisa system mafoni kurekodha data.

Mune ino positi, ini ndinoongorora anoenderera ekuchengetedza masisitimu anopihwa neLinux faira APIs. Zvinoita sekuti zvese zvinofanirwa kunge zviri nyore pano: chirongwa chinodaidza murairo write(), uye mushure mekunge murairo uyu wapera, data ichachengetedzwa zvakachengeteka ku diski. Asi write() inongokopa data rekushandisa kune kernel cache iri mu RAM. Kuti umanikidzire hurongwa kunyora data kune disk, unoda kushandisa mamwe maitiro ekuwedzera.

Durable Data Storage uye Linux File APIs

Pakazara, chinyorwa ichi muunganidzwa wezvinyorwa zvine chekuita nezvandakadzidza pamusoro penyaya yandinofarira. Kana tikataura muchidimbu nezvechinhu chinonyanya kukosha, zvinozoitika kuti kuronga kuchengetedza data kwaunofanirwa kushandisa murairo. fdatasync() kana kuvhura mafaira nemureza O_DSYNC. Kana iwe uchida kudzidza zvakawanda pamusoro pezvinoitika kune data munzira yayo kubva kukodhi kuenda kudhisiki, tarisa izvi article.

Zvimiro zvekushandisa kunyora () basa

System call write() inotsanangurwa muchiyero IEEE POSIX sekuedza kunyora data kune faira descriptor. Mushure mekubudirira kupedza write() Mashandisirwo ekuverenga data anofanirwa kudzosera chaizvo mabhayiti akange ambonyorwa, uchiita izvi kunyangwe iyo data inowanikwa kubva kune mamwe maitiro kana shinda (tarisai chikamu chakakodzera chePOSIX chiyero). zviri, muchikamu chekuti shinda dzinodyidzana sei neyakajairwa faira mashandiro, pane chinyorwa chinoti kana shinda mbiri imwe neimwe ichidaidza mabasa aya, ipapo kufona kwega kwega kunofanirwa kuona zvese zvakatarwa mhedzisiro yeimwe runhare, kana kusatombofona zvachose. migumisiro. Izvi zvinotungamira kumhedziso yekuti ese faira I/O mashandiro anofanira kubata kiyi pane sosi yavari kushanda pairi.

Izvi zvinoreva here kuti kuvhiyiwa write() ndiyo atomiki here? Kubva pakuona kwehunyanzvi, hongu. Mashandisirwo ekuverenga data anofanirwa kudzosera zvese kana hapana pane zvakanyorwa nazvo write(). Asi oparesheni write(), maererano nemwero wacho, haifaniri hazvo kuguma nokunyora zvose zvayakanzi inyore. Anobvumirwa kunyora chikamu che data chete. Semuenzaniso, tinogona kuva netambo mbiri imwe neimwe ichibatanidza 1024 bytes kune faira inotsanangurwa neiyo faira descriptor. Kubva pakuona kweyero, mhedzisiro inogamuchirwa ichave apo yega yega yekunyora mashandiro anogona kuwedzera imwe chete byte kufaira. Aya maitiro acharamba ari atomu, asi mushure mekunge apera, data ravakanyora kufaira richasanganiswa. pano Kukurukurirana kunonakidza pane iyi nyaya paStack Kufashukira.

fsync () uye fdatasync () mabasa

Nzira iri nyore yekutsvaira data kune dhisiki ndeyekufonera basa fsync(). Iri basa rinokumbira sisitimu yekushandisa kuendesa ese akagadziridzwa mabhuroki kubva kucache kuenda kudhisiki. Izvi zvinosanganisira metadata yese yefaira (nguva yekuwana, nguva yekushandura faira, zvichingodaro). Ndinotenda kuti iyi metadata haiwanzo kudiwa, saka kana iwe uchiziva kuti haina kukosha kwauri, unogona kushandisa basa racho. fdatasync(). The help pamusoro fdatasync() Zvinonzi panguva yekushanda kwebasa iri, huwandu hwakadaro hwemetadata hunochengetwa ku diski iyo "inodiwa pakuita kwakakodzera kweanotevera kuverenga data." Uye izvi ndizvo chaizvo zvinotarisirwa nemaapplication mazhinji.

Rimwe dambudziko rinogona kumuka pano nderekuti nzira idzi hadzivimbise kuti faira ichaonekwa mushure mekutadza kutadza. Kunyanya, paunenge uchigadzira faira nyowani, unofanirwa kufona fsync() kune dhairekitori ririmo. Kana zvisina kudaro, mushure mekukundikana, zvinogona kuitika kuti iyi faira haipo. Chikonzero cheizvi ndechekuti muUNIX, nekuda kwekushandiswa kwezvakaoma zvinongedzo, faira rinogona kuvapo mune akawanda madhairekitori. Naizvozvo, pakufona fsync() hapana nzira yekuti faira rizive kuti nderipi dhairekitori data rinofanirawo kuiswa kune dhisiki (pano Unogona kuverenga zvakawanda pamusoro peizvi). Zvinotaridza kunge ext4 faira system inokwanisa otomatiki shandisa fsync() kumadhairekitori ane mafaera anoenderana, asi izvi zvinogona kunge zvisiri izvo kune mamwe mafaera masisitimu.

Iyi meshini inogona kuitwa zvakasiyana pane akasiyana faira masisitimu. Ndakashandisa blktrace kudzidza nezve izvo dhisiki mashandiro anoshandiswa mune ext4 uye XFS faira masisitimu. Ose ari maviri anoburitsa mirairo yekunyora kudiski kune ese zviri mukati faira uye faira system jenari, bvisa cache, uye kubuda nekuita FUA (Force Unit Access, kunyora data zvakananga ku diski, nekupfuura cache) nyorera kujenari. Vanogona kuita izvi kuitira kusimbisa kuti kutengeserana kwakaitika. Pamadhiraivha asingatsigire FUA, izvi zvinokonzeresa maviri cache flushes. Miedzo yangu yakaratidza izvozvo fdatasync() zvishoma nekukurumidza fsync(). Utility blktrace zvinoratidza kuti fdatasync() kazhinji inonyora dhata shoma kune dhisiki (mu ext4 fsync() inonyora 20 KB, uye fdatasync() 16KiB). Zvakare, ndakaona kuti XFS inokurumidza zvishoma kupfuura ext4. Uye pano nerubatsiro blktrace akakwanisa kuziva kuti fdatasync() inobvisa data shoma kune dhisiki (4KiB mu XFS).

Mamiriro ezvinhu asina kujeka anomuka kana uchishandisa fsync ()

Ndinogona kufunga nezvezvinhu zvitatu zvisinganzwisisike maererano fsync()izvo zvandakasangana nazvo mukuita.

Nyaya yekutanga yakadaro yakaitika muna 2008. Ipapo iyo Firefox 3 interface yakamira kana nhamba huru yemafaira yakanyorwa kudhisiki. Dambudziko raive rekuti kushandiswa kweiyo interface kwakashandisa SQLite dhatabhesi kuchengetedza ruzivo nezve mamiriro ayo. Mushure mekuchinja kwega kwega kwakaitika mune interface, basa rakanzi fsync(), iyo yakapa vimbiso dzakanaka dzekuchengetedza data yakagadzikana. Mune ext3 faira system yakabva yashandiswa, basa racho fsync() yakarasa mapeji ese "akasviba" muhurongwa kune dhisiki, uye kwete chete ayo aienderana nefaira rinoenderana. Izvi zvaireva kuti kudzvanya bhatani muFirefox kwaigona kukonzeresa megabytes yedata kuti inyorwe kune magnetic disk, izvo zvinogona kutora masekonzi akawanda. Mhinduro yedambudziko, sekunzwisisa kwandinoita kubva izvozvo zvinhu zvaive zvekuendesa basa nedatabase kune asynchronous kumashure mabasa. Izvi zvinoreva kuti Firefox yakamboshandisa zvakaomesesa zvichengedzo kupfuura zvaidiwa chaizvo, uye maficha eiyo ext3 faira system yakatowedzera dambudziko iri.

Dambudziko rechipiri rakaitika muna 2009. Zvino, mushure mekuparara kwehurongwa, vashandisi veiyo itsva ext4 faira system vakatarisana nenyaya yekuti mafaera mazhinji achangogadzirwa aive ne zero kureba, asi izvi hazvina kuitika neyekare ext3 faira system. Mundima yapfuura, ndakataura nezvekuti ext3 yakawedzera sei data kune disk, izvo zvakadzora zvinhu pasi zvakanyanya. fsync(). Kuti uvandudze mamiriro acho ezvinhu, mu ext4 chete iwo mapeji ane tsvina anoenderana neimwe faira anosundirwa kudhisiki. Uye data kubva kune mamwe mafaera inoramba iri mundangariro kwenguva yakareba kupfuura ne ext3. Izvi zvakaitwa kuti uvandudze mashandiro (nekusagadzika, iyo data inogara iri mudunhu kwemasekonzi makumi matatu, unogona kugadzirisa izvi uchishandisa tsvina_expire_centisecs; pano Iwe unogona kuwana zvimwe zvinyorwa pamusoro peizvi). Izvi zvinoreva kuti huwandu hukuru hwe data hunogona kurasika zvisingaite mushure mekutadza. Mhinduro yedambudziko iri kushandisa fsync() mumashandisirwo anofanirwa kuve nechokwadi chekuchengetedza data yakagadzikana uye kuvadzivirira zvakanyanya sezvinobvira kubva kumhedzisiro yekutadza. Function fsync() inoshanda zvakanyanya zvakanyanya kana uchishandisa ext4 pane kana uchishandisa ext3. Kukanganisa kweiyi nzira ndeyokuti kushandiswa kwayo, sepakutanga, kunonotsa kuitwa kwemamwe mabasa, akadai sekuisa zvirongwa. Ona tsanangudzo pamusoro peizvi pano ΠΈ pano.

Dambudziko rechitatu maererano fsync(), yakatanga muna 2018. Zvadaro, mukati megadziriro yePostgreSQL purojekiti, zvakaonekwa kuti kana basa racho fsync() inosangana nechikanganiso, inomaka "tsvina" mapeji se "akachena". Nekuda kweizvozvo, kufona kunotevera fsync() Havaiti chero chinhu nemapeji akadaro. Nekuda kweizvi, mapeji akagadziridzwa anochengetwa mundangariro uye haana kumbobvira anyorwa kudhisiki. Iyi injodzi chaiyo, sezvo chikumbiro chichafunga kuti imwe data yakanyorerwa diski, asi chaizvoizvo hazvizove. Kukundikana kwakadaro fsync() hazviwanzo, iko kushandiswa mumamiriro ezvinhu akadaro hakugone kuita chero chinhu kurwisa dambudziko. Mazuva ano, kana izvi zvaitika, PostgreSQL uye mamwe maapplication anoparara. zviri, mune zvinyorwa "Zvishandiso Zvingadzore kubva kuFsync Kukundikana?", Dambudziko iri rinoongororwa zvakadzama. Parizvino mhinduro yakanakisa yedambudziko iri kushandisa Direct I/O nemureza O_SYNC kana nemureza O_DSYNC. Neiyi nzira, sisitimu inoshuma zvikanganiso zvinogona kuitika panguva chaiyo yekunyora mashandiro, asi nzira iyi inoda kuti application igadzirise iyo buffer pachayo. Verenga zvakawanda pamusoro peizvi pano ΠΈ pano.

Kuvhura mafaera uchishandisa O_SYNC uye O_DSYNC mireza

Ngatidzokerei kunhaurirano yeLinux michina inopa yakagadzikana yekuchengetedza data. Kureva kuti tiri kutaura nezvekushandisa mureza O_SYNC kana mureza O_DSYNC paunovhura mafaira uchishandisa system call vhura (). Neiyi nzira, yega yega data kunyora kushanda kunoitwa sekunge mushure mekuraira kwega kwega write() iyo system inopihwa mirairo zvinoenderana fsync() ΠΈ fdatasync(). The POSIX tsanangudzo iyi inonzi "Yakawiriraniswa I/O Faira Kupedzwa Kwekuvimbika" uye "Data Kuperera Kupedzwa". Mukana mukuru weiyi nzira ndeyekuti kuve nechokwadi chekuvimbika kwedata, unongoda kufona system imwe chete, kwete mbiri (semuenzaniso - write() ΠΈ fdatasync()) Chinhu chikuru chakaipa cheiyi nzira ndechekuti vese vanonyora vachishandisa inofananidzira faira descriptor ichave yakawiriraniswa, iyo inogona kudzikisira kugona kugadzira kodhi yekushandisa.

Kushandisa Direct I/O neO_DIRECT mureza

System call open() inotsigira mureza O_DIRECT, iyo yakagadzirirwa kunzvenga iyo yekushandisa system cache kuita I / O mashandiro kuburikidza nekudyidzana zvakananga ne diski. Izvi, muzviitiko zvakawanda, zvinoreva kuti kunyora mirairo yakabudiswa nepurogiramu ichashandurwa zvakananga mumirairo ine chinangwa chekushanda ne diski. Asi, kazhinji, iyi michina haisi kutsiva mabasa fsync() kana fdatasync(). Ichokwadi ndechokuti disk pachayo inogona defer kana cache mirairo yekunyora data. Uye, kuita kuti zvinhu zvinyanye kuoma, mune zvimwe zviitiko zvakakosha maoparesheni eI/O anoitwa pakushandisa mureza O_DIRECT, broadcast mune zvechinyakare buffered mashandiro. Nzira iri nyore yekugadzirisa dambudziko iri kushandisa mureza kuvhura mafaira O_DSYNC, izvo zvinoreva kuti imwe neimwe yekunyora kushanda ichateverwa nekufona fdatasync().

Zvakazoitika kuti XFS faira system yakanga ichangobva kuwedzera "nzira yekukurumidza" ye O_DIRECT|O_DSYNC-kurekodha data. Kana block ichinyorwa zvakare uchishandisa O_DIRECT|O_DSYNC, ipapo XFS, panzvimbo yekutsvaira cache, ichaita iyo FUA kunyora kuraira kana mudziyo uchitsigira. Ndakasimbisa izvi nekushandisa utility blktrace pane Linux 5.4/Ubuntu 20.04 system. Iyi nzira inofanira kunge yakanyatsoshanda, sezvo kana yakashandiswa, nhamba shoma yedhesi inonyorerwa ku diski uye imwe inoshandiswa inoshandiswa, pane maviri (kunyora nekutsvaira cache). Ndakawana link ku chigamba 2018 kernel, iyo inoshandisa iyi michina. Pane imwe nhaurirano ipapo nezve kushandisa iyi optimization kune mamwe mafaera masisitimu, asi sekuziva kwangu, XFS ndiyo yega faira system inotsigira izvi kusvika zvino.

sync_file_range() basa

Linux ine system call sync_file_range(), iyo inokutendera kuti ubvise chikamu chete chefaira kudhisiki, pane iyo faira rese. Kufona uku kunotanga asynchronous data flush uye haimirire kuti ipedze. Asi muchitupa sync_file_range() chikwata chinonzi "chinotyisa zvikuru". Hazvikurudzirwi kuishandisa. Zvinhu uye njodzi sync_file_range() yakanyatsotsanangurwa mukati izvi zvinhu. Kunyanya, iyi yekufona inoita seinoshandisa RocksDB kudzora kana kernel ichirasa yakasviba data kudhisiki. Asi panguva imwecheteyo, kuve nechokwadi chekuchengetedza data yakagadzikana, inoshandiswa zvakare fdatasync(). The code RocksDB ine zvimwe zvinonakidza zvekutaura pane iyi nyaya. Semuenzaniso, zvinoita sekufona sync_file_range() Paunenge uchishandisa ZFS, haiburitse data kudhisiki. Zvakaitika zvinondiudza kuti kodhi isingawanzo shandiswa inogona kunge iine tsikidzi. Naizvozvo, ini ndinopa zano kusashandisa iyi system yekufona kunze kwekunge zvakakodzera.

Mafoni eSistimu anobatsira kuti data irambe iripo

Ndasvika kumhedziso yekuti kune nzira nhatu dzinogona kushandiswa kuita maI/O mashandiro anoita kuti data irambe iripo. Vese vanoda runhare rwekuita fsync() yedhairekitori rakagadzirwa faira. Aya ndiwo maitiro:

  1. Kudana basa fdatasync() kana fsync() mushure mekushanda write() (zviri nani kushandisa fdatasync()).
  2. Kushanda nefaira descriptor yakavhurwa nemureza O_DSYNC kana O_SYNC (zviri nani - nemureza O_DSYNC).
  3. Kushandisa murairo pwritev2() nemureza RWF_DSYNC kana RWF_SYNC (zvichida nemureza RWF_DSYNC).

Performance Notes

Handina kunyatsoyera kushanda kwemaitiro akasiyana-siyana andakaongorora. Misiyano yandakaona mukumhanya kwebasa ravo idiki kwazvo. Izvi zvinoreva kuti ndinogona kunge ndisina kururama, uye kuti mumamiriro ezvinhu akasiyana chinhu chimwe chete chinogona kuunza migumisiro yakasiyana. Kutanga, ini ndichataura nezve izvo zvinokanganisa kuita zvakanyanya, uyezve izvo zvinokanganisa kuita kushoma.

  1. Kunyora faira data kunokurumidza kupfuura kuisa data kune faira (basa rekuita rinogona kuva 2-100%). Kuisa data kufaira kunoda imwe shanduko kune metadata yefaira, kunyangwe mushure mekufona system fallocate(), asi ukuru hweichi chiitiko hunogona kusiyana. Ini ndinokurudzira, kuitira kuita kwakanyanya, kufona fallocate() kufanogovera nzvimbo inodiwa. Zvadaro nzvimbo iyi inofanira kuzadzwa zvakajeka ne zero uye inodanwa fsync(). Izvi zvinozoita kuti mabhuroko anoenderana mufaira system akanyorwa se "akagoverwa" kwete "asina kugoverwa". Izvi zvinopa diki (inenge 2%) mashandiro ekuvandudza. Pamusoro pezvo, mamwe madhisiki anogona kuve nekunonoka kwekutanga kuwana kune block pane mamwe. Izvi zvinoreva kuti kuzadza nzvimbo ne zero kunogona kutungamira kune yakakosha (inenge 100%) kuvandudza mukuita. Kunyanya, izvi zvinogona kuitika ne disks AWS EBS (iyi idata isina pamutemo, handina kukwanisa kuisimbisa). Zvimwe chetezvo zvinoenda kukuchengetedza GCP Persistent Disk (uye iyi yatova ruzivo rwepamutemo, yakasimbiswa nemiedzo). Dzimwe nyanzvi dzakaita zvimwe chetezvo observation, zvinoenderana nemadhisiki akasiyana.
  2. Iwo mashoma masisitimu anofona, anowedzera kuita (iyo pfuma inogona kunge iri 5%). Zvinoita senge dambudziko open() nemureza O_DSYNC kana kufona pwritev2() nemureza RWF_SYNC nekukurumidza kupfuura kufona fdatasync(). Ini ndinofungidzira kuti chiripo apa ndechekuti nzira iyi ine basa mukuti mashoma masystem mafoni anofanirwa kuitwa kugadzirisa dambudziko rimwechete (kufona kumwe pachinzvimbo kwembiri). Asi mutsauko mukuita mudiki kwazvo, saka unogona kufuratira zvachose uye kushandisa chimwe chinhu mukushandisa icho chisingazoomese pfungwa dzayo.

Kana iwe uchifarira musoro wenyaya yekuchengetedza data, hezvino zvimwe zvinobatsira:

  • I/O Nzira dzekuwana -Kutarisisa kwekutanga kwemaitiro ekuisa/zvinobuda.
  • Ita shuwa kuti data inosvika dhisiki - nyaya pamusoro pezvinoitika kune data munzira kubva pakushandisa kuenda kudhisiki.
  • Ndepapi paunofanira fsync iyo ine dhairekitori - mhinduro kumubvunzo wenguva yekushandisa fsync() yedhairekitori. Kuti uise izvi muchidimbu, zvinozoitika kuti iwe unofanirwa kuita izvi paunenge uchigadzira faira idzva, uye chikonzero chekurudziro iyi ndechekuti muLinux panogona kuve nezvakawanda zvinongedzo kune imwechete faira.
  • SQL Server paLinux: FUA Internals - heino tsananguro yekuti kuenderera mberi kwekuchengetedza data kunoitwa sei muSQL Server papuratifomu yeLinux. Pane kumwe kunonakidza kuenzanisa pakati peWindows neLinux system inofona pano. Ndine chokwadi chekuti yaive yekutenda kune ichi chinyorwa chandakadzidza nezveFUA optimization yeXFS.

Wakamborasa data rawaifunga kuti rakachengetwa zvakachengeteka padhisiki here?

Durable Data Storage uye Linux File APIs

Durable Data Storage uye Linux File APIs

Source: www.habr.com