I-OpenZL 0.2.0

I-OpenZL 0.2.0 I-OpenZL 0.2.0

Ngemva kwezinyanga eziyisikhombisa zokuthuthukiswa, inguqulo 0.2.0 yohlaka yakhululwa. I-OpenZL, eklanyelwe ukudala ama-compressor edatha angenakulahlekelwa.

Lolu hlaka luqukethe umtapo wolwazi oyisisekelo namathuluzi okudala ama-compressor akhethekile achazwe ngolimi. I-SDDL.
Kunezinyathelo ezimbili zokudala i-compressor enhle ezinikele:

  1. Ukuhlaziywa kwedatha ukuze kukhishwe isakhiwo.
  2. Ukusebenzisa ama-backend compressor amahle asebenzisa isakhiwo esitholakalayo ukuze kufezwe ukucindezela okuhle.

I-OpenZL inikeza amathuluzi kuzo zombili izigaba.

Le phrojekthi ibhalwe ngo-C no-C++ futhi isatshalaliswa ngaphansi kwelayisensi ye-BSD.

Izinguquko ezinkulu

I-SDDL2

I-SDDL yabhalwa kabusha ngokuphelele kusukela phansi ukuze ifeze imigomo yayo yokuklama yokuqala. Ngenkathi i-demo yokuqala yayiyisimo sesikhathi sokusebenza esilula, i-SDDL2 iyi-compiler ephelele: i-parser idlulisela idatha ku-semantic analyzer, yona edlulisela isihlahla se-syntax esingabonakali esithayishiwe (AST) ku-optimizer, kanti i-optimizer ilawula i-code generator, ekhiqiza i-bytecode yomshini obonakalayo.

Umphumela oyinhloko uwukuhlaziya okusheshayo. Lapho indawo yerekhodi inganqunywa ngokugcwele kusetshenziswa amapharamitha nama-constant kuphela, injini igxuma ngqo kunoma iyiphi insimu ngaphandle kokuskena ama-byte angaphambilini, okuvumela ukufinyelela okungenakho ukukopisha kanye nokudluliselwa kwama-GB/s amaningana.

Ulimi ngokwalo luthuthukile kanye nesethi yamathuluzi alo. Manje selusekela izigaba ze-when zezitatimende ezinemibandela, amarekhodi ahlukaniswe ngamapharamitha nangaziwa, ukufinyelela kumalungu ensimu yamarekhodi, kanye nabaqhubi be-bitwise kanye nabanengqondo.

Ngasohlangothini lonjiniyela, isinyathelo sokuhlaziya i-semantic manje sibona izinkomba ezingachazwanga, ukungafani kohlobo, kanye namaphutha e-arity ngesikhathi sokuhlanganiswa—nendawo yekhodi yomthombo—kunokuba ngesikhathi sokusebenza, futhi isandiso se-VS Code sokugqamisa i-syntax yamafayela e-.sddl sikhishwe.

I-codec entsha ye-LZ eyakhelwe ngaphakathi

I-OpenZL manje ifaka i-codec yayo ye-LZ, emelelwe njenge-ZL_GRAPH_LZ, kanye nephrofayili yokucindezela elandelanayo kuhlelo lokusebenza lwe-zli. Umsebenzi ku-codec uyaqhubeka, wandisa isethi yayo yezici futhi uthuthukisa ukusebenza lapho kucutshungulwa idatha encane yokufaka. Njengamanje, isekela ukusebenza okulingana nezinga le-zstd 1, nefasitela lokucindezela elingu-64 KB.

I-OpenZL ivumela isigaba ngasinye sephayiphi ye-LZ ukuthi sihlelwe kabusha ukuze sihambisane nesivinini. Ukwakhiwa kwayo kwegrafu kuvumela nokuhlanganisa izigaba zokufaka ikhodi ye-entropy, kunokusebenzisa iphayiphi eyodwa efaneleka kahle kuzo zonke izimo zokusetshenziswa. Izigaba eziningi zingahlanganiswa zibe umsebenzi owodwa ukuze kuthuthukiswe isivinini sokucubungula. Lokhu kuvumela i-OpenZL ukuthi ifinyelele ukucindezelwa okusheshayo okungu-10% kanye nokuncishiswa okusheshayo okungu-70% uma kuqhathaniswa nezinga le-Zstandard 1 ku-Silesia corpus. izivivinyo zethu:

IsicindezeliIzilinganiso ZokucindezelaIsivinini SokucindezelaIsivinini Sokunciphisa Ukucindezeleka
Izinga 1 le-OpenZL LZ2.74I-466 MB / sI-2288 MB / s
Izinga 1 le-Zstd elinosayizi wefasitela ongu-64K2.74I-419 MB / sI-1254 MB / s
Izinga 1 le-Zstd2.89I-424 MB / sI-1345 MB / s

Ukusekelwa kwedatha yokufaka enkulu kakhulu

I-zli manje isekela ukucubungula idatha enkulu yokufaka (ama-gigabytes amaningana ngosayizi). Ngaphambi kokucindezelwa, idatha enjalo manje ihlukaniswa ngokuzenzakalelayo ibe yizicucu ezinobukhulu obuphathekayo (cishe i-16 MB ngokuzenzakalelayo), inciphisa ukusetshenziswa kwememori, ithuthukisa indawo yedatha, futhi ivumela ukucubungula okuhambisanayo. I-SDDL2 isebenzisa isici esifanayo sokuqoqa okuzenzakalelayo lapho isebenza nge-schema. Ama-segmenter amasha adalwe noma abuyekezwa kule nqubo—ye-CSV, i-Parquet, kanye nedatha yezinombolo ejwayelekile—futhi wonke ama-segmenter manje ayakwazi ukulandelelaniswa futhi alungiseka, ngakho-ke ukwakheka okukhethiwe kungagcinwa ku-compressor futhi kusetshenziswe kabusha kamuva.

Lokhu kusetshenziswa ngokusobala ngesikhathi sokucindezelwa. Qaphela ukuthi ipayipi lokuqeqesha lihlukile futhi alithinteki, ngakho-ke aliklanyelwe ukwamukela idatha enkulu yokufaka njengezinto zokuqeqesha.

Ukuthuthukiswa kwesibonisi segrafu eku-inthanethi (ukuzama)
Isibonisi manje sibona imikhondo yokucindezela kanye nokunciphisa kusukela ekuqaleni kuze kube sekupheleni.

Iphaneli yokubuka kuqala ukusakaza ikuvumela ukuthi ubone amabhayithi egeleza ngempela onqenqemeni ngalunye, futhi ukusika izilawuli kugcina ngisho nemifudlana emikhulu kulula ukusebenza ngayo.

Iphaneli yezilungiselelo ihlanganisa zonke izinketho zokubonisa ndawonye endaweni eyodwa, futhi isethi ephelele yama-hotkey—ukuzulazula okuqondiswayo, ukuzulazula oku-oda, ukunwebeka nokuwa, kanye nokukhethwa kwama-node—kukuvumela ukuthi usebenze kalula ngethuluzi ngaphandle kwegundane.

Ama-trace manje aseguquliwe, ukucindezela okusekelwe ku-block kuboniswa kahle, futhi i-zli ekugcineni ingakhiqiza ama-trace ayo isebenzisa amafulegi amasha e--trace kanye ne--trace-streams-dir.

Разное

  • Ama-codec amaningana angeziwe kukhathalogi. Ama-codec e-Partition kanye ne-bitpack manje asebenzisa i-decoder ehlanganisiwe. I-codec ye-floating-point bitsplit manje ifaka ama-encoder nama-decoder azinikele kumafomethi e-fp16, fp32, fp64, kanye ne-bf16 ane-acceleration ekhethekile. Ukuhlukaniswa kwe-Range-aware (split_byrange), i-length multiplexer, i-sentinel codec, i-lz4 graph, kanye nemisebenzi emincane yomsizi njenge-tryParseInt kanye ne-splitByParam kufakiwe.
  • I-API yenziwe lula.
  • Ukuhlolwa kwe-fuzz okuthuthukisiwe.
  • Inqubo yokwakha nokupakisha ethuthukisiwe yamapulatifomu engeziwe.

Source: linux.org.ru