DuckDB 0.6.0, SQLite musiyano wemibvunzo yekuongorora yakaburitswa

Kuburitswa kweDuckDB 0.6.0 DBMS kunowanikwa, kusanganisa zvinhu zvakadaro zveSQLite se compactness, kugona kubatanidza muchimiro cheraibhurari yakamisikidzwa, kuchengeta dhatabhesi mufaira rimwe chete uye iri nyore CLI interface, ine maturusi uye optimizations yekuita. mibvunzo yekuongorora inovhara chikamu chakakosha che data rakachengetwa, semuenzaniso rinounganidza zvese zviri mumatafura kana kubatanidza matafura makuru akati wandei. Iyo kodhi yeprojekiti yakagoverwa pasi peMIT rezinesi. Iyo budiriro ichiri padanho rekugadzira kuyedza kuburitswa, sezvo iyo yekuchengetedza fomati haisati yadzikamiswa uye shanduko kubva kushanduro kuenda kune shanduro.

DuckDB inopa yakakwira SQL dialect iyo inosanganisira humwe hunyanzvi hwekubata mibvunzo yakaoma uye inopedza nguva. Iko kushandiswa kwemhando dzakaomarara (zvimiro, zvimiro, mibatanidzwa) uye kugona kuita zvekupokana uye nested correlating subqueries zvinotsigirwa. Inotsigira kumhanyisa mibvunzo yakawanda panguva imwe chete, ichimhanyisa mibvunzo yakananga kubva kuCSV neParquet mafaera. Zvinogoneka kuunza kubva kuPostgreSQL DBMS.

Pamusoro peiyo shell kodhi kubva kuSQLite, purojekiti inoshandisa parser kubva kuPostgreSQL muraibhurari yakaparadzana, iyo Date Math chikamu kubva kuMonetDB, kwayo kuita kwayo kwemahwindo mabasa (zvichienderana neSegment Tree Aggregation algorithm), yenguva dzose kutaura processor yakavakirwa pa. iyo RE2 raibhurari, yayo yekubvunza optimizer, uye MVCC yekudzora maitiro panguva imwe chete yekuita mabasa (Multi-Version Concurrency Control), pamwe nevectorized query execution injini yakavakirwa paHyper-Pipelining Query Execution algorithm, iyo inobvumira seti huru dzehunhu. kuti igadziriswe panguva imwe chete.

Pakati pekuchinja mukuburitswa kutsva:

  • Basa rakaramba richivandudza chimiro chekuchengetedza. Iyo tarisiro yedhisiki yekunyora modhi yakashandiswa, iyo kana ichirodha seti yakakura yedata mune imwe kutengeserana, iyo data inomanikidzwa uye inonyorerwa kufaira kubva kudhatabhesi mukushambadzira mode, pasina kumirira kuti kutengeserana kusimbiswe nemirairo yeCOMMIT. . Kana murairo weCOMMIT wagamuchirwa, data yakatonyorwa ku diski, uye kana ROLLBACK yaitwa, inoraswa. Pakutanga, iyo data yakatanga kuchengetwa mundangariro, uye payakaitwa, yakachengetwa kudhisiki.
  • Yakawedzera rutsigiro rwekuenderana kurodha data mumatafura akasiyana, ayo anotendera iwe kuti uwedzere zvakanyanya kurodha kumhanya pane akawanda-musimboti masisitimu. Semuenzaniso, mukuburitswa kwakapfuura, kurodha dhatabhesi ine 150 miriyoni mitsara pagumi-core CPU yakatora 10 masekonzi, asi mushanduro itsva iyi kushanda kunopedzwa mumasekonzi gumi nemanomwe. Kune maviri akafanana ekurodha modhi - nekuchengetedza kurongeka kwemarekodhi uye pasina kuchengetedza kurongeka.
  • Nekudzvanya data, iyo FSST (Fast Static Symbol Table) algorithm inoshandiswa, iyo inokutendera kurongedza data mukati metambo uchishandisa duramazwi rakajairwa remachisi akajairwa. Kushandiswa kweiyo algorithm itsva kwakaita kuti zvibvire kuderedza saizi yedatabase rebvunzo kubva pa761MB kusvika 251MB.
  • Chimp nePatas algorithms akakurudzirwa kumanikidza anoyangarara mapoinzi nhamba (DOUBLE uye FLOAT). Kuenzaniswa neyakapfuura Gorillas algorithm, Chimp inopa yakakwira mazinga ekumanikidza uye nekukurumidza decompression. Iyo Patas algorithm inosara kuseri kweChimp mu compression reshiyo, asi inokurumidza kukurumidza mukutsikirira, iyo inenge isina kusiyana nekuverenga isina kudzvanywa data.
  • Yakawedzera kugona kwekuyedza kurodha data kubva kuCSV mafaera kuita akawanda akafanana hova (SET experimental_parallel_csv=true), iyo inoderedza zvakanyanya nguva inotora kurodha makuru CSV mafaera. Semuenzaniso, sarudzo iyi payakagoneswa, nguva yekurodha ye720 MB CSV faira yakaderedzwa kubva pa3.5 kusvika 0.6 masekonzi.
  • Iko mukana wekuita kwakafanana kwekugadzira index uye manejimendi mashandiro akaitwa. Semuyenzaniso, iyo CREATE INDEX oparesheni pakoramu ine 16 miriyoni rekodhi yakaderedzwa kubva pa5.92 kusvika 1.38 masekonzi.
  • Yagonesa kuenzanirana kwemabasa ekuunganidza mumibvunzo ine chirevo "COUNT(DISTINCT col)".
  • SQL yakawedzera tsigiro yerudzi rweUNION, iyo inobvumira marudzi akawanda kuti asungirwe kune chimwe chinhu (semuenzaniso, "UNION(num INT, kukanganisa VARCHAR))").
  • SQL inopa kugona kuumba mibvunzo inotanga neizwi rekuti "KUBVA" pachinzvimbo che "SARUDZA". Muchiitiko ichi, zvinofungidzirwa kuti mubvunzo unotanga ne "SELECT *".
  • SQL yakawedzera tsigiro yekutaura kweCOLUMNS, izvo zvinokutendera kuti uite oparesheni pamakoramu akawanda pasina kudzokorodza kutaura. Semuenzaniso, "SARUDZA MIN(COLUMNS(*)) kubva kuobs;" ichaita kuti MIN basa riitwe pakoramu yega yega mutafura yeobs, uye "SARUDZA MACOLUMNS('val[0-9]+') kubva kune obs;" kumakoramu ane zita rine "val" nenhamba.
  • Yakawedzerwa tsigiro yemashandiro pamazita, semuenzaniso, β€œSARUDZA [x + 1 ye x mu [1, 2, 3]] AS l;”.
  • Memory kushandiswa kwakagadziridzwa. Nekutadza, iyo Linux chikuva inoshandisa iyo jemalloc raibhurari yekurangarira manejimendi. Yakanyanya kunatsiridza kuita kwehashi yekubatanidza mashandiro kana ndangariro ishoma.
  • Yakawedzerwa ".mode duckbox" inobuda maitiro kune yekuraira mutsara interface, iyo inorasa makoramu epakati achifunga nezvehupamhi hwehwindo rekupedzisira (rakakodzera kukurumidza kuona mhedzisiro yemibvunzo ine nhamba huru yemakoramu, senge "SARUDZA * KUBVA tbl", iyo mune yakajairika modhi inopararira pamusoro pemitsetse yakati wandei). Uchishandisa ".maxrows X" parameter, unogona kuwedzera kudzikamisa nhamba yemitsara inoratidzwa.
  • Iyo CLI inopa otomatiki kupedzisa kwekuisa uchifunga nezve mamiriro (kupinza kwemazwi akakosha, mazita etafura, mabasa, makoramu mazita uye mazita emafaira anopedzwa).
  • Iyo CLI ine query kufambira mberi chiratidzo chinogoneswa nekusarudzika.

Source: opennet.ru

Voeg