I-DuckDB 0.6.0 ipapashiwe, i-SQLite Option ye-Analytical Questions

Ukukhutshwa kwe-DuckDB 0.6.0 DBMS kuyafumaneka, ukudibanisa iipropati ezinjalo zeSQLite njengokuqina, ukukwazi ukudibanisa ngendlela yelayibrari egxininisiweyo, ukugcina i-database kwifayile enye kunye ne-interface ye-CLI efanelekileyo, kunye nezixhobo kunye nokulungiswa kokuphunyezwa. imibuzo yocazululo equka inxalenye ebalulekileyo yedatha egciniweyo, umzekelo ehlanganisa yonke imixholo yeetheyibhile okanye ukudibanisa iitheyibhile ezininzi ezinkulu. Ikhowudi yeprojekthi ihanjiswa phantsi kwelayisenisi ye-MIT. Uphuhliso lusekwinqanaba lokuqulunqa ukukhutshwa kovavanyo, kuba ifomathi yokugcina ayikaqiniswa kwaye itshintshe ukusuka kwinguqulelo ukuya kwinguqulelo.

I-DuckDB ibonelela ngolwimi lwe-SQL oluphambili olubandakanya izakhono ezongezelelweyo zokuphatha imibuzo entsonkothileyo kwaye ethatha ixesha. Ukusetyenziswa kweentlobo ezintsonkothileyo (ii-arrays, izakhiwo, iimanyano) kunye nokukwazi ukwenza ii-subqueries ezihambelanayo ezingafanelekanga kunye ne-nested correlating ziyaxhaswa. Ixhasa ukuqhuba imibuzo emininzi ngaxeshanye, iqhuba imibuzo ngokuthe ngqo kwi-CSV kunye neefayile zeParquet. Kuyenzeka ukungenisa kwi-PostgreSQL DBMS.

Ukongeza kwikhowudi yeqokobhe evela kwiSQLite, iprojekthi isebenzisa i-parser evela kwi-PostgreSQL kwithala leencwadi elahlukileyo, icandelo leMathematika yoMhla ukusuka kwi-MonetDB, ukuphunyezwa kwayo kwemisebenzi yefestile (ngokusekwe kwi-algorithm ye-Segment Tree Aggregation), iprosesa yenkcazo eqhelekileyo esekwe ilayibrari RE2, eyakhe umbuzo optimizer, kunye nenkqubo yolawulo MVCC ngaxeshanye ufezekiso lwemisebenzi (Multi-Version Concurrency Control), kunye ne-vectorized queecution injini esekelwe kwi-hyper-Pipelining Query Execution algorithm, evumela iiseti ezinkulu zamaxabiso. ukuba iqhutywe kanye kumsebenzi omnye.

Phakathi kotshintsho kukhupho olutsha:

  • Umsebenzi waqhubeka ekuphuculeni ifomathi yokugcina. Imodi yokubhala i-disk enethemba iphunyeziwe, apho xa kulayishwa isethi enkulu yedatha kwi-transaction enye, idatha ixinzelelwe kwaye ibhalwe kwifayile esuka kwisiseko sedatha kwimodi yokusakaza, ngaphandle kokulinda ukuthengiselana ukuba kuqinisekiswe ngomyalelo we-COMMIT. . Xa umyalelo we-COMMIT ufunyenwe, idatha sele ibhaliwe kwidiski, kwaye xa i-ROLLBACK isenziwa, ilahlwa. Ngaphambili, idatha yayigcinwe ngokupheleleyo kwimemori, kwaye xa izinikele, yagcinwa kwidiski.
  • Inkxaso eyongeziweyo yokulayisha ngokuhambelanayo kwedatha kwiitafile ezahlukeneyo, ezikuvumela ukuba unyuse kakhulu isantya sokulayisha kwiinkqubo ezininzi ezingundoqo. Ngokomzekelo, ekukhutshweni kwangaphambili, ukulayisha i-database kunye ne-150 yezigidi zemiqolo kwi-CPU ye-10-core ithathe imizuzwana ye-91, kodwa kwinguqulo entsha lo msebenzi ugqitywe ngemizuzwana eyi-17. Kukho iindlela ezimbini zokulayisha ezihambelanayo - kunye nokugcinwa kolandelelwano lweerekhodi kwaye ngaphandle kokugcinwa kocwangco.
  • Ukunyanzeliswa kwedatha, i-FSST (i-Fast Static Symbol Table) i-algorithm isetyenzisiweyo, ekuvumela ukuba upakishe idatha ngaphakathi kweentambo usebenzisa isichazi-magama esiqhelekileyo semidlalo eqhelekileyo. Ukusetyenziswa kwe-algorithm entsha kwenza kube lula ukunciphisa ubungakanani bedatha yovavanyo ukusuka kwi-761MB ukuya kwi-251MB.
  • Ii-algorithms zeChimp kunye nePatas ziye zacetywa ukuba zicinezele amanani amanqaku adadayo (DOUBLE kunye neFLOAT). Xa kuthelekiswa ne-algorithm yeGorilla yangaphambili, iChimp ibonelela ngamanqanaba aphezulu oxinzelelo kunye noxinzelelo olukhawulezayo. I-algorithm ye-Patas isala ngasemva kweChimp kumlinganiselo woxinzelelo, kodwa ikhawuleza kakhulu kwisantya soxinzelelo, esiphantse sahluke nokufunda idatha engaxinzelelwanga.
  • Kongezwe amandla okulinga ukulayisha idatha esuka kwiifayile ze-CSV kwimijelo emininzi ehambelanayo (SET experimental_parallel_csv=true), enciphisa kakhulu ixesha elithathayo ukulayisha iifayile ezinkulu ze-CSV. Umzekelo, xa olu khetho lwenziwe, ixesha lokukhuphela lefayile ye-CSV ye-720 MB lancitshiswa ukusuka kwi-3.5 ukuya kwi-0.6 imizuzwana.
  • Ukuphunyezwa okufanayo kokudalwa kwesalathisi kunye nemisebenzi yokulawula iphunyeziwe. Ngokomzekelo, ukusebenza kwe-CREATE INDEX kwikholamu kunye neerekhodi ze-16 yezigidi zancitshiswa ukusuka kwi-5.92 ukuya kwi-1.38 imizuzwana.
  • Ulungelelwaniso olusebenzayo lwe-aggregation lwenziwe kwimibuzo equlathe ibinzana elithi β€œCOUNT(DISTINCT col)”.
  • I-SQL yongeze inkxaso yohlobo lwe-UNION, evumela iindidi ezininzi ukuba ziboshwe kwinto enye (umzekelo, "UNION (num INT, error VARCHAR))").
  • I-SQL ibonelela ngokukwazi ukwenza imibuzo eqala ngegama elithi "UKUSUKA" endaweni yokuthi "KHETHA". Kule meko, kucingelwa ukuba umbuzo uqala ngo "KHETHA *".
  • I-SQL yongeze inkxaso yenkcazo ye-COLUMNS, ekuvumela ukuba wenze umsebenzi kwimihlathi emininzi ngaphandle kokuphinda-phinda intetho. Umzekelo, "KHETHA UMNCINCI(KULAMN (*)) kwi-obs;" iya kubangela ukuba MIN umsebenzi wenziwe kumhlathi ngamnye kwi-obs table, kwaye "KHETHA IMIKHOLAM('val[0-9]+') ukusuka kwi-obs; kwiikholamu ezinegama eliquka "val" kunye namanani.
  • Inkxaso eyongeziweyo yokusebenza kwizintlu, umzekelo, "KHETHA [x + 1 for x in [1, 2, 3]] AS l;".
  • Ukusetyenziswa kwememori kuye kwalungiswa. Ngokungagqibekanga, iqonga leLinux lisebenzisa ilayibrari yejemalloc yolawulo lwenkumbulo. Kuphuculwe kakhulu ukusebenza kokudibanisa i-hash xa inkumbulo ilinganiselwe.
  • Yongezwe ".imode ye-duckbox" imo yemveliso kujongano lwelayini yomyalelo, elahla imihlathi ephakathi ithathela ingqalelo ububanzi befestile ye-terminal (ilungele ukuvavanya ngokukhawuleza iziphumo zemibuzo ngenani elikhulu leekholamu, ezinje ngo β€œKHETHA * UKUSUKA kwi-tbl”, ethi kwimo yesiqhelo isasazwe kwimigca emininzi). Ukusebenzisa iparameter ".maxrows X", unganciphisa inani lemiqolo ebonisiweyo.
  • I-CLI ibonelela ngokugqitywa ngokuzenzekelayo kwegalelo kuthathelwa ingqalelo umxholo (ukufakwa kwamagama angundoqo, amagama etafile, imisebenzi, amagama eekholamu kunye namagama eefayile agqityiwe).
  • I-CLI inesalathiso senkqubela yombuzo esinikwe ngokuzenzekelayo.

umthombo: opennet.ru

Yongeza izimvo