I-DuckDB 0.6.0 Ishicilelwe, Inketho ye-SQLite Yemibuzo Yokuhlaziya

Ukukhishwa kwe-DuckDB 0.6.0 DBMS kuyatholakala, kuhlanganisa izakhiwo ezinjalo ze-SQLite njengokubumbana, ikhono lokuxhuma ngendlela yomtapo wolwazi oshumekiwe, ukugcinwa kwedatha egciniwe efayeleni elilodwa kanye nesixhumi esibonakalayo se-CLI esilula, ngamathuluzi nokulungiselelwa ukuze kusetshenziswe. imibuzo yokuhlaziya ehlanganisa ingxenye ebalulekile yedatha egciniwe, isibonelo ehlanganisa konke okuqukethwe kwamathebula noma ehlanganisa amathebula amakhulu amaningana. Ikhodi yephrojekthi isatshalaliswa ngaphansi kwelayisensi ye-MIT. Ukuthuthukiswa kusesesigabeni sokwenza ukukhishwa kokuhlola, njengoba ifomethi yesitoreji ingakaqiniswa futhi iyashintsha isuka enguqulweni iye kwelinye.

I-DuckDB ihlinzeka ngolimi lwesigodi lwe-SQL oluthuthukisiwe oluhlanganisa amakhono engeziwe okusingatha imibuzo eyinkimbinkimbi kakhulu futhi edla isikhathi. Ukusetshenziswa kwezinhlobo eziyinkimbinkimbi (amalungu afanayo, izinhlaka, izinyunyana) kanye nekhono lokwenza imibuzo engezansi ehlobanisa ngokunganaki futhi efakwe esidlekeni kuyasekelwa. Isekela ukusebenzisa imibuzo eminingi ngesikhathi esisodwa, isebenzisa imibuzo ngokuqondile kusuka kumafayela e-CSV kanye ne-Parquet. Kungenzeka ukungenisa kusuka ku-PostgreSQL DBMS.

Ngaphezu kwekhodi yegobolondo evela ku-SQLite, iphrojekthi isebenzisa i-parser evela ku-PostgreSQL kumtapo wolwazi ohlukile, ingxenye ye-Date Math evela ku-MonetDB, ukuqaliswa kwayo kwemisebenzi yewindi (ngokusekelwe ku-algorithm ye-Segment Tree Aggregation), iphrosesa evamile yokubonisa esekelwe umtapo wezincwadi we-RE2, i-query optimizer yawo, kanye nendlela yokulawula ye-MVCC yokwenza imisebenzi ngasikhathi sinye (Multi-Version Concurrency Control), kanye nenjini yokukhipha imibuzo eyenziwe nge-vectorized esekelwe ku-algorithm ye-Hyper-Pipelining Query Execution, evumela amasethi amakhulu amanani. izocutshungulwa ngesikhathi esisodwa ekusebenzeni okukodwa.

Phakathi kwezinguquko ekukhishweni okusha:

  • Umsebenzi uqhubekile wokuthuthukisa ifomethi yesitoreji. Kusetshenziswe imodi yokubhala ye-disk enethemba, lapho lapho kulayishwa isethi enkulu yedatha ekwenziweni okukodwa, idatha icindezelwa futhi ibhalwe efayeleni elisuka ku-database kwimodi yokusakaza, ngaphandle kokulinda ukuthi ukuthengiselana kuqinisekiswe ngomyalo we-COMMIT. . Uma umyalo we-COMMIT wamukelwe, idatha isivele ibhalelwe kudiski, futhi lapho i-ROLLBACK yenziwa, iyalahlwa. Ngaphambilini, idatha ekuqaleni yayigcinwe ngokuphelele enkumbulweni, futhi lapho izinikele, yayigcinwa kudiski.
  • Ukwesekwa okwengeziwe kokulayisha okuhambisanayo kwedatha kumathebula ahlukene, okukuvumela ukuthi ukhuphule kakhulu isivinini sokulayisha kumasistimu anezingqikithi eziningi. Isibonelo, ekukhishweni kwangaphambilini, ukulayisha i-database enemigqa eyizigidi ezingu-150 ku-CPU engu-10-core kuthathe imizuzwana engu-91, kodwa enguqulweni entsha lo msebenzi uqedwa ngemizuzwana engu-17. Kunezindlela ezimbili ezihambisanayo zokulayisha - ngokulondolozwa kokuhleleka kwamarekhodi nangaphandle kokugcinwa kokuhleleka.
  • Ngokucindezelwa kwedatha, kusetshenziswa i-algorithm ye-FSST (Fast Static Symbol Table), ekuvumela ukuthi upakishe idatha ngaphakathi kweyunithi yezinhlamvu usebenzisa isichazamazwi esivamile sokufanayo okuvamile. Ukusetshenziswa kwe-algorithm entsha kwenze kwaba nokwenzeka ukunciphisa usayizi wesizindalwazi sokuhlola ukusuka ku-761MB kuya ku-251MB.
  • Ama-algorithms e-Chimp kanye ne-Patas ahlongozwa ukuze acindezele izinombolo zamaphuzu antantayo (DOUBLE kanye ne-FLOAT). Uma kuqhathaniswa ne-algorithm yangaphambilini yama-Gorilla, iChimp inikeza amazinga aphezulu okucindezela nokuwohloka okusheshayo. I-algorithm ye-Patas isala ngemuva kweChimp ku-compression ratio, kodwa ishesha kakhulu ngesivinini sokuwohloka, cishe esingehlukile nokufunda idatha engacindezelwanga.
  • Kwengezwe ikhono lokuhlola lokulayisha idatha esuka kumafayela e-CSV iye ekusakazeni okuningi okufanayo (SET experimental_parallel_csv=true), okunciphisa kakhulu isikhathi esisithathayo ukulayisha amafayela amakhulu e-CSV. Isibonelo, uma le nketho inikwe amandla, isikhathi sokulanda sefayela le-CSV elingu-720 MB sehlisiwe ukusuka kumasekhondi angu-3.5 ukuya kwangu-0.6.
  • Amathuba okwenziwa ngokuhambisana kokwakhiwa kwenkomba nemisebenzi yokuphatha seyenziwe. Isibonelo, ukusebenza kwe-CREATE INDEX kukholamu enamarekhodi ayizigidi ezingu-16 kwehlisiwe kusukela ku-5.92 kuya kumasekhondi angu-1.38.
  • Kunikwe amandla ukufana kwemisebenzi yokuhlanganisa emibuzweni equkethe isisho esithi “COUNT(DISTINCT col)”.
  • I-SQL yengeze ukusekelwa kohlobo lwe-UNION, oluvumela izinhlobo eziningi ukuthi ziboshwe engxenyeni eyodwa (isibonelo, “UNION(num INT, iphutha VARCHAR)”).
  • I-SQL inikeza ikhono lokwenza imibuzo eqala ngegama elithi “FROM” esikhundleni sokuthi “KHETHA”. Kulokhu, kucatshangwa ukuthi umbuzo uqala ngokuthi "KHETHA *".
  • I-SQL yengeze usekelo lwesisho esithi COLUMNS, esikuvumela ukuthi wenze umsebenzi kumakholomu amaningi ngaphandle kokuphinda isisho. Isibonelo, “KHETHA MINININGWANE(AMAKHOLUM(*)) kusuka ku-obs;” kuzobangela ukuthi umsebenzi MIN usetshenziswe kukholomu ngayinye kuthebula le-obs, kanye "KHETHA AMAKHOLOMU('ival[0-9]+') kokuthi obs;" kumakholomu anegama elihlanganisa "val" nezinombolo.
  • Ukwesekwa okwengeziwe kokusebenza ezinhlwini, isibonelo, “KHETHA [x + 1 for x in [1, 2, 3]] AS l;”.
  • Ukusetshenziswa kwememori kuthuthukisiwe. Ngokuzenzakalelayo, iplathifomu yeLinux isebenzisa umtapo wezincwadi we-jemalloc wokuphatha inkumbulo. Ukusebenza okuthuthuke kakhulu kokuhlanganisa i-hashi uma inkumbulo inomkhawulo.
  • Kwengezwe imodi yokukhiphayo ethi “.mode duckbox” kusixhumi esibonakalayo somugqa womyalo, elahla amakholomu aphakathi nendawo kucatshangelwa ububanzi bewindi letheminali (ifanele ukuhlola ngokushesha imiphumela yemibuzo ngenani elikhulu lamakholomu, njengokuthi “KHETHA * FROM tbl”, okuthi ngemodi evamile isatshalaliswe emigqeni eminingana). Usebenzisa ipharamitha ye-“.maxrows X”, ungakwazi futhi ukukhawulela inani lemigqa ebonisiwe.
  • I-CLI ihlinzeka ngokuqedela ngokuzenzakalela okokufaka kucatshangelwa umongo (okufakwayo kwamagama angukhiye, amagama ethebula, imisebenzi, amagama ekholomu namagama wamafayela kuqediwe).
  • I-CLI inenkomba yokuqhubeka kombuzo enikwe amandla ngokuzenzakalela.

Source: opennet.ru

Engeza amazwana