DuckDB 0.6.0 An Buga, Zaɓin SQLite don Tambayoyin Nazari

Sakin DuckDB 0.6.0 DBMS yana samuwa, yana haɗa irin waɗannan kaddarorin na SQLite azaman ƙaranci, ikon haɗawa a cikin nau'in ɗakin karatu da aka haɗa, adana bayanan bayanai a cikin fayil ɗaya da ingantacciyar hanyar CLI, tare da kayan aiki da haɓakawa don aiwatarwa. tambayoyin nazari da ke rufe wani muhimmin sashi na bayanan da aka adana, misali wanda ke tattara dukkan abubuwan da ke cikin teburi ko haɗe manyan tebura da yawa. Ana rarraba lambar aikin a ƙarƙashin lasisin MIT. Ci gaban har yanzu yana kan matakin samar da sakewa na gwaji, tun da har yanzu tsarin ajiya bai daidaita ba kuma yana canzawa daga sigar zuwa sigar.

DuckDB yana ba da yare na SQL na ci gaba wanda ya haɗa da ƙarin damar iyawa don gudanar da tambayoyi masu rikitarwa da cin lokaci. Ana goyan bayan amfani da hadaddun nau'ikan (tsari, tsari, ƙungiyoyi) da ikon aiwatar da sabani da ƙa'idodi masu alaƙa. Yana goyan bayan gudanar da tambayoyi da yawa a lokaci guda, gudanar da tambayoyin kai tsaye daga fayilolin CSV da Parquet. Yana yiwuwa a shigo da daga PostgreSQL DBMS.

Bugu da ƙari ga lambar harsashi daga SQLite, aikin yana amfani da parser daga PostgreSQL a cikin wani ɗakin karatu daban, Kwanan Math Math daga MonetDB, aiwatar da kansa na ayyukan taga (dangane da Segment Tree Aggregation algorithm), na'ura mai sarrafawa na yau da kullum dangane da ɗakin karatu na RE2, mai inganta binciken kansa, da tsarin sarrafa MVCC na aiwatar da ayyuka lokaci guda (Multi-Version Concurrency Control), da kuma injin aiwatar da binciken vectorized bisa ga Hyper-Pipeling Query Execution algorithm, wanda ke ba da damar manyan ƙima. da za a sarrafa a lokaci daya a daya aiki.

Daga cikin canje-canje a cikin sabon sakin:

  • An ci gaba da aiki don inganta tsarin ajiya. An aiwatar da yanayin rubutun faifai mai fa'ida, wanda lokacin loda manyan bayanai a cikin ma'amala ɗaya, ana matsa bayanan kuma a rubuta su zuwa fayil daga ma'ajin bayanai a yanayin yawo, ba tare da jira a tabbatar da ciniki tare da umarnin COMMIT ba. . Lokacin da aka karɓi umarnin COMMIT, an riga an rubuta bayanan zuwa faifai, kuma idan aka aiwatar da ROLLBACK, ana jefar da su. A baya can, an fara adana bayanan gaba ɗaya zuwa ƙwaƙwalwar ajiya, kuma lokacin da aka aikata, an adana su zuwa diski.
  • Ƙara goyon baya don haɗa bayanai a layi daya zuwa cikin tebur daban, wanda ke ba ku damar haɓaka saurin saukewa akan tsarin multi-core. Misali, a cikin sakin da ya gabata, loda rumbun adana bayanai mai jeri miliyan 150 a kan CPU mai nauyin 10-core ya dauki dakika 91, amma a cikin sabon sigar wannan aiki ya cika cikin dakika 17. Akwai nau'ikan loda guda biyu masu kama da juna - tare da adana tsarin rikodin kuma ba tare da kiyaye tsari ba.
  • Don matsawa bayanai, ana amfani da FSST (Taɓallin Alamar Tsayi Mai Sauƙi) algorithm, wanda ke ba ku damar tattara bayanai cikin kirtani ta amfani da ƙamus na gama-gari na matches. Yin amfani da sabon algorithm ya ba da damar rage girman bayanan gwajin daga 761MB zuwa 251MB.
  • An gabatar da algorithms na Chimp da Patas don damfara lambobi masu iyo (DOUBLE da FLOAT). Idan aka kwatanta da algorithm na Gorillas na baya, Chimp yana samar da matakan matsawa da sauri. Algorithm na Patas yana bayan Chimp a cikin rabon matsawa, amma yana da sauri cikin saurin raguwa, wanda kusan bai bambanta da karanta bayanan da ba a matsawa ba.
  • An ƙara ikon gwaji don loda bayanai daga fayilolin CSV zuwa rafukan layi ɗaya (SET experimental_parallel_csv=gaskiya), wanda ke rage lokacin da ake ɗaukan manyan fayilolin CSV. Misali, lokacin da aka kunna wannan zaɓi, an rage lokacin zazzagewar fayil ɗin CSV 720 MB daga 3.5 zuwa 0.6 seconds.
  • An aiwatar da yuwuwar aiwatar da layi ɗaya na ƙirƙirar ƙididdiga da ayyukan gudanarwa. Misali, aikin CREATE INDEX akan ginshiƙi mai rikodin miliyan 16 an rage shi daga 5.92 zuwa 1.38 seconds.
  • An kunna daidaita ayyukan tara a cikin tambayoyin da ke ɗauke da kalmar "COUNT(DISTINCT col)".
  • SQL ya ƙara goyon baya ga nau'in UNION, wanda ke ba da damar nau'ikan nau'ikan da yawa don ɗaure su zuwa kashi ɗaya (misali, "UNION(num INT, kuskure VARCHAR))")).
  • SQL yana ba da ikon samar da tambayoyin da suka fara da kalmar "DAGA" maimakon "Zabi". A wannan yanayin, ana ɗauka cewa tambayar ta fara da "Zabi *".
  • SQL ya ƙara goyon baya ga maganganun COLUMNS, wanda ke ba ku damar yin aiki akan ginshiƙai da yawa ba tare da kwafin magana ba. Misali, "Zabi MIN (COLUMNS(*)) daga obs;" zai sa a aiwatar da aikin MIN ga kowane shafi a cikin tebur na obs, da "Zabi COLUMNS('val[0-9]+') daga obs;" don ginshiƙai tare da suna wanda ya ƙunshi "val" da lambobi.
  • Ƙara goyon baya don ayyuka akan lissafin, misali, "Zabi [x + 1 don x a cikin [1, 2, 3]] AS l;".
  • An inganta amfani da ƙwaƙwalwar ajiya. Ta hanyar tsoho, dandalin Linux yana amfani da ɗakin karatu na jemalloc don sarrafa ƙwaƙwalwar ajiya. Ingantacciyar ingantattun ayyuka na haɗaɗɗun hash lokacin da ƙwaƙwalwar ajiya ta iyakance.
  • An ƙara yanayin fitarwa na ".mode duckbox" zuwa ƙirar layin umarni, wanda ke watsar da ginshiƙan tsakiyar la'akari da faɗin taga mai tashar (wanda ya dace da saurin tantance sakamakon tambayoyin tare da adadi mai yawa na ginshiƙai, kamar "SELECT * DAGA tbl", wanda a cikin yanayin al'ada ana yada shi akan layi da yawa). Yin amfani da ma'aunin ".maxrows X", za ka iya kuma iyakance adadin layuka da aka nuna.
  • CLI yana ba da ƙaddamarwa ta atomatik ta la'akari da mahallin (shigar da kalmomin shiga, sunayen tebur, ayyuka, sunayen shafi da sunayen fayil an kammala).
  • CLI tana da alamar ci gaban tambaya da aka kunna ta tsohuwa.

source: budenet.ru

Add a comment