DuckDB 0.6.0 Yosindikizidwa, Njira ya SQLite ya Mafunso Ofufuza

Kutulutsidwa kwa DuckDB 0.6.0 DBMS kulipo, kuphatikiza katundu wotere wa SQLite monga compactness, kuthekera kolumikizana mu mawonekedwe a laibulale yophatikizidwa, kusunga nkhokwe mu fayilo imodzi ndi mawonekedwe osavuta a CLI, ndi zida ndi kukhathamiritsa kwakuchita. mafunso ounika omwe ali ndi gawo lalikulu lazosungidwa, mwachitsanzo zomwe zimaphatikiza zonse zomwe zili m'matebulo kapena kuphatikiza matebulo angapo akulu. Khodi ya polojekitiyi imagawidwa pansi pa layisensi ya MIT. Chitukukochi chikadali pa siteji yopanga zoyeserera, popeza mawonekedwe osungira sanakhazikitsidwebe ndikusintha kuchokera ku mtundu kupita ku mtundu.

DuckDB imapereka chiyankhulo chapamwamba cha SQL chomwe chimaphatikizapo zina zowonjezera pakuyankha mafunso ovuta komanso owononga nthawi. Kugwiritsiridwa ntchito kwa mitundu yovuta (zosanjikiza, zomanga, migwirizano) ndi kuthekera kochita zinthu mosagwirizana ndi zomwe zili mu zisa zimathandizidwa. Imathandizira kuyankha mafunso angapo nthawi imodzi, kufunsa mafunso molunjika kuchokera ku mafayilo a CSV ndi Parquet. Ndizotheka kuitanitsa kuchokera ku PostgreSQL DBMS.

Kuphatikiza pa code ya chipolopolo kuchokera ku SQLite, polojekitiyi imagwiritsa ntchito parser kuchokera ku PostgreSQL mulaibulale yosiyana, gawo la Date Math kuchokera ku MonetDB, kukhazikitsa kwake kwawindo lazenera (kutengera algorithm ya Segment Tree Aggregation), purosesa yokhazikika yokhazikika. laibulale ya RE2, query optimizer yake, ndi njira yowongolera ya MVCC yochitira ntchito nthawi imodzi (Multi-Version Concurrency Control), komanso injini yowunikira mafunso yotengera Hyper-Pipelining Query Execution algorithm, yomwe imalola magulu akulu azikhalidwe. kukonzedwa nthawi imodzi mu opareshoni imodzi.

Zosintha pakutulutsa kwatsopano zikuphatikiza:

  • Ntchito idapitilira kukonza mawonekedwe osungira. Njira yabwino yolembera disk yakhazikitsidwa, pomwe pakukweza deta yayikulu pakugulitsa kumodzi, detayo imapanikizidwa ndikulembedwa ku fayilo kuchokera ku database mumayendedwe akutsatsira, osadikirira kuti kutsimikizika kutsimikizidwe ndi lamulo la COMMIT. . Pamene lamulo la COMMIT lalandiridwa, deta yalembedwa kale ku disk, ndipo pamene ROLLBACK ikuchitidwa, imatayidwa. M'mbuyomu, deta idasungidwa pamtima, ndipo ikaperekedwa, idasungidwa ku diski.
  • Thandizo lowonjezera pakukweza deta m'matebulo osiyanasiyana, zomwe zimakupatsani mwayi wowonjezera kwambiri kuthamanga pamakina amitundu yambiri. Mwachitsanzo, pakutulutsidwa koyambirira, kutsitsa database yokhala ndi mizere 150 miliyoni pa 10-core CPU kudatenga masekondi 91, koma mu mtundu watsopano ntchitoyi imatsirizika mumasekondi 17. Pali njira ziwiri zotsatsira zofananira - ndi kusunga dongosolo la zolemba komanso popanda kusunga dongosolo.
  • Pakupondereza kwa data, algorithm ya FSST (Fast Static Symbol Table) imagwiritsidwa ntchito, yomwe imakulolani kulongedza deta mkati mwa zingwe pogwiritsa ntchito mtanthauzira wamba wa machesi wamba. Kugwiritsiridwa ntchito kwa algorithm yatsopano kunapangitsa kuti zitheke kuchepetsa kukula kwa database yoyesera kuchokera ku 761MB kufika ku 251MB.
  • Ma algorithms a Chimp ndi Patas aperekedwa kuti apanikizike manambala oyandama (DOUBLE ndi FLOAT). Poyerekeza ndi ma algorithm am'mbuyomu a Gorilla, Chimp imapereka milingo yayikulu yoponderezedwa komanso kutsika mwachangu. Algorithm ya Patas imatsalira kumbuyo kwa Chimp mu compression ratio, koma imathamanga kwambiri pa liwiro la decompression, lomwe silili losiyana kwambiri ndi kuwerenga zomwe sizimalumikizidwa.
  • Anawonjezera luso loyesera kuti mutsegule deta kuchokera ku mafayilo a CSV mumitsinje yambiri yofanana (SET experimental_parallel_csv=true), zomwe zimachepetsa kwambiri nthawi yomwe imafunika kuti mutsegule mafayilo akuluakulu a CSV. Mwachitsanzo, njira iyi itayatsidwa, nthawi yotsitsa ya fayilo ya 720 MB CSV idachepetsedwa kuchoka pa masekondi 3.5 mpaka 0.6.
  • Kuthekera kwa kuphatikizika kofanana kwa kupanga index ndi ntchito zowongolera zakhazikitsidwa. Mwachitsanzo, ntchito ya CREATE INDEX pagawo yokhala ndi ma rekodi 16 miliyoni idachepetsedwa kuchoka pa masekondi 5.92 mpaka 1.38.
  • Yathandizira kufananiza kwa ma aggregation mu mafunso omwe ali ndi mawu oti "COUNT(DISTINCT col)".
  • SQL yawonjezera chithandizo cha mtundu wa UNION, womwe umalola mitundu ingapo kuti igwirizane ndi chinthu chimodzi (mwachitsanzo, "UNION(num INT, error VARCHAR))").
  • SQL imapereka kuthekera kopanga mafunso omwe amayamba ndi mawu oti "KUCHOKERA" m'malo mwa "SAKHANI". Pankhaniyi, akuganiza kuti funso limayamba ndi "Sankhani *".
  • SQL yawonjezera chithandizo cha mawu a COLUMNS, omwe amakulolani kuti mugwire ntchito pazigawo zingapo popanda kubwereza mawuwo. Mwachitsanzo, "SANKHANI MIN(COLUMNS(*)) kuchokera ku obs;" zipangitsa kuti MIN agwire ntchito pagawo lililonse patebulo la obs, ndi "SAKHANI COLUMNS('val[0-9]+') kuchokera ku obs;" kwa mizati yokhala ndi dzina lopangidwa ndi "val" ndi manambala.
  • Thandizo lowonjezera la magwiridwe antchito pamndandanda, mwachitsanzo, "SAKANI [x + 1 ya x mu [1, 2, 3]] AS l;".
  • Kugwiritsa ntchito kukumbukira kwakonzedwa bwino. Mwachikhazikitso, nsanja ya Linux imagwiritsa ntchito laibulale ya jemalloc pakuwongolera kukumbukira. Kuchita bwino kwambiri kwa ntchito zophatikiza ma hashi pomwe kukumbukira kuli kochepa.
  • Kuwonjezedwa kwa ".mode duckbox" kumawonekedwe a mzere wamalamulo, omwe amataya mizati yapakati poganizira kukula kwa zenera la terminal (loyenera kuwunika mwachangu zotsatira zamafunso okhala ndi magawo ambiri, monga "Sankhani * KUCHOKERA tbl”, yomwe mwachizolowezi imafalikira pamizere ingapo). Pogwiritsa ntchito ".maxrows X" parameter, mukhoza kuwonjezera chiwerengero cha mizere yowonetsedwa.
  • CLI imapereka kumalizidwa kokwanira potengera zomwe zikuchitika (kuyika kwa mawu osakira, mayina atebulo, ntchito, mayina amigawo ndi mayina a mafayilo amalizidwa).
  • CLI ili ndi chizindikiro chakupita patsogolo kwa mafunso chomwe chimathandizidwa ndi kusakhazikika.

Source: opennet.ru

Kuwonjezera ndemanga