Release of DBMS SQLite 3.42 and DuckDB 0.8.0. CG/SQL with implementation of stored procedures for SQLite

The release of SQLite 3.42, a lightweight DBMS designed as a plug-in library, has been published. The SQLite code is distributed in the public domain, i.e. can be used without restrictions and free of charge for any purpose. Financial support for SQLite developers is provided by a specially created consortium, which includes companies such as Adobe, Oracle, Mozilla, Bentley and Bloomberg.

Major changes:

  • For FTS5 full-text indexes, the secure-delete command is implemented, which completely cleans up all residual data after deletion.
  • Functions for processing data in JSON format now support JSON5 extensions.
  • Improved query planner. Count-of-view optimization is enabled by default. Eliminated handling of unused columns in subqueries. Improved push-down optimization implementation.
  • Added "--unsafe-testing" option to the CLI to allow some dangerous commands such as ".testctrl" to be used in testing. In safe mode ("--safe"), the ".log on" and ".log off" commands are allowed. Added support for the "β€”" separator, for arguments after which option processing is disabled. Added ":inf" and ":nan" parameters associated with infinity and NaN values.
  • The names of application-defined SQL functions are allowed to intersect with the CROSS, FULL, INNER, LEFT, NATURAL, OUTER, and RIGHT keywords.
  • Expanded PRAGMA integrity_check capabilities. Added check for specifying the NaN value in columns with the NOT NULL condition. Improved information content of error messages.
  • The session addition allows interception of changes from tables without ROWIDs.
  • The "subsec" modifier has been added to functions for working with time and dates to use fractions of seconds.
  • Reduced default recursion depth for arrays and JSON objects from 2000 to 1000.

In addition, the DuckDB 0.8.0 DBMS release has been formed, which develops the SQLite variant, enhanced with features and optimizations for performing analytical queries covering a significant part of the stored data, for example, aggregating the entire contents of tables or merging several large tables. An extended dialect of the SQL language is provided, including additional capabilities for handling very complex and long-running queries, as well as support for complex types (arrays, structures, unions), simultaneous execution of multiple queries, and execution of queries directly from files in CSV, JSON and Parquet format. It is possible to import from PostgreSQL DBMS.

In addition to the shell code from SQLite, the project uses a parser from PostgreSQL, moved to a separate library, the Date Math component from MonetDB, its own implementation of window functions (based on the Segment Tree Aggregation algorithm), a regular expression processor based on the RE2 library, its own query optimizer, MVCC control mechanism simultaneous execution of tasks (Multi-Version Concurrency Control), as well as a vectorized query execution engine based on the Hyper-Pipelining Query Execution algorithm, which allows processing large sets of values ​​​​at once in one operation. The project code is distributed under the MIT license.

In the new version of DuckDB:

  • Added new expressions "PIVOT" and "UNPIVOT" to transform rows into columns and vice versa.
  • Improved parallelization when importing and exporting data. By default, reading from files in CSV format into multiple streams and multi-threaded writing are implemented when using Parquet, CSV and JSON formats.
  • Added "**" operator to recursively iterate over directories when defining file paths (eg FROM "'data/glob/crawl/stackoverflow/**/*.csv';").
  • Added support for merge operations (JOIN) for data in the form of a time series (slices of parameter values ​​at specified time intervals), in which the criterion for merging records is not an exact, but an approximate match of values ​​in a field with time (for example, you can combine records, time which differ by no more than 1 minute).
  • The deferred loading of database metadata has been implemented, which made it possible to speed up the launch of the DBMS tenfold.
  • Added support for connecting custom functions in Python.
  • Added support for the ADBC ​​(Arrow Database Connectivity) API for data transfer using Apache Arrow.
  • Implemented binding for integration with Swift code.
  • Changed the behavior of the division ("/") operator, which now performs floating point calculations by default instead of integer operations. A new operator "//" has been proposed for integer division. The old behavior can be returned by setting "SET integer_division=true;".
  • The method of accounting for null records when sorting has been changed from "NULLS FIRST" to "NULLS LAST", i.e. NULL values ​​will now be displayed at the end instead of the beginning of the list. The old behavior can be restored with the setting "SET default_null_order='nulls_first';".

Additionally, the CG/SQL project developed by Facebook, which provides a code generator for using stored procedures with SQLite, can be noted. CG/SQL allows you to write stored procedures in a special dialect of T-SQL (Transact-SQL), which allows you to call the functions of the standard C library and access data in SQLite. The generated stored procedures are compiled into C code that uses the SQLite C API to perform specified actions and process complex queries. Compiled stored procedures can be linked to C, Java, and Objective-C programs. The project code is written in C and distributed under the MIT license.

Source: opennet.ru

Add a comment