TileDB 2.0 storage engine release

Submitted by repository TileDB 2.0, optimized for storing multidimensional arrays and data used in scientific calculations. Various systems for processing genetic information, spatial and financial data are mentioned as areas of application for TileDB, i.e. operating systems sparse or continuously filled multidimensional arrays. TileDB offers a C++ library for transparently abstracting access to data and metadata in applications, taking care of all the low-level work for efficient storage. The project code is written in C++ and spreads under MIT license. Supports work on Linux, macOS and Windows.

Main features of TileDB:

  • Efficient methods for storing sparse arrays, the data in which is not continuous; the array is filled with fragments and most of the elements remain empty or take the same value.
  • Ability to access data in key-value format or column sets (DataFrame);

    TileDB 2.0 storage engine release

  • Supports integration with cloud storage AWS S3, Google Cloud Storage and Azure Blob Storage;
  • Support for tiled (block) arrays;
  • Ability to use different data compression and encryption algorithms;
  • Support for integrity checking using checksums;
  • Work in multi-threaded mode with parallel input/output;
  • Support for versioning stored data, including for retrieving state at a certain point in the past or atomic updates of entire large sets.
  • Ability to link metadata;
  • Support for data grouping;
  • Integration modules for use as a low-level storage engine in Spark, Dask, MariaDB, GDAL, PDAL, Rasterio, gVCF and PrestoDB;
  • Binding libraries for the C++ API for Python, R, Java and Go.

Release 2.0 is notable for its support for the β€œDataFrame” concept, which allows data to be stored in the form of columns of values ​​of arbitrary length, tied to certain attributes. The storage is also optimized for processing sparse arrays of heterogeneous sizes (cells can store data of different types and can perform merge operations on columns of different types, for example, those storing name, time and price). Added support for columns with string data. Added modules for integration with Google Cloud Storage and Azure Blob Storage. The API for the R language has been redesigned.

Source: opennet.ru

Add a comment