BlazingSQL SQL engine code released that uses GPU for acceleration

Announced about open source SQL engine BlazingSQL, which uses the GPU to speed up data processing. BlazingSQL is not a full-fledged DBMS, but is positioned as an engine for analyzing and processing large data sets, comparable in its tasks to Apache Spark. The code is written in Python and open licensed under Apache 2.0.

BlazingSQL is suitable for performing single analytical queries on large data sets (tens of gigabytes) stored in tabular formats (for example, logs, NetFlow statistics, etc.). BlazingSQL can execute queries from raw CSV and Apache Parquet files hosted in network and cloud file systems like HDSF and AWS S3, directly transferring the result to GPU memory. By parallelizing operations on the GPU and using faster video memory, BlazingSQL queries run up to 20 times faster than Apache Spark.

BlazingSQL SQL engine code released that uses GPU for acceleration

To work with the GPU, a set developed with the participation of NVIDIA is used. open libraries FAST, which allows you to create applications for data processing and analytics that run entirely on the GPU side (provided Python interface to use low-level CUDA primitives and parallelize computations).

BlazingSQL provides the ability to use SQL instead of a data processing API cuUDF (on the base Apache Arrow) used in RAPIDS. BlazingSQL is an additional layer that runs on top of cuDF and uses the cuIO library to read data from disk. SQL queries are translated into cuUDF function calls that allow you to load data into the GPU and perform merging, aggregation, and filtering operations on them. The creation of distributed configurations covering thousands of GPUs is supported.

BlazingSQL greatly simplifies working with data - instead of hundreds of cuDF function calls, you can get by with one SQL query. The use of SQL makes it possible to integrate RAPIDS with existing analytics systems without writing specific handlers and without resorting to intermediate data loading into an additional DBMS, but
while maintaining full compatibility with all parts of RAPIDS, translating existing functionality into SQL and providing performance at the cuDF level. Including support for integration with libraries XGBoost ΠΈ cuML for solving problems of analytics and machine learning.

Source: opennet.ru

Add a comment