The LLVM project introduced HPVM 1.0, a compiler for CPUs, GPUs, FPGAs and accelerators

The developers of the LLVM project have published the release of the HPVM 1.0 (Heterogeneous Parallel Virtual Machine) compiler, aimed at simplifying programming for heterogeneous systems and providing tools for generating code for CPUs, GPUs, FPGAs and domain-specific hardware accelerators (support for FGPA and accelerators was not included in release 1.0 ). The project code is distributed under the Apache 2.0 license.

The main idea behind HPVM is to compile with a unified view of parallel programs that can be used to execute on a variety of parallel hardware, including GPUs, vector instructions, multi-core processors, FPGAs, and various specialized accelerator chips. Unlike other systems, HPVM tried to combine three possibilities for organizing heterogeneous computing: an intermediate representation independent of the programming language and hardware, virtual instruction set architecture (ISA) and runtime scheduling.

HPVM's target-independent intermediate representation (IR) builds on the LLVM 9.0 instruction intermediate representation and extends it with a hierarchical data flow graph to capture task, data, and pipeline parallelism. The HPVM intermediate representation also includes vector instructions and shared memory. The main purpose of using an intermediate representation is efficient code generation and optimization for heterogeneous systems.

The virtual instruction set architecture (ISA) allows for portability between different types of parallel computing hardware and makes it possible not to lose performance when using different elements of heterogeneous systems. Virtual ISA can also be used to deliver generic program executable code that can run on CPUs, GPUs, FPGAs, and various accelerators.

At the current stage of development in HPVM, code generators are proposed that are capable of translating application nodes defined using virtual ISA for execution using NVIDIA GPUs (cuDNN and OpenCL), Intel AVX vector instructions, and multi-core x86 CPUs. At run time, HPVM applies flexible computing process scheduling policies, implemented both based on information about the program (graph structure) and by compiling individual program nodes for execution on any of the target computing devices available in the system.

It is noted that the use of HPVM allows to achieve a significant increase in performance. The performance of HPVM translators is comparable to hand-written OpenCL code for GPUs and vector computing devices.

Compared to the first preview release, HPVM 1.0 includes support for linear algebra tensor operations, frontends for Pytorch and Keras, approximations to convolutional operators, and an approximation tuning framework that automatically selects optimal approximations for certain tensor operations and selects a configuration that provides optimal performance.

Source: opennet.ru

Add a comment