Google has unveiled the open-source Coral NPU (Neural Processing Unit) platform, offering an open hardware accelerator for machine learning models and software tools for using it with standard AI engines. Coral can be used as a foundation for creating energy-efficient systems-on-a-chip (SoC) suitable for use in IoT systems, edge computing, and sensor data collection boards, as well as ultra-low-power consumer wearable devices such as headphones, augmented reality glasses, and smartwatches. The project's work is licensed under the Apache 2.0 license.
The Coral NPU is designed to run always-on AI applications on portable devices with minimal power consumption. The base Coral NPU implementation delivers 512 billion operations per second (GOPS) while consuming just a few milliwatts of power. The NPU is designed for flexible architecture modifications based on the needs of SoC manufacturers. Synaptics will be the first manufacturer to begin producing chips based on the Coral NPU, having announced the Astra SL2610 series of processors for IoT devices, which includes the Torq NPU subsystem based on the Coral NPU architecture.
Typical applications of the Coral NPU include using AI for image and audio processing, user interaction, and context awareness. For example, the devices can run large language models and applications for facial and object recognition, visual search, speech recognition, live translation, speech transcription, keyword extraction, gesture and voice control, and user activity (walking, running, sleeping) and environment (indoors, outdoors).
The NPU utilizes the 32-bit RISC-V RV32IMF_Zve32x instruction set architecture, an AXI4 bus, and a four-stage instruction processing pipeline with in-order dispatch, out-of-order completion, four-stream scalar dispatch, and two-stream vector dispatch. The processor supports SIMD operations for simultaneous processing of 128-bit vectors and is equipped with 8 KB of instruction memory and 32 KB of data memory.

The NPU consists of three processing components that work together:
- The scalar core is a lightweight, C-programmable RISC-V frontend that manages data flows to the main cores and uses a run-to-completion model to provide traditional CPU functionality and ultra-low power consumption.
- A vector SIMD coprocessor that supports vector extensions to the RISC-V instruction set (RVV v1.0) and enables simultaneous execution of multiple operations on large amounts of data.
- A matrix coprocessor that efficiently performs multiply-add (MAC) operations and is designed to accelerate basic neural network operations.

A set of AI model compilers (IREE and TFLM), a C compiler, and a simulator have been prepared for application developers. Compilation of models used in AI applications based on the TensorFlow, JAX, and PyTorch frameworks is supported. The model is compiled into a universal intermediate representation, which is then converted using LLVM into the low-level RISC-V instruction set supported by the Coral NPU.

Source: opennet.ru
