Along with the new processor core
ARM declares a significant increase in graphics performance Mali-G77 - 40% compared to the current generation of Mali-G76. This has been achieved through both process technology and architectural improvements. Mali-G77 can have from 7 to 16 cores (scaling from 1 to 32 is possible in the future), with each of them almost the same size as the G76. Therefore, high-end smartphones are likely to be equipped with the same number of GPU cores.
In games, you can expect performance improvements of between 20% and 40%, depending on the type of graphics workload. Judging by the results of the popular Manhattan GFXBench test, the significant superiority of the new GPU over the current generation will force rival Qualcomm to attend to a significant improvement in Adreno graphics performance.
On its own, the new Mali-G77 architecture provides an average of 30 percent improvement in power efficiency or performance, according to ARM. The second generation of ARM Valhall's scalar architecture allows the GPU to execute 16 instructions per cycle in parallel per CU, compared to eight in Bifrost (Mali-G76). Other innovations include fully hardware-driven dynamic instruction scheduling and a completely new instruction set while maintaining backwards compatibility with Bifrost. Also added support for the ARM AFBC1.3 compression format and other innovations (FP16 render targets, layered rendering and vertex shader outputs).
The Bifrost CU contained 3 instruction execution engines, each of which included an instruction cache, a register, and a Warp control unit. Spread across these three engines allowed 24 FMA instructions to be executed with 32-bit floating point precision (FP32). In Valhall, each CU has only one command execution engine, divided between two compute modules capable of processing 16 Warp instructions per clock, that is, a total throughput of 32 FMA FP32 instructions per CU is provided. Thanks to these architectural changes, the Mali-G77 can perform up to a third more mathematical calculations in parallel calculations compared to the Mali-G76.
In addition, each of these CUs contains two new math function blocks. The new conversion module (CVT) handles basic integer, boolean, branch, and conversion instructions. The Special Function Block (SFU) speeds up integer multiplication, division, square root, logarithms, and other complex integer functions.
There are several settings in the standard FMA block that support 16 FP32 instructions per cycle, 32 - FP16 or 64 - INT8 Dot Product. These optimizations can provide performance improvements of up to 60% in machine learning applications.
Another key change in the Mali-G77 is a doubling of the performance of the texture engine, which now processes 4 bilinear texels per clock compared to the previous two, 2 trilinear texels per clock, enabling faster FP16 and FP32 filtering.
ARM has made a number of other changes as well, resulting in the Mali-G77 and Valhall promising significant performance improvements for gaming and machine learning workloads. It is important to note that power consumption and chip area are kept at Bifrost levels, which promises mobile devices with higher peak performance without increasing requirements for power consumption, heat dissipation and size.
Source: 3dnews.ru