At the moment, if we have a PC, the only way for us to have a specialized unit for AI is to buy separate hardware, either by purchasing an NVIDIA RTX family GPU or by purchasing an FPGA mounted on it. a PCI Express port.
El Intel GNA, a precedent
Intel currently has a built-in drive called GNA that can perform some AI-based algorithms, but not in the same way as a systolic array since GNA is a coprocessor with a SIMD configuration. On the other hand, Intel also sells solutions based on FPGAs and with its Intel Xe GPUs, HP promises to integrate units in the Tensor Core style.
But it is precisely a matter of integrating this type of unit into a CPU, so that a greater number of applications can benefit from this type of unit.
An answer to Apple’s M1
One of the advantages of Apple’s M1 is not that the ARM register and instruction set is more energy efficient, but that for some applications and functions its Neural Engine is extremely efficient.
These types of units have become a staple in the smartphone and tablet market because they allow very complex tasks to be performed in a short time and with very few resources, which has made PC processors lag behind in this regard. .
As when SIMD units brought with them the implementation of new x86 instructions, the implementation of matrix or tensor units brings with them a new type of instruction, called AMX or Advanced Matrix Extensions, which will be implemented for the first time. with Intel Xeon architecture. Sapphire Rapids.
The extension adds two additional elements, on the one hand, a two-dimensional recordset made up of records called “tiles” and a series of accelerators capable of operating on these tiles. These accelerators share memory access consistently with the rest of the processor elements and can work in stripe and parallel with other x86 threads.
The accelerator is called Tile Matrix Multiply or TMUL, it is a systolic array in the form of a mesh of ALUs capable of executing the FMA (Addition and Multiplication) instruction in a single cycle, which uses as records the tiles from which you discussed it in the previous paragraph.
In the AMD patents, the TMUL unit is called Data Parallel Cluster and it is a unit that sits in each of the processor cores, although Intel will be implementing it for the first time in Sapphire Rapids, there is no no doubt we will see it implemented in the rest of Intel processors in the future.