When you buy the latest graphics card model the first thing you see is they tell you about huge amounts of cores or processors, but what if we tell you the nomenclature is wrong?
The trap that manufacturers make is to call simple ALUs or threads as cores, for example NVIDIA calls their ALUs CUDA cores which run in 32-bit floating point, but if we are strict we don’t can no longer call them cores or processors. They do not meet the basic requirements to be viewed in this way.
So what is a core or processor in a GPU?
A core or processor is an integrated circuit or part of it which together can execute a complete cycle of instructions, it is the capture of instructions from a memory, the decoding and the execution of this one.
An ALU is just an execution unit, so it needs a control unit to be a complete kernel. And what do we consider to be a complete core? Well, what NVIDIA calls SM, Intel calls Sub-Slice and AMD Compute Unit.
The reason is that it is in these units that the whole cycle of instructions takes place, and not in the ALUs or execution units, which are only in charge of a part of the cycle of instructions.
GPUs do not “run” programs
Keep in mind that GPUs do not run programs as we know them, this is a sequence of instructions. The exception is shader programs that run on what are actually GPU cores.
Shader programs manipulate datasets or graphics primitives in different stages. But at the level of hardware functionalities, these are presented in the form of kernels.
Kernels, not to be confused with operating systems, are standalone data + instruction sets, also known as execution threads in the context of a GPU.
How is a GPU core different from a CPU core?
The main difference is that Processors are primarily designed for instructional parallelism, while lGPUs specialize in parallelism at the thread level.
Instruction-level parallelism is intended to reduce the instruction time of a program by executing multiple instructions in the program simultaneously. Kernels based on thread-level parallelism take multiple programs at the same time and run them in parallel,
Contemporary processors combine ILP and TLP in their architectures while GPUs remain purely TLP without any type of ILP in order to simplify the control unit and to be able to place as many cores as possible.
Running on a GPU or running on a CPU
Most of the time when a thread reaches the main GPU ALUs, it directly contains the instruction and the data, but there are times when the data needs to be fetched in the caches and in memory, to avoid delays in running it the GPU kernel scheduler, what it does is what is called a Round-Robin and passes this thread to run afterwards.
In a CPU this cannot be done, the reason is that the threads are very complex sets of instructions and with a high dependency on each other, while in a GPU there is no problem, because execution threads are extremely small because they stand alone in the kernels, often of a single instruction duration.
In reality, what GPUs do is put together a set of cores in what is called a wave, assigning each wave to an ALU in the GPU, these are cascaded and in order. Each kernel has a thread or core limit, which will keep it busy for a while until it needs a new list, thus preventing the large number of cores constantly making memory requests.