In the history of processor architectures, we have seen how different concepts have been implemented to increase their performance. Approaches like segmentation, superscalar processors, out of order execution, etc. All have served to have faster and faster processors with higher performance per clock cycle.
The concept of hybrid cores is a further step towards achieving higher performance, it is based on the combination in a single core of two types of cores, one optimized for complex instructions and the other for simpler instructions. , but so as to share the hardware in common and work together as if it were a single processor core.
The concept of hybrid cores to increase the CPI
In a processor, not all instructions are equally complex, some of them require a greater number of clock cycles to complete while others require very few clock cycles to complete because they are much simpler. In the design of new processors, the tendency so far has been to optimize the most complex instructions in terms of number of cycles.
But whatever type of instruction our CPU cores execute, all of them use the same components during the instruction cycle, which means that at the level of power consumption, the simplest instructions cannot. not be optimized. which would not have lower performance in a binary compatible processor but lower consumption.
The idea is reduced to the fact that a CPU has two types of execution units, some optimized to execute the most complex instructions and the other for the simplest, this makes it possible to optimize the consumption of the different instructions. .
An idea of the world of GPUs
In GPUs, we have two different types of ALUs, on the one hand we have the SIMD units, such as CUDA cores, which manufacturers usually promote to talk about the TFLOPS rate, these units are responsible for the step of execution of extremely simple instructions, but, on the other hand, we have SFUs which are ALUs with lower computation rate, as they are optimized for more complex instructions
Well, SFUs would consume a lot more power to execute a single instruction than SIMD units, hence the separation that was made years ago in NVIDIA and AMD GPUs. When the C0mpute control unit or scheduler detects an instruction that the SFUs can execute, it simply copies that instruction line and sends it directly to one of the SFUs that is free for execution.
Implementation of hybrid cores to increase the IPC
The concept in a CPU is no different, the instruction fetch phase would be almost the same in both processors, so both processors would share the program counter that points to the next instruction, that would be at the end of the recovery phase where reading the instruction register where the instruction would be sent to one type of kernel or another for execution.
This means that the two cores would in fact be like Siamese twins who share some of the hardware by sharing one of the steps in the instruction cycle, but as the instructions would be decoded and executed in the separate part of the two cores, not only the IPC increases in the sense of the number of simultaneous instructions per clock cycle, but it also prevents certain instructions from conflicting in the use of resources.
Another of the things this change does is how to handle the instructions coming to the processor, which are requests made by devices that stop code execution. You can make the kernel optimized for simple instructions handle them, without the other stopping.
Its effects on the CPU pipeline
We need to understand that nowadays all processors are segmented into multiple steps, so if we have the n instruction at a specific step then the n + 1 instruction will be in the previous step and the n- 1 in the next step.
The reciprocal of time is always frequency (1 / time = frequency), so the trick to increasing clock speed is to reduce the duration of each stage, so generally what you do is increase the amount of steps, with the aim that each new step lasts less and that the frequency or clock cycles are higher.
Obviously, subdividing a complex instruction into a larger number of instruction cycles is ideal for achieving high clock speeds. But what about the same simple ones? It’s a headache for architects to crack instructions even simpler than they already are today.
Differences between hybrid cores with big.LITTLE
In a big.LITTLE processor, the “big” cores are separated from the “LITTLE” cores in the sense that they operate in a switched manner with respect to each other, so it is the application that makes a request to the system. operation so that one group of cores or another is activated.
The operation for this type of kernel is that when they receive a specific interrupt, they end the current one and give the witness to the other type. This happens when the workload on the system is very high or certain conditions are met. In any case, it must be taken into account that in the big.LITTLE approach, each set of kernels is complete and completely independent.
In the concept of hybrid ALUs, we don’t have totally separate cores, but rather they share the capture phase as well as access to the cache hierarchy and memory. In addition, one does not turn off when the other is working precisely because they share the memory access hardware and we also cannot forget that big.LITTLE does not increase the IPC of the cores.
Why do hybrid cores increase the CPI of processors?
The reason is simple, having a larger number of threads, as well as the fact that the hardware in the decode stage is not shared is what causes there to be no not what is called a conflict, it occurs when two or more instructions are competing for a single resource, so one has to wait for the other to finish.
Why are processors not designed without this problem? The design can be designed, but the budget for transistors is limited and that is why architects cheat by putting some commonalities along the way. Most minor architecture updates are generally based on avoiding this type of conflict by adding more internal paths so that there is no conflict.
CPI as a marketing term is no longer the amount of simultaneous instructions that the core of a CPU can simultaneously resolve under the best conditions, the term is now based on taking a benchmark and finding the average of the instructions per cycle it issues. the processor. This is why it is so important to avoid conflicts between instructions and this is why hybrid kernels with decoding and execution steps separated by kernel type are ideal for increasing the IPC.
Which current processor uses hybrid cores to increase CPI?
The straightforward answer is a resounding NO, none of the processors currently on the market or coming out in the near future will use hybrid cores, but they will be more based on the big.LITTLE concept in which the cores used will be one or more. the other depending on the situation, which will happen in particular in Intel Gen 12 which will be released in a few months.
The one that we know, through clues in various patents published last year, that it will opt for a hybrid core approach is AMD, we do not know whether facing Zen 4 or Zen 5. What does not does not mean that Intel and even other processor designers like Apple do not already implement these solutions.
The cause? Raising the CPI cannot be done forever and is becoming increasingly complex to achieve, hence the need to use techniques such as hybrid cores to increase it.