Why has this new standard been so difficult to develop? Why has it taken so long in general and so little since version 5.0? Well, the improvements are more than huge, since we are currently on PCIe 4.0 and we are talking about quadruple its speed in the same x16 lane width. Logically, PCI-SIG had to implement a series of improvements to ensure data delivery, including PAM 4 and FEC, but what is the latter and how does it work on this specific bus?
Forward error correction or FEC, a technology required for PCIe 6.0
Although we have already talked about PAM 4 as they say, the FEC is not understandable without it. PAM 4 has been part of network engineers for a few years, where, in large data centers, it has been the holy grail of saving or updating infrastructure, among other technologies.
But it doesn’t stop there, since it was introduced in the PCIe bus for obvious wave modulation reasons and of course, to achieve greater bandwidth for each available Hz. Even with its advantages, it also has disadvantages that must be mitigated, such as its more fragile signal, for this reason and being the real reason for its implementation, the PCI-SIG included the so-called Direct error correction o FEC.
As the name suggests, FEC is nothing more than a means of correcting errors in sending and receiving a signal between different links or Host, where it manages to provide a constant flow of data with error correction included.
What it achieves is that it goes from a signal that can be critical in terms of data integrity to a stable, error-free signal, which ensures the proper functioning of the equipment and its components.
The problem with this technology is its high latency
But not all that glitters is gold. FEC by itself and by its nature of correcting errors found in pure CRC style is not suitable as such for a bus like PCIe and less in its version 6.0 at 128 Gb / s, not at all.
The problem with FEC is that it introduces latency on the bus, so the rate of packet delivery is reduced and can generate unwanted delay. This is why PCIe 6.0 technology uses a unique method to achieve low latency through a combination of a first bit error rate (FBER at 10-6) combined with a lightweight, low latency FEC to complete the initial patch.
But yes, FEC can correct the errors, but for that it has to know the exact location and the magnitude of the error to make the corresponding choices. Because? very simple, the goal was to pay a latency penalty close to zero (zero is impossible) and then rely on a very robust CRC for detection, combined with a fast replay at the link level to handle the errors that the FEC could not correct (This is not foolproof and therefore CRC is required).
On the other hand, if the speed drops 128 Gbit / s In PCIe 6.0, there is a possibility that the FEC can be bypassed, which will result in lower system latency.
What will happen if FEC cannot correct the errors? Well, it’s time for the CRC to come in generating a NAK, but it will trigger round trip latency to verify data up to 100ns.
It is clear that using FEC is justified, it is not perfect, but it is the best method to generate the lowest possible latency with error correction, something totally necessary for something as delicate as passing data from the CPU, memory and GPU.