One of the fundamental pieces of every GPU is the Texture Unit, which is responsible for placing an image in the triangle of each scene, a process we call texturing, but also what we call texture interpolation.
How does texture interpolation work?
Texture interpolation is a function that extends to texture mapping, it consists of creating color gradients from one pixel to another in order to eliminate the effect of rasterization in the scene. The simplest being the filter or bilinear interpolation, which consists in taking the 4 pixels closest to each pixel to perform the interpolation calculation.
We use today much more complex interpolation systems than the bilinear filter, which use a greater number of pixels per sample. As is the case with the anisotropic filter at different levels which can use 8, 16 and even 32 samples per pixel in order to achieve better image quality.
However, despite the existence of more complex texture interpolation algorithms, the vast majority of graphics hardware is designed to use the bilinear filter, which is the cheapest and easiest to implement of all at the level. material and makes it possible to obtain results that could today be considered free of charge in terms of computational cost.
Texture interpolation and data cache
When the texture unit runs the polygon tutoring process, it looks for the color value stored in the in-memory texture, applies it to the corresponding pixel, performs the corresponding calculations with the specified shader (s), and the end result is sends to the ROPS to be written to the frame buffer.
In the middle of this process is the interpolation or filtering of textures, but because it is a repetitive and recursive task. This is done by what is called the texture unit, which is in charge of making the various filters. To do this, it requires a high bandwidth which increases with the number of samples required per pixel, at least four times higher for each texel, which is necessary to achieve the bilinear filter.
Texture units in today’s GPUs are grouped four by four in each shader unit or compute unit, which means 16 32-bit accesses are needed per clock cycle and per shader unit. This is why the data cache of the same unit and its bandwidth are used to perform texture filtering.
At the same time, if it is necessary to use a higher precision texture filter since each texture unit only takes 4 samples per clock cycle, then it is necessary to use a larger number of cycles clock to perform much more complex texture interpolation algorithms, thereby reducing the rate of texturing.
The cost of calculating texture interpolation
Today shaders are able to perform an enormous amount of computations per clock cycle, it is not in vain that they are able to perform multiple TFLOPS, FLOPS being floating point operations and the T corresponding to the prefix Tera, which refers to 10 ^ 12 operations, the computing power of GPUs has therefore increased enormously.
Today, texture interpolation could be done without problems using a shader program inside the GPU, which in theory would avoid the inclusion of texture units altogether. Why is it not done? Well, since to compensate for the loss of said texture units, one would have to increase the power of the computational units responsible for running the shaders in a way that would cost much more. That is, we would need to place more transistors than we would have saved, so the texture units continue within the GPU after more than twenty years from the first 3D cards and they will not go away.
It must be taken into account that the texture interpolation is done in a way for all the texels that are processed in the scene, which is a huge amount and especially if we are talking about high resolutions such as 1440P or 4K. So removing them from the hardware is not cost effective and any graphics hardware without them is going to have major performance issues if it does, not to mention that all games and applications already take their existence for granted.
Current texture unit limitations
Once we have explained its usefulness, we need to take into account the limitations regarding the unity of textures for interpolation. The usual texture format is RGBA8888 where each of the components has 8-bit precision and therefore has 256 values per color component.
This makes it much easier to implement texture interpolation at the hardware level. Because although each texture unit takes the 32 bits of each pixel internally in the data cache, each of the four components is processed separately, rather than together.
Problems with this implementation? When the texture unit interpolates each of the 4 components, it does so using only 256 values, which, while making it easier to implement texture interpolation at the hardware level, reduces the accuracy of the data. obtainable, for what we don’t. lead to the ideal result for the interpolation, but an approximation of it.
This lack of precision in combination with the fact of using few samples per pixel means that several times in the textures of the games image artifacts are generated which poison the final quality of the scene. The best solution? Using much more complex interpolation methods such as bicubic interpolation, but this means bringing the hardware to much higher levels of complexity than today, as it would require four times the bandwidth with the data cache .