Next, we are going to explain the theory behind that graphics cards use special memories with high transfer speed, some concepts that many will already know in advance, while others will be unfamiliar as they are usually not discussed. in graphics card marketing.
Bandwidths between GPU and VRAM
The GPU uses different bandwidths to render a 3D scene, which we will list below:
- Color stamp (Bc): It is part of the so-called Backbuffer or back buffer on which the GPU draws the scene. In it, each pixel has RGBA components, if the rendering is delayed, several buffers are generated to generate the G-Buffer. In current APIs, GPUs support up to 8 such buffers at the same time.
- Depth buffer (Bz): Also known as Z buffer, is the buffer in which the position of the pixels of each object relative to the camera is stored. combine with the stencil pad. Unlike the color buffer, this is not generated during the post-texturing phase, but in the previous one, rasterization.
- Textured (Bt): GPUs use texture maps so large that they cannot fit in memory and must be imported from VRAM, this is a read-only operation. On the other hand, post-processing effects read the frame buffer as if it were textures.
This is summarized in the following diagram:
Since VRAM memory chips are A full-duplex and they transmit both read and write at the same time, the bandwidth is the same in both directions. The part of the graphics pipeline where the most processing is done is precisely during texturing, so this is one of the first explanations why GPUs require a large tread width.
As for the data used during the pre-rasterization process, the calculation of the geometry of the scene, it is small enough not to result in a huge amount of memory used and influence the type of memory used as VRAM.
The overdraw problem
The algorithm used to render a scene is rasterization, also called z-buffer algorithm or painter’s algorithm, which in its basic form has the following structure: for each primitive in the scene, for each pixel covered by the primitive, mark the pixel closest to the camera and stores it in the z-buffer.
This causes that, if several objects are in the same position of the X and Y coordinate axis relative to the camera, but in a different position relative to the Z axis, then the pixels of each of them are drawn. in the final image buffer and end up processing multiple times. This effect is called overdraw or overdrawn due to the fact that the GPU paints and repaints the pixels in the same position.
Now some of you are rightly thinking the following: if the depth buffer is generated before texturing, how come the pixels are not removed at this point? Actually, there are techniques for this, but at this point we completely ignore the color of each pixel and whether an object is semi-transparent or not, therefore GPUs cannot remove all the pixels from a scene where it is. there is only one object. transparent, since its representation would be incorrect.
Intermediate sort vs last sort
The process of checking the pixels one by one to see if they are visible or not requires additional circuitry in the GPUs and the rendering process is affected. The idea with a GPU is that of raw power without taking into account other elements, if there is any optimization to do that is left to the hardware part, that is why the verification that a pixel must go in the frame buffer or it is not done at the end of the process, which is called Last Sort.
Whereas, if the objects are sorted during the raster phase, using the depth buffer as a reference, we call it Intermediate Sorting because it occurs right in the middle of the graphics pipeline.
The second technique avoids oversizing, but as we have seen before, there are problems when a scene has transparency. And what do current GPUs use? Well, both, since developers can choose which type to choose. The difference is that in intermediate sorting, there is no overdraft.
Bandwidth and VRAM: overdraw
The logic behind the overdraw is that the first pixel in one position (x, y) will be drawn in the frame buffer yes or yes, the second under the same position will have a 50% chance of having a higher Z value or 50% chance of having a smaller one and therefore it will be written in the final buffer, the third has 1/3 of the possibilities to exist, the fourth of 1/4.
This is called the harmonic series:
H (n)= 1 + 1/2 + 1/3 + 1/4… 1 / n
Why is this important? Well, due to the fact that even though the pixels discarded by oversizing are really large, it reaches the point where massive oversizing does not result in a huge number of pixels drawn in the color buffer because if the z value of this already textured pixel is greater than that found in the image buffer, so it is discarded and does not count towards the bandwidth of the color buffer, even if it has already been textured.
VRAM bandwidth: compression mechanisms
In recent years, the so-called Delta Color Compression or DCC have appeared, we recommend that you look for the article we made about it. These techniques are based on compressing the size of the color buffer so that it takes up much less and to do this what they do is tell the GPU that each pixel has a value of + n bits. , where n is the difference between the current image and the previous one.
Another element is texture compression, which is different from DCC and is used when generating a color pad that we want to grab later to perform post-processing effects. The problem is that the image that uses texture compression is not understood by the unit that reads the final image and sends it to the screen.
Bandwidth and VRAM: tile rendering
In tiled rendering, the color buffer and depth buffer are processed internally on the chip, so these bandwidths are ignored. Therefore, GPUs that use this technique, such as those used in smartphones, do not require as much bandwidth and can operate with much lower bandwidth memories.
However, tile renderers have a series of setbacks that cause them to have less raw power than GPUSs that don’t use this way of rendering the scene.
It’s hard to guess how much bandwidth each of the games uses, so there are tools like NVIDIA’s NSight and Microsoft’s PIX which not only measure the level of compute load in each part of the GPU but also throughput. bandwidth, this allows developers to optimize in the use of VRAM.
The reason is that in the case of over-drawn scenes, they cannot predict what the charge of each of the pixels in an image will be. For hardware architects and software engineers, it’s best to keep things simple and put the fastest VRAM within the stipulated costs.
What is taken into account is the ratio of the bandwidth to the theoretical fill rate, which consists of dividing the bandwidth by the precision per pixel and comparing it to the theoretical fill rate of the GPU, but C ‘ is a factor that is taken less and less into account, especially since GPUs no longer draw already textured pixels directly in VRAM but instead write them to the L2 cache of the GPU itself, thus reducing the impact on VRAM.