Ray Tracing on AMD Radeon RX 6000 graphics cards, how does it work?

When a year ago AMD presented its new graphics architecture RDNA we took one of lime and one of sand; The good news came in the form of a new graphics architecture after more than five years with the GCN architecture, but the bad news came in the form of the lack of dedicated hardware for what is known as ray tracing. real-time or real-time ray tracing. . But a few months ago, AMD confirmed that the RDNA 2 architecture will be equipped with this type of units, so that they can compete with NVIDIA in this regard, although it works somewhat different from NVIDIA’s proposal. .

The RX 6000 Intersection Unit: The Key to Ray Tracing

If we look at the ray tracing pipeline, we will see that regardless of the material it is always the same, it is a process that is repeated over and over where a huge number of times the intersection between the ray and the object is calculated. This repetitive calculation is more expensive to do in specialized units than in the shaders themselves.

Since AMD and NVIDIA units are very similar, we recommend that you read the tutorial on this website titled “What are RT Cores for Ray Tracing and How Do They Work?” where the NVIDIA solution works as a complement to this tutorial so that you can get a full idea of the differences between the two approaches.

Each of the intersection units is in each calculation unit, the reasons are as follows:

They need to have access to the BVH tree in memory, so they need to be able to traverse the GPU cache system, and just like SIMD units that run shader programs, they need to access the entire cache hierarchy.

They should be close to the SIMD units because they are the ones that depend on the result of the intersection unit to know what kind of shader they apply to ray tracing objects.

AMD opted for a different solution: integrate the intersection unit into the texture filter unit or at least let them share access to the data cache. We know this information from two different sources, the first is the presentation in the Hot Chips 2020 made by Microsoft on the SoC of its Xbox Series X, because it has a GPU built into the RDNA 2 architecture, the same than AMD RX 6000 graphics cards.

Let’s not forget that AMD itself has confirmed that the solution for Ray Tracing in next-gen consoles with their GPUs and on PCs is exactly the same.

The second source is a patent to AMD itself where it is said that the intersection unit for ray tracing is in the texture unit, which has led to the confusion that the texture unit does not. can’t calculate the intersection of the rays and the texturing at the same time, but in reality the texturing is only applied at one stage of the graphics pipeline, which is the texturing of the scene where the pixel shaders are acting, so outside at this stage, these units are rarely needed.

The texture unit simply applies the bilinear filter, which means it takes 4 neighboring samples per pixel and interpolates between them. Every contemporary GPU usually has 4 texture units accompanied by 16 load / storage units with which they access the data cache of the compute unit or the SM.

The only difference with NVIDIA’s solution to calculate intersection in ray tracing is that in the AMD RX 6000, access to the data cache via L / S units is switched between texture filtering units and l unit of intersection.

Why the intersection unit for ray tracing in the CPU?

Execution units within the GPU typically operate with instructions, usually of the register-register type, so they lack a complex mechanism to access the memory hierarchy, which allows them to be simpler cores than those of a CPU and place more inside each chip. The way the compute unit’s SIMDs access the memory hierarchy, which consists of the internal GPU and VRAM caches, is by using the load / storage units for this.

Almost all types of shader programs tend to run on registers, but there is one type which is pixels or shaders which require access to the memory hierarchy because they work with the huge amounts of texture data. , and therefore the texture units have access to the memory hierarchy with the SIMD units.

In the specific case of Ray Tracing, we have to store the position of the objects in the scene in a spatial data structure that we call BVH. This data structure does not fit into the internal GPU memory, so the intersection unit must use the memory hierarchy, which means these units are connected to cache and VRAM as well.

The RX 6000 is more geared towards DirectX 12 Ultimate requirements

There is still a long way to go for ray tracing to replace rasterization and there is a long way to go where the most optimistic forecasts speak of a minimum of three years ahead. The reason for this is that ray tracing requires very high computing power and there are scenes where even the most powerful GPU would completely choke on trying to achieve adequate performance.

In traditional ray tracing, a ray bounces off various objects until it runs out of energy or simply leaves the stage; To understand energy, you have to keep in mind that every object has a refractive quotient that goes from 0 to 1 and that’s the amount of light it absorbs and reflects. An object with a refractive quotient of 0 completely absorbs all light and will not emit it, while an object with a refractive quotient of 1 will emit all the light that reaches it.

Each time a ray hits an object, it creates new indirect rays and so on until the refraction quotient is low enough. It will of course be understood that this involves a large number of intersections to be calculated which exceed the capacity of the intersection units.

To avoid this, in APIs like the DX12 Ultimate or Vulkan from Microsoft, a new type of shader program has been added: the Ray Generation Shader, which consists in that the generation of new rays is not automatic but must be explicitly invoked. by code, which means that in the early years we will see objects in games that do not refract the rays in order to reduce the amount of rays in the scene and get stable frame rates.

This means that when a ray hits an object and has to continue its path generating new rays, then the intersection unit has to ask the shader program responsible for coordinating the path what to do.

Is RX 6000 Ray Tracing Solution Better Than RTX 3000?

Well, we don’t know for sure because at the moment one company and the other have chosen to give different metrics, and in the case of AMD, the information we have indirectly through Microsoft is that the units of ‘intersection can make 4 rays. Ops per cycle, but we don’t know exactly what these Ray Op are, the only thing we also know about Microsoft is that its console’s GPU intersection units are equivalent to 25 TFLOPS, but we don’t know not the context of this figure.

In the case of NVIDIA, they claim that the RT Cores in the RTX 3080 have a combined power of 58 RT-TFLOPS, but we don’t know if that’s the computing power of the RT Cores per se or the computing power. that the CUDA drives should be sufficient to have the same performance.

Either way, the reality is that we can only trust what both architectures tell us and the information we have, and it looks like the units in the RX 6000 are closer to those in the RTX 2000. with 4 lightning intersection calculation units and 1 lightning triangle unit, but NVIDIA in the RTX 3000 has doubled the latter, so the capacity when calculating intersections is a little larger.

How this translates into each game depends on a number of factors, but in any case, it seems AMD’s solution for Ray Tracing on their RX 6000 is good and efficient enough to switch to consoles as well. new generation.