The advent of Ray Tracing, how has it affected NVIDIA and AMD?

As in the revitalization, where initially it was only possible on supercomputers, then on workstations and later on home computers with 3D cards, radiation detection or known as its English name “Ray Tracing” had the same evolution and what years ago could only happen with the most powerful and expensive systems available to everyone.

Hardware emergence in relation to Ray Tracing

That’s why we decided to do a look back to show you the appearance of hardware in relation to Ray Tracing; We have divided this transformation into five distinct phases, and in them we will not only talk about the ways of the past but also the ways we will soon see and use for future generations of GPUs that will equip our PCs.

Phase 1: Providing CPU

It should be noted that GPUs have long been tied to the rasterization algorithm, so they were not ready to provide scenes based on Ray Tracing, using a different algorithm.

The solution that existed when you wanted to give a scene by following the ray track? Pulling multiple internal CPUs and although this is already part of history was the way Intel wanted to use it with its canceled Larrabee and failed more than a decade ago, which was nothing more than x86 cores in a very similar GPU configuration.

This solution is already ineffective because CPUs are scalar systems designed to work with one function per thread and compared to GPUs have very few simultaneous threads, forcing the need to create bulk supercomputers if there are not hundreds of dedicated CPUs.

Step 2: Tracking Ray on the GPU using Computers Shaders

Starting with DirectX in version 11 and OpenGL in version 4, a new type of GPU shader programs called Compute Shaders, which were not related to the image pipe section, emerged.

Thankfully, GPUs were able to focus on their capabilities fully or partially in solving problems beyond reconfiguration and among them it was possible to initiate radiation tracking in the GPU, not at a speed sufficient to allow real-time supply, but yes to use successive phase pipeline with Compute Shaders.

However, it was not until DirectX 12 that it began to develop a full-fledged Ray Tracing supply pipeline where certain categories were Computer Shader making one of these categories in particular.

This pipeline is the last to be rated since 2018 as a precursor to DirectX Ray Tracing and later adopted by Vulkan; however, this initial GPU implementation of real-time tracking was inadequate in terms of performance and changes were required in the old Compute Unit / SM.

Phase 3: Cross-unit units

Something common in hardware design is to create accelerators to perform repetitive and repetitive tasks at a cost in place with much less power than a complete processor, the idea is to silence these functions for those trained processors.

These types of units are common in GPUs. For example, when it comes to rasterization, we find units with a fixed function that performs functions such as triangular fixing, composition filtering, etc. These wired and permanent units perform this function from the input data provided which is why they are called fixed function, because we cannot modify their function, that is, they cannot be programmed. The advantage of this type of unit is that it allows us to perform these specific calculations using very small units, with very little use and fully functional.

In Ray Tracing, each radiation generated at a local time will strike one or more objects at the scene, so it is necessary to make this calculation continuously and repeatedly what we call a crossroads, so it is kind of a good process that ends in the form of a special unit that works the same.

In the case of cross-linked units, within the GPU found within Compute Units / SMs in the graphics units of AMD / NVIDIA (however we are talking about the same type of unit in both cases, but with a different name) and interacting with ALUs managing to create shadows using data cache within the unit the same.

Section 4: BVH Tree Walking Units

BVH is a local data structure that maintains scene geometry in a systematic way. In order to speed up the calculation process at intersections, the practice is to do this on a BVH tree instead of making a pixel-by-pixel.

With the exception of units that override the BVH tree, it is necessary to do it with a compute shader system, but with these units used at the hardware level we forget to carry out this process.

In other words, the path unit will generate all the radiation and its path through the BVH tree automatically without the participation of the shader system and will communicate with the crossroads unit. Both at the end of the process will send the results back.

It should be noted that in the current version of DirectX 12 Ultimate this is not part of the minimum and it is necessary to control the formation of new radiation from the intersection of others with the Ray Generation Shader. The use of this unit is therefore restricted, as it is preferable to enable game developers at the moment regarding the size of lightning on the scene.

Step 5: Searching for Related Ray

The next step in the transformation of Ray Tracing’s GPUs will be the addition of the integration unit into the GPU, but first we need to understand what it means for memory integration from the perspective of any processor, as this is the view of each processor memory, especially today’s GPUs.

If we want to understand the problem with memory integration, then we have to understand how the temporary storage system of any multi-processor system works, whether we are talking about the CPU or GPU.

Caches are not the actual RAM, but instead store some RAM components or higher cache levels.
Lower cache levels and closest to processors that contain copies of data segments from high-level archives.

Therefore, if we want to create a compatible system, a method must be created that when a core or other GPU unit converts the amount of data, all copies referring to that data in all caches are also converted simultaneously to VRAM.

Now what problem do GPUs face now? Before we note that the BVH intersection and traversal unit have access to the Computer Computer / SM database, but as there is no consensus, when changes are made to the Computer Unit / SM data then other units cannot, and this leads to a positive part. of intersections and repeated distance calculations even if they have already been done by other units.

The assembly unit is one or more hardware units responsible for notifying all Computer Units / SMs of changes in cache content, so it is difficult to use the hardware due to the amount of communication you need.

In the CPU, integration can be easily achieved because we have very few calls within it, but in the GPU a large number of cables make the integration process difficult to implement; note that the number of data methods to use is n² when in the number of linked objects.

As the GPUs are on their way to chiplet separation, it is possible that this integration unit becomes the chiplet itself or is located in the central part that holds the different components to each other. In any case, we have not yet reached this point and given that a change in the level of construction is taking place over a period of 2 to 5 years, we will still have to wait a bit.

Where Coherency Engine is used in the construction of the PowerVR Wizard Imagination, because it has been made on that hardware for years, but NVIDIA and AMD have not yet used it on their GPUs and it should be noted that they have a “slightly” approach different from Thought; In any case, it is Ray Tracing’s next appearance.