Having the most powerful CPU or GPU today is not enough for games that use ray tracing to run smoothly. Even today, with this technique used so primitively, ray tracing performance is not as expected. What phenomena affect it?
How does the processor affect ray tracing performance?
We have already told you about bounded volumes or BVH more than once in various articles, but today these are not generated by the GPU, but games use a general BVH for the level, which allows it to position the different elements of the scene according to the player’s position. The most common use? Although it is the GPU that renders the scene, in each frame the CPU has to calculate the state of each of the elements.
This is why in games, spatial data structures are typically used as a level map, which is used to locate the position of each item on the screen relative to the player and the level structure. That way all the computational effort is in what’s happening near where their avatar is located and the player can see at that point.
There are tips in various games that allow us to get out of the extremes of a map or even see everything happening on stage at the same time. When we do, the performance of the game crashes. Well, the spatial data structures are generated by the CPU, because they are the tool used to eliminate from the scene anything that is not seen by the player and therefore it would be a waste of resources to have to calculate them in same time.
Therefore, when the CPU creates the list of screens to be calculated by the GPU, what it is going to do is create a list of screens in said frame with all the objects in view of the player and it will be completely the same if this is mathematically correct because beyond the visualization space, there are elements that also affect the scene.
At the same time, it is also responsible for updating the BVH in each frame and delivering it renewed to the GPU. Since in the BVH the static elements of the level do not move, but the mobiles like the NPCs, the characters or the player himself do it at each level and it is the CPU in charge of updating the status of each one.
GPU and ray tracing performance
GPUs are the quintessential parallel processing processors, so much so that we have used high performance computing, or HPC, for years to speed up parallel parts of different programs. Ray Tracing being one of the applications that benefits the most from parallelization and therefore the most apparently from the power of GPUs, we are however still far from the ideal paradigm in which GPUs as we know them today are the ideal hardware. for Ray Tracing.
In Ray Tracing, like in the other algorithm for generating 3D graphics, rasterization, what we do is take a collection of objects in a three-dimensional space and represent it in a two-dimensional space made up of a matrix of pixels. screen. However, it has never been possible for a GPU to render a scene based on real-time ray tracing at the same speed and performance as with rasterization.
Ray tracing does not depend on resolution
To say this may sound bold to many, after all, rendering performance in FHD is not the same as in QHD or 4K. The performance variations are disparate, but once you understand what we’re talking about, you’ll understand why we’re talking about it without depending on resolution.
The original ray tracing algorithm talks about impacting the primary and secondary rays of the scene against each of the pixels in the scene. It requires a lot of power, so instead of using the original algorithm, we use spatial data structures like BVH in order to speed up the algorithm. Since without the spatial data structure, ray tracing would require enormous computing power.
Hence the use of tree-shaped spatial data structures to accelerate the performance of the GPU in ray tracing. However, today’s games have extremely complex scenes whose data structure used for the intersection cannot be cached by the GPU due to its size. Because what it does is copy fragments of the structure to the GPU cache from VRAM as and when they are needed.
The other trick is how you will now know the use of the hardware in charge of calculating the intersection of each ray with the data structure. This is a calculation that is done continuously and recursively and would end up consuming a large amount of GPU resources, so with these units we are seeing performance increases between five times and an order of magnitude.
The problem of localization of data
In reality, we do not represent the path of light, because in real life the light does not have time and because of its speed it is everywhere at the same time. Thus, the data structure does not represent or what is the path of a ray of light in the scene, but rather parts of the scene that are affected by it.
What we do is organize the objects in the scene into sets that we call delimited volumes, in such a way that it is interpreted that if the ray does not pass through a delimited volume, it will not affect the objects, the other delimited volumes, which are as a whole.
The first thing we can think of is that it would be a good idea to divide the stage space regularly. So that one part of the scene takes care of a part of the GPU and another part of the scene takes care of another part. In other words, it would be like taking a map and delineating it by quadrants, having the information of each quadrant or even sub-quadrant stored by the different levels of the cache. This is a method that works very well with rasterization.
In ray tracing, on the other hand, we can see that if we divide a scene into several subspaces, a single light ray passes through several subspaces, which can be assigned to different process elements in the GPU. This is why what is done is to take a totally different path, the scene is not divided according to the pixels which are affected by each ray, but each ray is cataloged according to the areas of the scene that it crosses.
The example of the vehicle and the card
To better understand the situation, suppose we were to mark the layout of a delivery vehicle on a map. How can we do it? The first is to create the map and plot the route. The second is to make a list of the areas of the map that the vehicle has crossed and then to build its route more easily.
In the second case, we can completely reconstruct the route, turning it into a linear route. At the end of the day, we’re only interested in where the vehicle has passed and not where it hasn’t, so in the case of using a map much of the information would be superfluous and we could reject them by not contributing anything.
But just creating an orderly list doesn’t mean those who drive the vehicle have to travel a lot of miles for their deliveries. This information is not taken into account and it is the same with the performance in Ray Tracing, each of the spokes has a different computational cost, not only because they have a different route, but also because we do not do not know the memory jumps that we will have to do.