The ray tracing used in games today is what we call hybrid rendering in which the consistent part of the scene is rendered using the rasterization algorithm and the inconsistent part of the scene is rendered by the ray tracing, therefore Despite what the marketing of different companies say, the era where games are rendered only by ray tracing has not arrived.
To make this statement more understandable, let’s say the scene is rendered using rasterization completely ignoring indirect lighting, which is produced when a light source falls on an object and reflects light in new directions.
Ray tracing renders the inconsistent elements of the scene more accurately, faster, and more efficiently than rasterization, but there is an associated performance issue when rendering the inconsistent part of the scene that makes the computational cost very high. When it comes to applying ray tracing and this is precisely the next big challenge for companies like NVIDIA and AMD, optimizing the performance of the inconsistent part of the scene in ray tracing.
Coherent Ray Tracing and Inconsistent Ray Tracing
Let’s put aside the hybrid rendering used in games for now and turn our attention to pure ray tracing, where rays can be evaluated in two different ways.
- In pure ray tracing, they are considered coherent rays those that come out of the camera and follow the path of the frustoconical view of the scene, these rays are said to be coherent but are not used in the hybrid rendering.
- The inconsistent rays are those produced by the impact of a ray of light on an object.
- Son coherent rays those that come from a primary light source, i.e. they were not generated by the impact of a previous ray on the object.
At a visual level, if we are only talking about direct lighting, there is no visual quality difference in rendering a scene with only direct lighting between rasterization and ray tracing, add that to the fact that all game engines work via raster and you will understand why ray tracing is not used when rendering the consistent part of the scene.
The performance of the inconsistent part of Ray Tracing on a GPU
The problem is that while ray tracing is much better at rendering the inconsistent part of a scene than rasterizing, there is the problem that inconsistent rays have a much poorer performance than calculating the coherent rays of the scene. scene. .
The reason for this performance mismatch is because not all of the scene information is in the GPU cache, which is what the beam intersection units access, with non-coherent beams they don’t have impact. on the same area of the stage and therefore do not affect the same shader, causing stoppages in a large number of threads in the GPU, resulting in reduced performance.
This is a problem that in the film industry they solve through ray rearrangement algorithms, but they can do it easily because they know the position of the camera in advance and therefore can convert all the inconsistent rays from the ray scene. coherent thanks to a control algorithm.
But when it comes to rendering a movie that they have all the time in the world, they don’t have to display a frame every few milliseconds and the sorting algorithms are more to save time and with it the cost. of their powerful render farms, however, the situation in video games is different.
But in a video game where every frame is unique, this cannot be done, moreover, it would require very powerful hardware so that the ray control of the scene does not affect its high frame rate, so that is now the next big challenge for GPU manufacturers to solve and it is crucial if ray tracing is not to stagnate in terms of performance.
Current GPUs are not intended for inconsistent ray tracing
The graphics processors we use in our PCs have been designed for rasterization, which is an exploited rendering algorithm that takes advantage of the spatial and temporal localization of memory accesses.
Most of the work that the GPU has to do while rasterizing has the peculiarity that when applying a shader program, especially during the Pixel Shader, the data of the pixels and triangles it is processing is shared with its closest neighbors. In the scene.
So there are many possibilities that if the GPU accesses the data of a group of triangles and pixels and collects all nearby in memory in caches, it will already have the data for neighboring pixels and triangles. Changes must therefore be made in order to exploit this common feature of all GPUs.
The structure of spatial data
In order to speed up Ray Tracing, what is done is to build a spatial data structure, said structure is nothing more than the map of the position of objects in the scene in an orderly fashion.
The scene is converted into a sort of cube with several sub-divisions that indicate where the objects are located, of which there are two types:
- The scene is divided into regular blocks by space.
- The scene is divided into the parts where there is geometry or elements.
In games, the second type was chosen by adopting BVH, not least due to the fact that NVIDIA has dedicated hardware in its GPUs to quickly navigate this data tree, but there are two types of BVH:
- Static BVHs need to be rebuilt after we edit an object in the scene, but once they are built they speed up the render time of the scene.
- Dynamic BVHs allow objects to be updated individually, so when rebuilding BVH the time to do so is much shorter, but in turn, the subsequent rendering time increases.
And how important is that? If we want to order the rays according to their trajectory in the scene, we must first be able to have a map of the same scene that allows us to store the trajectory of the rays.
Ray trajectory mapping
One solution is to pre-traverse the ray scene without modifying it, just to know which objects will affect the different rays and which rays will cross the scene. Once the pre-tour is complete, the various rays that affect a particular part of the scene are stored in a buffer, although they are not related to each other.
Although there is no direct relationship between the different rays in the same place, there is a spatial relationship, which makes it possible to exploit the common architecture of all GPUs when rendering a scene with non-coherent rays. The idea is to pre-render the scene but without calculating the shaders that cause the color values of different objects to vary when rendering the scene, we are just interested in knowing which parts of the scene each of the rays will affect.
Rays before the stage
The rays that pre-pass through the scene will only execute a shader, the Ray Generation Shader, which indicates that this object in the scene has the ability to generate an indirect light ray, as for the rays themselves, they have a series with them. settings to keep them from forever bouncing like ping pong balls all over the stage.
To do this, it is necessary to place a series of parameters associated with the rays and the objects which would be the following:
- A constant which is the number of bounces a ray can make in the scene, once that amount of bouncing has been made regardless of other conditions, said ray stops bouncing.
- A constant in each material which is the refractive constant, which goes from 0 to 1, at each intersection, the energy value of the ray is multiplied by the refractive constant, when a ray reaches a sufficiently low energy level , it is Jeter.
With this we can already bounce the rays through the scene in a preliminary way, which helps to order the data, because with this we can know in which parts of the scene the different rays will affect. Which will speed up performance significantly, but it requires two hardware changes.
Built-in memory to store the spatial data structure
All that remains is to be able to store the entire spatial data structure in a memory as close to the processor as possible, along with the pre-tour data, but this data structure cannot be stored in caches. limited to a few megabytes. , nor the Infinity Caché despite its 128 MB would be able to store such a large amount of data.
What is needed is to find a way to place as much memory as possible near the GPU, which serves to store the entire spatial data structure, said memory would not be a cache and would not be part of the hierarchy. processor memory, it would simply serve to store the entire spatial data structure within.
One way to achieve this would be to use SRAM connected vertically to the GPU, but the implementation of this memory could come with additional additions to take advantage of its future implementation in GPUs. While there are other ways to do this, they can even do it in the form of a new high-density last-level cache.
The next fixed functional units
There will be two, which will be crucial to increase performance:
- The first will be responsible for generating the spatial data structure through the position of the geometry in the scene.
- The second thing you’ll do is note where each ray hits during the tour before applying ray tracing.
Both units will take advantage of the huge onboard memory that the GPUs will include to store the stage’s spatial data structure. Thanks to them, we will see a big increase in performance with regard to Ray Tracing.
These units are already found in hardware solutions such as Wizard’s PowerVR in the form of Scene Hierarchy Generator and Coherency Engine, their utility has been more than demonstrated but not in extremely complex environments where the implementation of on-board memory will be necessary.