DirectML and NVIDIA DLSS for resolution

With the advent of the next generation consoles integrated with the AMD RX 6000 we finally have both graphics card manufacturers with GPUs that can work technology for AI, however, the approach of both companies is different. While NVIDIA attempts to bind developers to libraries designed exclusively for their hardware, AMD has chosen not to build its own tools and use them Microsoft DirectM APIUL.

Considering these two next-generation versions, the PlayStation 5 and the Xbox Series X, it is clear that with the use of artificial intelligence in games AMD will win, but we must start with the view that DirectML is not designed for specific hardware And it is a complete platform that God is unknown.

DirectML works under any kind of processor

DirectML is based on the idea that we can execute any type of command on any type of processor, but not all work equally well, this means that some formats will work better than others when using these algorithms.

The fastest type of unit is called ASICs, these are neural processors (NPUs) whose ALUs are systolic arrays and are designed to perform these algorithms very quickly, examples of this type of units are the following:

NVIDIA RTX Tensor Core
SoC NPUs are different Smartphones

The second type of unit FPGAs are set up as ASICs, but due to the large area of FPGAs and low clock speed they do not work well.

The third type are GPUs, these do not have special units, but it is about making AI algorithms look like Computer Shader programs, they do not work as well as FPGA or ASIC, but work better than the CPU when performing this type of algorithm.

DirectML is designed to use the ASIC when in the system, if not available it will look at the GPU to use it and ultimately the CPU to be the most hopeless source. On the other hand, libraries such as NVIDIA cudNN will only work with the NVIDIA GPU and Tensor Cores, ignoring other types of units in the system.

Solution

It is known as Advanced Solution with AI by using an artificial intelligence algorithm to produce a higher resolution version of the given image, which has the advantage of being able to maximize the resolution of game releases without traditional rendering and using less resources: Super-Resolution is only important when the time to render traditional decision is greater than the time to provide for redesigned retrieval and algorithm scaling algorithm.

Remember that there are two types of algorithms for further resolution:

Those of the first type are used in movies

so to frames have already been defined previously updated all ix ms and when the same word formation and AI measurement is done with the least amount of power required. Automatic measurement systems for some televisions are based on algorithms of this type.
The second type is what we saw with NVIDIA DLSS, in real-time games there is no pre-defined version of the image in memory, this will have to be done and the processor using the algorithm has a few milliseconds to use it. However, it should be noted that what DLSS does is not limited to NVIDIA and anyone can make a partner.

The first type is much easier to train because we can use a higher version of the movie adaptation so that the AI can retaliate during training. But in the video game it is different, each frame was not before so the training used is very complex and requires constant monitoring, so, for example, NVIDIA should use larger Saturn-V computers to train AI.

Another problem arises when algorithm is obtained through training. In the case of DLSS 2.0. A type of algorithm of the second type, its Tensor Cores have a rate of 1.5 ms to perform the whole process, which means it requires high power to do it at that speed, which is why a large number of TFLOPS in Tensor Cores.

In DirectML you can install an algorithm of the same style, but it should be noted that the most powerful component is the algorithm, in which case the GPU will be faster than providing a pre-configuration scene to give the unit time to use the algorithm that solves completely.

We may soon see DLSS 2.0 losses based on the DirectML algorithm, but we should see if AMD GPUs they are fast enough too capture the type facing NVIDIA. The NVIDIA Tensor Core they can perform FP16 and Int8 calculations in a 4: 1 ratio related to any AMD GPU with similar features. Not surprisingly, DirectML was first introduced using Tensor Cores of NVIDIA Volta.

It should be noted that these algorithms do not create an image in native 4K, but rather measure the value of each pixel and it should be noted that there is a range of errors that can lead to the farthest display expected. That’s why games that support this type of strategy don’t make it out of the box and support is limited to most games, but where AI produces images that are very close to traditional 4K then conservation is important.