GPU virtualization, how it works and how to use it

Before the arrival of the personal computer, people worked in terminals connected to a central minicomputer, which was in charge of all the tasks. With the advent of cloud computing running various operating systems in virtual machines, virtualization is not only needed on processors, but also on GPUs.

Why is virtualization necessary on GPUs?

Virtualization is the ability of hardware to create multiple versions of itself so that it can be used by multiple virtual machines. For example, part of a CPU that has been virtualized will be seen by an operating system running in a virtual machine, while other virtual machines will see other parts of the CPU as a single, separate CPU. In this case, we virtualized the processor, because we created a virtual version of each operating system running on the system.

In the case of GPUs, the command lists for graphics and computing are written in certain parts of memory in particular and in general the GPUs that we mount in our PCs are designed to work in a single operating system without no virtual machine in between. .

This is not the case with graphics cards for data centers, where several different instances, virtual machines, of an operating system are usually running and each customer must access the GPU. This is where virtualization in GPUs is needed.

GPU side virtualization

GPUs also need changes at the hardware level to support virtualization. For this reason, graphics cards with this capability are typically only sold in markets far from desktop and therefore have a much higher price tag than desktop GPUs. Today, you can hire the power of a GPU in the cloud to speed up certain scientific work, render a scene remotely for a movie or series, etc.

SR-IOV

Each PCI Express device has a unique memory address on the memory card of each PC. Which means if we are using a virtualized environment, we cannot access hardware connected to these ports such as graphics cards more than once. Usually, virtual machines that we run on desktop PCs do not have the ability to use the graphics card, which will only be accessible to the guest operating system.

The solution to this is SR-IOV, which virtualizes the PCI Express port and allows multiple virtual machines to simultaneously access these memory addresses. In the PC, peripheral communication takes place via calls to certain memory addresses. Although today these calls do not correspond to physical memory addresses, but virtual ones, the inviolable rule is that the contents of a memory address cannot be manipulated by two clients at the same time, because there are then conflicts regarding data content.

The SR-IOV to work needs a network controller built into the PCI Express device, in case we are dealing with the graphics card, which receives requests from different virtual machines that need to access its resources.

Modifications to DMA drives for GPU virtualization

The first change occurs in DMA units, these units usually come in pairs in the GPUs we use in PCs and provide access to the system RAM, not to be confused with VRAM, through an alternate channel. At each frame, the GPU will need to access to read the list of screens in part of the RAM or it will have to copy the data from RAM to VRAM in case it needs that data later. This uses one DMA unit in each direction. In the case of GPUs with virtualization? They use multiple DMA units in parallel or one DMA unit with multiple simultaneous access channels

The use of the different channels by the virtual GPUs is managed by the integrated network controller, which is responsible for managing requests to the RAM, either physically or to another device also connected to the PCI Express port. So if the different virtual GPUs need to access, for example, an SSD, they do so through the DMA units.

GPU Command Processor Changes

The second change concerns the command processor. All GPUs for computing without graphics in the middle are used to working in several contexts at the same time, this is because they are very small lists of commands that are resolved in a short time, on the other hand if we speak of graphics at the same time. thing that changes completely, because only one command list is usually used.

What about non-virtualized GPUs that use multiple displays? This is not the same as using multiple operating systems in virtual machines, since the list of screens in these cases comes from a single operating system that tells the GPU which video output is through which each image must be transmitted.

Therefore, it is necessary to implement a special graphics control processor, which works in conjunction with the DMA units and the integrated network controller to work not as a GPU, but as several different and virtual ones.

GPU resources are distributed in virtualization

All contemporary graphics cards are generally divided into several blocks within other blocks, for example if we take the GA102 from the NVIDIA RTX 3080 or 3090 we will see that it is initially composed of several GPCs, within which there are several TPC and in each TPC we have 2 SM.

Depending on the way in which the distribution of resources has been proposed by the manufacturer, we can find a distribution in which each virtual machine corresponds to a GPC, so in the case of the GA102 we would speak of a virtualization of the GPU in 7 different ones. Although it is also possible to do this at the TPC level, in this case up to 36 VMs can be created, but as we understand the power of each of them would be very different.

In GPUs, what is assigned in NVIDIA is a full GPC or what in AMD is known as the Shader Engine, as each of these parts has all of the components needed to function as a GPU on its own. In cases where the virtualized GPU is not used for rendering but for computation, then the distribution is at the TPC level or its equivalent in AMD RDNA, the WGP.