NVIDIA Lovelace, possible technical specifications

First of all, take this with a lot of skepticism, although the information leak was correct in its day on the specs of the RTX 3000 and NVIDIA A100, it is also possible that it received counter-information from NVIDIA. With In order to protect the specification plans for the next generation of NVIDIA GeForce, in any case, we will notify you and comment on any changes NVIDIA may make to achieve them.

Remember that according to rumors, Lovelace is the next NVIDIA architecture, which will be released from 2022 as the first date and under a 5nm node, which it is not known if it will come from TSMC or Samsung but everything points to the first . In any case be skeptical even in the latter because nobody expected the RTX 3000 in the 8nm node from Samsung.

NVIDIA’s possible dramatic leap from Ampère to Lovelace

According to insider Kop ite7kimi on his twitter, the AD102 chip under the GeForce Lovelace architecture will have a 12 * 6 structure instead of a 7 * 6 like the GA102, if that sounds Chinese to you, it’s easy to explain it to using the diagram of GA102, the GPU used in the RTX 3080 to RTX 3090 and all variants in between.

The NVIDIa GA102 has 7 GPCs, each GPC has 6 units which are called TPCs and each of them includes 2 SMs. What is Kopite referring to in your information? Well, we’re going to have a total of 12 GPCs with 6 TPCs each. Which would indicate how NVIDIA for the next generation would seek to place the most units in the chip area. but there are elements that make us skeptical.

The main reason is that NVIDIA usually does not search the processor for as many elements as possible, but what it is looking for is to increase the capacity of each of them. They are currently under the roadmap to achieve full ray tracing support and despite the power of the RTX 3000, we still haven’t seen support for things like ray tracing consistency and even acceleration units. frame. of data.

NVIDIA Lovelace to need a new interconnect structure

The Kopite specs indicate a very big improvement in the interconnects that the different GPCs communicate to and the things they are connected to, like the L2 cache, as one of the reasons why a device’s core count is not increased without CPU discrimination is due to the amount of power required to increase the number of interconnects, so fewer but more powerful cores are preferred.

It is not known if NVIDIA has been successful in any way as the interconnect that connects the different GPCs with the L2 cache and these between them allows them a dramatic leap that places the configuration of this GPU from 84 SM to 144 SM in a generation, that would be the biggest leap NVIDIA has ever made in this regard.

The other possibility is that the 5nm node does not allow as big increases in clock speed as expected, which forces to increase the number of SM drives in the GPU, but this is required by NVIDIA to redo the whole structure of the GPU. ‘interconnection inside the GPU and overcome the handicaps they have in terms of power consumption at the time of data transmission within a processor.

In any case, we are skeptical, we can believe that NVIDIA is making profound changes in SMs and there is a list of things it could do, but the increase in the number of SMs seems to us today to be too much of a leap. exaggerated without changes. on the rest of the GPU.

NVIDIA Lovelace, not so monolithic?

A year ago, NVIDIA introduced an experimental chip called RC-18, of which I particularly highlight GRS or Ground Reference Signaling, a type of vertical interconnect that NVIDIA used to communicate multiple chips within an MCM.

The idea is that each chip has 4 complete communication channels (North, East, South, West) and the transmitter and receiver of each channel, so we would speak of a NoC type configuration in which each element can communicate in a direct with the 4 around him.

In the NVIDIA example, several chips were used, but that doesn’t mean that it is not possible to do this with a seemingly monolithic GPU, because the GRS communicates through an interposer below, so apparently the chip would look like a monolithic chip but it would be a 3DIC composition in which the intercom structure would be on a chip placed at the bottom.

VRAM beyond GDDR6X

What we can’t forget about these specs is that this GPU monster is going to require some type of VRAM memory with high bandwidth in order to be able to feed it at constant speed, and even the GDDR6X we don’t see feeding a monster of similar characteristics. Are we going to see NVIDIA deploy its FG-DRAM memory for the first time in the market with this GPU?

The problem with FG-DRAM is that it is a type of memory designed to work like HBM so it would be very expensive, another possibility would be that like with RTX IO data decompression is taken supported on the fly. with RAM, but we are talking about 50 times faster bandwidths and this decompression speed may not be achieved in real time.

This is why NVIDIA Lovelace might come with a new type of VRAM, be it FG-DRAM or some other type, we don’t think so, but GDDR6X fails in front of such a monster if the technical specifications that have been leaked are true.