Starting at 7199 yuan, Nvidia RTX 40 series graphics cards are finally here: double the basic performance, and quadruple the light pursuit

This article is reproduced from the heart of the machine, edited by Zenan and Du Wei.

The most high-end RTX4090 is the most cost-effective, you did not expect it?

Although it was complained because the product was too expensive, when it comes to the strongest AI chip and high-end gaming GPU, people still look at Nvidia first. On the evening of September 20th, the GTC conference was held online, and the much-anticipated RTX 40 series graphics cards were finally officially released.

Slightly different from the past, this important Keynote appears unpretentious and "short". Huang Renxun directly stood in the empty metaverse and gave a speech:

867fd1d3896de1c4c0d8105ddf39e1ec.gif

An hour and a half to call it a day.

In this event, Nvidia demonstrated the latest developments in RTX, AI chips, and the Omniverse product line, including their help to new breakthroughs in the field of artificial intelligence, as well as a large number of applications.

Before everything started, Lao Huang first showed a fully interactive simulation environment RacerX, which was built with Omniverse, and has a lot of physical material characteristics, ray tracing, smoke and flames. The most important thing is "all are not pre-rendered, they run on a single GPU":

ee08754f8dd8284dad043067d2315090.gif

Huang Renxun said that all real-time processing is what the future game should look like.

It must be the latest RTX 40-series graphics card that can afford RacerX. AMD is about to release a new generation of GPU. Will Nvidia stay ahead this time? If the performance of the N card is stronger, what is the cost?

RTX 40 Series GPUs: Double the Performance, Double the Light Tracing Capabilities

This is Nvidia's third-generation RTX graphics card, which uses the new Ada Lovelace architecture .

b05cd5900484eff4bd4999b587d94f60.gif

After Nvidia switched to TSMC, the RTX40 series GPU using a customized 4N process was packed with  76 billion transistors , 70% more than the previous generation Ampere.

bd1b2bdc725b419224c1f43fd040c3e5.jpeg

Twenty-five years ago, NVIDIA revolutionized the world of computer 3D graphics with the introduction of programmable shader GPUs. By 2018, Nvidia launched the RTX architecture, the newly added RT Core is used to accelerate real-time ray tracing, and the Tensor Core is used to process matrix operations, achieving unprecedented results. In the just-launched Ada Lovelace architecture, all three processors have been improved and enhanced, among them:

1. The SM stream processor adds Shader Execution Recording, which can reschedule tasks in real time, increase the speed of light tracing by 2-3 times , and can output 90TFLOPS on the 4090, which is twice the performance of the previous generation.

2. The third-generation RT Core achieves twice the intersection performance of rays and triangles. The new Opacity Micromap engine can double the geometric performance of Alpha-test. The Micro-Mesh engine can increase the richness of geometry without Bring more consumption of BVH construction and storage resources.

3. The new fourth-generation Tensor Core can achieve  1.4 petaFLOPs computing power, doubling AI performance .

00e8c78513655b5cf2c8370b064fdaf9.jpeg

"Shader Execution Recording is a major innovation like the CPU out-of-order execution back then," Huang Renxun said. "Ray tracing is difficult to process in parallel, and GPU is highly parallel. SER improves efficiency by real-time rearranging shader loads, which can increase ray tracing performance by 2 to 3 times, and game performance by 25%."

But we know that ray tracing was once described by David Krik, the former chief scientist of Nvidia, as a technology that "never comes". Doubling the performance will not allow the GPU to maintain a high frame rate in today's various large-scale stand-alone games. At this time AI algorithms are needed.

DLSS uses the convolutional autoencoder AI model, which can automatically generate high-resolution images based on the low-resolution images output by the GPU, greatly reducing performance requirements. Nvidia introduced DLSS 3 in the Ada architecture, which can automatically supplement frames while increasing the resolution. DLSS 3 consists of four components: the new Optical Flow Accelerator, Game Engine Motion Vectors, Convolutional Autocoding AI Frame Generator, and Reflex Low Latency Pipeline.

DLSS 3 processes the current frame and the previous frame at the same time. The optical flow accelerator provides the neural network with motion direction and speed information. Combined with the motion vector of graphics and pixels, it can be input into the neural network to generate an intermediate frame.

"DLSS 3 generates a new frame without involving graphics pipeline processing, which can improve performance by up to 4 times compared to pure rendering," Huang Renxun said. "And games that are bottlenecked by either the CPU or the GPU can benefit from it."

Nvidia showed the running effects of games such as Cyberpunk 2077 with DLSS 3 and Microsoft Flight Simulator:

373ba48d271471c1c84df1ba389d6f92.gif

In the past 4 years, the amount of RTX data has increased by 16 times. Now some pixels are calculated, and most of them are guessed by AI.

In the demo, several games are directly multiplied by two. But it is worth noting that DLSS 3 is highly bound to new hardware features, and users of the 30 and 20 series cannot enjoy such performance.

Nvidia announced a number of general hardware specifications of the 40 series graphics card: RTX 4090 uses AD102 GPU, has 16384 CUDA cores and 24GB GDDR6X video memory, and the default TDP is 450W; RTX 4080 16GB version has 9728 CUDA cores, TDP is 320W; RTX 4080 The 12G version has 7680 CUDA cores and a TDP of 285W. From the perspective of power consumption, the efficiency of switching from Samsung 8nm to TSMC 4N process has been significantly improved, the performance of the same level has been improved, and the power supply requirements have not increased.

How much has the performance improved compared to the previous generation? Through SER optimization, a larger chip, and the chip acceleration frequency increased from 1.7GHz to 2.52GHz, the combination can double the performance of the RTX 4090 compared to the 3090Ti; if you look at ray tracing, the performance is increased by 4 times. Huang Renxun said that the same power consumption performance of Ada is twice that of Ampere.

cfb541e7db9a1e977cfb021c5e9aaddf.jpeg

Further down, the RTX 4080 can achieve twice the performance of the 3080Ti when DLSS is turned on.

Finally, the price: the RTX 4090 public version is priced at $1,599, and it will go on sale on October 12; the RTX 4080 16GB is priced at $1,199, and the 12GB version is priced at $899.

To sum up, the price of the 90 has hardly increased, and the price of the 80 has increased by $500. For domestic users, the price of the RTX 40 series looks like this: starting at 12999 yuan for 4090, 9499 yuan for 4080 (16GB), and 7199 yuan for 4080 (12GB).

It seems that the price of the non-public version of RTX 4090 will be around 15,000.

bcb2d95dd300762cf2a396f30ae89eb7.jpeg

However, there is one thing to note about this generation of graphics cards: it seems that the 12G version of the 4080 seems to have changed the name of the original 70Ti.

NVIDIA Omniverse connects the 3D world

In addition to GPU and AI, Nvidia is also the leader of the Metaverse, and Huang Renxun introduced a series of Omniverse advancements.

Omniverse is NVIDIA's platform for building and running metaverse applications that function where the digital and physical worlds meet. Omniverse is also a real-time large-scale 3D database to build a shareable 3D world. The Omniverse is more of a computing platform on which you can write applications that run on it, and those applications become a portal into the virtual world.

e7d9e64ea7c93603573d9798da17004f.gif

Today, Jen-Hsun Huang released a series of major updates to his Omniverse platform, which supports Ada Lovelace GPUs and has achieved a huge leap in ray tracing and large-scale scene performance.

First up are new neural rendering tools based on GANs and diffusion models. OmniGraph is a graph rendering engine that procedurally controls behavior, motion, and action.

3e47f4cb8f477761c3df18fa5403bee7.gif

The second is a major update to Omniverse Physics, which can be used to handle the motion of complex multi-connected part objects.

fd5c591b45541e169f20d0564ab42dc5.gif

Then there's the all-new Cloud XR, which enables Ada's powerful ray tracing capabilities in VR. There is also the first SimReady material library for data generation and digital twin simulations.

72591caf61dad0aac43753ff82af5bf3.gif

Replicator is one of the popular Omniverse apps used to generate synthetic data to train self-driving cars, robots, and various computer vision models. Finally there is the new Omniverse JT connector, a large application that makes Omniverse accessible to industry and manufacturing.

8c1923d9aed1dfeccddc4a2fe55885d7.gif

Suffice it to say, Omniverse is an enterprise platform for the entire product lifecycle, from product design and styling to engineering, manufacturing, marketing, and operations. Just as the Internet connected websites, the Omniverse connected the 3D world.

Huang Renxun showed at the meeting how some companies use Omniverse to create digital twins for factories, logistics warehouses, automated production lines and industrial plants. We can learn a thing or two from the following scenarios.

790f1c6be74cdf87feaa631e9d71776c.gif

34514b468b1bdad5f7af8fdda1bcfe8e.gif

The Omniverse computing platform consists of three parts: RTX computers, used by creators, designers, and engineers; OVX servers, which host connections to the Nucleus database and run virtual world simulations; and NVIDIA GDN, the portal into Omniverse.

Through GeForce Now, NVIDIA has built a global graphics delivery network (GDN). Covering 100 regions, the network provides a responsive, ultra-fast RTX graphics content delivery network (ie CDN) to stream Internet video efficiently. NVIDIA GDN, on the other hand, can efficiently serialize interactive graphics, and combine NVIDIA RTX PCs and NVIDIA GPUs in the cloud to create a global Omniverse computing platform.

462bc5fdee64a94c2e5aacba61e4afa9.png

NVIDIA Omniverse Cloud is a software and infrastructure-as-a-service suite for designing, publishing, and experiencing Omniverse applications anytime, anywhere, on any device. Jen-Hsun Huang presented Rimac, a pioneer in supercar and advanced electric vehicle solutions, and how it uses Omniverse Cloud to enable collaborative workflows for 3D teams and provide users with advanced 3D experiences.

79573f657c279285977b34c95e715067.gif

Huang Renxun said that NVIDIA Omniverse Cloud is an IaaS product that can be connected to the cloud, locally and on a single device to run Omniverse applications. Replicator and Farm can also run on the cloud, where Farm is an extension engine for rendering farms. Currently, users can use Replicator and Farm containers on AWS.

9f862748515735746df8fc2d2d8c099d.png

A new generation of autonomous driving chip Drive Thor

In the field of autonomous driving, car companies need stronger computing power, and the performance of each generation of Nvidia products has to double.

At present, the development of intelligent machines has set off a wave of AI, and the participation of deep learning has opened a new door for the improvement of system capabilities. Everything from how software is developed to how it is run becomes drastically different. Therefore, it is imperative to create a new generation of processors. NVIDIA Xavier is the world's first self-driving super chip designed for deep learning, and it has made a huge leap in processor performance every two years since then.

At the same time, in order to expand the field of autonomous driving and improve driving safety, the number and resolution of sensors are facing simultaneous growth. At the same time, more complex AI models are introduced, all of which drive Nvidia to continuously improve performance.

In 2021, Nvidia launched a 1000 TOPS SoC-Atlan. Today, Jensen Huang stated that its place has been taken by Thor. Thor has twice the throughput and more than twice the delivery performance of Atlan . Achieving these goals is inseparable from three factors: Grace, Hopper, and Ada Lovelace, among which Grace provides the amazing Transformer engine, ViT's rapid change, and the multi-instance GPU in Ada helps to centralize on-board computing resources, bringing Costs reduced by hundreds of dollars.

0e85f4a78e22d3f925a23edc2c2b8161.png

The Nvidia Drive Thor employs many new technologies, and it can be configured in multiple modes to fully utilize its 2000 TOPS and 2000 TFLOPs for autonomous driving workflows. It can also be configured partly for cockpit AI and infotainment, and partly for assisted driving. Multi-computation domain isolation in Thor allows concurrent, time-sensitive multi-processes to run without interruption. You can run Linux, QNX and Android simultaneously on one computer.

In addition, Thor also concentrates many computing resources, reducing cost and power consumption while achieving functional leaps. Currently, a car's parking, active safety, driver monitoring, camera mirroring, cluster, and infotainment are all controlled by different computing devices. In the future, these functions will no longer be controlled by separate computing devices, but by software running on Thor and improving over time.

The Thor chip is expected to be used in cars in 2025.

d6864a813477701e2805a62a808622ee.png

NVIDIA Drive is an end-to-end platform for the development and deployment of self-driving cars, including Replicator synthetic data generation, Drive Sim and Drive Map on the development side, and full-stack driving and in-car AI applications, AI computers and Hyperion Autonomous Vehicle Reference Architecture.

30a0be5f25a675387c57584f62e75130.png

NVIDIA Drive ushered in a series of feature updates, starting with an AI workflow called Neural Reconstruction Engine, which has become a major feature of Drive Sim. It builds 3D scenes from recorded sensor data, which can then be enhanced with human-created or AI-generated content after being imported into Drive Sim. Additionally, this video-to-3D geometry workflow can run on OVX systems.

a7dca1689b827d855e28eb01b73345f9.gif

Dynamic presentation from video to 3D workflow.

Another important feature of Drive Sim is hardware-in-the-loop, which means we can run the entire in-vehicle software stack in the AI ​​factory. It can also simulate the environment inside the car. The car of the future will not only have a simple dashboard, but also a surround display that combines digital design with physical design, so that automotive engineers, software engineers and electronic engineers can collaborate in Drive Sim, Run all real computers and software stacks simultaneously.

4e4362016fbbd5cd46af97706a4e8f28.gif

Drive Sim becomes a virtual design studio.

In addition, NVIDIA has made excellent progress in developing other aspects of the Drive end-to-end self-driving system, such as Replicator synthetic data generation, AI model improvement, Drive Map self-driving fleet map construction, urban and highway driving and parking.

8745346f02f8e62e1ad03ac4f461a470.gif

Drive Map's self-driving fleet map building.

New Micro Robot System-on-Module

Drive Orin, Nvidia's second-generation self-driving car computing chip, appears to be very successful so far, being used in more than 40 cars, trucks, and driverless taxis. Jetson, Nvidia's robot computer, has 1 million developers and is used by about 6,000 companies.

At today's GTC conference, Huang Renxun announced the launch of a micro-robot system-on-module chip Jetson Orin Nano, which is 80 times faster than the previous Jetson Nano. The Jetson Orin Nano can run the NVIDIA Isaac robotics stack and has a ROS 2 GPU-accelerated framework.

c3dba90204da762f2bb384f5f83deb9f.png

Huang Renxun also introduced Metropolis, its edge AI platform, which can interpret data from cameras, lidar and other IoT sensors to improve the safety and efficiency of warehouses, factories, retail stores and cities.

From industry to scientific research, autonomous driving to metaverse, Nvidia's business has already expanded from GPU to countless fields, and has a leading position in many places. For ordinary consumers, graphics cards are no longer used for playing games.

Now that a new generation of GPU has been launched, will you choose the RTX30 after the price reduction, or buy the new one instead of the old one?

Guess you like

Origin blog.csdn.net/u014333051/article/details/126984121