Late night bombing! Nvidia releases the world’s most powerful AI chip, with performance soaring 30 times. Huang is the Steve Jobs of this era

Just now, NVIDIA released the world’s most powerful AI chip.

Generative AI has reached a tipping point.

The two-hour GTC 2024 conference was more like a large-scale concert. Jim Fan, a senior scientist at NVIDIA, joked that "Jensen Huang is the new Taylor Swift."

This is probably the current position of Nvidia's Jen-Hsun Huang in the AI ​​industry.

Last year, Huang Renxun called out that AI's "iPhone moment" has arrived, allowing us to see how daily life is being rewritten by AI, and today it shows that the speed of this change is accelerating crazily.

In the past 10 years, NVIDIA has advanced AI approximately one million times, far exceeding Moore's Law. In other words, NVIDIA is writing its own iteration law. From chip computing power to AI implementation, from automobile manufacturing to medical logistics, NVIDIA has promoted the development of various industries while making its own progress.

Moore's Law is dead, but NVIDIA gave birth to a new Moore's Law.

Except for computer graphics cards, NVIDIA is rarely perceived by us in normal times, but the technological progress of many products around us is always inseparable from them. After reading this first summary of GTC 2024, maybe you can have a more obvious perception of the wave of AIGC.

A tweet sent by OpenAI CEO Sam Altman on X last night may be a footnote of the times:

This is the most interesting year in human history, except for all future years
This has been the most interesting year in human history, but it will be the most boring year in the future.

The world's most powerful AI chip is born, and its performance rockets

This is the most advanced GPU in production in the world today.

The protagonist of the press conference is the "Blackwell B200" AI chip. Huang Renxun said that the name of this chip comes from the mathematician, game theorist, and probability theorist David Blackwell.

Based on TSMC's 4NP process, the computing chip under the Blackwell architecture has 104 billion transistors, which is another breakthrough compared to the 80 billion transistors on the previous generation GH100 GPU.

Blackwell B200 is not a single GPU in the traditional sense. It is composed of two Blackwell GPUs + a Grace CPU chip, and is connected through 10 TB/s NV-HBI (Nvidia High Bandwidth Interface) to ensure that each chip can Run alone.

Therefore, the B200 actually has 208 billion transistors and can provide up to 20 petaflops of FP4 computing power. The two GPUs combined with a single Grace CPU can increase the work of LLM (Large Language Model) inference by 30 times. efficiency.

The performance of GB200 will also be greatly improved. In the GPT-3 LLM benchmark with 175 billion parameters, GB200's performance is 7 times that of H100, and its training speed is 4 times that of H100.

What's more, it reduces cost and energy consumption by 25 times compared to the H100.

Previously, although NVIDIA's AI processor H100 was very popular, the peak power consumption of each H100 was as high as 700 watts, exceeding the average power consumption of ordinary American households. Experts predict that as a large number of H100s are deployed, their total power consumption will be as high as A large American city is comparable to, or even larger than, some small European countries.

Huang Renxun said that training a 1.8 trillion parameter model previously required 8,000 Hopper GPUs and 15 megawatts of power. Now 2,000 Blackwell GPUs can do this with only 4 megawatts of power consumption.

The powerful performance of Blackwell B200 GPU can also be perfectly reflected in terms of energy consumption. The B200, which uses the latest NVLink interconnect technology, supports the same 8GPU architecture and 400GbE network switches. While its performance is greatly improved, it can achieve the same peak energy consumption (700W) as the previous generation H100/H200.

Another point worth noting is the computing power of FP4. Huang Renxun said that in the past eight years, AI computing power has increased by a thousand times. The most critical improvement is the second-generation Transformer engine, which has significantly improved computing, bandwidth and model size through FP4 computing power.

Compared with the FP8 computing power commonly used in AI, the B200's two computing chips make its performance 2.5 times that of the H100. The computing power of each chip under the Blackwell architecture is 25% higher than the previous generation Hopper chip.

NVIDIA senior scientist Jim Fan calls the new Blackwell B200 GPU "a new performance beast."

The computing power of B200 exceeds 1 Exaflop within a single architecture. The performance of the first DGX delivered to OpenAI by Jen-Hsun Huang is 0.17 Petaflops. The GPT-4-1.8T parameters can be trained on 2,000 Blackwell units within 90 days.

It is no exaggeration to say that a new Moore's Law was born.

Since Blackwell is available in several different variants, Nvidia also provides specs for the full server node, with three main options.

The first is the largest and most powerful GB200 NVL72 system, which is configured with 18 1U servers, each server is equipped with two GB200 super chips. The system provides 72 B200 GPUs with 1440 Peta FLOPS of FP4 AI inference performance and 720 Peta FLOPS of FP8 AI training performance. It will adopt a liquid cooling solution. One NVL72 can handle 27 trillion parameter models (GPT-4 The maximum parameter does not exceed 1.7 trillion parameters).

Another spec is the HGX B200, which is based on using eight B200 GPUs and an x86 CPU in a single server node. Each B200 GPU can be configured up to 1000W, and the GPU provides up to 18 petaflops of FP4 throughput, which is slower than the GPU in the GB200 10%.

Finally, NVIDIA will also launch the HGX B100, which has the same general specifications as the HGX B200, equipped with an x86 CPU and 8 B100 GPUs, but will be directly compatible with the existing HGX H100 infrastructure and allow the fastest deployment of Blackwell GPUs, each The TDP of the GPU is limited to 700W.

Prior to this, Nvidia has become a multi-trillion-dollar company through AI chips such as H100 and H200, surpassing leading companies such as Amazon. The new Blackwell B200 GPU and GB200 "super chip" released today are very promising. It may extend its lead and even surpass Apple.

The era of software defining everything is coming

In 2012, a small group of researchers released a breakthrough image recognition system called AlexNet, which at the time far outperformed previous methods on the task of classifying dogs and cats, making it a leader in deep learning and convolutional neural networks. An iconic demonstration of the potential of (CNN) in image recognition.

It was precisely after seeing the opportunities in AI that Huang Renxun decided to bet all on AI. What’s interesting is that in the past, it was used to recognize generated pictures and generate text, but now it is to generate pictures through text.

So when the wave of generative AI arrives, what can we do with it? Jen-Hsun Huang gave some standard answers.

Traditional weather models combined with NVIDIA's weather model Cordiff can achieve forecasts that explore areas of hundreds or even thousands of kilometers, providing the range of impacts such as typhoons, thereby minimizing property losses. Cordiff will also be open to more countries and regions in the future.

Generative AI can not only understand images and audio through digital capabilities, but can also use its huge computing power to scan billions of compounds to screen out new drugs.

As an AI arms dealer, Huang Renxun also introduced the NiMS system, which specializes in assisting in the development of AI chips. In the future, you may even have the opportunity to form an AI super team. After breaking down the task into a series of subtasks, you can let different AIs complete tasks such as retrieval and software optimization.

The facilities, warehouses, and factory buildings of the future will be software-defined.

Whether it is humanoid robots, self-driving cars, or manipulating arms, these autonomous robots require software-level operating systems. For example, through the combination of AI and Omniverse, NVIDIA built a robot warehouse covering an area of ​​100,000 square meters.

In this physically accurate simulated environment, 100 ceiling-mounted cameras map all activity in the warehouse in real time using NVIDIA Metropolis software and the route planning capabilities of autonomous mobile robots (AMRs).

These simulations also include software loop testing of the AI ​​agent to evaluate and optimize the system's ability to adapt to the unpredictability of the real world.

In one simulated scenario, the AMR encountered an incident on its way to pick up a pallet, blocking its intended route. Nvidia Metropolis then updated and sent a real-time occupancy map to the control system, which calculated the new optimal path.

Warehouse operators can also ask questions to the visual model through natural language, and the model can understand details and activities and provide instant feedback to improve operational efficiency.

It is worth mentioning that Apple Vision Pro also appeared at this conference. Enterprises can easily stream interactive universal scene descriptions (OpenUSD) of 3D applications to Vision Pro in real time through Omniverse Cloud, helping users explore virtual worlds like never before.

The end of the press conference was the familiar robot segment. As Huang Renxun said, the moment he opened his hands and stood with other humanoid robots, at this time, "the intersection of computer graphics, physics, and artificial intelligence, this… Everything starts at this moment.”

▲ Little easter egg

Ten years ago at GTC, Jen-Hsun Huang first emphasized the importance of machine learning. While many people still regarded Nvidia as a manufacturer of "game graphics cards," they were already at the forefront of the AI ​​revolution.

In 2024, known as the first year of AI applications, NVIDIA has already used AI software and hardware to empower various industries in many fields: large language models, conversational AI, edge computing, big data, autonomous driving, bionic robots…

Drug discovery is not our expertise, computing is; building cars is not our expertise, the AI ​​computers needed to build cars are. Frankly, it's hard for a company to be good at all of these things, but we're very good at the AI ​​computing part of it.

Compared with the leader in a single industry, NVIDIA is more like a "big man behind the scenes." As long as AI is mentioned, NVIDIA must be an unavoidable topic.

As Huang said, NVIDIA is already a platform company.

It was the advance deployment and the general trend of historical development that enabled Nvidia to occupy more than 70% of the sales of the AI ​​chip market at the beginning of the AI ​​era. The company's valuation also exceeded US$2 trillion not long ago.

Perhaps this is also the reason why Apple gave up building cars after struggling for many years and invested heavily in generative AI. Whether it is economic benefits or technological trends, it is too worthy of a gamble.

At a time when we are still questioning the usefulness of "AI", NVIDIA has proven with actions that AI has become an indispensable part of the new era.

Author: Li Chaofan, Xiao Fanbo, Mo Chongyu

# Welcome to follow the official WeChat public account of aifaner: aifaner (WeChat ID: ifanr). More exciting content will be provided to you as soon as possible.

Ai Faner | Original link · View comments · Sina Weibo