Spending US$10 billion to build, what is the strength of the most powerful AI chip in history?

In the past two days, we once again reviewed Huang Renxun's speech at GTC 2024. When we conducted a deeper analysis and interpretation of the product, we discovered some highlights that we had missed while staying up late at the time.

First, Huang’s speech style is humorous, natural, and very communicative. It’s no wonder that he can turn a technology product launch conference into a concert.

The second is to review the newly released Blackwell architecture and series of GPUs in combination with previous generations of products. I can only say that its computing performance, cost and future performance are far beyond my imagination.

Just like the name NVIDIA, the first two letters N and V of NVIDIA represent Next Version "next generation".

Like GTC in previous years, Nvidia released the next generation of products as scheduled, with higher performance and better performance; but it is completely different from before, because Blackwell represents not only the next generation of products, but also the next era.

Rediscover the most powerful GPU on earth

Self-introduction usually starts with your name, so let’s start here with the latest and most powerful AI chip.

Blackwell's full name is David Harold Blackwell. He is an American statistician and one of the proposers of the Rao-Blackwell theorem. More importantly, he was the first black member of the National Academy of Sciences and the first black tenured faculty member of the University of California, Berkeley.

The "Blackwell" released at GTC 2024 comes from this. It is not that Blackwell himself has made any outstanding contributions to NVIDIA, but that in NVIDIA's naming system, the names of some famous scientists (or mathematicians) in history are used. It has become a convention to name GPU microarchitectures by their names.

Since 2006, NVIDIA has successively launched Tesla, Fermi, Kepler, Maxwel, Pascal, Volta, Turing, and Ampere architectures, corresponding to Tesla, Fermi, Kepler, Maxwell, Pascal, Volta, Turing, Ampere and these academic leaders.

One is that it is famous, and the other is that it has materials. As for whether it corresponds to the designated product one-to-one, it is actually not that strong a correlation.

It needs to be emphasized here that the named objects mentioned above are not individual chips, but refer to the entire GPU architecture (Huang Renxun calls it a platform).

Chip Architecture refers to the basic design and organizational structure of the chip. Different architectures determine the performance, energy efficiency, processing power and compatibility of the chip, and also affect the execution method and efficiency of applications.

To put it simply, you now own a stadium (the raw material for making chips), and you plan to completely transform it. Whether the land will be used for concerts or sports games (the purpose of the chip) determines the venue layout, personnel hiring, and decoration. And the way of announcement (chip architecture).

Therefore, chip architecture and chip design are interrelated and jointly determine chip performance.

For example, x86 and ARM, which are often heard, are two mainstream architectures designed for CPUs. The former has powerful performance and the latter has excellent energy consumption control. Each has its own strengths.

Built on multiple generations of NVIDIA technology, the chips B200 and B100 under the Blackwell architecture have outstanding performance, efficiency and scale, and also open a new chapter for AIGC.

But why is it called an “AI nuclear bomb”? How powerful is the new GPU? Compared with the previous generation product, we will have a more intuitive feeling.

At the 2022 GTC, Huang Renxun released a new architecture Hopper and a new chip H100:

1. It is manufactured using TSMC’s 4nm process and integrates 80 billion transistors, which is a full 26 billion more than the previous generation A100.
2. The FP16, TF32 and FP64 performance of H100 are 3 times that of A100, which are 2000TFLOPS, 1000TFLOPS and 60TFLOPS respectively. It only takes 1 day to train a large model with 395 billion parameters. In Lao Huang’s original words, “20 pictures can carry the world’s data” Internet traffic".
3. The release of H100 has pushed NVIDIA's market value to over US$2 trillion, making it the third largest technology company after Microsoft and Apple.

According to statistical analysis by market tracking company Omdia, Nvidia sold approximately 500,000 H100 and A100 GPUs in the third quarter of last year, and the total weight of these graphics cards was nearly 1,000 tons.

So far, the Hopper H100 is still the most powerful GPU on sale, by a wide margin.

The Blackwell B200 has once again set a new record for "strongest", with performance improvements far beyond conventional product iterations.

From the perspective of process technology, the B200 GPU uses the second-generation TSMC 4nm process, using bare wafers with double the photolithography limit size, and is connected through 10 TB/s inter-chip interconnection technology to form a unified GPU, with a total of 208 billion transistors ( A single chip is 104 billion), compared with the N4 technology used to make the Hopper H100, the performance is improved by 6%. , the overall performance is improved by about 250%.

In terms of performance, the second-generation Transformer engine enables Blackwell to support double the calculation and model size reasoning capabilities through the new 4-bit floating point AI. The single-chip AI performance is as high as 20 PetaFLOPS (can perform 20×10^15 floats per second). Point operation), which is 4 times higher than the previous generation Hopper H100, and the AI ​​inference performance is 30 times higher than the previous generation.

From the perspective of energy consumption control, in the past, training a 1.8 trillion parameter model required 8,000 Hopper GPUs and 15 megawatts of power. Now 2,000 Blackwell GPUs can do this, and the power consumption is only 4 megawatts, directly reducing 96%.

Therefore, Huang Renxun’s statement that “Blackwell will become the most powerful chip in the world” is not just a lie, but has become a fact.

Not cheap, not simple to use

Analysts at financial services firm Raymond James had estimated the cost of the B200.

Nvidia's cost of manufacturing each H100 is about US$3,320, and the price is between US$25,000 and US$30,000. Based on the performance difference between the two, it is estimated that the cost of B200 will be 50% to 60% higher than that of H100, which is about US$6,000.

In an exclusive interview with CNBC after the press conference, Huang Renxun revealed that the price of Blackwell GPU is about US$30,000 to US$40,000, and the research and development of the entire new architecture cost approximately US$10 billion.

We had to invent some new technology to make it (the new architecture) possible.

According to the past rhythm, Nvidia will release a new generation of AI chips approximately every two years. Compared with previous generations of products, the latest Blackwell has significantly improved computing performance and energy consumption control. More intuitively, it combines The two-GPU Blackwell is nearly twice as large as the Hooper.

The high cost is not only related to chips, but also to designing data centers and integrating into data centers of other companies, because in Huang Renxun's view, Nvidia does not make chips, but is building data centers.

According to Nvidia's latest financial report, fourth-quarter revenue reached a record $22.1 billion, a year-on-year increase of 265%. Net profit in the fourth quarter was US$12.3 billion, a year-on-year increase of 765%.

The data center segment, the largest source of revenue, reached a record $18.4 billion, an increase of 27% from the third quarter and an increase of 409% from the same period last year.

R&D costs are high, but the positive returns are higher.

The data center NVIDIA is currently building includes a full-stack system and all software. It is a complete system. Blackwell or GPU is only one part of it.

The data center is decomposed into multiple modules. Users can freely choose the corresponding software and hardware services according to their own needs. NVIDIA will adjust the network, storage, control platform, security, and management according to different requirements, and has a dedicated team to provide technology support.

Whether such a global vision and customized services are good or not, data can tell everything: As of March 5, Nvidia’s market value has surpassed giants such as Alphabet and Amazon, and has surpassed Saudi Aramco to become the third largest company in the world, second only to The two major technology giants, Microsoft and Apple, have a combined market value of US$2.4 trillion.

Currently, the global data center market is approximately 200 billion euros (approximately RMB 787.3 billion), and NVIDIA is part of it. Huang Renxun predicts that this market is likely to grow to US$1-2 trillion in the future.

Analysis by Nvidia CFO Kress:

Data center revenue in the fiscal fourth quarter was primarily driven by generative AI and its related training. We estimate that approximately 40% of data center revenue over the past year was derived from AI.

Less than a month ago, Huang Renxun also stated in the financial report

Accelerated computing and generative AI have reached a tipping point, with demand surging across businesses, industries and countries around the world.

Indeed, customization is not exclusive to NVIDIA, but in the era of AI, there are few companies left that can provide "head-to-toe" services, and NVIDIA is one of them.

For pigs to take off, they must first be in the wind

At the intersection of virtual reality, high-performance computing and artificial intelligence, GPUs are even replacing CPUs as the brains of AI computers.

The core reason why generative AI has aroused heated discussions in various industries is that it begins to work and learn like a "human", from chatting, writing copywriting, drawing pictures, making videos, to analyzing conditions, research and summarizing… all of which are exciting Astonishing results require astronomical amounts of sample data to support them.

For example, if you can remember the name "Ai Fan'er", it may be because the information pushed by the public account every day has strengthened your memory by repeating it; it may also be because you have never seen the combination of "Ai" and "Fan'er" before, which is novel. The sense of it leaves a deep impression on you; or maybe the orange logo leaves a unique visual symbol in your mind.

Every simple little detail consolidates the image of "Faner" in your mind, but when the national technology media information is mixed together, more symbols are needed to deepen the impression to avoid confusion.

AI's deep learning probably follows this logic, and GPU is the best choice for processing massive amounts of information.

Since OpenAI ignited the AIGC, most well-known companies have begun to quickly put their own large and small models on the shelves. Smart cars, translation software, electronic documents, mobile assistants, and even sweeping robots all have AI.

GPUs have become the object of global competition seemingly overnight. According to statistics from market tracking company Omdia, Tencent, Alibaba, Baidu, ByteDance, Tesla, Meta and Microsoft have each purchased 150,000 units. H100 GPU (last year’s most powerful chip).

Technical principles and the background of the times have jointly promoted the explosion of GPUs and created the "graphics card empire" belonging to Nvidia. According to statistics from Wells Fargo, Nvidia currently has 98% market share in the data center AI market.

Standing on the wind outlet, even a pig can fly.

But when a company's share in an industry is close to 100%, there must be a reason behind it that is as important as being at the forefront.

In 1999, NVIDIA took the lead in proposing the concept of GPU, and launched CUDA in 2006. This was an important technological turning point in the history of NVIDIA's development. It lowered the application threshold of GPU. Developers can use C/C++ and other languages ​​​​to build on the GPU. Writing programs, GPU has broken away from the sole purpose of image processing, and high-performance computing has entered the world of graphics cards.

The victory of AlphaGo in 2016, the surge of Bitcoin in 2017 and the mining boom, during this period, we bet on the autonomous driving market. Until the advent of large AI models such as ChatGPT in 2023, NVIDIA ushered in a harvest moment for the seeds it planted many years ago.

The outlet is important, but forward-looking market layout, diversified application areas, massive investment and innovation, any link that is out of place will not create the current market myth of close to perfect scores.

However, for Nvidia, how to maintain its leading position at the crossroads of the times is the most important issue.

Blackwell is a key step in consolidating the results. Before many manufacturers have received the ordered H100, the assembly lines of B200 and B100 have already started.

In his speech, Huang Renxun reiterated the point he made in previous financial reports that "general-purpose computing has reached a bottleneck."

So now we need bigger models, we need bigger GPUs, and we need to stack GPUs together.

This is not about reducing costs, but about increasing scale.

There is some humility in this, but of course there is also huge demand in the market.

At present, OpenAI's largest model already has 1.8T (trillions) parameters and needs to handle billions of tokens (strings). Even with a PetaFLOP (petaflops) level GPU, training such a large model requires 1,000 years to complete.

Hopper is great, but we need a more powerful GPU.

The first wave of discussion brought by GTC 2024 has slowly faded away in the past few days. It is foreseeable that the Blackwell GPU series, fifth-generation NVLink, and RAS engines at the conference will bring more when they go to the market. Shocking; it is difficult to predict how many surprises and changes "the tipping point that generative AI has reached" will bring to the world?

At the moment when AIGC breaks out and on the eve of the arrival of AGI, the string of AI firecrackers set off by Nvidia has only exploded for the first time.

# Welcome to follow the official WeChat public account of aifaner: aifaner (WeChat ID: ifanr). More exciting content will be provided to you as soon as possible.

Ai Faner | Original link · View comments · Sina Weibo