Flagship chips perform poorly, this “pot” can’t just let Samsung back

Even if you are not a consumer electronics enthusiast, you should know the stalk of "fire dragon" in the past two years.

▲ Picture from: "Game of Thrones"

The main reason is that in recent generations of Android flagship chips, the power consumption has been “rolled over” one after another. High performance is often accompanied by high energy consumption, which is also accompanied by a steep rise in mobile phone heat.

This brings one advantage and one disadvantage. The advantage is that manufacturers are getting higher and higher levels of "heat dissipation", while the disadvantage is that chip tuning is becoming more conservative, as well as a lower temperature control wall.

Under the continuous high-performance squeeze (such as running "Yuan Shen"), basically in 10~15 minutes, the products will actively reduce the frequency of the chip's super core. If the temperature is still high, the next limit is the big core.

On the other hand, using flagship chips with more advanced technology, under daily conditions, there should be an improvement in battery life.

However, in use, the improvement of battery life is "sprinkling water", which has little effect, and depends on high-power fast charging to continue life.

In addition, there is another point, the recent instability of the 4nm process in Samsung's foundry can be regarded as a reason. Its own Exynos 2200 flagship chip has also underperformed, not intentionally.

As a result, Qualcomm, which was "deeply hurt", also announced the use of TSMC's 4nm process on the just announced Snapdragon 8+ Gen1 Soc, and directly stated in the PPT that the performance was improved by 10% and the power consumption was reduced by 30%. %.

It seems that the poor performance of Android flagship chips is all because Samsung's 4nm process is "too bad", so is TSMC a "rescue"?

TSMC's 4nm is just a "figure cloth"

TSMC and Samsung are almost the two major oligarchs in the world's advanced process chip production. The two almost dominate the world's market for chips produced below 10nm.

▲ Picture from: wccftech.com

In a few years, from 10nm to 4nm, and they are also building 3nm production lines and foundries, the competition is intensifying.

Unlike TSMC's pure foundry, Samsung is a vertically integrated manufacturing (IDM) company that integrates independent design chips, production chips, and Exynos' own chips.

10 years ago, Samsung wanted to be ahead of TSMC, and Apple's A4 chip was also magically modified from Samsung Exynos, and it was manufactured by it.

Due to Samsung's special identity, and the fact that the screen and memory are dependent on Samsung, the risk is too high, and Apple began to support TSMC to transfer the risk.

After twists and turns, TSMC built a new production line, allocated a professional team, and finally won the exclusive foundry of Apple's A8 chip. Coupled with the unprecedented sales of the iPhone 6 and 6 Plus, TSMC has benefited a lot from it.

Subsequently, Apple's A-series chips began to be bound with TSMC and helped its development through resource tilt. Today, Apple's A-series and M-series chips are all foundry by TSMC, and they have become the highest priority customers, none of them.

▲ TSMC and Apple are deeply bound. Image from: appuals.com

At the same time, the "myth" of high stability of TSMC chip foundry has been created.

Both 5nm and 4nm lag behind TSMC's Samsung, and they are not discouraged, but have a stud. It announced an investment of 133 trillion won (about 8000 trillion yuan), aiming at the 3nm process, and thereby becoming the world's largest SoC manufacturer.

▲ Picture from: Samsung

And, abandoning FinFEET technology, but one step to GAAFET transistor technology, so as to achieve the overtake of TSMC, success or failure is here.

Going back to the present, Samsung's 5nm and 4nm wafer density and process stability are not as good as TSMC, so there is indeed a certain gap when it is fed back to the flagship chip.

The MediaTek Dimensity 9000 at the beginning of this year uses TSMC's 4nm process, the Cortex-X2 ultra-large core (3.05GHz), the A710 large core (2.85GHz), and the A510 medium core (1.8GHz) in the 1+3+4 three-cluster architecture. ) are far higher than the Qualcomm Snapdragon 8 Gen1.

In theory, it has higher performance and better energy efficiency, making it a perfect flagship chip.

However, after waiting for a few months, when the flagships equipped with the Dimensity 9000 are launched, the real energy efficiency performance is actually not much different from the Qualcomm version.

In this high-profile promotion of Qualcomm, when the Snapdragon 8+ Gen1 using TSMC's 4nm process will have better performance, I actually didn't have high expectations.

▲ After the release of the Snapdragon 8+ Gen1, many manufacturers' "Super Cup" will also return, and the highlight is coming.

In view of the comprehensive overclocking of Snapdragon 8+ Gen1 (Cortex-X2 3.2GHz + A710 2.75GHz + A510 2.0GHz), the absolute performance will be improved. As for the improvement, it depends on the adjustment of manufacturers, and the same is true for energy efficiency.

In this way, TSMC's 4nm process technology is more like a "fig cloth" for flagship chips, covering the new architecture of Arm's extremely weak public version.

The Arm public version architecture is the "culprit"

In the past ten years, Arm has changed 9 versions of the architecture, and the latest Armv9 is relatively an important instruction set upgrade.

With the upgrade of the instruction set, Arm also announced the public version of the CPU IP, which is the super-large core Cortex-X2, large core (performance core) Cortex-A710 and Mid-core (performance core) Cortex-A510.

▲ Picture from: Arm

The public version of the CPU architecture still uses the three-plex architecture, namely 1+3+4. It is an evolution of the previous big.LITTLE architecture. The purpose is nothing more than "the right core for the right job" to improve energy efficiency.

The mixed architecture of large and small cores is now widely used in desktop and mobile CPUs of X86 and Arm architectures.

▲ Intel 12 also adopts the hybrid architecture of P+E.

Arm's public version of the three-cluster architecture, if each performs its own duties, the super-core X2 provides absolute performance, the large-core A710 shares the daily performance requirements, and the mid-core A510 completes the corresponding tasks with low power consumption.

The three cores, each with its own purpose, should be inclined in design and invocation.

Cortex-X2, which is a fully optimized version of X1, doubles the L3 cache to 8MB, increases the cache area, optimizes the communication delay, and obtains a 16% IPC improvement (also understandable performance).

▲ The super-large core has improved significantly. Image from: Arm

From the follow-up products, the Snapdragon 8Gen1 and Dimensity 9000 do have better performance than the Snapdragon 888 when the performance is fully turned on, and the power consumption does not "explode".

It is reasonable to trade high power consumption for high performance.

But the big core and the middle core have big problems, and it is these two cores with new "names" that cause the flagship chips to frequently roll over.

Cortex-A710 does not use a newer architecture, it is still the optimization of the classic A78, and it may be more accurate to call it A79.

Anandtech called this new name "an interesting marketing tidbit", and the performance of the A710 is self-evident.

▲ High energy consumption and high performance. Image from: Arm

On Arm's PPT, the A710 has a 10% performance improvement, while also optimizing 30% energy efficiency. However, from the curve point of view, the higher performance is mostly located in the high energy consumption part, and is obtained by doubling the L3 cache (8MB).

The optimization of energy efficiency only reduces the distribution throughput of the A710 core (from 6 to 5), not from the optimization of the architecture.

▲ Do not imitate. Image from: tenor

The A710 is an optimized version of the A78, and the A78 is an overclocked version of the A77. For a few years, the design team of Arm's big core is still exploring the potential of the A77 architecture, but after the A78 reaches the architecture's sweet spot frequency, the energy efficiency ratio of the A710 is thundering, especially when the system needs high performance but is not enough to switch to the X2 ultra-large When the core is used, the power consumption takes off directly.

Even, Arm directly uses the 4nm A78 with the X2 ultra-large core, which may have better results.

As a big core, the A710 needs more performance than energy efficiency design. Arm is in the wrong direction.

▲ Newly designed A510. Image from: Arm

Relatively speaking, the core of the Cortex-A510 is a new design architecture. And unlike the Austin team that designed the two cores of X2 and A710, it was designed by the Cambridge team.

The A510 architecture adopts many innovative design ideas, such as the use of "hyper-threading" to share the L2 cache, and at the same time, the L1, L2, and L3 bandwidth is increased by twice that of the A55, thus improving floating-point performance by 50%, and integer operations also have a 35% improvement.

However, the A510 still uses "sequential execution" rather than the "out-of-order execution" of the energy-efficient cores in Apple's A-series chips. To prevent instruction latency, the front end of the A510 was increased, the cache was doubled, and the back end was enlarged.

▲ Some honest Arm, note that the vertical axis is energy consumption. Image from: Arm

The design idea is also relatively clear, just for better "performance". Just the end result, but with little success.

From Arm's PPT, the A510 can only get better performance than the A55 in the case of high power consumption.

However, in terms of low power consumption, which is the focus of the energy efficiency core, it is difficult to open the gap with the A55, and there are even some "reversing".

▲ Do not imitate. Image from: tenor

Overall, among the three-cluster architectures that Arm has featured in recent years, only the Cortex-X2 ultra-large core is a relatively normal change. The large-core Cortex-A710 focuses on energy efficiency, while the mid-core Cortex-A510 has begun to focus on peak performance.

The Arm public version of the CPU IP is still like this, so don't expect the flagship chip to be modified on this basis, how good the performance can be.

If you don't want to embrace the 64-bit app ecosystem of the big factories, you have to come out and take the "pot"

After the release of Armv9, the biggest change is to completely abandon 32-bit applications and fully embrace 64-bit applications.

In other words, in the three-cluster architecture, in theory, all cores no longer support 32-bit applications, but for the Android application environment in the Chinese market, Arm has specially approved the cores in the A710 to be compatible with 32-bit applications.

That is to say, when you open a 32-bit app, it will force the A710, a high-energy-consuming core, to remain active, even if you just turn off the screen to listen to a song.

In fact, since Armv8, Arm has been promoting 64-bit applications, and the Google Store also stipulated that new programs must support 64-bit applications in August 2019.

However, many domestic app manufacturers have not made improvements. Many commonly used apps, such as Alipay, QQ, and NetEase Cloud, are still 32-bit, and there is no plan for when the 64-bit version will be launched.

In addition, the software stores of many domestic Android manufacturers do not have corresponding 64-bit app partitions, and 32-bit and 64-bit apps are mixed.

However, OPPO, vivo, and Xiaomi have already begun to popularize 64-bit apps. The first stage is to restrict new apps to be 64-bit. As for the commonly used apps, no relevant measures have been released for the time being.

In recent years, Android flagship chips have frequently encountered problems. The most fundamental reason is that the design direction of the Arm public version architecture violates the original intention of the three-cluster architecture, and the domestic manufacturers do not actively embrace 64-bit apps.

As for whether it is TSMC or Samsung, Dimensity or Qualcomm, on the device side, the difference between them is far less than the numbers on the PPT.

#Welcome to pay attention to the official WeChat account of Aifaner: Aifaner (WeChat: ifanr), more exciting content will be brought to you as soon as possible.

Love Faner | Original link · View comments · Sina Weibo