Start with GPU specifications, architecture, cost and performance

Start with GPU specifications, architecture, cost and performance
insert image description here

Nvidia was the victim of a cyber attack at the end of February, which was hacked and lost a lot of data. This hack was a disaster not only for Nvidia, but for the national security of all chip companies and all "Western" countries.
According to reports, the hacked data includes detailed specifications and simulation data of Nvidia's next-generation GPU Hopper and Ada. Hopper is shipping now, and released on GTC by Nvidia. The specs exactly match this leak, but Ada, named after Ada Lovelace, is still a few months away.
Ada, the next generation of client and video professional GPUs will be the subject of this article. Based on leaked specifications and simulations, SemiAnalysis and Locuza teamed up to analyze various chip architectures, die size, and cost analysis of GPU ASICs.
SemiAnalysis and Locuza did not download any leaked files from the LAPSUS$ hack, but many shared excerpts online.
Based on these excerpts from the leak, the following specs can be extracted for Nvidia's next-generation Ada Lovelace GPU lineup and will be compared to the current-generation Ampere GPU lineup.
Reference link
https://mp.weixin.qq.com/s/fJfQv8_PmoEIDp8_Y74Cfg
https://mp.weixin.qq.com/s/B_pNd0662c0t1gb7HwwBsQ
https://mp.weixin.qq.com/s/bSowhmoRqVJm5jHArm6XsA
https:/ /semianalysis.substack.com/p/nvidia-ada-lovelace-leaked-specifications?s=r
insert image description here

A block diagram of each chip, architectural analysis, estimated die sizes, how those die sizes were derived, and some cost and positioning analysis will be presented.
insert image description here

The standout among Ada's architectures is the AD102, which is estimated to be around 611.3mm². Compared to the previous generation GA102, this is a huge leap, as with 5 additional GPCs, you get a 70% increase in CUDA cores. The memory bus width remains the same at 384 bits, but memory speeds are expected to increase slightly to around 21Gbps. Despite the increase, that's not enough to support this beast chip running. AD102 has 96MB L2 Cache, much higher than the 6MB L2 Cache of the previous generation GA102.
insert image description here

Interestingly, this is the same amount of L2 cache that AMD's Navi 22 GPU has "InfinityCache". Hopefully Nvidia named the large L2 "Nfinity Cache" just to appeal to everyone.
AMD's Infinity Cache is an L3 cache, and despite differences in cache hierarchy between the two vendors, the general trend in hit rates is expected to be the same. Taking AMD as an example, 1080p hit rates are 78%, 1440p hit rates are 69%, and 4k hit rates are 53%. These high hit rates help reduce memory bandwidth requirements.
If Nvidia's large L2 worked in a similar fashion, it would greatly help feeding the AD102, albeit with a slight increase in memory bandwidth. High-end configurations of the Ada should come with 24GB of GDDR6X, but expect some configurations to drop as a result.
insert image description here

The AD103 has a very interesting configuration, estimated to be around 379.69mm². This is a huge downgrade compared to the AD102. This is probably the largest near-term memory gap between the top chip in the GPU generation and the second chip, where the AD102 has more than 70% more CUDA cores than the AD103.

Another interesting thing is that the CUDA core count is exactly the same as the current generation high-end GA102. The memory bus adopts 256-bit bus, which is much smaller than the 384-bit bus of AD102. As such, AD103 based gaming GPUs have a maximum capacity of 16GB, but scaled down versions may exist. Although the memory bandwidth is much lower than the GA102, the inclusion of 64MB of L2 cache will still allow this GPU to be fed.

Given that Nvidia will be using a custom TSMC "4N" node, expect a higher clock frequency than the GA102. The clock increase coupled with architectural improvements will allow the AD103 to outperform the current generation flagship RTX 3090 Ti; if brought to a power-hungry desktop. It's important to note that the GA103 was never seen on desktops and was only available on the high-end of notebook GPUs, so it's likely that this will be the case again with the Ada generation.

insert image description here

The AD104 is estimated to be around 300.45mm², the best choice in the Ada series due to its performance and cost-effectiveness. The 192-bit bus brings 12GB of memory to the gaming GPU, a high enough capacity while keeping the bill of materials (BOM) at a reasonable level.

Meanwhile, the 104 designs of Nvidia GPUs tend to have similar performance to the previous generation 102s. If this trend continues, cost/performance should be excellent. In fact, there may even be more, as Nvidia may increase the clock quite a bit to reach performance levels above the 3090.

Expect Nvidia's top-of-the-line AD104 desktop GPU with GDDR6X to power up to 350W or even 400W. So expect this to be the GPU that most enthusiasts end up buying. GPUs can also be efficient, expect this to happen without G6X memory and clock backing a bit.
insert image description here

The AD106 is a true mass market GPU, estimated to be around 203.21mm². Possibly the largest capacity GPU in the series, as 106 GPUs is the largest capacity for the Pascal, Turing and Ampere generations. Since it is a 128-bit bus, it is mainly equipped with 8GB of memory.

In high-end configurations, expect similar performance to the GA104, which is at its best in the 3070 Ti. Given that there are only 3 GPCs in the AD106 and 6 GPCs in the GA104, this assumption may be a bit too optimistic.

This GPU will also be the highest-capacity GPU in a mobile device. With 32MB of L2 cache, GPU cache hit rates might be 55% in 1080p, 38% in 1440p, and 27% in 4k like AMD's Navi 23.

Before discussing this generation's baby AD107, a little background is required.

The data posted on Twitter from the leaked document does not specify the cache size for this GPU. Previous GPUs assumed the same 16MB per 64-bit memory controller/frame buffer partition (FBP). For the AD107, this doesn't make much sense, as the number of GPCs and bus width remain the same, while the TPC per GPU only drops to 4. If the L2 cache remains the same, the die size will only drop from ~203.21mm² to ~184.28mm². This tiny reduction is not enough to separate the two GPUs in the stack.
insert image description here

Instead, assume a similar relationship to TU116 and TU106 for Turing generation GPUs. The TU116 has an FBP with 0.5MB of L2 cache instead of 1MB like the TU10x. If the same 50% L2 cache mode is applied to each FBP, the AD107 is finally estimated to be approximately 145.54mm². This seems much more reasonable for product positioning and cost.

insert image description here

With these assumptions, the AD107 appears to be an excellent mobile GPU. Since no more PCIe lanes are needed, it's tuned to 8 lanes, and Nvidia typically moves its bottom GPUs down to this lane count. Performance is good enough to beat Intel's best Meteor Lake iGPU configurations, but cheap enough for some low-cost laptops.
Overall, Ada is a pretty interesting lineup. At the high end, there is a considerable increase in performance (and power consumption). The AD102 has a similar die size to the GA102, but uses the more expensive custom TSMC 4N process technology instead of the cheaper custom Samsung 8N process technology.
The considerable density increase of TSMC's N4 derivative relative to Samsung's 8nm derivative justifies the cost.
Interestingly, despite being a much newer node, SemiAnalysis sources report that TSMC's N4's parametric yield is actually slightly better than Samsung's 8nm node, albeit with a similar disastrous yield. This is basically not a problem for GPUs, as yield can be harvested on almost every chip.
insert image description here

In terms of die size and overall BOM, the rest of the Ada lineup gets more modest. Although the wafer cost is much higher, the performance should generally be higher than the ampere at the same power, but the manufacturing cost is much lower. Played around with wafer cost and chip calculators to get some cost estimates, but in the end Nvidia's cost is just a fraction of the end user price. Nvidia sells the chips marked and negotiates pricing for memory used by ODM/AIB. ODM/AIB partners still have to purchase and integrate memory as well as power components and cooling systems at potentially low profit margins.
Nvidia seems to have optimally balanced L2 cache size and memory bus width. Memory size will remain reasonable as most GPUs will have 16Gb G6X or G6. Generally speaking, AD104 is replacing GA102 and AD106 is replacing GA104 in the performance layer. Memory costs the same, and it costs less to make chips. Board-level components such as packaging, cooling and power components are cheaper due to higher efficiency and smaller circuit boards.
When comparing the same die in the stack (eg GA104 vs AD104), there is an increase in memory size, but this is needed because 8GB is too little for the segment and 16GB is too expensive.

However, the fear of high power should be taken into account. Nvidia will likely pump energy into each chip like the previous generation. In fact, it is conceivable to do what would push the power to 1 chip higher in the stack, ie the top AD104 configuration reaches 3080-level power consumption, and the top-level AD106 configuration reaches 3070-level power consumption. Rumors point to the top-of-the-line AD102, breaking a new record for GPU power consumption.
Next, we'll break down how these die size estimates were derived.
The first step in chip size analysis is to collect architectural changes about Ada and compare them to Ampere. The SM architecture is 8.9 instead of 8.6, so this is mostly a generational improvement. Therefore, assume that the SM size is increased by 10%. Not sure what the changes to the SM architecture are, but may include 192Kb L1 cache and tensor cores.
The biggest possible change in mind is the addition of a new 3rd Gen RT core. On the IO side, leaks suggest that NVLink has been removed from the lineup entirely, suggesting that Nvidia won't be launching the Ada lineup for multi-GPU datacenter and professional visualization applications. Expect PCIe 5.0, better memory controllers, GDDR6X for higher speeds, and DisplayPort 2.0 to be included. Possibly including newer NVENC and NVDEC, which should mix AV1 encodings together.
insert image description here

The biggest change in Ada is of course the L2 cache. Instead of using the small L2 cache, Nvidia appears to have borrowed from AMD's Infinity Cache and used a larger cache across the board. Given that it has most of the specs, Ampere's GA102 IP block can be used to create a hypothetical GPU die with similar specs to the AD102. This does not take into account certain changes such as SM architecture changes, larger encoder blocks, PCIe 5.0, Displayport 2.0, or memory controllers tuned for GDDR6X.
insert image description here

By using the GA102 building block, a die size of 1629.60mm² was obtained for this hypothetical Ampere GPU, which has the same configuration as the AD102, but at 8nm. What's immediately noticeable is that the L2 cache is huge. AMD has a larger-capacity L3 Infinity Cache on its Navi 21 GPU, but doesn't allocate such a large area dedicated to that cache. Yes, AMD is on the denser N7 node, but that's one small piece of the puzzle. Most of the difference in density comes from the layout and configuration of the L2 cache.
The GA102 uses 48 SRAM slices of 128KB with 1MB of L2 per 64-bit memory controller/frame buffer partition (FBP). On the other hand, the GA100 uses 80 SRAM slices of 512KB. As you can see from the comparison with AMD's L2 cache, these larger slices seem to increase the density considerably.
The density improvement of the GA100 is much more than a process node reduction. The same effect can be seen with AMD's L3 Infinity Cache.
insert image description here

Although AMD is inferior to Nvidia in many design elements, it is believed that certain areas such as cache and packaging are undoubtedly better. It is believed that much of this stems from the pedigree of the CPU team. AMD is very good at making extremely dense high-performance caches for GPUs, as shown by Infinity Cache. In fact, Nvidia's 96MB L2 is still nowhere near AMD's 96MB L3 Infinity Cache in final chip size estimates.
insert image description here

Anyway, just shrinking from Samsung 8 to TSMC 4 won't get the GA102 building block to a reasonable die size. Instead, cache design requires architectural rework. The leak tells that there is now 16MB of L2 per 64-bit memory controller in the AD102's FBP. It is estimated that Nvidia will migrate to SRAM slices of 48, 2048KB.
With this cache configuration, the theoretical cache bandwidth can be calculated using these numbers.
AMD has 1.99TB/s of Infinity Cache bandwidth on Navi 21 at 1.94GHz. If you assume that Nvidia runs at the same 1.94GHz on the AD102, it will be able to achieve 5.96TB/s of bandwidth on its L2. Final product clocks will vary, but expect frequencies around 2.25GHz to be realistic for Ada in a desktop. RDNA3 is expected to be clocked above 2.5GHz on desktops. Nvidia is making design choices to use high-bandwidth caches at the expense of density.
Nvidia could have introduced a higher density cache, 8-16MB per slice. This could make L2 density similar to AMD's Infinity Cache, but cause L2 bandwidth to drop below Ampere's bandwidth. In the end, this may not be an option.
The impact of this different cache architecture on the L2 region of the AD102 building block is estimated. Then a shrinkage factor was applied to TSMC's N7 and another shrinkage factor was applied to TSMC's N4. The SRAM seems to use a 60:40 SRAM with logic split, which helps to impact the shrinkage of the SRAM used. A total growth factor of 10% was applied to SM to account for any architectural changes there, with different shrink factors for various digital logic blocks based on the mix of SRAM and logic (typically 30:70).
insert image description here

In the end, keep the analog portion of the chip the same, as the shrinkage is small, but these will be balanced by upgrades that may increase area, such as PCIe 5.0, GDDR6X memory speeds, and DisplayPort 2.0. Removed NVLink in these figures. Finally reached ~611.3mm². This is independently consistent with kopite7kimi's claim that the die size is about 600mm².
After gathering a small overview, it's time to start with the configuration of the rest of the lineup. GPC, count, TPC count, L2 size, command buffers, various PHYs, crossbars, etc. can all be dynamically scaled down based on GPU configuration. All the numbers chosen for the shrink factor are somewhat arbitrary based on fabrications of statements made by TSMC and the actual product, so it's a bit of a shot in the dark at the end. For the AD107, a different cache architecture is slightly abandoned because there is less cache per FBP.
Overall, Ada Lovelace does not appear to be architecturally dissimilar from the current Ampere architecture, but does bring changes such as improved ray tracing cores, improved encoders, and larger L2 caches that will reduce costs. while significantly improving performance albeit on a more expensive TSMC N4-based custom node. Nvidia has maintained a tradition of keeping memory sizes balanced across the stack, with modest increases in memory size at each level. Compared to AMD, the L2 is rumored to have very high performance at the high end, but also at a high cost. More interested in Navi33 chip, should be between AD104 and AD106. The range is huge, but leaks suggest a good competitor in the mass market.
AMD is currently far behind in ray tracing performance, and the lack of many differentiating software features like DLSS and broadcasting does hurt competitiveness, but it is believed that this will be the most competitive generation of GPUs in a decade.
GPU prices are falling fast as Ethereum 2.0 slams the disruption in mining demand and consumers shift their spending mix from goods to services. These factors combined with higher inflation mean that AdaLovelace (and RDNA 3) GPU prices are forecast to be pretty good value for money in the $400 to $1,000 market. The top of the stack is likely to have amazing performance levels, but at a higher cost.
3D chip technology subverts computing: AMD, Graphcore, and Intel all make big moves
High-performance processor research shows that a new direction to continue Moore's Law is coming. Each generation of processors needs to perform better than the previous generation, which also means that more logic circuits need to be integrated into the silicon. But there are two problems in chip manufacturing right now: One is that the ability to shrink transistors and their ability to form blocks of logic and memory is slowing; the other is that chips have reached size limits.
insert image description here
Moore's Law. Source: wikipedia

The lithography tool can only imprint an area of ​​about 850 square millimeters, about the size of a top-of-the-line Nvidia GPU.
In recent years, system-on-chip developers have begun to break up larger chip designs into smaller chips and connect them together within the same package. In CPUs, the connection technology is mostly 2.5D packaging, where chiplets are placed next to each other and connected using short, dense interconnects. As most manufacturers have agreed on a 2.5D "chiplet-chiplet" communication standard, the momentum for this integration will continue to grow.
However, due to increased data storage requirements, storing large amounts of data on the same chip requires shorter, denser connections, which can only be achieved by stacking one chip on top of another. Connecting two chips means thousands of connections per square millimeter between chips.
It takes a lot of innovation to make this happen, and engineers have to figure out how to prevent one chip in the stack from overheating and destroying the other, to prevent the occasional bad chiplet from crashing the entire system, etc.
Recently, Samuel K. Moore, senior editor in charge of semiconductor coverage at IEEE Spectrum, introduced 3 ways 3D chip technology is disrupting computing, focusing on AMD, Graphcore and Intel's industry-leading advantages.
AMD Zen 3
PCs have long had the option to increase memory to speed up extremely large applications and data-heavy work. AMD's next-generation CPU chiplets will offer this option thanks to 3D die stacking.
Both Zen 2 and Zen 3 processor cores use the same TSMC manufacturing process and therefore have the same size transistors, interconnects, etc. AMD made a lot of architectural changes, and even without the additional cache memory, Zen 3's average performance improved by 19%.
insert image description here

It is worth mentioning that one of the highlights of the Zen 3 architecture is the vertical stacking of chips through silicon vias (TSV), which is a way of connecting multiple chips to each other. The TSV is built in the Zen 3 highest level cache, a block of SRAM called L3, located in the middle of the compute chiplet and shared across all 8 cores.
In data-heavy processors, the Zen 3 wafer backside is thinned until the TSVs are exposed, and then a 64-megabyte SRAM chiplet is attached to those exposed TSVs using hybrid bonding— — A process similar to copper cold welding. The result is a dense set of connections that can be as tight as 9 microns. Finally, for structural stability and thermal conduction, blank silicon chiplets are attached to the rest of the Zen 3 CPU die (Die or CPU Die refers to the small squares that are cut from the wafer during the production process of the processor).
insert image description here

AMD 3D V-Cache technology stacks a 64-megabyte SRAM cache (red) and 2 blank fabric chiplets onto a Zen 3 compute chiplet.
“Adding extra memory by placing blank silicon chiplets next to the CPU die is not advisable because the data takes too long to reach the processor core. Although the L3 cache size is tripled, the 3D V-Cache It only adds four clock cycles of latency -- which is only possible with 3D stacking," said John Wuu, senior design engineer at AMD.
Larger caches have their place in high-end games, and using a desktop Ryzen CPU with 3D V-Cache can speed up games at 1080p by an average of 15%. Wuu noted that the industry's ability to shrink SRAM is slowing compared to shrinking logic capabilities. Therefore, it can be predicted that SRAM expansion will continue to use more mature manufacturing processes, and computing chiplets will be pushed to the forefront of Moore's Law.
Graphcore Bow AI Processor
3D integration speeds up computation even if the chips in the stack don't have transistors. Graphcore, a UK-based AI computer company, has achieved a dramatic increase in system performance simply by installing a power-delivery chip on its AI processor.
Adding power delivery silicon means that the combo chip, called Bow, can run faster (1.85 GHz vs 1.35 GHz) at lower voltages than its predecessor. This means computers can train neural networks 40% faster and consume 16% less energy than the previous generation. Users don't need to change the software to get this improvement.
The power management die is a stack of capacitors and through-silicon vias that provide power and data to the processor chip, and it is the capacitors that really make the difference. Like the bit storage components in DRAM, these capacitors are formed in deep, narrow trenches in silicon. Because these charge reservoirs are very close to the processor's transistors, power transfer is smoothed, allowing the processor core to run faster at lower voltages.
Without the power delivery chip, the processor would have to boost its operating voltage higher than nominal to operate at 1.85 GHz, which would consume more power. Using a power chip, it is also possible to achieve a given clock frequency and consume less power.
insert image description here

Graphcore Bow AI accelerator uses 3D die stacking to boost performance by 40%.
Bow's manufacturing process is unique. Most 3D stacking is done by gluing one chiplet onto another, with one still on the wafer, called a chip-on-wafer [see AMD's Zen 3 above]. Instead, Bow used TSMC's "wafer-to-wafer," where whole wafers of one type are bonded to whole wafers of another type and then diced into chips.
Simon Knowles, Graphcore's chief technology officer, said this is the first chip on the market to use the technology, enabling a higher connection density between two dies than can be achieved using a wafer-to-chip process.
insert image description here

BOW-2000.
Although power-delivery chiplets don't have transistors, there may be one in the near future. Using the technology for power transmission alone is just the first step, Knowles said, and will go much further in the near future.
Learn more: https://spectrum.ieee.org/graphcore-ai-processor The
Intel Ponte Vecchio supercomputer chip The
Aurora supercomputer aims to be one of the first high-performance computers (HPC) in the US to break the exaflop barrier - 1 billion high precision floating point calculations per second. In order for the Aurora to achieve these properties, Ponte Vecchio packed more than 100 billion transistors on 47 pieces of silicon into a single processor. Intel uses both 2.5D and 3D technologies to squeeze 3,100 square millimeters of silicon (almost equal to four Nvidia A100 GPUs) into 2,330 square millimeters of space.
insert image description here
The Intel Ponte Vecchio processor integrates 47 chiplets into a single processor.
Each Ponte Vecchio is actually two sets of mirrored chips, connected together using Intel's 2.5D integration technology Co-EMIB, which forms a high-density interconnect bridge between two 3D chiplet stacks. The "bridge" itself is a small piece of silicon embedded in an encapsulated organic substrate, and the density of interconnects on silicon can be twice as high as on an organic substrate. The Co-EMIB die also connects the high-bandwidth memory and I/O chiplets to the base tile (the largest chiplet on which the other chips are stacked).
The base tile uses Intel's 3D stacking technology, called Foveros, on which compute and cache chiplets are stacked. The technology creates a dense array of die-to-die vertical connections between the two chips, and these connections are 36 microns. Signal and power enter this stack through TSVs, with wider vertical interconnects running directly through most of the silicon.
insert image description here

Foveros'
eight compute tiles, four cache tiles, and eight blank tiles for cooling the processor are all connected to the base tile. The base tile itself provides cache memory and a network that allows computational tiles to access memory.
Intel researcher Gomes said: None of this is easy, and Ponte Vecchio has innovated in yield management, clock circuits, thermal regulation and power delivery. For example, Intel engineers chose to supply the processor with a higher-than-normal voltage (1.8 volts) so that the current was low enough to simplify packaging. The circuit in the base tile steps down the voltage to close to 0.7 V for the compute tiles, and each compute tile must have its own power domain in the base tile. The key is a new type of high-efficiency inductor, called a coaxial magnetic integrated inductor. Because these are built into the package substrate, the circuit actually moves back and forth between the base tile and the package before supplying voltage to the compute tile.
Gomes said it took 14 years from the first petaflop supercomputer in 2008 to this year's exaflops, and that advanced packaging techniques, such as 3D stacking, will help increase computing power.
GPU will become the mainstream, localized
computing power will become the engine of the digital economy and the cornerstone of an intelligent society, and the combination of heterogeneous chips will provide massive computing power, and GPU will become the mainstream: future AI computing will form a CPU as the control center, GPU , FPGA, and ASIC are the heterogeneous computing patterns of accelerator cards in specific scenarios. From the perspective of the flexibility and efficiency of architecture deployment and the essential characteristics of artificial intelligence algorithms, GPU will become the chip with the largest demand for AI computing, and it is expected that the demand will account for 57% in 2025.
The Chinese market will grow rapidly, and the dawn of GPU localization is beginning to emerge: It is estimated that by 2024, the scale of China's artificial intelligence technology market will reach 17.2 billion US dollars; Driving force: In the fields of AI computing training and reasoning, many startups have released GPU products.
In 2030, the computing power demand for artificial intelligence will reach 16,206EFLOPS, and the technical factors of computing chips restrict the evolution of computing power layout to ubiquitous.
insert image description here
insert image description here
insert image description here

Training refers to the AI ​​technology that simulates the ability of humans to receive, learn and understand external information; reasoning refers to the AI ​​technology that simulates the logic contained in information obtained by humans through psychological activities such as learning, judgment, and analysis.
In terms of deployment flexibility, CPU is the most flexible, followed by GPU, and FPGA and ASIC are the last two.
In terms of computing efficiency, ASIC is the most efficient, followed by FPGA, and the last two are GPU and CPU.
Heterogeneous computing is a balanced result: Considering deployment flexibility and computing efficiency, heterogeneous computing is a balanced result, and CPU+GPU or FPGA or ASIC is the trend.
CPU is good at scheduling, with average computing power
insert image description here
insert image description here

GPU is good at floating point calculation and has strong parallel processing ability
insert image description here

FPGA can be programmed flexibly and is good at high-speed processing of fixed services
insert image description here

ASIC can be customized according to needs and can only be used for specific scenarios
insert image description here

From the perspective of technology trends, GPU will become the mainstream AI chip
insert image description here

The global AI chip market scale will grow rapidly: It is estimated that by 2025, the global market scale is expected to reach 30 billion US dollars, with an average annual compound growth rate of about 37% from 2019 to 2025.
The GPU market share is expected to reach 50%: AI applications require the use of a large number of convolution algorithms, which is the area where GPUs are good at; it is expected that the market share will reach about 57% in 2025.
The training chip market is dominated by GPUs: In 2019, the Chinese training chip market size was about 4.1 billion yuan, and NVIDIA accounted for 90% of the market share with products such as the V100 series; considering that AMD's products are also GPUs, GPUs occupy the training chips. 95% market share.
The trend of diversification of the reasoning chip market is more obvious: in 2019, the market size of China's reasoning chip is about 3.7 billion yuan. Compared with the training market, the share of FPGA and ASIC has increased, and the share of AMD's GPU products has also increased; GPU products are no longer available. It's a big family.
China's GPU chip board market will grow rapidly: It is estimated that in 2024, China's GPU chip board market will reach 37 billion yuan, with an average annual compound growth rate of about 30%; the training market will account for about 36%, and the reasoning market will account for about 58%, and the high-performance computing market is about 6%.

Reference link
https://mp.weixin.qq.com/s/fJfQv8_PmoEIDp8_Y74Cfg
https://mp.weixin.qq.com/s/B_pNd0662c0t1gb7HwwBsQ
https://mp.weixin.qq.com/s/bSowhmoRqVJm5jHArm6XsA
https:/ /semianalysis.substack.com/p/nvidia-ada-lovelace-leaked-specifications?s=r

Guess you like

Origin blog.csdn.net/wujianing_110117/article/details/124374168