Can China’s chip stacking strategy really challenge Nvidia’s AI dominance?

December 3, 2025

China is pioneering a novel semiconductor strategy known as chip stacking to counteract the tightening US restrictions on advanced chip manufacturing. This innovative method aims to bridge the performance divide with Nvidia’s cutting-edge GPUs by vertically integrating older, domestically producible chips, circumventing the barriers to accessing the latest fabrication technologies.

Rethinking Chip Advancement: Vertical Integration Over Process Shrinking

At the heart of this approach lies a straightforward yet ingenious idea: when the fabrication of next-generation chips is off-limits, enhance system capabilities by intelligently combining existing chip technologies. Wei Shaojun, vice-president of the China Semiconductor Industry Association and a professor at Tsinghua University, recently introduced a design that merges 14-nanometer logic chips with 18-nanometer DRAM through advanced three-dimensional hybrid bonding techniques.

This strategy is particularly significant because US export controls restrict the production of logic chips at 14nm and below, as well as DRAM at 18nm and below. By operating precisely at these thresholds, Chinese manufacturers can continue production without violating export limitations.

The technical innovation involves “software-defined near-memory computing,” which minimizes the latency caused by data transfer between processors and memory-a critical bottleneck in AI workloads. By stacking chips vertically, the design places processing and memory units in close physical proximity, drastically reducing data movement delays.

Utilizing 3D hybrid bonding, the chips are connected via copper-to-copper interfaces at pitches smaller than 10 micrometers, effectively eliminating the spatial gaps that traditionally slow down chip communication.

Evaluating Performance: Ambitions Versus Reality

Wei Shaojun asserts that this stacked chip configuration could rival Nvidia’s 4nm GPUs, boasting energy efficiency figures of 2 TFLOPS per watt and an aggregate performance of 120 TFLOPS. However, Nvidia’s A100 GPU, often cited as a benchmark, delivers up to 312 TFLOPS-more than double the claimed output of the chip stacking design.

This disparity underscores the inherent challenges of the chip stacking method. While the architectural concept is promising, it cannot fully compensate for the advantages of advanced process nodes, which offer superior transistor density, enhanced power efficiency, and improved thermal management.

Strategic Rationale Behind China’s Chip Stacking Initiative

Beyond raw performance, the chip stacking strategy reflects a broader strategic shift in China’s semiconductor ambitions. Huawei’s founder, Ren Zhengfei, encapsulates this vision by advocating for “state-of-the-art performance through stacking and clustering chips node for node,” signaling a move away from competing solely on process node miniaturization.

With industry leaders like TSMC and Samsung advancing toward 3nm and 2nm fabrication processes-technologies currently inaccessible to China-the focus pivots to system-level innovation and software optimization as alternative competitive fronts.

Another critical factor is the dominance of Nvidia’s CUDA software ecosystem, which underpins much of AI computing today. Wei describes this as a “triple dependence” involving models, architectures, and ecosystems. Chinese chip developers face the daunting task of either replicating CUDA’s extensive capabilities or persuading developers to transition away from a mature, widely adopted platform. The chip stacking approach offers a novel computing paradigm that could bypass this dependency.

Technical and Practical Challenges Ahead

While 3D chip stacking is an established technology in high-bandwidth memory and advanced packaging, applying it to create new computing architectures introduces significant hurdles. Thermal management is a primary concern, as stacking multiple 14nm active dies generates substantial heat, complicating cooling solutions compared to more efficient 4nm or 5nm chips.

Additionally, manufacturing yields in 3D stacking are notoriously difficult to optimize; a defect in any layer can jeopardize the entire chip stack. Furthermore, the software infrastructure necessary to fully exploit these architectures is still in its infancy and will require considerable development time.

Realistically, this strategy may excel in specific applications where memory bandwidth is more critical than sheer computational power-such as AI inference, certain data analytics, and specialized workloads. However, achieving parity with Nvidia’s GPUs across the full spectrum of AI training and inference remains a distant objective.

Implications for the Global AI Chip Landscape

The adoption of chip stacking as a cornerstone of China’s semiconductor development marks a strategic pivot from attempting to replicate Western chip designs to leveraging unique architectural innovations aligned with domestic manufacturing capabilities.

Although it remains uncertain whether this approach can close the performance gap with Nvidia, it clearly demonstrates China’s adaptability in the face of export restrictions by focusing on system design, packaging technologies, and integrated software-hardware optimization.

For the global AI sector, this evolution adds complexity to the competitive environment. Nvidia’s supremacy is increasingly challenged not only by established rivals like AMD and Intel but also by emerging architectural innovations that could redefine the concept of an “AI chip.”

Despite its current limitations, the chip stacking strategy embodies a disruptive architectural shift that merits close observation as the semiconductor industry continues to evolve.