AI chips are now memory chips with logic attached

What happened

Epoch AI published a component cost breakdown for modern AI accelerators showing that memory has grown to nearly two-thirds of the bill of materials. The compute die — the thing everyone calls 'the chip' — is now a minority cost on the parts list of an AI chip. HBM stacks, the silicon interposer that wires them to the logic die, and the advanced CoWoS-style packaging that holds it all together together dominate the BOM.

The trend isn't subtle. Earlier accelerator generations were logic-heavy: a big GPU die, GDDR memory soldered around it on a PCB, packaging that was mostly mechanical. Today's parts — H100, H200, B200, MI300X, TPU v5p — bond eight or twelve HBM3/HBM3e stacks directly to the logic die through a silicon interposer. Each HBM stack is itself a 3D-stacked tower of DRAM dies with a base logic die and through-silicon vias. You are buying, in effect, a small mountain of DRAM with a compute die welded to one side.

The pricing reflects this. HBM3e sells at multiples of commodity DDR5 per gigabyte, and HBM supply is sold out through 2026 at SK Hynix, Micron, and Samsung. Nvidia's bottleneck on Blackwell shipments has been CoWoS capacity and HBM allocation, not its relationship with TSMC's leading-edge logic nodes.

Why it matters

The cost shift is the physical manifestation of a workload shift. Training a frontier model is bandwidth-bound long before it is FLOP-bound. The arithmetic intensity of a transformer attention layer — FLOPs per byte moved — is low enough that on an H100, you spend most of your time waiting for weights and KV cache to arrive from memory. The industry's response has been to throw memory bandwidth at the problem: HBM3e on H200 hits ~4.8 TB/s, and Blackwell pushes past 8 TB/s per package. That bandwidth isn't free — it is, almost literally, what you're paying for.

This reframes a lot of conventional wisdom about the AI hardware stack. The narrative for two years has been 'TSMC is the bottleneck' and 'Nvidia's moat is CUDA.' Both are still true, but they're not where the marginal dollar goes. The marginal dollar goes to Korean and American DRAM fabs, and the marginal capacity constraint is advanced packaging — the CoWoS-L and CoWoS-S lines at TSMC that bond HBM to logic. When Jensen Huang said on earnings calls that supply is constrained, the constraint he was describing lives in Hsinchu's packaging fabs and Icheon's DRAM fabs, not in the 3nm logic line.

It also explains the competitive dynamics. AMD's MI300X competes with H100 not because AMD finally caught up on logic — it competes because AMD bought more HBM per package (192GB vs 80GB) and priced accordingly. The MI300X's selling point in inference benchmarks is largely a memory-capacity story: it fits bigger models in a single GPU's address space, which collapses tensor parallelism overhead. Groq's LPU, Cerebras's wafer-scale engine, and SambaNova's RDU all make different memory-bandwidth bets, and each company's pitch eventually reduces to a slide about bytes-per-flop.

The community reaction on Hacker News tracked the obvious second-order question: if memory is the bottleneck, why isn't the memory vendor capturing more of the margin? SK Hynix's HBM3e is sold out and prices have roughly doubled year-over-year, but Nvidia's gross margin on a fully-built H200 system is still north of 70%. The answer is the integration premium — the interposer, the packaging, the validated thermal solution, and the software stack. DRAM is a commodity; integrated AI accelerators are not, yet. The interesting question for the next five years is whether HBM vendors can climb the value chain by selling more integrated subsystems, or whether they remain the supplier that captures the volume but not the margin.

What this means for your stack

If you're a practitioner doing capacity planning for 2026, the operational implication is direct: when you model GPU availability and pricing, the supply curve to watch is HBM, not logic. Cloud providers' GPU shortages aren't going to ease because TSMC opens another N3 line — they ease when Micron's Taichung HBM expansion comes online and when CoWoS-L capacity ramps. That's a different timeline and a different set of public earnings calls to track.

For inference workloads specifically, the math now favors fitting your model in fewer, fatter GPUs over more, smaller ones — because the marginal cost of HBM-per-GB is what you're optimizing against, and tensor parallelism across nodes is paying the latency tax twice (NVLink isn't HBM, and Ethernet really isn't HBM). MI300X's 192GB and H200's 141GB are the targets to think in. If your serving stack is built around H100-era 80GB assumptions — heavy tensor parallelism, aggressive KV cache eviction, paged attention — you may be able to simplify the topology in 2026 with parts that just hold the whole model.

For training, the same logic suggests that the next architectural win for frontier labs isn't a smarter optimizer — it's any technique that reduces bytes moved per FLOP. That's why mixture-of-experts, sparse attention variants, FP8 weights, and KV cache quantization keep getting research attention disproportionate to their accuracy gains. They directly attack the line item that's eating the BOM.

Looking ahead

The 'AI chip' as a discrete object is a useful fiction; what's actually being sold is a memory-bandwidth subsystem with enough logic attached to consume the bandwidth. Expect the next two generations of accelerators to lean harder into this — HBM4 arrives in 2026 with wider interfaces and stacked logic, and the long-rumored 'memory-centric computing' research from Samsung and SK Hynix (processing-in-memory, near-memory compute) suddenly looks less speculative when memory is already two-thirds of the cost. The companies that win the next round won't be the ones with the best matrix multipliers; they'll be the ones who figured out how to move fewer bytes.

AI chips are now memory chips with logic attached

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Memory has grown to nearly two-thirds of AI chip component costs

AI chips are now memory chips with logic attached

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Memory has grown to nearly two-thirds of AI chip component costs

// share this