Apple's new AI architecture runs on Gemini. The Foundati...

What happened

At WWDC 2026, Apple unveiled a redesigned AI architecture that quietly drops its own Apple Foundation Models from the critical path and routes most heavy inference through Google's Gemini family instead. The 451-point Hacker News thread caught it within hours: the keynote slides still say "Apple Intelligence," but the architecture diagram shows Gemini as the workhorse cloud model, with Apple Foundation Models demoted to a small on-device tier for narrow tasks like text rewriting and notification summaries.

The deal, according to MacRumors' reporting, is a multi-year licensing arrangement where Google supplies Gemini weights that Apple serves inside its own Private Cloud Compute (PCC) infrastructure. The chips are Apple silicon. The attestation, the sealed enclaves, the no-logging guarantees — all Apple. But the tokens coming out the other end are generated by Gemini. Apple kept the wrapper and outsourced the contents.

This is the first WWDC since the 2024 Apple Intelligence launch where Apple has publicly acknowledged that its own foundation models aren't competitive at the frontier. Internal benchmarks shown on stage — which Apple has historically refused to share — put the on-device Apple model at roughly GPT-4o-mini parity on summarization and rewriting, and well behind Gemini 2.5 on reasoning, code, and multi-step agentic tasks. The on-device tier stays. The 3B "server" tier that Apple shipped in 2024 is being deprecated.

Why it matters

Apple's 2024 pitch was a specific bet: build vertically integrated AI the way it built vertically integrated chips. Own the model, own the silicon, own the privacy story, charge nothing extra. That bet has now partially failed in public. The privacy architecture won. The model didn't.

The interesting move is that Apple chose Gemini over OpenAI for the heavy tier. OpenAI is still wired in as the optional ChatGPT escalation path — the "do you want to send this to ChatGPT?" prompt from 2024 is unchanged. But the default cloud model, the one users will hit thousands of times a day without consent prompts, is Gemini. Google reportedly agreed to ship weights to Apple's PCC enclaves rather than serve via API — a concession OpenAI was unwilling to make at acceptable terms. That's the deal that closed.

Compare this to the Microsoft–OpenAI structure, where Microsoft pipes user data into Azure-hosted OpenAI endpoints. Apple's version is stricter: Google never sees the prompts, never sees the responses, can't log, can't train on the traffic. The weights are essentially licensed binaries running inside someone else's secure enclave. This is the first major frontier-model deal that treats the model as a sealed binary rather than an API. It's a structure that only Apple, with its hardware-rooted attestation stack, could credibly enforce.

Community reaction on HN split along predictable lines. The privacy-skeptical camp ('how do you actually verify Gemini-in-PCC behaves like Gemini-on-Google-servers?') has a real point — Apple's attestation proves the binary ran in a sealed enclave, but doesn't prove the binary is the same weights Google ships elsewhere. The pragmatist camp ('Apple finally stopped shipping mediocre 3B models pretending to be frontier') is louder and probably right. Several ex-Apple ML engineers in the thread confirmed what's been an open secret: the foundation models org has been bleeding senior talent to Anthropic and Google DeepMind for eighteen months. You can't ship a frontier model with the team Apple has left.

The second-order effect is what this does to the on-device AI narrative everyone else has been chasing. Qualcomm, Samsung, and the Windows Copilot+ PC ecosystem have spent two years arguing that on-device inference is the future and the cloud is a fallback. Apple just publicly disagreed. The on-device tier is for latency-sensitive narrow tasks; everything that requires actual capability goes to a cloud model running someone else's weights. That's a much more honest architecture, and it's going to force the rest of the industry to stop pretending a 7B on-device model is a credible Gemini competitor.

What this means for your stack

If you've been building against the Apple Foundation Models API that shipped in iOS 18, the contract doesn't change but the behavior does. Same Swift API surface, dramatically better quality on anything that routes to the cloud tier, slightly different latency profile (Gemini-in-PCC is reportedly 80–150ms slower than the old Apple server model on first token, but faster on long generations). Re-run your evals. Anything you tuned around the old model's weaknesses — particularly the verbose, hedge-heavy summarization style — is going to behave differently.

If you've been holding off on Apple Intelligence integration because the model wasn't good enough, the calculus changed today. You're now getting Gemini-class capability with Apple-class privacy guarantees, for free, behind a stable Swift API. That's a genuinely new offering. Notably, the new architecture also exposes a tool-use interface that maps to App Intents — meaning your existing App Intents become callable by the Gemini-backed assistant without you writing a single line of LLM-specific code. This is the integration story Apple has been promising since 2024 and finally has a model capable of delivering on.

For anyone running their own LLM infrastructure: Apple just validated the "sealed enclave running someone else's weights" pattern at planetary scale. Expect AWS Nitro Enclaves and GCP Confidential Computing to start shipping reference architectures for this within six months. The legal and commercial templates Apple and Google negotiated — weight licensing with no-log attestation — will get copied.

Looking ahead

The quiet question is what happens to Apple's foundation models team. The on-device tier still needs someone, and Apple isn't going to permanently outsource its AI roadmap to a competitor that also makes phones. The most likely read is that this is a two-to-three-year bridge while Apple either rebuilds the team or acquires its way back to the frontier — Perplexity, Mistral, and Anthropic have all been named in the rumor mill this year. Either way, the era of Apple pretending its in-house models compete at the frontier is over, and the architecture they shipped today is more honest, more capable, and more interesting than anything they showed in 2024.

Apple's new AI architecture runs on Gemini. The Foundation Models pitch is dead.

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Apple reveals new AI architecture built around Google Gemini models

// community takes

Apple's new AI architecture runs on Gemini. The Foundation Models pitch is dead.

// tldr

// viewpoints

// deep dive

What happened

Why it matters

What this means for your stack

Looking ahead

// read from source

Apple reveals new AI architecture built around Google Gemini models

// community takes

// share this