Apple adds Nvidia confidential compute to Google Gemini Siri

AIMay 28, 2026 at 05:00 PM

Read full story

Source: 9to5Mac

TLDR: CAMBRIDGE, Mass.—Apple plans to distill Google Gemini into smaller models for on device runs, but still routes some Siri queries to Google Cloud using Nvidia confidential compute. That mix lets Apple scale beyond its Private Cloud Compute limits while pushing for cloud privacy protections.

Key Takeaways:

Apple is leaning on on device AI while preparing WWDC reveals for iOS 27 and a revamped Siri.
Apple uses Gemini distillation for local models, then runs some queries in Google Cloud on licensed Gemini.
Nvidia confidential compute encrypts data and models in cloud processing, slowing queries slightly but strengthening privacy claims.

Apple is doing the classic AI magic trick: whispering promises about privacy while quietly upgrading its compute stack behind the scenes.

No comments yet. Be the first to share your thoughts!

Apple is doing the classic AI magic trick: whispering promises about privacy while quietly upgrading its compute stack behind the scenes.

Q&A

If Apple cannot fit full Gemini on its servers, what happens when user demand spikes during major releases?

Apple will likely lean more on Google Cloud capacity, turning Private Cloud Compute into a partial fallback rather than the default engine.

How does distillation change the quality tradeoff between local Siri answers and cloud accuracy?

Smaller distilled models can run faster and safer on device, but may struggle with edge cases that full Gemini handles better.

Why would Nvidia confidential compute matter more for Apple than for competitors selling pure cloud AI?

Apple has made privacy a product promise, so encryption during processing helps it justify outsourcing without sounding like it compromised.

Could Apple’s interest in smaller model startups reshape the market for on device AI tooling?

Yes. If buyers like Apple reward compression and efficiency, more startups will target distillation pipelines and local inference hardware aware software.

What might Apple change next if WWDC users complain about Siri latency on cloud backed requests?

Apple could adjust confidential compute usage, improve caching, or route only specific intents to the cloud while keeping routine tasks local.

Apple adds Nvidia confidential compute to Google Gemini Siri

Key Takeaways:

Q&A

If Apple cannot fit full Gemini on its servers, what happens when user demand spikes during major releases?

How does distillation change the quality tradeoff between local Siri answers and cloud accuracy?

Why would Nvidia confidential compute matter more for Apple than for competitors selling pure cloud AI?

Could Apple’s interest in smaller model startups reshape the market for on device AI tooling?

What might Apple change next if WWDC users complain about Siri latency on cloud backed requests?

Top in AI

DataHub targets SQL query log context to stop hallucinated joins

Has the hunt for AI compute uncovered the next Cerebras?

Databricks warns enterprise AI deals hinge on operational trust

RSI chases AGI style acceleration, but definitions blur

Apple readies Siri for ChatGPT pressure on iPhone