TLDR: CAMBRIDGE, Mass.āApple plans to distill Google Gemini into smaller models for on device runs, but still routes some Siri queries to Google Cloud using Nvidia confidential compute. That mix lets Apple scale beyond its Private Cloud Compute limits while pushing for cloud privacy protections.
Key Takeaways:
- Apple is leaning on on device AI while preparing WWDC reveals for iOS 27 and a revamped Siri.
- Apple uses Gemini distillation for local models, then runs some queries in Google Cloud on licensed Gemini.
- Nvidia confidential compute encrypts data and models in cloud processing, slowing queries slightly but strengthening privacy claims.
Apple is doing the classic AI magic trick: whispering promises about privacy while quietly upgrading its compute stack behind the scenes.
Apple is doing the classic AI magic trick: whispering promises about privacy while quietly upgrading its compute stack behind the scenes.
Q&A
If Apple cannot fit full Gemini on its servers, what happens when user demand spikes during major releases?
Apple will likely lean more on Google Cloud capacity, turning Private Cloud Compute into a partial fallback rather than the default engine.
How does distillation change the quality tradeoff between local Siri answers and cloud accuracy?
Smaller distilled models can run faster and safer on device, but may struggle with edge cases that full Gemini handles better.
Why would Nvidia confidential compute matter more for Apple than for competitors selling pure cloud AI?
Apple has made privacy a product promise, so encryption during processing helps it justify outsourcing without sounding like it compromised.
Could Appleās interest in smaller model startups reshape the market for on device AI tooling?
Yes. If buyers like Apple reward compression and efficiency, more startups will target distillation pipelines and local inference hardware aware software.
What might Apple change next if WWDC users complain about Siri latency on cloud backed requests?
Apple could adjust confidential compute usage, improve caching, or route only specific intents to the cloud while keeping routine tasks local.
No comments yet. Be the first to share your thoughts!