TL;DR
Apple’s new Gemini-powered Siri will rely on Nvidia H100 GPUs for server-side inference, according to a new report from The Information. This marks Apple’s first major public partnership with Nvidia and signals a shift away from its exclusive reliance on Google’s TPUs for AI training — a move that could reshape the AI chip market.
What Happened
A detailed report from The Information, published Thursday, June 4, 2026, reveals that Apple is deploying thousands of Nvidia H100 GPUs to power the cloud-based inference for its upcoming Gemini-enhanced Siri. The report, which cites multiple people familiar with the project, says Apple has already placed orders for additional Nvidia B200 Blackwell GPUs expected later this year, representing a dramatic reversal from Apple’s long-standing preference for Google’s Tensor Processing Units (TPUs) for AI workloads.
Key Facts
- Apple is using Nvidia H100 GPUs — not Google TPUs — to run inference for the Gemini-powered Siri upgrade expected in iOS 20, slated for release in September 2026.
- The deployment involves thousands of H100 chips across multiple data centers, with Apple already committing to Nvidia’s upcoming B200 Blackwell GPUs for future capacity expansion.
- Apple’s Gemini-powered Siri will process complex queries server-side via the Gemini 2.5 Pro model, while simpler requests will be handled on-device using Apple’s own smaller language models.
- The report states Apple is paying Nvidia a per-GPU-hour rate that is “significantly higher” than what it pays Google for TPU access, reflecting the scarcity of H100 supply.
- Apple’s AI infrastructure now includes both Google TPUs (for training) and Nvidia GPUs (for inference), marking a dual-supplier strategy after years of near-exclusive reliance on Google’s chips.
- The Gemini-powered Siri will be able to perform multi-step reasoning tasks — such as booking a flight and adding it to a calendar — that the current Siri cannot handle.
- Apple’s total AI compute spend is projected to exceed $5 billion in 2026, up from an estimated $2.5 billion in 2025, according to supply chain estimates cited in the report.
Breaking It Down
The decision to use Nvidia GPUs for inference — rather than Google TPUs — is the most consequential technical choice Apple has made in its AI strategy. For years, Apple’s AI infrastructure was built almost entirely around Google’s custom chips, with Apple purchasing TPU capacity for training models like the on-device transformer used in iOS 18’s writing tools. The shift to Nvidia for inference suggests that Apple’s internal evaluations found Nvidia’s CUDA ecosystem and TensorRT-LLM optimization tools delivered better latency and throughput for production workloads than Google’s equivalent offerings.
Apple’s per-GPU-hour cost for Nvidia H100s is “significantly higher” than its TPU cost — yet Apple is still expanding that relationship, signaling that raw price is not the deciding factor.
This premium pricing underscores a critical market reality: Nvidia’s H100 remains the gold standard for AI inference despite competition from AMD’s MI300X, Intel’s Gaudi 3, and Google’s TPU v5p. Apple’s willingness to pay a premium — and to pre-order next-generation B200 Blackwell GPUs — indicates that performance and ecosystem maturity outweigh cost considerations for a company that treats user experience as its primary competitive moat.
The dual-supplier strategy also has strategic implications. By maintaining Google TPUs for training and Nvidia GPUs for inference, Apple avoids putting all its AI compute eggs in one basket. This hedge is particularly important as Apple and Google remain competitors in search, advertising, and mobile platforms — even as they partner on Gemini integration. The report notes that Apple’s training workloads still run on TPUs, likely because Google offers custom-designed interconnects that scale better for the massive model training jobs required by Apple’s own foundation models.
What Comes Next
- September 2026 — iOS 20 launch: The Gemini-powered Siri will debut alongside iPhone 18, with the first wave of multi-step reasoning capabilities. Early adopters will test the system’s latency and accuracy under real-world server loads.
- Late 2026 — B200 Blackwell deployment: Apple’s pre-ordered Nvidia B200 GPUs are expected to begin arriving in Q4 2026, potentially doubling inference capacity per rack compared to H100s. This will enable more complex Gemini-powered features like real-time document analysis and image generation.
- Q1 2027 — Apple’s own AI chip roadmap: The report hints that Apple is accelerating development of its own server-grade AI chip, codenamed “Baltra,” which could begin replacing Nvidia and Google hardware by 2028. An internal Apple presentation cited in the report sets a target of 10x power efficiency over Nvidia’s H100.
- Mid-2027 — EU regulatory review: The European Commission is expected to scrutinize the Apple-Google Gemini partnership under the Digital Markets Act, potentially requiring Apple to offer competing AI assistants from other providers — a development that could reshape the revenue model for this integration.
The Bigger Picture
This story sits at the intersection of three major trends reshaping technology. First, the AI infrastructure arms race: Apple’s $5 billion compute spend in 2026 places it alongside Microsoft, Google, and Amazon as one of the world’s largest AI hardware buyers. The decision to pay a premium for Nvidia GPUs — despite having access to Google TPUs — validates Nvidia’s dominant market position and suggests that inference workloads will drive the next wave of GPU demand, not just training.
Second, the Apple-Google AI partnership represents an unprecedented collaboration between two fierce competitors. While Apple benefits from Google’s Gemini 2.5 Pro model — arguably the most capable publicly available model — Google gains access to over 2 billion active Apple devices, making its AI technology a daily utility for a massive user base. The regulatory risks, however, are substantial: if EU regulators force Apple to offer alternative AI assistants, the exclusive Gemini integration could become a liability.
Third, the shift toward on-device plus cloud AI is becoming the dominant architecture for consumer AI. Apple’s approach — using small on-device models for simple queries and cloud models for complex reasoning — mirrors strategies from Samsung (Galaxy AI), Google (Pixel AI), and Microsoft (Copilot+). The key differentiator will be latency: if Apple can make cloud inference feel as fast as on-device processing, it will have solved the fundamental user-experience challenge of cloud-based AI.
Key Takeaways
- [Nvidia’s Inference Dominance]: Apple’s choice of Nvidia H100 GPUs — despite higher costs — confirms that Nvidia’s hardware and software ecosystem remains the preferred platform for production AI inference, not just training.
- [Apple’s Dual-Supplier Strategy]: By using Google TPUs for training and Nvidia GPUs for inference, Apple avoids vendor lock-in while optimizing each workload for its best-suited hardware.
- [$5 Billion AI Spend]: Apple’s projected 2026 AI compute budget of $5 billion underscores the enormous capital requirements for competing in the AI assistant market, with inference costs likely to exceed training costs over time.
- [Regulatory Risk Ahead]: The exclusive Gemini integration faces potential EU antitrust action under the Digital Markets Act, which could force Apple to open Siri to competing AI models — a development that would fundamentally alter the partnership’s value.


