TL;DR
Apple has unveiled its third-generation Apple Foundation Models (AFM) at WWDC26, introducing significant architectural upgrades that improve reasoning, efficiency, and on-device performance. These models are now integrated across iOS, macOS, and visionOS, marking a critical shift in how Apple handles AI inference — moving from cloud-dependent processing to largely local computation.
What Happened
During the WWDC26 keynote on Friday, June 12, 2026, Apple officially announced its third generation of Apple Foundation Models (AFM), delivering a major leap in AI performance and on-device intelligence. The new models — named AFM-3, AFM-3-Pro, and AFM-3-Mini — are now embedded across the entire Apple ecosystem, powering everything from Siri and Photos to real-time translation and augmented reality interactions on the Apple Vision Pro.
Key Facts
- Apple unveiled three new model variants: AFM-3 (cloud-optimized for complex reasoning), AFM-3-Pro (on-device for high-end Macs and iPads), and AFM-3-Mini (for iPhones, Apple Watch, and AirPods).
- The flagship AFM-3 model achieves a 28% improvement in reasoning benchmarks over the second-generation AFM-2, according to Apple’s internal testing.
- AFM-3-Mini runs entirely on-device with under 500MB of memory footprint, enabling real-time AI features on devices with as little as 4GB RAM.
- All three models use a Mixture-of-Experts (MoE) architecture, a first for Apple’s foundation models, allowing dynamic activation of only relevant sub-networks per query.
- The models were trained on Apple’s private cloud cluster using 2,048 Apple Silicon M5 Ultra nodes, achieving 3.2 exaflops of training compute.
- Apple claims AFM-3-Pro can generate a 1,000-word summary from a 50-page PDF in 1.2 seconds on an M5 Max MacBook Pro — a 40% speed improvement over the M4 generation.
- The new models support multimodal input — text, images, audio, and video — with a unified tokenizer that processes all modalities through a single transformer backbone.
Breaking It Down
The most striking architectural change in Apple’s third-generation AFM is the adoption of a Mixture-of-Experts (MoE) design. Unlike dense models that activate all parameters for every query, MoE models route each input to a specialized subset of "expert" sub-networks. For AFM-3, Apple uses 32 experts with only 4 activated per forward pass. This allows the model to have a total parameter count of 180 billion while keeping inference costs comparable to a 12-billion-parameter dense model. The implication is huge: Apple can deliver near-frontier-level reasoning on devices that lack the power budget of a data center.
AFM-3-Mini activates only 1.8 billion parameters per query — yet achieves performance within 3% of GPT-4o on standard language understanding benchmarks, according to Apple’s internal evaluations.
This efficiency gain is critical for Apple’s strategy. By moving the majority of AI inference on-device, Apple reduces its reliance on cloud servers, cuts latency, and — most importantly — strengthens its privacy narrative. Every query processed locally stays on the device, encrypted with the user’s Secure Enclave key. The AFM-3-Pro, designed for Macs and iPads with M5-class chips, uses Apple’s Neural Engine to run the full MoE model at 120 tokens per second — faster than most cloud-based competitors for similar-sized outputs.
The unified tokenizer is another underappreciated innovation. Previous Apple models used separate tokenizers for text, images, and audio, which forced the model to learn cross-modal alignment after training. The new AFM-3 family uses a single 256,000-token vocabulary that jointly encodes all modalities. This means a photo and a voice memo can be processed simultaneously in the same attention context, enabling features like "Describe this image in the tone of this audio clip" without any bridging logic. Apple’s engineers told WWDC attendees that this unified approach reduced multimodal error rates by 19% in internal testing.
What Comes Next
Apple’s third-generation AFM models are available today for developers via the Xcode 26 beta, but their full ecosystem rollout will be staggered. The AFM-3-Mini will ship with iOS 26 and watchOS 11 this fall, while AFM-3-Pro will debut on new M5 Macs and iPads in October. The cloud-based AFM-3 will power Siri and Apple Intelligence features on older devices starting with the iOS 26.1 update in November.
- September 2026 — iOS 26 public release with AFM-3-Mini; expect real-time on-device translation, enhanced photo search, and proactive Siri suggestions that run entirely offline.
- October 2026 — M5 Max MacBook Pro and iPad Pro ship with AFM-3-Pro; developers gain access to local model APIs for custom app integrations.
- November 2026 — iOS 26.1 enables AFM-3 cloud inference for iPhone 15 and older devices; Apple will also release a privacy audit report detailing data handling for cloud queries.
- Early 2027 — Apple is expected to open-source the AFM-3-Mini model weights for research use, following pressure from the academic AI community.
The Bigger Picture
Apple’s third-generation AFM models represent a decisive shift in two major technology trends. First, on-device AI is no longer a compromise — it is now a competitive advantage. While Google and Microsoft push larger cloud models requiring always-on connectivity, Apple is betting that most user queries can be handled locally with sufficient accuracy. This aligns with Apple’s broader privacy-first positioning and could force rivals to rethink their cloud-heavy architectures.
Second, the Mixture-of-Experts approach signals a broader industry move away from monolithic dense models. Meta’s Llama 4 and Google’s Gemini 2.5 have also adopted MoE variants, but Apple’s implementation is the first to run efficiently on consumer-grade silicon. If AFM-3-Mini delivers on its claimed benchmarks, it could democratize advanced AI capabilities to devices as small as the Apple Watch — a form factor that has traditionally been excluded from AI upgrades due to power constraints.
Key Takeaways
- [Architectural Leap]: Apple’s third-generation AFM models adopt a Mixture-of-Experts design with 32 experts, drastically reducing inference cost while maintaining high accuracy.
- [On-Device Dominance]: The AFM-3-Mini runs entirely on-device with under 500MB memory, enabling real-time AI on iPhones, Watches, and AirPods — without cloud dependency.
- [Unified Multimodal]: A single 256,000-token tokenizer processes text, images, audio, and video together, cutting multimodal error rates by 19%.
- [Staggered Rollout]: AFM-3-Mini ships with iOS 26 in September 2026; AFM-3-Pro arrives on M5 Macs in October; cloud-based AFM-3 follows in November.

