Dev.to Machine Learning2h ago|Research & PapersProducts & Services

iPhone 17 Pro Runs 400B LLM: On-Device AI Changes Everything

A developer ran a 400-billion parameter large language model called Flash-MoE on an iPhone 17 Pro, entirely on-device without internet access. This breakthrough signals the future of mobile AI.

đź’ˇ

Why it matters

This breakthrough in running large language models on mobile devices signals a major shift in the future of AI, enabling new privacy-preserving and offline-capable applications.

Key Points

  • 1A 400B parameter LLM ran on an iPhone 17 Pro with just 12GB of RAM
  • 2The model uses a Mixture of Experts architecture to stream weights from storage to GPU
  • 3Techniques like quantization and speculative decoding enable running large models on limited hardware
  • 4This builds on Apple's 2023 research on efficient LLM inference with limited memory

Details

The developer @anemll posted a demo showing an iPhone 17 Pro running a 400-billion parameter Mixture of Experts (MoE) language model entirely on-device, without any cloud or internet access. This is a major engineering breakthrough, as the phone's 12GB of RAM is only a fraction of the 200GB needed to hold the full model. The key innovations are: 1) Streaming model weights from the phone's SSD storage directly to the GPU as needed, 2) Leveraging the MoE architecture to only activate a small subset of the 400B parameters for each token, 3) Aggressive model weight quantization to reduce data transfer, and 4) Speculative decoding to pre-fetch the most likely needed experts. This builds on Apple's 2023 research paper 'LLM in a Flash' which proposed techniques for running large models on limited memory. While the on-device inference is very slow at 0.6 tokens per second, it enables critical capabilities like privacy, offline access, and on-device personalization that cloud-based AI cannot provide.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies