
The article introduces Google Gemma 2B, a lightweight 2B parameter version capable of running on Android and iPhone without the need for WiFi. It is a significant update from MLC aimed at enhancing mobile device capabilities for running local LLMs. The adoption of Gemma 2B by the MLC community and their success in running it locally showcases the potential of lightweight LLMs in mobile computing. The application of LLMs on the edge addresses important issues like latency and privacy, marking a shift towards more efficient and secure model deployments.
Main Points
Gemma 2B running on Android and iPhone
Google Gemma 2B, a lightweight 2B parameter version of Gemma, can now run on mobile devices including Android and iPhone. The model outputs 20 tokens/sec and does not require WiFi for operation. It highlights the advancement in lightweight, local LLMs suitable for mobile platforms.
Gemma 2B local implementation
The MLC community successfully got Gemma 2B running locally in under a day using the new MLC SLM compilation flow, demonstrating the rapid adoption and integration capabilities of open source LLM projects.
Benefits of running models on the edge
The concept of running models on the edge, combined with larger models in the cloud, addresses key issues such as latency, compute pressure, and privacy concerns, highlighting the benefits of hybrid cloud/local setups.
Insights
MLC-LLM is one of the core technologies underpinning OctoAI’s tech stack.
As CEO Luis Ceze puts it: “Hardware portability for LLMs not only enables more efficient use of cloud resources, it also enables new use cases. Chain local Phi-2 or Gemma 2B with largeLLMs running on OctoAI cloud and you have a magic model cocktail!”
Links
- MLC community
- open source project
- Junru Shao
- OctoAI
- MLC chat beta for iPhone
- our repo
- Try OctoAI for free today