The article introduces Google Gemma 2B, a lightweight 2B parameter version capable of running on Android and iPhone without the need for WiFi. It is a significant update from MLC aimed at enhancing mobile device capabilities for running local LLMs. The adoption of Gemma 2B by the MLC community and their success in running it locally showcases the potential of lightweight LLMs in mobile computing. The application of LLMs on the edge addresses important issues like latency and privacy, marking a shift towards more efficient and secure model deployments.

Main Points

Gemma 2B running on Android and iPhone

Google Gemma 2B, a lightweight 2B parameter version of Gemma, can now run on mobile devices including Android and iPhone. The model outputs 20 tokens/sec and does not require WiFi for operation. It highlights the advancement in lightweight, local LLMs suitable for mobile platforms.

Gemma 2B local implementation

The MLC community successfully got Gemma 2B running locally in under a day using the new MLC SLM compilation flow, demonstrating the rapid adoption and integration capabilities of open source LLM projects.

Benefits of running models on the edge

The concept of running models on the edge, combined with larger models in the cloud, addresses key issues such as latency, compute pressure, and privacy concerns, highlighting the benefits of hybrid cloud/local setups.

Insights

MLC-LLM is one of the core technologies underpinning OctoAI’s tech stack.

As CEO Luis Ceze puts it: “Hardware portability for LLMs not only enables more efficient use of cloud resources, it also enables new use cases. Chain local Phi-2 or Gemma 2B with largeLLMs running on OctoAI cloud and you have a magic model cocktail!”

Links

Images

URL

https://medium.com/octoml/look-ma-no-wifi-gemma-on-android-and-iphone-and-more-local-llm-updates-from-mlc-239f10adba3c?s=31
Hi Josh Adams, I am your personal AI. What would you like to ask about your notes?