Google says LiteRT is almost there for any-device, edge agentic AI – and beats the pants off Llama

With increasing NPU support, Google argues it has the platform that makes it trivial to roll your own agents for Android, iOS and even Raspberry Pi, via what used to be TensorFlow Lite.

By Phillip de Wet

Updated May 12, 2026 2:49 PM / Published May 12, 2026 10:00 AM 3 min read

Google says LiteRT is almost there for any-device, edge agentic AI – and beats the pants off Llama — Photo by Rayson Tan on Unsplash

Google says its framework for local AI inference, LiteRT, is getting closer to being able to run agentic workflows on edge devices cheaply, using the most popular open-source models.

Google rebranded its runtime system for local AI, TensorFlow Lite, as LiteRT in 2024 because it said at the time, the runtime was evolving beyond prior TensorFlow models.

With LiteRT, Google said it was not building a platform for one AI model, but creating a way for users to deploy the model of their choice on the edge device of their choice simply, without the friction common in cross-platform workflows.

Fast forward and – albeit with some bits still in preview – the Alphabet company says it is just about there, complete with agentic capabilities and even some NPU acceleration.

Oh, and it beats the pants off Meta's Llama.

This content is for members only

Already have an account? Sign in

Add The Stack on Google

Phillip de Wet

Phillip de Wet started in journalism as an honest B2B IT-sector reporter before straying into business, politics, and international affairs. He is currently in recovery, studying AI while writing about everything from hardware to infosec policy.

Google says LiteRT is almost there for any-device, edge agentic AI – and beats the pants off Llama

This content is for members only

Phillip de Wet

More in agentic ai

Everything you need to know from the expo floor of Gartner’s D&A Summit

Disgruntled Claude users get an explanation – and another option with ChatGPT-5.5

How Spotify used agents to migrate 1,800 data pipelines and save 10 weeks of dev work