Offline, free, lightweight mobile LLM. Is it actually real?

I’m genuinely curious. Has anyone shipped an offline, free, lightweight mobile LLM, especially for a speech-based app?

I’ve tried building an on-device AI assistant, and the reality is messy:

  • Models are still huge
  • Mobile tooling is painful (Android + JNI + assets)
  • Latency and memory constraints are real
  • “Lightweight” feels like a myth unless you compromise hard

So I’m asking the community:

Is there a truly usable offline (and free of cost) LLM for mobile right now?

If yes, what did you use and how did you ship it?

If no, what’s the closest thing you’ve tried?