Latest writing
Engineering deep-dives, design essays, and product devlogs on building a native AI app for iOS and Android.

What is a native AI app, and why does it matter in 2026?
A native AI app runs intelligence directly on the phone — fast, private, and offline-capable. Here's the definition, the stack, and why it's eating cloud-only chatbots for breakfast.

Shipping an on-device LLM without melting the phone
What I learned moving inference from a cloud endpoint to a 3B-parameter LLM running on the Apple Neural Engine — latency, thermals, memory, and the tricks that actually worked.

Building an AI app on Android with TFLite and MediaPipe in 2026
A practical walkthrough of the Android on-device AI stack: Gemma, TFLite, MediaPipe LLM Inference, and how to make it feel as fast as the iOS version.

Why Swift is quietly becoming a great AI runtime
Strict concurrency, value semantics, and Metal Performance Shaders make Swift surprisingly pleasant for ML glue code on iOS.

Designing trust into an AI app, one micro-interaction at a time
Hallucinations are a UX problem before they're a model problem. Three patterns I'm using to design trust into a native AI app.

What 'private AI' actually means when the model is on the phone
Privacy isn't a checkbox in your settings screen. Here's how on-device inference changes the threat model — and the four places it can still leak.

On-device RAG: giving a small model a long memory
How to build a retrieval layer that runs entirely on the phone, so a 3B-parameter native AI app can feel like it actually knows you.

Neural Engine vs. GPU vs. CPU: where should mobile inference actually run?
A field guide to picking the right accelerator on iOS and Android — with the gotchas nobody warns you about until your battery graph cliffs.
