The era of on-device Generative AI has arrived, and for Android developers, it brings a paradigm shift as significant as the transition from imperative UI to Jetpack Compose. When Google announced Gemini Nano and the AICore system service, the promise was clear: powerful, private, and low-latency AI running directly on the silicon in our pockets. However, as developers begin to move beyond simple "Hello World" prompts, they encounter a formidable technical wall: the architecture of memory. If you have ever wondered why your on-device model suddenly "forgets" the beginning of a conversation, or why a long prompt causes your app to lag or crash with an OutOfMemoryError , you are dealing with the dual challenges of Context Windows and Stateless Inference . Understanding these concepts isn't just academic; it is the difference between a glitchy prototype and a production-ready AI application.…