Edge AI in Practice: Attempting to Run Hermes Agent on an Android Inference Server

1 / 2

Edge AI in Practice: Attempting to Run Hermes Agent on an Android Inference Server

DEV Community·Wilson Felipe·28 days ago

#d6FMq2jB

#edgecomputing #ai #programming #server #agent #phone

Reading 0:00

15s threshold

If NASA landed on the Moon using a computer with only 4KB of RAM, why can't I run a personal AI Agent directly on my phone? 🚀📱 Over the last few weeks, I decided to test the limits of Edge AI . The idea was to create a local inference server on my Samsung S20 FE, using it as the engine for autonomous agents like Hermes or OpenClaw. To achieve this, I forked the Google AI Edge Gallery , built an embedded Ktor server , and exposed the Gemma 4 model through an API 100% compatible with the OpenAI standard (/v1/chat/completions). And the best part? It worked! I even managed to get the model to execute native Function Calling directly from the phone to check the weather forecast. But it wasn't all smooth sailing at the bleeding edge of technology. I quickly hit a physical wall: context management (KV Cache) and RAM.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Edge AI in Practice: Attempting to Run Hermes Agent on an Android Inference Server