Published May 29, 2026, 10:31 AM EDT Shekhar Vaidya is a veteran technology journalist and computer science engineer. He is the founder of TechLatest, where he has spent years providing technical analysis on hardware and Windows ecosystems. Now a Computing Writer at XDA, Shekhar leverages his deep background in NAS, storage solutions, and PC internals to help readers master their tech. Sign in to your XDA account Once my daily gaming sessions dropped to just weekends, my 4070 Ti spent most of its time idle. For months, I was paying $20 for Claude Pro while a perfectly capable RTX 4070 Ti sat mostly idle in my PC. The funny part was that most of the AI prompts I sent from my phone had nothing to do with complex coding tasks or deep research. They were quick summaries, rough ideas, and random Q&A sessions throughout the day. I had tried local models before, 7B, 12B, and even 24B, and always went back to the cloud. Their outputs were weak, the context window was tiny, and inference was slow.…