This is a submission for the Gemma 4 Challenge: Write About Gemma 4 When choosing Gemma 4 models for your local system, knowing how much video RAM your GPU has dedicated might not be the only factor. GPU offloading, and reducing it while still having ample system RAM, might make larger models more accessible than previously considered. 1. Introduction Why would anyone bother running a local AI model? That is a fair question. Claude, Gemini, ChatGPT, and other frontier systems are already extremely capable. They are easy to access, constantly improving, and in many cases they are better than anything most of us can run on a laptop. If the only question were raw intelligence, then local models would often lose. But raw intelligence is not the only question. There are several practical reasons someone might want a local model available. First, some data is private, sensitive, experimental, or simply not something you want to send to a hosted service.…