Gemma 4 Challenge: Write about Gemma 4 Submission This is a submission for the Gemma 4 Challenge: Write About Gemma 4 The "Memory Wall" Problem As a systems engineer focused on high-performance data ingestion, the most interesting part of Gemma 4 isn't the benchmarks—it's how it physically handles memory. Most open models hit a "Memory Wall" at high context. For a standard Transformer, the Key-Value (KV) cache grows linearly, eventually consuming more VRAM than the model weights themselves. Gemma 4 solves this through a Divergent Architecture that splits "Edge" models (E2B/E4B) from "Server" models (31B Dense). 1. Per-Layer Embeddings (PLE) The E2B variant is a masterclass in memory-compute trade-offs. It uses Per-Layer Embeddings (PLE) , where a secondary embedding signal is fed into every decoder layer. By blowing nearly 46% of its parameter budget on these lookup tables, Gemma 4 prevents token identity collision in the narrow hidden states required for 2B-scale models.…