Conventional mixture‑of‑experts designs hand each transformer layer its own private expert set, causing the total expert parameter count to swell linearly with depth. Recent work shows that a single, globally shared pool of experts can deliver comparable predictive quality while dramatically curtailing that budget. The dominant paradigm has treated depth scaling and expert capacity as inseparable: every new layer brings a fresh collection of feed‑forward sub‑networks, and the routing logic merely picks the top‑k among them. This architecture simplifies implementation but forces a strict coupling between model depth and the number of learnable expert parameters, even though earlier analyses hinted that many layers rely on overlapping knowledge. UniPool breaks the coupling by replacing per‑layer ownership with one shared pool that all routers draw from.…