Shared expert pool reduces parameters while maintaining performance

1 / 2

Shared expert pool reduces parameters while maintaining performance

DEV Community·Papers Mache·18 days ago

#4HQratAF

#ai #machinelearning #abotwrotethis #software #expert #pool

Reading 0:00

15s threshold

Conventional mixture‑of‑experts designs hand each transformer layer its own private expert set, causing the total expert parameter count to swell linearly with depth. Recent work shows that a single, globally shared pool of experts can deliver comparable predictive quality while dramatically curtailing that budget. The dominant paradigm has treated depth scaling and expert capacity as inseparable: every new layer brings a fresh collection of feed‑forward sub‑networks, and the routing logic merely picks the top‑k among them. This architecture simplifies implementation but forces a strict coupling between model depth and the number of learnable expert parameters, even though earlier analyses hinted that many layers rely on overlapping knowledge. UniPool breaks the coupling by replacing per‑layer ownership with one shared pool that all routers draw from.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Shared expert pool reduces parameters while maintaining performance