# Introduction For a long time, running transformer models meant maintaining a Python server, paying for GPU time, and routing every inference request through an API. The user typed something, it left their machine, touched your infrastructure, and came back as a prediction. That architecture made sense when the models were too large to run anywhere else. It is no longer the only option. Transformers.js changes the equation. It runs state-of-the-art NLP models directly in the browser, on the user's device, with no server involved. The models download once, cache locally, and run offline from that point forward.β¦