We Didn’t Just Train AI on the Internet. We Started Training It on Itself.

1 / 2

We Didn’t Just Train AI on the Internet. We Started Training It on Itself.

DEV Community: datascience·Arpit Gupta·3 days ago

#KrwhZCQn

#dev #human #training #compute #article #discussion

Reading 0:00

15s threshold

There’s a quiet assumption in almost every AI discussion right now: “If we scale compute and models, intelligence will keep improving.” That assumption is starting to break. Not loudly. But structurally. The real bottleneck isn’t compute We’ve optimized for compute like it’s the main constraint. GPUs. Clusters. Parallelism. Faster training runs. But there’s a less visible constraint emerging: We are running out of high-quality human data. And worse: We are replacing it with something fundamentally different. Synthetic content generated by the very models we are training. The internet used to be messy. That was the advantage. Early foundation models had something we are quietly losing: A mostly human internet. Not clean. Not structured. Not optimized. But real.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

We Didn’t Just Train AI on the Internet. We Started Training It on Itself.