Menu

Post image 1
Post image 2
1 / 2
0

12 million tokens, linear cost: Subquadratic's bet against the attention tax

DEV Community·Andrew Kew·27 days ago
#jh0LZ5bO
#ai#llm#api#context#attention#token
Reading 0:00
15s threshold

The quadratic attention problem has quietly shaped everything you've built with LLMs. RAG pipelines, agentic decomposition, hybrid architectures — these aren't the natural shape of AI systems. They're workarounds. Doubling the context quadruples the compute, so everyone stopped at a million tokens and engineered around the rest. Subquadratic, a Miami-based startup with 11 PhD researchers on staff, launched its first model this week and says it's done with workarounds. Its new architecture — Subquadratic Selective Attention (SSA) — claims linear scaling in both compute and memory with respect to context length. The result: a 12-million-token context window, available in API today. "For prompt A, words one and six are going to be important to each other. For prompt B, maybe it's words two and three. It's different for every single input." — Alex Whedon, CTO What actually changed The quadratic bottleneck comes from dense attention: with 1,000 tokens, every token attends to every other — 1,000² comparisons.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More