Menu

Post image 1
Post image 2
1 / 2
0

AI/ML Research Digest — May 23, 2026

DEV Community: machinelearning·Papers Mache·about 17 hours ago
#cc5p9sDS
#dev#long#generation#matching#token#attention
Reading 0:00
15s threshold

Extreme KV‑Cache Compression and Long‑Context Efficiency Static quantization is giving way to rotation‑based and context‑sensitive schemes. OCTOPUS and OScaR reach near‑lossless INT2 performance while cutting cache size dramatically [1] , [2] . Sparse token indexers replace dense caches with a searchable sketch, preserving attention fidelity at lower memory cost [3] . Linear‑attention decoupling splits the KV stream into a short‑term mutable part and a long‑term static part, keeping long‑context reasoning accurate without quadratic growth [4] . Together these ideas let models handle thousands of tokens on modest hardware, a bottleneck for many retrieval‑augmented and multilingual applications. Verifiable Rewards for LLM Reasoning RL from verifiable rewards (RLVR) refines policy updates with token‑level credit signals rather than the coarse GRPO baseline. Discriminative token weighting assigns higher reward to correct intermediate steps, improving math and code accuracy [5] .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More