Reddit - Please wait for verification

📰

Reddit - Please wait for verification

Machine Learning·/u/Bobby-Ly·3 days ago

#machinelearning #reddit #article #discussion #englishlanguage

Reading 0:00

15s threshold

Vision Transformers waste 90% of their compute recalculating stationary asphalt. NeuroFlow tracks semantic surprise in embedding space, physically eliminating background tokens before the encoder. Result: 55.8x wall-clock speedup for ViTs on high-res video (1792p) with 97% fidelity. No fine-tuning required. NeuroFlow is a dynamic routing framework for Vision Transformer video inference. It exploits temporal redundancy by tracking per-patch semantic surprise via an Exponential Moving Average (EMA) of patch-level embeddings, effectively answering the architectural mismatch between O(N2) self-attention and highly redundant natural video streams. Key Contributions Architecture C (Dual-Memory Reconstruction): A completely training-free inference engine that combines a Layer 0 Retinal Gate with a Layer 12 Cortical Cache. It achieves 71.55% zero-shot top-1 accuracy at 84.0% token sparsity on SigLIP, retaining 92.4% of dense accuracy without modifying any weights.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Reddit - Please wait for verification