Menu

#Accumulation

8 posts

Feed·
8 of 8 posts
Gradient Accumulation OOM: Hidden Memory Spike Explained
📰
0

Gradient Accumulation OOM: Hidden Memory Spike Explained

DEV Community: pytorch·TildAlice·about 1 month ago
#rktVxrIn
#dev#code#gradient#accumulation#batch#photo

You Set batch_size=1, Enabled Gradient Accumulation, and It Still Crashes Gradient accumulation is supposed to be the silver bullet for training large models on small GPUs.…

15s
Read More
Gradient Accumulation vs Large Batch: Memory & Cost Test
📰
0

Gradient Accumulation vs Large Batch: Memory & Cost Test

DEV Community: pytorch·TildAlice·about 1 month ago
#YylZIFrw
#dev#batch#gradient#accumulation#size#article

Why This Matters: The Memory Trap Nobody Warns You About Gradient accumulation promises to let you train with "effective batch size 128" on a GPU that can barely fit batch size 8. Sounds perfect, right?…

15s
Read More