Menu

#Ingero

3 posts

Feed·
3 of 3 posts
📰
0

Tracing a 13x PyTorch Slowdown to a Hidden NumPy Synchronization

DEV Community: pytorch·Ingero Team·about 1 month ago
#z0ZFA4vY
#dev#class#code#numpy#ingero#article

TL;DR: A .cpu().numpy() call buried inside a forward pass was forcing a full CPU-GPU synchronization on every batch, every loop iteration. The GPU would finish its work in milliseconds, then sit idle for ~2 seconds waiting for Python and NumPy to catch…

15s
Read More