Menu

#DataLoader

3 posts

Feed·
3 of 3 posts
📰
0

124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

DEV Community: pytorch·Ingero Team·about 1 month ago
#9zUBmuJ9
#dev#class#code#strong#dataloader#article

TL;DR: PyTorch's DataLoader can be 50-124x slower than direct tensor indexing for in-memory GPU workloads. We reproduced a real PyTorch issue on an RTX 4090 and traced every CUDA API call and Linux kernel event to find the root cause.…

15s
Read More