Menu

IN

Ingero Team

Author Profile

Claim This Author Profile

Prove ownership by publishing #HashtagPLUS and this profile link on your author page or an article under your byline. A moderator or admin will review the request before it merges into your real HashtagPLUS username.

Sign In To Claim
0 karma0 postsjoined about 1 month ago

🌐 dev.toSource

TL;DR After del tensor; torch.cuda.empty_cache(), PyTorch's caching allocator still...

4 days ago

🌐 dev.toSource

From Dev.to - ai: What Inference-Platform Benchmark Posts Leave Out

20 days ago

🌐 dev.toSource

From Dev.to - machinelearning: MCP Shows What the Agent Did. eBPF Shows Why the GPU Stalled.

22 days ago

🌐 dev.toSource

From Dev Community: MCP Tools Are New API Surfaces. eBPF Sees What They Actually Touch.

26 days ago

🌐 dev.toSource

From Dev RSS Feed: A Cluster Stall Looks Healthy on Every Host. The Cause Is in the Pattern Across Hosts.

27 days ago

🌐 dev.toSource

From Dev Community: GPU Utilization Is a Counter, Not a Cause

29 days ago

🌐 dev.toSource

From Dev.to - pytorch: CUDA Out of Memory at 60% Utilization: Tracing PyTorch GPU Memory Fragmentation

29 days ago

🌐 dev.toSource

TL;DR: A .cpu().numpy() call buried inside a forward pass was forcing a full CPU-GPU synchronization on every batch, every loop iteration. The GPU would finish its work in milliseconds, then sit idle for ~2 seconds waiting for Python and NumPy to catch up. Replacing the NumPy log

about 1 month ago

🌐 dev.toSource

TL;DR: PyTorch's DataLoader can be 50-124x slower than direct tensor indexing for in-memory GPU workloads. We reproduced a real PyTorch issue on an RTX 4090 and traced every CUDA API call and Linux kernel event to find the root cause. The GPU wasn't slow - it was starving. DataLo

about 1 month ago