Ingero Team

Author Profile

Claim This Author Profile

Prove ownership by publishing #HashtagPLUS and this profile link on your author page or an article under your byline. A moderator or admin will review the request before it merges into your real HashtagPLUS username.

0 karma0 postsjoined about 1 month ago

Tracing torch.cuda.empty_cache() on an RTX 4090 - Where Do the 53 MB Go?

🌐 dev.toSource

TL;DR After del tensor; torch.cuda.empty_cache(), PyTorch's caching allocator still...

#dev #empty_cache #pytorch #cuda #memory #allocator #article #englishlanguage

4 days ago

What Inference-Platform Benchmark Posts Leave Out

🌐 dev.toSource

From Dev.to - ai: What Inference-Platform Benchmark Posts Leave Out

#machinelearning #ai #gpu #rank #echo #ingero #events #demo

20 days ago

MCP Shows What the Agent Did. eBPF Shows Why the GPU Stalled.

🌐 dev.toSource

From Dev.to - machinelearning: MCP Shows What the Agent Did. eBPF Shows Why the GPU Stalled.

#ai #machinelearning #monitoring #agent #layer #calls #ebpf #article

22 days ago

MCP Tools Are New API Surfaces. eBPF Sees What They Actually Touch.

🌐 dev.toSource

From Dev Community: MCP Tools Are New API Surfaces. eBPF Sees What They Actually Touch.

#ai #machinelearning #monitoring #tool #agent #kernel #call #ingero

26 days ago

A Cluster Stall Looks Healthy on Every Host. The Cause Is in the Pattern Across Hosts.

🌐 dev.toSource

From Dev RSS Feed: A Cluster Stall Looks Healthy on Every Host. The Cause Is in the Pattern Across Hosts.

#machinelearning #devops #echo #events #fullscreen #rank #article #video

27 days ago

GPU Utilization Is a Counter, Not a Cause

🌐 dev.toSource

From Dev Community: GPU Utilization Is a Counter, Not a Cause

#gpuobservability #ebpf #gpu #kernel #utilization #throughput #time #nvidia

29 days ago

CUDA Out of Memory at 60% Utilization: Tracing PyTorch GPU Memory Fragmentation

🌐 dev.toSource

From Dev.to - pytorch: CUDA Out of Memory at 60% Utilization: Tracing PyTorch GPU Memory Fragmentation

#gpu #memory #fullscreen #cudamalloc #cuda #article #englishlanguage

29 days ago

Tracing a 13x PyTorch Slowdown to a Hidden NumPy Synchronization

🌐 dev.toSource

TL;DR: A .cpu().numpy() call buried inside a forward pass was forcing a full CPU-GPU synchronization on every batch, every loop iteration. The GPU would finish its work in milliseconds, then sit idle for ~2 seconds waiting for Python and NumPy to catch up. Replacing the NumPy log

#dev #class #code #numpy #ingero #article #discussion #englishlanguage

about 1 month ago

124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

🌐 dev.toSource

TL;DR: PyTorch's DataLoader can be 50-124x slower than direct tensor indexing for in-memory GPU workloads. We reproduced a real PyTorch issue on an RTX 4090 and traced every CUDA API call and Linux kernel event to find the root cause. The GPU wasn't slow - it was starving. DataLo

#dev #class #code #strong #dataloader #article #englishlanguage

about 1 month ago

Menu

Ingero Team