Menu

#Cuda

36 posts

Feed·
20 of 36 posts
 Nvidia's long-awaited N1/N1X SoC specs leak ahead of Computex launch — N1 to feature up to 20 Arm-based cores, standard N1 equipped with 12- and 10-core configs
📰
0

Nvidia's long-awaited N1/N1X SoC specs leak ahead of Computex launch — N1 to feature up to 20 Arm-based cores, standard N1 equipped with 12- and 10-core configs

Latest from Tom's Hardware ·Hassam Nasir·about 14 hours ago
#xpN10TnF

The N1X reportedly comes in two SKUs: a top-end 20-core option with 6,144 CUDA cores matching the desktop RTX 5070, and a cut-down 18-core option with 5,120 CUDA cores.…

15s
Read More
NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++, Compiler Autotuning, and Python
🖼️
0

NVIDIA CUDA 13.3 Enhances GPU Development with Tile Programming in C++, Compiler Autotuning, and Python

NVIDIA Technical Blog·Jonathan Bentz·3 days ago
#MjyCwVQT
#developer#include#cuda#cccl#import#python

NVIDIA CUDA 13.3 brings new capabilities and performance optimizations to developers across the CUDA ecosystem. The launch of NVIDIA CUDA Tile programming in…

15s
Read More
Develop High-Performance GPU Kernels in C++ with NVIDIA CUDA Tile
🖼️
0

Develop High-Performance GPU Kernels in C++ with NVIDIA CUDA Tile

NVIDIA Technical Blog·Jonathan Bentz·3 days ago
#VjvQQoBg
#developer#include#define#tile#auto#float

Developers can now use NVIDIA CUDA Tile programming within large existing C++ GPU codebases to develop highly optimized GPU kernels using tile-based…

15s
Read More
Why CUDA kernels silently corrupt memory and how to catch the bug
🖼️
0

Why CUDA kernels silently corrupt memory and how to catch the bug

DEV Community·Alan West·20 days ago
#pAAsZBHU
#ifndef#cuda#rust#kernel#scratch#compute

A practical guide to debugging silent memory corruption in CUDA kernels, with compute-sanitizer workflows and a look at Rust-on-GPU tooling.

15s
Read More
How to Deploy Llama 3.2 90B with vLLM + Speculative Decoding on a $16/Month DigitalOcean GPU Droplet: 2.5x Faster Inference at 1/110th Claude Cost
🖼️
0

How to Deploy Llama 3.2 90B with vLLM + Speculative Decoding on a $16/Month DigitalOcean GPU Droplet: 2.5x Faster Inference at 1/110th Claude Cost

DEV Community·RamosAI·21 days ago
#pACPdpCa

From Dev.to - tutorial: How to Deploy Llama 3.2 90B with vLLM + Speculative Decoding on a $16/Month DigitalOcean GPU Droplet: 2.5x Faster Inference at 1/110th Claude Cost

15s
Read More
CUDA Proves Nvidia Is a Software Company - Slashdot
🖼️
0

CUDA Proves Nvidia Is a Software Company - Slashdot

hardware.slashdot.org·hardware.slashdot.org·21 days ago
#ZvzQBICT
#comments#modal_box#cuda#nvidia#gpus#single

Nvidia's real AI moat isn't "a piece of hardware," writes Wired's Sheon Han. It's CUDA: a mature, deeply optimized software ecosystem that keeps machine-learning workloads tied to Nvidia GPUs.…

15s
Read More
The Zero-Trust Docker Pipeline: Securing GPU/AI Container Images from Build to Production
🖼️
0

The Zero-Trust Docker Pipeline: Securing GPU/AI Container Images from Build to Production

DEV Community·Pavan Madduri·24 days ago
#3rTHTS1s
#ai#docker#cuda#base#nvidia#article

From Dev.to - docker: The Zero-Trust Docker Pipeline: Securing GPU/AI Container Images from Build to Production

15s
Read More
How to Deploy Llama 3.2 Vision with TensorRT on a $14/Month DigitalOcean GPU Droplet: 3x Faster Multimodal Inference at 1/120th Claude Vision Cost
🖼️
0

How to Deploy Llama 3.2 Vision with TensorRT on a $14/Month DigitalOcean GPU Droplet: 3x Faster Multimodal Inference at 1/120th Claude Vision Cost

DEV Community·RamosAI·26 days ago
#WP3rQwRJ

From Dev.to - tutorial: How to Deploy Llama 3.2 Vision with TensorRT on a $14/Month DigitalOcean GPU Droplet: 3x Faster Multimodal Inference at 1/120th Claude Vision Cost

15s
Read More
How to Deploy Llama 3.2 11B with TensorRT-LLM on a $12/Month DigitalOcean GPU Droplet: 4x Faster Inference at 1/70th API Cost
🖼️
0

How to Deploy Llama 3.2 11B with TensorRT-LLM on a $12/Month DigitalOcean GPU Droplet: 4x Faster Inference at 1/70th API Cost

DEV Community·RamosAI·29 days ago
#mew7Ko7j

From Dev.to - webdev: How to Deploy Llama 3.2 11B with TensorRT-LLM on a $12/Month DigitalOcean GPU Droplet: 4x Faster Inference at 1/70th API Cost

15s
Read More
How to Deploy Llama 3.2 405B with vLLM + Tensor Parallelism on a $40/Month DigitalOcean GPU Cluster: Enterprise-Scale Inference at 1/30th API Cost
🖼️
0

How to Deploy Llama 3.2 405B with vLLM + Tensor Parallelism on a $40/Month DigitalOcean GPU Cluster: Enterprise-Scale Inference at 1/30th API Cost

DEV Community·RamosAI·about 1 month ago
#ZWfSpKoA

From Dev.to - tutorial: How to Deploy Llama 3.2 405B with vLLM + Tensor Parallelism on a $40/Month DigitalOcean GPU Cluster: Enterprise-Scale Inference at 1/30th API Cost

15s
Read More
Your AI, Your Rules: Running a Local LLM with GPU Acceleration on Proxmox
🖼️
0

Your AI, Your Rules: Running a Local LLM with GPU Acceleration on Proxmox

DEV Community·Clint·about 1 month ago
#FABpQ28K
#part#key#fullscreen#nvidia#llama#cuda

From 3 tok/s frustration to 21 tok/s GPU-hybrid inference - a real engineer's guide to self-hosted...

15s
Read More