Forward Deployed Engineer: AI + HPC at Cedana | Y Combinator

📰

Forward Deployed Engineer: AI + HPC at Cedana | Y Combinator

Hacker News·Forward Deployed Engineer: AI + HPC at Cedana | Y Combinator·4 days ago

#ycombinator #cedana #slurm #research #customer #infrastructure

Reading 0:00

15s threshold

Introducing Cedana The Problem AI and HPC  infrastructure suffers from scarcity and high costs, so when failures happen they are costly in terms of time and money. Cluster productivity directly determines research output and revenue. Achieving high utilization and throughput is increasingly challenging due to the complexity of workloads, hardware, and operations. Cedana’s Solution Cedana maximizes AI+HPC cluster utilization and reliability with automated GPU checkpointing infrastructure. We enable transparent and fast migration of GPU workloads across instances, without losing work. Workloads automatically migrate to achieve new levels of reliability and throughput while accelerating time to results. Our system is at the kernel/OS level, requiring no code or config changes, and works seamlessly with Kubernetes, SLURM, and NVIDIA Dynamo. Today, we're deploying into leading inference platforms, neoclouds, enterprise, and research clusters.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Forward Deployed Engineer: AI + HPC at Cedana | Y Combinator