Setting up Ray on GKE: How I spent a week optimising Docker pulls?

1 / 2

Setting up Ray on GKE: How I spent a week optimising Docker pulls?

DEV Community·Emin Mammadov·23 days ago

#MqeEfXHp

#llmops #ray #kubernetes #gcp #cluster #node

Reading 0:00

15s threshold

I spent a week debugging slow Ray cluster starts on GKE. The fix was a region mismatch that is not very obvious from the docs. We've been running Ray on GKE (with Anyscale) for over a year on the AI Platform team at Geotab. As self-hosted LLM workloads grow, Ray is one of the tools that makes scaling them practical. Introducing Ray and making it a go-to platform for multiple teams has been a rewarding but challenging path. One issue I kept running into: slow Ray cluster spawn times. Here's where the time actually went, and what helped. 1. GKE node provisioning: 2-3 minutes When Ray's autoscaler asks for a new node, GKE has to allocate a VM, boot the OS, register the kubelet, and join the cluster. GPU nodes add another 30-50 seconds for driver install. We treated this as a baseline cost - no point optimizing anything else until the node exists. That recently changed a bit as GCP introduced GKE Active Buffer that aims to minimize that time. I haven't tested it yet, but it's on the list. 2.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Setting up Ray on GKE: How I spent a week optimising Docker pulls?