Gemma-4-31B on v6e-4 TPU Benchmarks

1 / 2

Gemma-4-31B on v6e-4 TPU Benchmarks

DEV Community·xbill·24 days ago

#Rk6lkAWS

#ai #devchallenge #google #llm #model #throughput

Reading 0:00

15s threshold

This is a submission for the Gemma 4 Challenge: Build with Gemma 4 model: Gemma-4-31B 🚀 Gemma 4 TPU v6e-4 Performance Report 📋 Deployment Overview Model: google/gemma-4-31B-it Hardware: Cloud TPU v6e-4 (Trillium) Runtime: v2-alpha-tpuv6e (Flex-start) TPU Location: southamerica-east1-c Serving Engine: vLLM (v0.20.2rc1.dev111+g8eb401134) 📊 Performance Summary (C1 - C1024) Peak Prefill Throughput: 463,345 tokens/sec Avg TTFT (~1.6k tokens): 2.597 seconds Avg TTFT (16k tokens): 4.775 seconds 📈 Concurrency Scaling Matrix (Mean per Concurrency) concurrency avg_ttft prefill_tps 1 0.546599 14778.3 2 0.562068 28121.7 4 0.595823 51869.1 8 0.679816 88055.5 16 0.872466 133697 32 1.16488 191631 64 1.55596 261802 128 2.15464 328909 256 3.55723 352654 512 7.59987 318854 1024 21.005 240170 🔍 Key Findings Efficiency Saturated: Maximum throughput was achieved at concurrency 256, reaching 463,345 tok/s .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Gemma-4-31B on v6e-4 TPU Benchmarks