Dynamic batching for Encoder-Decoder MT training or generation when long sequence caps the batch …

📰

Dynamic batching for Encoder-Decoder MT training or generation when long sequence caps the batch size [P]

Reddit r/MachineLearning·u/Leather_Loan5314·about 1 month ago

#batch #training #decoder #dynabatch #size #article

Reading 0:00

15s threshold

Dynamic batching for Encoder-Decoder MT training or generation when long sequence caps the batch size [P] I built a small pytorch sampler called **dynabatch** after facing this specific batching issue while fine tuning a NLLB-200 600M model. Training on RTX 5090, the largest fixed batch size I could use was 8, any bigger leads to OOM. While training and monitoring using **nvidia-smi ,** it looked like only a few batches were actually stressing the GPU. A lot of the time utilization was much lower. My guess was that fixed batch size was being dictated by the longests source/target examples, while the shorter examples probably had room for more samples per batch. So I tried to make the batch size change as the sequence lengths changed.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Dynamic batching for Encoder-Decoder MT training or generation when long sequence caps the batch size [P]