mainstream. We first saw them in language, then vision, and now also in video and speech. The recipe by now is familiar: first, pretrain a big neural net on large enough data, then apply the model to downstream tasks without any per-task adaptation. For many industrial applications, time series is a crucial modality . We frequently need to do forecasting, anomaly detection, and classification by using different kinds of recording data. The current practice is usually to build dedicated models for one specific problem at hand. That can work, but it involves quite some “reinventing the wheel”, and may deliver suboptimal performance if the dataset for the current problem is small. Naturally, we’d like to ask: can we apply the same recipe here, that is, pretrain a large time-series foundation model and use it for any downstream tasks, out of the box? That’s the bet behind time series foundation models , or TSFM s.…