Introduction to Chronos-2: From univariate to universal forecasting

Time series forecasting is essential for numerous applications in business, science, and engineering. Recently, fundamental models have led to a paradigm shift in time series forecasting. Unlike statistical models that extrapolate from a single time series, or earlier deep-learning models that were trained on specific tasks, time series foundation models (TSFMs) are trained once on large-scale time series data and then applied across forecasting problems.

Related content

Tokenizing time series data and treating it like a language enables a model whose zero-shot performance matches or exceeds that of purpose-built models.

Since their initial release, Amazon’s TSFMs, Chronos and Chronos-Bolt, have collectively been downloaded over 600 million times from Hugging Face, demonstrating the popularity of TSFMs and their applicability to arbitrary forecasting scenarios.

Despite their success, existing TSFMs have an important limitation: they only support univariate forecasts that predict a single time series at a time. Although univariate forecasting is important, many scenarios require additional options. Real-world forecasting problems often involve forecasting multiple time series that evolve simultaneously (multivariate prognosis) or incorporate external factors that influence the results (covariate-informed forecasting). For example, cloud infrastructure metrics such as CPU usage, memory usage, and storage I/O evolve together and benefit from joint modeling. Likewise, retail demand is strongly influenced by promotional activities, while energy consumption patterns are driven by weather conditions.

To address this limitation, we introduce Chronos-2, a basic model designed to handle arbitrary forecasting tasks—univariate, multivariate, and covariate informed—in a zero shots meeting. Chronos-2 leverages in-context learning (ICL) to enable these capabilities without additional training.

For multivariate forecasting, Chronos-2 can jointly forecast multiple time series that evolve simultaneously, capturing dependencies that improve overall accuracy. For example, cloud operations teams can collaboratively forecast CPU usage, memory usage, and storage I/O to anticipate resource bottlenecks before they occur.

FoundationModels-AutoAirflow-16x9.png

Related content

To transform scientific domains, basic models will require satisfaction of physical constraints, quantification of uncertainty, and specialized forecasting techniques that overcome data scarcity while maintaining scientific rigor.

For covariate-informed forecasting, Chronos-2 can incorporate external factors that influence predictions. The model only supports past covariates (such as historical traffic volume that signals future trends) and known future covariates (such as planned campaigns or weather forecasts). It also handles categorical covariates, such as specific holidays or campaign types. For example, a retailer can forecast demand while taking into account planned sales promotions and holiday schedules to optimize inventory levels.

Chronos-2’s enhanced ICL features also improve univariate forecasts by enabling transversal learningwhere the model shares information across univariate time series, leading to more accurate predictions. This is especially valuable for cold-start scenarios: a logistics company opening a new distribution center can leverage patterns from existing facilities to generate accurate forecasts, even with minimal operating history.

The complete Chronos-2 pipeline. Input time series (measures and covariates) are first normalized using a robust scaling scheme, after which a time index and mask meta-functions are added. The resulting sequences are partitioned into non-overlapping patches and mapped to high-dimensional embeddings via a residual network. The core transformer stack operates on these patch embeddings and produces multi-patch quantile outputs corresponding to the future patches masked in the input. Each transformer block alternates between time and group attention layers: the time attention layer aggregates information across patches within a single time series, while the group attention layer aggregates information across all series within a group at each patch index. The figure illustrates two multivariate time series with a known covariate each, with corresponding groups highlighted in blue and red. This example is for illustrative purposes only; Chronos-2 supports any number of measures and optional covariates.

Construction of a universal TSFM like Chronos-2 required innovation on two fronts: model architecture and training data. Downstream forecasting tasks differ in their number of dimensions and their semantic content. Since it is impossible to know in advance how the variables will interact in an unseen task, the model must infer these interactions from the available context.

Cuboid decomposition.16x9.png

Related content

New “cubic awareness” helps transformers deal with large-scale multidimensional data, while diffusion models enable probabilistic prediction.

Our group attention the mechanism accounts for such interactions through information exchange within groups of time series of arbitrary size. For example, if Chronos-2 predicts cloud metrics, CPU usage patterns can inform memory usage predictions. Group attention can also account for covariates by using e.g. information from promotional schedules to help forecast demand.

The training corpus is as critical as the architectural innovations. A universal TSFM needs to be trained on heterogeneous time series tasks, but high-quality pretraining data with multivariate dependencies and informative covariates are scarce. To address this problem, we rely on synthetic time series data generated by imposing a multivariate structure on time series sampled from base univariate generators.

Results of experiments on fev-bench time series benchmark. The average win rate and skill score are calculated relative to the scaled quantile loss (SQL), which evaluates probabilistic forecasting performance. Higher values ​​are better for both. Chronos-2 outperforms all existing pretrained models by a significant margin on this comprehensive benchmark, which includes univariate, multivariate, and covariate-informed forecasting tasks.

Chronos-2 results in univariate mode and the corresponding gains from in-context learning (ICL), shown as stacked bars on the covariate subset of fev-bench. ICL delivers large gains on tasks with covariates, demonstrating Chronos-2’s ability to efficiently use covariates via ICL. Besides Chronos-2, only TabPFN-TS and COSMIC support covariates, and Chronos-2 outperforms all baselines (including TabPFN-TS and COSMIC) by a wide margin.

Results on the GIFT-Eval time series benchmark. The average win rate and skill score in terms of (a) probability and (b) point prediction metrics. Higher values ​​are better for both win rate and skill score. Chronos-2 surpasses the previous best performing models, TimesFM-2.5 and TiRex.

Empirical evaluation confirms that Chronos-2 represents a leap in the capabilities of TSFMs. On the comprehensive time-series benchmark fev-bench, which spans a wide range of forecasting tasks—univariate, multivariate, and covariate-informed—Chronos-2 outperforms existing TSFMs by a large margin. We see the largest gains on covariate-informed tasks, demonstrating the strength of Chronos-2 in this practically important setting. On the GIFT-Eval benchmark, Chronos-2 ranks first among pretrained models. The Chronos-2 significantly outperforms its predecessor, the Chronos-Bolt, achieving a win rate of over 90% in head-to-head comparisons.

The ICL capabilities of Chronos-2 position it as a viable general forecasting model that can be used “as is” in production pipelines, significantly simplifying them. Chronos-2 is now available open source (links below). We invite researchers and practitioners to engage with Chronos-2 and join the research front with time series foundation models.

Get more information:

Chronos-2 Technical Report
Chronos GitHub Repository

Leave a Comment