CODES Benchmark
We introduce CODES, a benchmark for comprehensive evaluation of surrogate architectures for coupled
ODE systems. Besides standard metrics like mean squared error (MSE) and inference time, CODES
provides insights into surrogate behaviour across multiple dimensions like interpolation,
extrapolation, sparse data, uncertainty quantification and gradient correlation. The benchmark
emphasizes usability through features such as integrated parallel training, a web-based
configuration generator, and pre-implemented baseline models and datasets. Extensive documentation
ensures sustainability and provides the foundation for collaborative improvement. By offering a fair
and multi-faceted comparison, CODES helps researchers select the most suitable surrogate for their
specific dataset and application while deepening our understanding of surrogate learning behaviour.
Motivation
There are many efforts to use machine learning models ("surrogates") to replace the costly numerics
required involved in solving coupled ODEs. But for the end user, it is not obvious how to choose the
right surrogate for a given task. Usually, the best choice depends on both the dataset and the
target application.
Dataset specifics - how "complex" is the dataset?
- How many samples are there?
- Are the trajectories very dynamic or are the developments rather slow?
- How dense is the distribution of initial conditions?
- Is the data domain of interest well-covered by the domain of the training set?
Task requirements:
- What is the required accuracy?
- How important is inference time? Is the training time limited?
- Are there computational constraints (memory or processing power)?
- Is uncertainty estimation required (e.g. to replace uncertain predictions by numerics)?
- How much predictive flexibility is required? Do we need to interpolate or extrapolate across
time?
Besides these practical considerations, one overarching question is always: Does the model only learn
the data, or does it "understand" something about the underlying dynamics?
Goals
This benchmark aims to aid in choosing the best surrogate model for the task at hand and additionally
to shed some light on the above questions.
To achieve this, a selection of surrogate models are implemented in this repository. They can be
trained on one of the included datasets or a custom dataset and then benchmarked on the
corresponding test dataset.
Some metrics included in the benchmark (but there is much more!):
- Absolute and relative error of the models.
- Inference time.
- Number of trainable parameters.
- Memory requirements (WIP).
- Predictive uncertainty.
- Pearson correlation coefficients.
Besides this, there are plenty of plots and visualisations providing insights into
the models behaviour:
- Error distributions - per model, across time or per quantity.
- Insights into interpolation and extrapolation across time.
- Behaviour when training with sparse data or varying batch size.
- Predictions with uncertainty and predictive uncertainty across time.
- Correlations between the either predictive uncertainty or dynamics (gradients) of the data and
the prediction error.
Some prime use-cases of the benchmark are:
- Finding the best-performing surrogate on a dataset. Here, best-performing could mean high
accuracy, low inference times or any other metric of interest (e.g. most accurate uncertainty
estimates, ...).
- Comparing performance of a novel surrogate architecture against the implemented baseline models.
- Gaining insights into a dataset or comparing datasets using the built-in dataset insights.
On This Website
This website should be a helpful guide to using CODES. Here is what can be found where (all links in the header):
-
Overview. You are here :)
-
Benchmark. If you want a thorough overview over the structure of CODES and how to use it,
this is the page to consult. There are many explanations on structure and configuration without too much technical details.
-
Docs. This is the more detailed and technical guide on code structure, configuration
and on how to add your own dataset or model to the benchmark.
-
Results. Exemplary evaluation of a benchmark run. This page gives an overview about what to expect as output
of the benchmark.
-
Config Maker. A javascript-based tool that helps you set up the configuration of the benchmark.
It generates the required
config.yaml
which can then be downloaded.
-
GitHub. Link to the CODES GitHub repo.
-
API Docs. Link to the technical API documentation, auto-generated with Sphinx.
-
Paper. Link to the CODES paper on arXiv, which was accepted for ML4PS@NeurIPS2024.