Residual an non-residual variables and remappers for downscaling by yoel-zerah · Pull Request #1005 · ecmwf/anemoi-core

yoel-zerah · 2026-03-24T08:26:56Z

Description

This PR is for merging ds-collab-residuals into ds-collab. This brings two new features for downscaling :

The possibility to train with residual and non-residual variables at the same time.
The management of remappers for downscaling.

Training with residual and non-residual variables

For a given input x (not forcings, and ignoring whether high_res or low_res), an output variable y and a target y_target:
Residual variables are variables for which the actual prediction y_res by the model is the difference between the output y and the input x, and must match the target residual y_target - x:

y_res = y - x  => y = y_res + x

Residual variables require that the same field exists in both the input and output data.

Conversely, non-residual variables are variables that are a direct prediction y by the model, and attempt to match the target y_target directly. These variables can either exist in both input and output datasets or in the output dataset only.

Note

Variables that are concerned by being residual or non-residual are output variables.

To specify whether variables are residual or not, two recipe fields are introduced : data.residual_fields and data.direct_prediction_fields. This is how they behave:

if both direct_prediction_fields and residual_fields are undefined (or if they are set to null), all output variables are selected as non-residuals.
if direct_prediction_fields: [] and residual_fields is undefined, then all variables are residuals. Conversely, if direct_prediction_fields is undefined, and residual_fields: [] , then all variables are non-residuals.
if residual_fields is undefined and direct_prediction_fields contains a list of variables that don't account for all of the output variables, the remaining variables are residual. Conversely, if direct_prediction_fields is undefined and residual_fields contains a list of variables that don't account for all of the output variables, the remaining variables are non-residual.

Checks are performed, and the config will fail if :

direct_prediction_fields and residual_fields are both defined as empty list
direct_prediction_fields and residual_fields have variables in common
The joint set of variables of direct_prediction_fields and residual_fields is different from the set of output variables

Remappers for downscaling

Remapper are processors, meaning that they apply a transformation to the variables before they are fed to the model, and the inverse transformation when they are output by the model, just like a normalizer.
Like normalizers, remappers are defined in data.processors:

data:
  processors:
     normalizer:
        ...
     remapper: # First remapper
       ...
     remapper: # second remapper, etc...

All processors (normalizer, remapper and inputer) are applied sequentially in the same order as they are defined in the recipe (and in the reverse order for the inverse transformation) .

This PR adds the possibility for downscaling to use remappers with anemoi.models.preprocessing.ds_remapper:TopRemapper (inspired by anemoi.models.preprocessing.multi_dataset_normalizer:TopNormalizer).
Remapper are specified similarly to normalizers:

data:
   processors:
       remapper:
            _target_: anemoi.models.preprocessing.ds_remapper:TopRemapper
            _convert_: all
            config:
               default: "none"      # default remapper, applied to all variables unless they are listed in a method below
               boxcox: [tp]           # the boxcox mapping is applied to tp only
               method_kwargs:   # Defines additional keyword argument for mappers
                   boxcox:             # the mapping for which a keyword argument is specified
                      lambd: 0.1     # the kwarg for which the value is defined

The PR also introduces the possibility to supply kwargs to mapping functions used in remappers, such as the box-cox parameter (this feature doesn't exist yet in main !).

Other features

The behaviour of default for inference noise parameters (scheduler and sampler) has been updated:
Hardcoded inference defaults are applied if nothing else is supplied ; one of two sets of default parameters is selected depending on whether config.training.training_approach is set to probabilistic_low_noise or probabilistic_high_noise. If inference default parameters are supplied in the training recipe, they will take precedence. Finally, if parameters are supplied in the inference recipe, they will be taken into account instead.
Two new recipe fields have been added to allow training on a single training data sample (overfitting): config.dataloader.overfit_on_index and config.dataloader.overfit_on_index. When neither overfit_on_index and overfit_on_date aren't defined or set to null, nothing changes, the training will occur normally and iterate through the training dataset. When overfit_on_index is set to an integer, the dataloader is overriden, and all training epochs are made of a single training date, for which the index in the training dataset is set in overfit_on_index.
When overfit_on_date is set to a date (format YYYY-MM-DDThh:mm:ss), and if this date exist in the training dataset, the sample with this date will be selected as the only sample to iterate on. overfit_on_date overrides whatever is put into overfit_on_index .

Backward compatibility

There is no breaking change in this PR, other than a behaviour change in variable selection :
In a ds-collab recipe, direct_prediction_fields and residual_fields wouldn't be defined, and all variables would be taken as residuals. On the contrary, with ds-collab-recipe, the same recipe would yield a different behaviour : all variables would be non-residual.
As detailed above, to make sure that all variables are residual again in a ds-collab recipe, just define config.data.direct_prediction_fields: [].

📚 Documentation preview 📚: https://anemoi-training--1005.org.readthedocs.build/en/1005/

📚 Documentation preview 📚: https://anemoi-graphs--1005.org.readthedocs.build/en/1005/

📚 Documentation preview 📚: https://anemoi-models--1005.org.readthedocs.build/en/1005/

…fig.data

… computation

for more information, see https://pre-commit.ci

… perform overfitting.

for more information, see https://pre-commit.ci

…iteration on the chosen sample, instead of using the number of training samples as N.

for more information, see https://pre-commit.ci

…ny iterations per epoch with the same sampleinstead of 1 sample per epoch

OpheliaMiralles · 2026-03-25T07:46:58Z

see #973 for kwargs in remapper (including boxcox), might be some duplication here. Would be nice to run the precommit hooks so the diff is clean

With the new version of lightning (2.6.0) the default changes to RichProgressBar if the rich package is available (https://github.com/Lightning-AI/pytorch-lightning/releases/tag/2.6.0). On HPCs with Slurm, where the output is written to a file, this does not work. This introduces a progress bar callback, which allows you to use the different lightning progress bars or your custom progress bar. The progress bar class can be set in the callbacks and will be instantiated, e.g. ``` progress_bar: _target_: pytorch_lightning.callbacks.TQDMProgressBar refresh_rate: 1 ``` If no `progress_bar` is set, or the instantiation fails, a default `pytorch_lightning.callbacks.TQDMProgressBar` will be used. This also ensures backwards compatibility with the configs. Tests with different lighting versions (2.5.2, 2.5.5, 2.6.0) and both training interactively and with a slurm job worked well with the `pytorch_lightning.callbacks.TQDMProgressBar`. - Interactive debug run with RichColorbar: https://mlflow.ecmwf.int/#/experiments/395/runs/266ea7709a2a476e9337a60079f151b8/artifacts - Interactive debug run with TQDMProgressBar: https://mlflow.ecmwf.int/#/experiments/395/runs/b417e46603974cbc80823938434b1103/artifacts - Slurm debug run with TQDMProgressBar: https://mlflow.ecmwf.int/#/experiments/413/runs/8d2ecd0b1c68429b881fadc5db1bf51d/artifacts - Slurm debug run with RichProgressBar: https://mlflow.ecmwf.int/#/experiments/413/runs/d37e47b2ed814aacb607e6ccc7a5ec37/artifacts **Note:** The `pytorch_lightning.callbacks.RichProgressBar` does not work on slurm jobs. The progress bar only shows after the end of training. ***As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/*** By opening this pull request, I affirm that all authors agree to the [Contributor License Agreement.](https://github.com/ecmwf/codex/blob/main/Legal/contributor_license_agreement.md)  ---- 📚 Documentation preview 📚: https://anemoi-training--739.org.readthedocs.build/en/739/   ---- 📚 Documentation preview 📚: https://anemoi-graphs--739.org.readthedocs.build/en/739/   ---- 📚 Documentation preview 📚: https://anemoi-models--739.org.readthedocs.build/en/739/  --------- Co-authored-by: anaprietonem <ana.prietonemesio@ecmwf.int>

for more information, see https://pre-commit.ci

yoel-zerah · 2026-04-02T06:52:03Z

see #973 for kwargs in remapper (including boxcox), might be some duplication here. Would be nice to run the precommit hooks so the diff is clean

Hello,
I'm definitely expecting some duplication with feat(models)-atanh-transform, since #973 is based on the developments of this branch. However, this PR is not for merging with main, but with another downscaling branch that doesn't have the mapper kwargs features.

Lun4m · 2026-04-04T15:18:14Z

+    return x.pow_(lambd)
+
+
+def inverse_power_transform(x, lambd=0.33, tangent_linear_above_one=False):


The clip_negative argument is missing? It seems like in the way this is implemented you need the same args for both forward and backward transforms? Maybe it'd make more sense to have a class for each mapper with forward and backward methods so you can share the arguments more explicitly? But probably that's outside the scope of this PR

for more information, see https://pre-commit.ci

Joffrey Dumont Le Brazidec and others added 29 commits February 23, 2026 17:11

matching channels go to model part

0826d57

feat: (non/-) residual variables

7a092df

fix: moving indices computation to model rather than training

d6ab255

fix: changed residual_fields location in config

6cdf9b2

fix: config arg issue for residual variables and switching arg to con…

9a02653

…fig.data

refactoring

e4e2225

refactoring

64b1089

removed x_in_matching_indices

5a460e2

simplified and moved residual computation to model

f58c674

residual computation

41fe164

fix: in_place=False in pre_processors

8857c20

fix: device

53de66c

fix: residual computation

7a6fe7a

fix: removed stray NotImplementedError

487b23a

feat: input positiveness assertion, lambd default value and efficient…

d506924

… computation

removed debugging prints

a16e778

feat: added remappers for downscaling

4e309db

fix: dimension mismatch in add_interp_to_state

db8803b

enable remapper method kwargs in recipe

ff57a66

refactoring

9ff6468

add rescaled boxcox mapping and reworked inverse-boxcox

0b4b65e

fix: noise parameter default behaviour

68afe97

chore: uniformize prints and print only the retained values

970710b

enabled to not define default noise parameters in the recipe.

d7fb485

enable training approach dependent default noise setting

346ac98

fix: leftover row removal

779a95e

import of main remapper progress

95725d8

updates to remappers

d090127

missing arg

f585103

github-project-automation bot added this to Anemoi-dev Mar 24, 2026

github-project-automation bot moved this to To be triaged in Anemoi-dev Mar 24, 2026

github-actions bot added training models labels Mar 24, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

2296f81

for more information, see https://pre-commit.ci

github-actions bot added the graphs label Mar 24, 2026

ssmmnn11 requested review from JoffreyDumontLeBrazidec and OpheliaMiralles March 24, 2026 08:34

yoel-zerah removed the graphs label Mar 24, 2026

feat: allow to select an index of a sample in the dataset on which to…

455736b

… perform overfitting.

github-actions bot added the graphs label Mar 24, 2026

pre-commit-ci bot and others added 7 commits March 24, 2026 09:58

[pre-commit.ci] auto fixes from pre-commit.com hooks

77db67e

for more information, see https://pre-commit.ci

fix: change overfitting behaviour so that 1 epoch corresponds to N=1 …

95bcfa8

…iteration on the chosen sample, instead of using the number of training samples as N.

[pre-commit.ci] auto fixes from pre-commit.com hooks

57f71dd

for more information, see https://pre-commit.ci

feat: overfit_on_date to select date to overfit on instead of index

d61d3b8

[pre-commit.ci] auto fixes from pre-commit.com hooks

23eae4d

for more information, see https://pre-commit.ci

Change clamp and log functions to in-place versions

02a406a

fix: due to epoch specific callbacks, it is more efficient to have ma…

60cacda

…ny iterations per epoch with the same sampleinstead of 1 sample per epoch

jakob-schloer and others added 3 commits April 2, 2026 01:46

fix: saving checkpoints by time was broken

78084e9

[pre-commit.ci] auto fixes from pre-commit.com hooks

570a07f

for more information, see https://pre-commit.ci

elkir added the ATS Approval Not Needed No approval needed by ATS label Apr 2, 2026

Lun4m reviewed Apr 4, 2026

View reviewed changes

yoel-zerah and others added 3 commits April 16, 2026 10:26

fix: missing clip_negative argument in inverse_power_transform

fe7ca89

[pre-commit.ci] auto fixes from pre-commit.com hooks

0f447ce

for more information, see https://pre-commit.ci

fix: missing clip_negative argument in inverse_power_transform

597ea3e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Residual an non-residual variables and remappers for downscaling#1005

Residual an non-residual variables and remappers for downscaling#1005
yoel-zerah wants to merge 44 commits intods-collabfrom
ds-collab-residuals

yoel-zerah commented Mar 24, 2026 •

edited by github-actions bot

Loading

Uh oh!

OpheliaMiralles commented Mar 25, 2026 •

edited

Loading

Uh oh!

yoel-zerah commented Apr 2, 2026 •

edited

Loading

Uh oh!

Lun4m Apr 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

		return x.pow_(lambd)


		def inverse_power_transform(x, lambd=0.33, tangent_linear_above_one=False):

Conversation

yoel-zerah commented Mar 24, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Training with residual and non-residual variables

Remappers for downscaling

Other features

Backward compatibility

Uh oh!

OpheliaMiralles commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yoel-zerah commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lun4m Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

yoel-zerah commented Mar 24, 2026 •

edited by github-actions bot

Loading

OpheliaMiralles commented Mar 25, 2026 •

edited

Loading

yoel-zerah commented Apr 2, 2026 •

edited

Loading

Lun4m Apr 4, 2026 •

edited

Loading