Skip to content

Residual an non-residual variables and remappers for downscaling#1005

Open
yoel-zerah wants to merge 44 commits intods-collabfrom
ds-collab-residuals
Open

Residual an non-residual variables and remappers for downscaling#1005
yoel-zerah wants to merge 44 commits intods-collabfrom
ds-collab-residuals

Conversation

@yoel-zerah
Copy link
Copy Markdown

@yoel-zerah yoel-zerah commented Mar 24, 2026

Description

This PR is for merging ds-collab-residuals into ds-collab. This brings two new features for downscaling :

  1. The possibility to train with residual and non-residual variables at the same time.
  2. The management of remappers for downscaling.

Training with residual and non-residual variables

For a given input x (not forcings, and ignoring whether high_res or low_res), an output variable y and a target y_target:
Residual variables are variables for which the actual prediction y_res by the model is the difference between the output y and the input x, and must match the target residual y_target - x:

y_res = y - x  => y = y_res + x

Residual variables require that the same field exists in both the input and output data.

Conversely, non-residual variables are variables that are a direct prediction y by the model, and attempt to match the target y_target directly. These variables can either exist in both input and output datasets or in the output dataset only.

Note

Variables that are concerned by being residual or non-residual are output variables.

To specify whether variables are residual or not, two recipe fields are introduced : data.residual_fields and data.direct_prediction_fields. This is how they behave:

  • if both direct_prediction_fields and residual_fields are undefined (or if they are set to null), all output variables are selected as non-residuals.
  • if direct_prediction_fields: []  and residual_fields is undefined, then all variables are residuals. Conversely, if direct_prediction_fields is undefined, and residual_fields: [] , then all variables are non-residuals.
  • if residual_fields is undefined and direct_prediction_fields contains a list of variables that don't account for all of the output variables, the remaining variables are residual. Conversely, if direct_prediction_fields is undefined and residual_fields contains a list of variables that don't account for all of the output variables, the remaining variables are non-residual.

Checks are performed, and the config will fail if :

  • direct_prediction_fields  and residual_fields  are both defined as empty list
  • direct_prediction_fields  and residual_fields  have variables in common
  • The joint set of variables of direct_prediction_fields  and residual_fields is different from the set of output variables

Remappers for downscaling

Remapper are processors, meaning that they apply a transformation to the variables before they are fed to the model, and the inverse transformation when they are output by the model, just like a normalizer.
Like normalizers, remappers are defined in data.processors:

data:
  processors:
     normalizer:
        ...
     remapper: # First remapper
       ...
     remapper: # second remapper, etc...

All processors (normalizer, remapper and inputer) are applied sequentially in the same order as they are defined in the recipe (and in the reverse order for the inverse transformation) .

This PR adds the possibility for downscaling to use remappers with anemoi.models.preprocessing.ds_remapper:TopRemapper (inspired by anemoi.models.preprocessing.multi_dataset_normalizer:TopNormalizer).
Remapper are specified similarly to normalizers:

data:
   processors:
       remapper:
            _target_: anemoi.models.preprocessing.ds_remapper:TopRemapper
            _convert_: all
            config:
               default: "none"      # default remapper, applied to all variables unless they are listed in a method below
               boxcox: [tp]           # the boxcox mapping is applied to tp only
               method_kwargs:   # Defines additional keyword argument for mappers
                   boxcox:             # the mapping for which a keyword argument is specified
                      lambd: 0.1     # the kwarg for which the value is defined

The PR also introduces the possibility to supply kwargs to mapping functions used in remappers, such as the box-cox parameter (this feature doesn't exist yet in main !).

Other features

  • The behaviour of default for inference noise parameters (scheduler and sampler) has been updated:
    Hardcoded inference defaults are applied if nothing else is supplied ; one of two sets of default parameters is selected depending on whether config.training.training_approach is set to probabilistic_low_noise or probabilistic_high_noise. If inference default parameters are supplied in the training recipe, they will take precedence. Finally, if parameters are supplied in the inference recipe, they will be taken into account instead.
  • Two new recipe fields have been added to allow training on a single training data sample (overfitting): config.dataloader.overfit_on_index and config.dataloader.overfit_on_index. When neither overfit_on_index and overfit_on_date aren't defined or set to null, nothing changes, the training will occur normally and iterate through the training dataset. When overfit_on_index is set to an integer, the dataloader is overriden, and all training epochs are made of a single training date, for which the index in the training dataset is set in overfit_on_index.
    When overfit_on_date is set to a date (format YYYY-MM-DDThh:mm:ss), and if this date exist in the training dataset, the sample with this date will be selected as the only sample to iterate on. overfit_on_date overrides whatever is put into overfit_on_index .

Backward compatibility

There is no breaking change in this PR, other than a behaviour change in variable selection :
In a ds-collab recipe, direct_prediction_fields  and residual_fields wouldn't be defined, and all variables would be taken as residuals. On the contrary, with ds-collab-recipe, the same recipe would yield a different behaviour : all variables would be non-residual.
As detailed above, to make sure that all variables are residual again in a ds-collab recipe, just define config.data.direct_prediction_fields: [].


📚 Documentation preview 📚: https://anemoi-training--1005.org.readthedocs.build/en/1005/


📚 Documentation preview 📚: https://anemoi-graphs--1005.org.readthedocs.build/en/1005/


📚 Documentation preview 📚: https://anemoi-models--1005.org.readthedocs.build/en/1005/

@OpheliaMiralles
Copy link
Copy Markdown
Contributor

OpheliaMiralles commented Mar 25, 2026

see #973 for kwargs in remapper (including boxcox), might be some duplication here. Would be nice to run the precommit hooks so the diff is clean

jakob-schloer and others added 3 commits April 2, 2026 01:46
With the new version of lightning (2.6.0) the default changes to
RichProgressBar if the rich package is available
(https://github.com/Lightning-AI/pytorch-lightning/releases/tag/2.6.0).
On HPCs with Slurm, where the output is written to a file, this does not
work.

This introduces a progress bar callback, which allows you to use the
different lightning progress bars or your custom progress bar. The
progress bar class can be set in the callbacks and will be instantiated,
e.g.
```
progress_bar:
  _target_: pytorch_lightning.callbacks.TQDMProgressBar
  refresh_rate: 1
```
If no `progress_bar` is set, or the instantiation fails, a default
`pytorch_lightning.callbacks.TQDMProgressBar` will be used. This also
ensures backwards compatibility with the configs.

Tests with different lighting versions (2.5.2, 2.5.5, 2.6.0) and both
training interactively and with a slurm job worked well with the
`pytorch_lightning.callbacks.TQDMProgressBar`.

- Interactive debug run with RichColorbar:
https://mlflow.ecmwf.int/#/experiments/395/runs/266ea7709a2a476e9337a60079f151b8/artifacts
- Interactive debug run with TQDMProgressBar:
https://mlflow.ecmwf.int/#/experiments/395/runs/b417e46603974cbc80823938434b1103/artifacts
- Slurm debug run with TQDMProgressBar:
https://mlflow.ecmwf.int/#/experiments/413/runs/8d2ecd0b1c68429b881fadc5db1bf51d/artifacts
- Slurm debug run with RichProgressBar:
https://mlflow.ecmwf.int/#/experiments/413/runs/d37e47b2ed814aacb607e6ccc7a5ec37/artifacts

**Note:** The `pytorch_lightning.callbacks.RichProgressBar` does not
work on slurm jobs. The progress bar only shows after the end of
training.

***As a contributor to the Anemoi framework, please ensure that your
changes include unit tests, updates to any affected dependencies and
documentation, and have been tested in a parallel setting (i.e., with
multiple GPUs). As a reviewer, you are also responsible for verifying
these aspects and requesting changes if they are not adequately
addressed. For guidelines about those please refer to
https://anemoi.readthedocs.io/en/latest/***

By opening this pull request, I affirm that all authors agree to the
[Contributor License
Agreement.](https://github.com/ecmwf/codex/blob/main/Legal/contributor_license_agreement.md)

<!-- readthedocs-preview anemoi-training start -->
----
📚 Documentation preview 📚:
https://anemoi-training--739.org.readthedocs.build/en/739/

<!-- readthedocs-preview anemoi-training end -->

<!-- readthedocs-preview anemoi-graphs start -->
----
📚 Documentation preview 📚:
https://anemoi-graphs--739.org.readthedocs.build/en/739/

<!-- readthedocs-preview anemoi-graphs end -->

<!-- readthedocs-preview anemoi-models start -->
----
📚 Documentation preview 📚:
https://anemoi-models--739.org.readthedocs.build/en/739/

<!-- readthedocs-preview anemoi-models end -->

---------
Co-authored-by: anaprietonem <ana.prietonemesio@ecmwf.int>
@yoel-zerah
Copy link
Copy Markdown
Author

yoel-zerah commented Apr 2, 2026

see #973 for kwargs in remapper (including boxcox), might be some duplication here. Would be nice to run the precommit hooks so the diff is clean

Hello,
I'm definitely expecting some duplication with feat(models)-atanh-transform, since #973 is based on the developments of this branch. However, this PR is not for merging with main, but with another downscaling branch that doesn't have the mapper kwargs features.

@elkir elkir added the ATS Approval Not Needed No approval needed by ATS label Apr 2, 2026
return x.pow_(lambd)


def inverse_power_transform(x, lambd=0.33, tangent_linear_above_one=False):
Copy link
Copy Markdown

@Lun4m Lun4m Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clip_negative argument is missing? It seems like in the way this is implemented you need the same args for both forward and backward transforms? Maybe it'd make more sense to have a class for each mapper with forward and backward methods so you can share the arguments more explicitly? But probably that's outside the scope of this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: To be triaged

Development

Successfully merging this pull request may close these issues.

6 participants