Residual an non-residual variables and remappers for downscaling#1005
Residual an non-residual variables and remappers for downscaling#1005yoel-zerah wants to merge 44 commits intods-collabfrom
Conversation
for more information, see https://pre-commit.ci
… perform overfitting.
for more information, see https://pre-commit.ci
…iteration on the chosen sample, instead of using the number of training samples as N.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…ny iterations per epoch with the same sampleinstead of 1 sample per epoch
|
see #973 for kwargs in remapper (including boxcox), might be some duplication here. Would be nice to run the precommit hooks so the diff is clean |
With the new version of lightning (2.6.0) the default changes to RichProgressBar if the rich package is available (https://github.com/Lightning-AI/pytorch-lightning/releases/tag/2.6.0). On HPCs with Slurm, where the output is written to a file, this does not work. This introduces a progress bar callback, which allows you to use the different lightning progress bars or your custom progress bar. The progress bar class can be set in the callbacks and will be instantiated, e.g. ``` progress_bar: _target_: pytorch_lightning.callbacks.TQDMProgressBar refresh_rate: 1 ``` If no `progress_bar` is set, or the instantiation fails, a default `pytorch_lightning.callbacks.TQDMProgressBar` will be used. This also ensures backwards compatibility with the configs. Tests with different lighting versions (2.5.2, 2.5.5, 2.6.0) and both training interactively and with a slurm job worked well with the `pytorch_lightning.callbacks.TQDMProgressBar`. - Interactive debug run with RichColorbar: https://mlflow.ecmwf.int/#/experiments/395/runs/266ea7709a2a476e9337a60079f151b8/artifacts - Interactive debug run with TQDMProgressBar: https://mlflow.ecmwf.int/#/experiments/395/runs/b417e46603974cbc80823938434b1103/artifacts - Slurm debug run with TQDMProgressBar: https://mlflow.ecmwf.int/#/experiments/413/runs/8d2ecd0b1c68429b881fadc5db1bf51d/artifacts - Slurm debug run with RichProgressBar: https://mlflow.ecmwf.int/#/experiments/413/runs/d37e47b2ed814aacb607e6ccc7a5ec37/artifacts **Note:** The `pytorch_lightning.callbacks.RichProgressBar` does not work on slurm jobs. The progress bar only shows after the end of training. ***As a contributor to the Anemoi framework, please ensure that your changes include unit tests, updates to any affected dependencies and documentation, and have been tested in a parallel setting (i.e., with multiple GPUs). As a reviewer, you are also responsible for verifying these aspects and requesting changes if they are not adequately addressed. For guidelines about those please refer to https://anemoi.readthedocs.io/en/latest/*** By opening this pull request, I affirm that all authors agree to the [Contributor License Agreement.](https://github.com/ecmwf/codex/blob/main/Legal/contributor_license_agreement.md) <!-- readthedocs-preview anemoi-training start --> ---- 📚 Documentation preview 📚: https://anemoi-training--739.org.readthedocs.build/en/739/ <!-- readthedocs-preview anemoi-training end --> <!-- readthedocs-preview anemoi-graphs start --> ---- 📚 Documentation preview 📚: https://anemoi-graphs--739.org.readthedocs.build/en/739/ <!-- readthedocs-preview anemoi-graphs end --> <!-- readthedocs-preview anemoi-models start --> ---- 📚 Documentation preview 📚: https://anemoi-models--739.org.readthedocs.build/en/739/ <!-- readthedocs-preview anemoi-models end --> --------- Co-authored-by: anaprietonem <ana.prietonemesio@ecmwf.int>
for more information, see https://pre-commit.ci
Hello, |
| return x.pow_(lambd) | ||
|
|
||
|
|
||
| def inverse_power_transform(x, lambd=0.33, tangent_linear_above_one=False): |
There was a problem hiding this comment.
The clip_negative argument is missing? It seems like in the way this is implemented you need the same args for both forward and backward transforms? Maybe it'd make more sense to have a class for each mapper with forward and backward methods so you can share the arguments more explicitly? But probably that's outside the scope of this PR
Description
This PR is for merging
ds-collab-residualsintods-collab. This brings two new features for downscaling :Training with residual and non-residual variables
For a given input
x(not forcings, and ignoring whetherhigh_resorlow_res), an output variableyand a targety_target:Residual variables are variables for which the actual prediction
y_resby the model is the difference between the outputyand the inputx, and must match the target residualy_target - x:Residual variables require that the same field exists in both the input and output data.
Conversely, non-residual variables are variables that are a direct prediction
yby the model, and attempt to match the targety_targetdirectly. These variables can either exist in both input and output datasets or in the output dataset only.Note
Variables that are concerned by being residual or non-residual are output variables.
To specify whether variables are residual or not, two recipe fields are introduced :
data.residual_fieldsanddata.direct_prediction_fields. This is how they behave:direct_prediction_fieldsandresidual_fieldsare undefined (or if they are set tonull), all output variables are selected as non-residuals.direct_prediction_fields: []andresidual_fieldsis undefined, then all variables are residuals. Conversely, ifdirect_prediction_fieldsis undefined, andresidual_fields: [], then all variables are non-residuals.residual_fieldsis undefined anddirect_prediction_fieldscontains a list of variables that don't account for all of the output variables, the remaining variables are residual. Conversely, ifdirect_prediction_fieldsis undefined andresidual_fieldscontains a list of variables that don't account for all of the output variables, the remaining variables are non-residual.Checks are performed, and the config will fail if :
direct_prediction_fieldsandresidual_fieldsare both defined as empty listdirect_prediction_fieldsandresidual_fieldshave variables in commondirect_prediction_fieldsandresidual_fieldsis different from the set of output variablesRemappers for downscaling
Remapper are processors, meaning that they apply a transformation to the variables before they are fed to the model, and the inverse transformation when they are output by the model, just like a normalizer.
Like normalizers, remappers are defined in
data.processors:All processors (
normalizer,remapperandinputer) are applied sequentially in the same order as they are defined in the recipe (and in the reverse order for the inverse transformation) .This PR adds the possibility for downscaling to use remappers with
anemoi.models.preprocessing.ds_remapper:TopRemapper(inspired byanemoi.models.preprocessing.multi_dataset_normalizer:TopNormalizer).Remapper are specified similarly to normalizers:
The PR also introduces the possibility to supply kwargs to mapping functions used in remappers, such as the box-cox parameter (this feature doesn't exist yet in
main!).Other features
Hardcoded inference defaults are applied if nothing else is supplied ; one of two sets of default parameters is selected depending on whether
config.training.training_approachis set toprobabilistic_low_noiseorprobabilistic_high_noise. If inference default parameters are supplied in the training recipe, they will take precedence. Finally, if parameters are supplied in the inference recipe, they will be taken into account instead.config.dataloader.overfit_on_indexandconfig.dataloader.overfit_on_index. When neitheroverfit_on_indexandoverfit_on_datearen't defined or set tonull, nothing changes, the training will occur normally and iterate through the training dataset. Whenoverfit_on_indexis set to an integer, the dataloader is overriden, and all training epochs are made of a single training date, for which the index in the training dataset is set inoverfit_on_index.When
overfit_on_dateis set to a date (formatYYYY-MM-DDThh:mm:ss), and if this date exist in the training dataset, the sample with this date will be selected as the only sample to iterate on.overfit_on_dateoverrides whatever is put intooverfit_on_index.Backward compatibility
There is no breaking change in this PR, other than a behaviour change in variable selection :
In a
ds-collabrecipe,direct_prediction_fieldsandresidual_fieldswouldn't be defined, and all variables would be taken as residuals. On the contrary, withds-collab-recipe, the same recipe would yield a different behaviour : all variables would be non-residual.As detailed above, to make sure that all variables are residual again in a
ds-collabrecipe, just defineconfig.data.direct_prediction_fields: [].📚 Documentation preview 📚: https://anemoi-training--1005.org.readthedocs.build/en/1005/
📚 Documentation preview 📚: https://anemoi-graphs--1005.org.readthedocs.build/en/1005/
📚 Documentation preview 📚: https://anemoi-models--1005.org.readthedocs.build/en/1005/