Clipping is currently applied after the optimizer step. The gradient clipping code controlled by the max-grad-norm cli param should be implemented between the backward and step calls.
Node Prediction Code
|
self.optimizer.zero_grad() |
|
loss.backward() |
|
rt_profiler.record('train_backward') |
|
self.optimizer.step() |
|
rt_profiler.record('train_step') |
|
|
|
if max_grad_norm is not None: |
|
th.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm, grad_norm_type) |
Clipping is currently applied after the optimizer step. The gradient clipping code controlled by the
max-grad-normcli param should be implemented between the backward and step calls.Node Prediction Code
graphstorm/python/graphstorm/trainer/np_trainer.py
Lines 214 to 221 in f3a0636