Closed
Description
The mu_product
field in the NAdam state is never initialized and therefore defaults to zero. See here the initialization of the state (I also searched the rest of the code and there was no other initialization):
The mu_product
by definition is always going to be the product of other multiplications, and therefore will always remain 0.
In the PyTorch
documentation, you can see that in the source code for NAdam in the _init_group
function they initialize the mu_product
to 1:
Metadata
Metadata
Assignees
Labels
No labels