-
Notifications
You must be signed in to change notification settings - Fork 114
Description
Hello!
I came across texar-pytorch while in the process of writing my own version of the Executor, and am really happy that someone's already done the work for it, so firstly, thanks for the excellent repository.
Broad Query
One of the main questions I have pertains to the design requirement of including the loss as an item in the dictionary returned by the forward
method of the model. Essentially, I'm wondering what's the recommended pattern for including terms in the loss (for eg. regularization terms) that ought to not be a part of the validation loss? Conceptually, I think of the forward pass as being completed with the model making its predictions, which is what I believe the predict()
method is for. However, having the computation of the loss as a responsibility of the forward pass could lead to certain problems.
Context
Ideally, the training loss ought to be computed in a separate forward pass after the training epoch is completed. I'm aware that most people use an average over the training batches as an approximation of the training loss. However, this becomes an issue when comparing against a validation loss curve, where the difference between th
61AB
e training and validation curve indicates generalization error. This is for two reasons in the typical case:
- The model changes at the end of each batch when the optimizer takes a step, so it's an unfair comparison against the evaluation setting. The model might also overfit on a single batch.
- The model might have dropout and batch norm turned on during training which behave differently during evaluation.
As far as I can tell, the training loss is computed within the training loop in the executor, as opposed to an additional forward pass over the training set with model.eval()
. Is my understanding correct?
On regularization
Typically, any regularization terms over the model parameters is added to the loss before the call to backward
. With the given interface, it seems like the right place to do this would be in the forward pass. However, I find it a little weird to make the model responsible for regularizing itself. I usually have a separate nn.Module
subclass responsible for regularizing a model and dependency-inject the model into this class. That way I can swap out different regularizers without changing the model, in compliance with the Open-Close SOLID principle.
Could you please explain how to achieve these two things (loss computation over the training set after the train loop, and computing auxiliary loss terms outside the model) with the current setup of the executor? It seems like this is a direct consequence of requiring the model to compute the loss, which seems to be a little problematic. This is currently the main two problems precluding my use of this otherwise awesome repo, so I'd appreciate any insight.
Thanks!