Discussion about this post

User's avatar
Sebastian Müller's avatar

In Neural Networks, weight averaging the five models can sometimes work pretty well as an alternative to ensembling (especially if you use techniques like git rebasin).

Keeping a held-out test set to pick the best method among the ones you presented can also help.

Expand full comment
PT's avatar

In high-stakes situations, where small performance gains might mean a lot of money, here is what I did: instead of evaluating a model, I'd evaluate the whole training-validation strategy.

Assume you are comparing inside out vs parameter donation: you can simulate how those two strategies would perform out of sample and out of time by replaying your dataset over time, like a time series validation, and applying those strategies to all the data you have until then and then testing the final model against the next period of time. You can break the time intervals by week, month, quarter or year, depending on your problem.

If parameter donation wins, then you can be confident it's the best strategy to use, even if you cannot make a meaningful claim about the test error! You can even quantify how much you expect it to be better than the other strategies in expectation. This allows you to trade-off the risk of uncertain test results with the potential performance gains of training with more or fresher data.

Expand full comment
3 more comments...

No posts

Ready for more?