5 Comments
Nov 14, 2023Liked by Christoph Molnar

You can also mix, adapt and customize your distribution.

Expand full comment
author

Yes that's true. But loss functions still give more flexibility, because it allows you to leave the realm of distributions.

Expand full comment

Thank you for this well-written and insightful article. Could you please point me to a reference that derives the connection between data distribution and loss function.

Expand full comment

Proper scoring rules are another useful concept related to the above (see e.g. https://www.bundesbank.de/resource/blob/635562/7d3de0f3fc003e5b4864828143f268cf/mL/2012-06-01-eltville-11-gneiting-paper-data.pdf) that link statistical functionals (mean, median, etc) with loss functions. For example, forecasts of the conditional mean of the distribution can be assessed with a range of loss functions beyond the mean squared error, each with different sensitivities to over/underprediction - one particular function being the QLIKE loss = y/x - log(y/x) - 1 (x=forecast, y=observed) that is popular in the volatility forecasting literature.

Expand full comment

Something worth mentioning is that maximizing the likelihood assumes the prior distribution (that of the parameter) is uniform or irrelevant. The a posteriori is proportional to the product of the likelihood and the prior distribution, from Bayes’ theorem. Sometimes, when we know enough about the priors, it’s worth maximizing the a posteriori, which could give out of the box regularisation. For example, in the linear regression problem, if we assume the prior distribution is Gaussian, then we get the same loss + an L2 regularisation term.

Expand full comment