Bridging the Gap: From Statistical…

Nov 14, 2023

When I consulted researchers on which statistical analysis to use for their data, a common first step was to think about the distribution of the target variable:

Read →

5 Comments

YTRE

Nov 14, 2023

You can also mix, adapt and customize your distribution.

Expand full comment

Reply (1)

Christoph Molnar

Nov 14, 2023

Yes that's true. But loss functions still give more flexibility, because it allows you to leave the realm of distributions.

Expand full comment

Michael Rivard

Nov 19, 2023

Thank you for this well-written and insightful article. Could you please point me to a reference that derives the connection between data distribution and loss function.

Expand full comment

Anier Velasco Sotomayor

Nov 14, 2023

Something worth mentioning is that maximizing the likelihood assumes the prior distribution (that of the parameter) is uniform or irrelevant. The a posteriori is proportional to the product of the likelihood and the prior distribution, from Bayes’ theorem. Sometimes, when we know enough about the priors, it’s worth maximizing the a posteriori, which could give out of the box regularisation. For example, in the linear regression problem, if we assume the prior distribution is Gaussian, then we get the same loss + an L2 regularisation term.

Expand full comment

BrettL

Nov 15, 2023

Proper scoring rules are another useful concept related to the above (see e.g. https://www.bundesbank.de/resource/blob/635562/7d3de0f3fc003e5b4864828143f268cf/mL/2012-06-01-eltville-11-gneiting-paper-data.pdf) that link statistical functionals (mean, median, etc) with loss functions. For example, forecasts of the conditional mean of the distribution can be assessed with a range of loss functions beyond the mean squared error, each with different sensitivities to over/underprediction - one particular function being the QLIKE loss = y/x - log(y/x) - 1 (x=forecast, y=observed) that is popular in the volatility forecasting literature.

Expand full comment

Mindful Modeler

Bridging the Gap: From Statistical…