Mindful Modeler

Share this post

Log Odds or Probability

mindfulmodeler.substack.com

Discover more from Mindful Modeler

Better machine learning by thinking like a statistician. About model interpretation, paying attention to data, and always staying critical.
Over 9,000 subscribers
Continue reading
Sign in

Log Odds or Probability

Which Space Should You Choose for Classifier Interpretation?

Christoph Molnar
Apr 25, 2023
7
Share this post

Log Odds or Probability

mindfulmodeler.substack.com
Share

What's the bigger difference?

A jump from 80.0% to 90.0%?
Or from 98.0% to 99.9%?

Well, depends on which space you are asking the question in.

The obvious answer would be that the first jump is larger, because it's a difference of 0.10, while the other is at 0.019.

If you thought so too, then it's because you calculated the difference in probability space, which is also the space in which I presented the question.

But you can also answer the question in log odds or logit space.

The logit of a probability is: logit(p) = log(p/(1-p)). We also call this the log odds. You’ll find the term “log odds” more often in classic statistics and “logits” more often in machine learning, at least that’s my impression so far. And in the logit space, the second jump is larger! Because in logit space we have:

log(0.9/0.1) - log(0.8/0.2) ≈ 0.8

log(0.999/0.001) - log(0.98/0.02) ≈ 3

The second jump (0.980 -> 0.999) also remains larger if we look at the multiplicative difference instead of the additive difference:

Thanks for reading Mindful Modeler! Subscribe for free to receive new posts and support my work.

log(0.9/0.1) / log(0.8/0.2) ≈ 1.58

and

log(0.999/0.001) / log(0.98/0.02) ≈ 1.77

But why is the second jump larger in logit space?

When probabilities are near 0 or 1, the logit function stretches out the values. So the logit amplifies small differences in probabilities close to 0 or close to 1.

A plot. x-axis is the probability, y-axis the logits. starts steep at (0, -8) then goes into a linear part and goes steep again ending in (1, 8)
Logits versus probabilities. In green: the jump from 98.0% to 99.9% and in red: jump from 80% to 90%


When logit versus probability matters

Many have probably had their first encounter with logits when using logistic regression. Because for the interpretation of the coefficients, we have to use the odds, which is just p/(1-p).

The β in logistic regression can be interpreted as the additive change in the log odds when the feature value is increased by 1 unit.

I always forget the interpretation, that’s why I created a cheat sheet:

Get Cheatsheet For free

That’s just how logistic regression works because it expresses the probability through the logistic function:

\(P(Y=1|X) = \frac{1}{1 + e^{-X\beta}}\)

To get the interpretation of β, you have to invert the logistic regression which will give you the logit function.

But you do have a choice as you can also interpret so-called marginal effects, where you force small changes in the input variables and observe the changes in the output of the model — since for logistic regression these are probabilities (at least they are between 0 and 1), the marginal effects will show how feature changes influence the model output on this level instead of the log odds.

But the distinction also matters for other interpretation techniques. When you use Shapley values on a classifier, you have the option to add a link function. Well, one is always used, but it’s the identity link f(X) = X. But if you pick the logit link for this argument, then you get the interpretation on the log scale.

And that can make sense, especially since on the logit (log odds), the feature effects are additive rather than multiplicative.

But it will also mean that feature effects that push the probabilities towards the extreme, will get more emphasis.

When to use which

Pro log odds:

  • Log odds are additive, while probabilities are multiplicative. Since many explanation methods have a tendency to express the prediction as a linear sum, the log odds might be a more natural choice

  • In many situations, a jump at the edges, like from 0.001 to 0.01 is more important than in the middle, like 0.5 to 0.6.

  • Avoids extreme values of 0 and 1

Pro probabilities:

  • Intuitive interpretation: Probabilities represent the likelihood of an event occurring that we humans can more easily grasp than anything on the logarithmic scale

  • Making decisions: The output of the classifier might be used in decision-making. Here it’s often easier to work with probabilities to make a decision, calculate expected costs, etc.

  • Familiarity: Seems more natural to interpret, since we at least feel like we understand probabilities better than log odds. Probabilities are something we hear about every day, like the probability of winning the lottery, or the probability of getting cancer, but we usually don’t have to work with log odds in daily life

Thanks for reading Mindful Modeler! Subscribe for free to receive new posts and support my work.

7
Share this post

Log Odds or Probability

mindfulmodeler.substack.com
Share
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Christoph Molnar
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing