Mindful Modeler

Share this post

Bayesian modeling from first principle and memes

mindfulmodeler.substack.com

Discover more from Mindful Modeler

Better machine learning by thinking like a statistician. About model interpretation, paying attention to data, and always staying critical.
Over 9,000 subscribers
Continue reading
Sign in

Bayesian modeling from first principle and memes

Christoph Molnar
May 9, 2023
9
Share this post

Bayesian modeling from first principle and memes

mindfulmodeler.substack.com
5
Share

It took me a long time to understand Bayesian statistics. So many angles from which to approach it: the Bayes' Theorem, probability as a degree of belief, Bayesian updating, priors, and posteriors, ... But my favorite angle is the following first principle:

> In Bayesian statistics, model parameters are random variables.

The "model" here can be a simple distribution. The mean of a distribution, the coefficient in logistic regression, the correlation coefficient – all these parameters are variables with a distribution.

Let's follow the implications of the parameters-are-variables premise to its full conclusion:

Thanks for reading Mindful Modeler! Subscribe for free to receive new posts and support my work.


Parameters are variables.

Therefore, modeling means estimating P(θ|X) , the parameter distribution for θ given the data X.



But there is a problem with P(θ|X). It's unclear how parameters given data are distributed. The inverse P(X|θ), the distribution of data given parameters, is more natural to estimate.


Fortunately, a mathematical “trick” can help: Bayes' theorem. The theorem inverses the condition to P(X|θ), which is the good old likelihood function.



Bayes' theorem also involves P(θ), the priori distribution. That's why Bayesians must specify a parameter distribution BEFORE observing data.

Model estimation equals update from prior to posterior.


Bayes' theorem also involves the term P(X) called the evidence, which is usually unfeasible to estimate.


So Bayesians usually sample from the posterior. Sampling from the posterior involves techniques such as MCMC.

This makes Bayesian models a bit more computationally intense to estimate.


From this first-principle perspective, Bayesian statistics made a lot more sense to me. It becomes clear why Bayesians have to work with a prior distribution, why estimation can be time-consuming, and why we need the Bayes Theorem.

Bayesian statistics is not just a method - it's a mindset that tells you how to model the world with data. But when I learned about it I always felt too close to the mathematics and models so it took me a long time to understand the big principles behind Bayesian stats.

Bayesian stats is not the only "mindset", there are also frequentist stats, causal inference, supervised machine learning, and many more. That's why I wrote Modeling Mindsets. To understand the big principles and not get lost in the details.

Get Modeling Mindsets

If you've already read "Modeling Mindsets", I would be extremely grateful if you could take a moment to leave a review on Amazon. Your honest feedback helps other readers make informed decisions about the book, and it also helps me as an author.

Thanks for reading Mindful Modeler! Subscribe for free to receive new posts and support my work.

9
Share this post

Bayesian modeling from first principle and memes

mindfulmodeler.substack.com
5
Share
5 Comments
Share this discussion

Bayesian modeling from first principle and memes

mindfulmodeler.substack.com
Sairam Sundaresan
Writes Gradient Ascent
May 9Liked by Christoph Molnar

I don't think I've laughed this much reading an article on Bayesian modeling. This was such a fun read!

Expand full comment
Reply
Share
1 reply by Christoph Molnar
Benedict Neo
Writes Neo
May 9·edited May 9Liked by Christoph Molnar

Thanks for this article! I'm taking a bayesian statistics course and would to clarify something. Is it true that P(X) is the normalizing constant, and in practice, we can just take the numerator P(X|θ) P(θ) which is proportional to the posterior distribution, and MCMC is used to sample from an unknown form of the posterior distribution. And there is no need to deal with P(X)?

Expand full comment
Reply
Share
1 reply by Christoph Molnar
3 more comments...
Top
New
Community

No posts

Ready for more?

© 2023 Christoph Molnar
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing