Don't be dogmatic about interpretability-by-design versus post-hoc
When to use which ML interpretation approach
Should machine learning models be interpretable by design or should we interpret complex models post-hoc, with tools such as permutation feature importance and SHAP?
Interpretability-by-design versus post-hoc interpretation is a dividing topic:
Rudin (2018) says we shouldn’t rely on post-hoc interpretation in high-stakes scenarios.
I know many statisticians who exclusively use inherently interpretable (statistical) models.
You rarely see models that are interpretable-by-design win Kaggle competitions. Post-hoc explanations are often employed.
Many companies rely on good old logistic regression and won’t easily switch to more complex models (+ post-hoc interpretation).
Can we resolve this conflict?
Is design vs. post-hoc a philosophical debate?
Is the interpretability debate like the Bayesian versus frequentist stats question? Unresolved, maybe hopelessly so? Bayesianism versus frequentism is a situation where you must decide for yourself what the true nature of probability is and then pick one of the two camps. If you’ve read Modeling Mindsets, you know what I mean.
For a long time, I thought about interpretability by design versus post-hoc interpretability as a philosophical question. But I now have a more fine-grained view on this question, because I believe there are interpretation scenarios where you should pick one approach over the other.
What often falls too short in these debates are the actual goals that we have in mind when “interpreting” a model. If you can specify the reason why you need interpretability, you sometimes can easily decide on one approach. The interpretability debate shrinks to a few goals where a cultural divide between interpretability-by-design and post-hoc interpretability remains.
It seems obvious that you should have a goal in mind when picking an interpretation approach. I’ve been super-focused on interpretable ML for over 6 years now, but my focus has been more method-centric. For example, in my book Interpretable Machine Learning, most chapters represent one interpretation method. My views changed thanks to a discussion with Timo with whom I work on the book “Supervised Machine Learning for Science”.
Let’s make it more concrete: How do your goals relate to interpretability-by-design versus post-hoc interpretability?
Pick interpretation approaches based on your goals
Let’s talk more about some of the reasons why people use interpretability in the first place.
Debug and improve the model: I’m currently participating in an ML competition and I’m using SHAP and permutation feature importance for finding modeling errors such as target leakage or improving the model with, for example, better-engineered features.
Justify the predictions in high-stakes scenarios: ML is used in lots of places where maybe it shouldn’t be used at all, like risk assessment of criminal recidivism (e.g. COMPAS). Such high-stakes scenarios require a way to challenge the model outputs, which in turn requires understanding how the decision was made.
Extract insights from the model about the world: Many researchers already rely on machine learning + interpretability in their work. A good example is this Almond Yield Prediction Paper using random forests and other ML models in combination post-hoc interpretation methods such as partial dependence plots and permutation feature importance.
These are 3 very different interpretation goals with different demands on interpretability, especially regarding the design vs. post-hoc decision.
Model debugging and improvement is probably the greatest interpretability playground because many approaches can help here. You can make the argument that model-agnostic post-hoc interpretation methods such as permutation feature importance and SHAP are best here because you can compare them across models. But also interpretable models such as linear regression can help you improve other more complex models.
In the case of high-stakes decision-making, I agree with Rudin (2019) that our best shots are inherently interpretable models. The reason: That’s the only way to produce 100% faithful explanations that align with how the model actually worked. Post-hoc interpretations are insufficient because they usually simplify relations.
The third case about extracting insights has a cultural divide. In classic statistical modeling, statisticians assume that the interpretable model reflects the data-generating process and then interpret the model (coefficients) and extend this interpretation to the world. That’s how most quantitative science works at the moment. What happens when you use more complex machine learning models and interpret them post-hoc? Timo and I have made the argument in Freiesleben et. al (2022) and Molnar et al (2021) that there is a justification for using the best-performing model (with constraints) and using a post-hoc interpretation that may be extended to the “world” under certain circumstances.
To summarize: Think of why you need interpretability before you pick an approach.
I would be happy to hear about your interpretation goals and use cases, either in the comments on Substack or via e-mail.
Picking an interpretation approach that matches your goal and modeling task constraints (data types, correlation, etc.) is difficult. So if there’s enough interest, I’ll write about how to pick the best interpretation tool for each goal in a follow-up post. Just let me know!
Great summary post!
From a business practitioner perspective, I may add 'justify investment and increase the chance of business implementation success'
Some companies are a complicated environment with a lot of stakeholders on different levels of knowledge and also fears. To bring a model from idea to production, it can help to have interpretability-by-design or at least interpretability-during-design, to align experts and drive change commitment.
Does this make sense?
I really enjoyed reading this. You've identified a real failing in the way Machine Learning interpretability is taught, where the big picture questions of "what is the goal" is an afterthought or even ignored. Far too often people product an importance chart and call it a day.
I'm looking forward to the follow-up posts. I hope you can also bring "which data to use" together with these posts on "which tool to use" because I often find the former question even more difficult than the latter, at least in my work. Currently I'm trying to design anti-discrimination tests that could be applied to machine learning models.
Questions on the COMPAS algorithm: is the issue about ML or that simply that the algorithm isn't transparent? My understanding is that protecting IP is the real reason for the lack of transparency and the algorithm used has little to do with it. If they released the code of their algorithm to the public, people like yourself could approximately (if not perfectly) understand how it's making decisions, but then you could also reverse-engineer it and sell yourself.
I skimmed Rudin 2018 and I think that's what he's saying, although I'm not sure that I agree that drastically simplifying the model to gain simpler interpretability is always going to be worth the trade-off, even in "high-stakes" situations. I'm OK with 50 variable ML models, as long as some good explainability is also provided.