Love this, I am a big fan of permutation importance.
For the first scenario, since the model was overfit, is SHAP more useful with a validation set? This way at least you would see the clear failure of the model for certain examples. Is it worthwhile looking at shap for correct versus incorrect predictions?
(Obviously this doesn’t hold if there is some sort of distribution shift in production, etc.)
Nice post. I am quite critical with shapley values because I know already all the pitfalls of feature importance and similar but other things seem to be problematic for shapley values.
I tried out information gain for feature selection, but even there are some pitfalls in case of too many factors. In fact the whole topic is still a bit blurry for me and quite hard to handle.
SHAP Is Not All You Need
Love this, I am a big fan of permutation importance.
For the first scenario, since the model was overfit, is SHAP more useful with a validation set? This way at least you would see the clear failure of the model for certain examples. Is it worthwhile looking at shap for correct versus incorrect predictions?
(Obviously this doesn’t hold if there is some sort of distribution shift in production, etc.)
Haha 🙈 I totally get your rant. The two quotes speak for themselves...
Nice post. I am quite critical with shapley values because I know already all the pitfalls of feature importance and similar but other things seem to be problematic for shapley values.
I tried out information gain for feature selection, but even there are some pitfalls in case of too many factors. In fact the whole topic is still a bit blurry for me and quite hard to handle.
Great post highlithing the need of using and comparing several XAI methods for data analysis.
Here is a blog post with my additional thoughs on this topic : https://medium.com/@jb.excoffier/pitfalls-when-computing-feature-importances-3f5b0e2c198c