9 Comments

Love this, I am a big fan of permutation importance.

For the first scenario, since the model was overfit, is SHAP more useful with a validation set? This way at least you would see the clear failure of the model for certain examples. Is it worthwhile looking at shap for correct versus incorrect predictions?

(Obviously this doesn’t hold if there is some sort of distribution shift in production, etc.)

Expand full comment

If I remember correctly, the Shapley values were computed on a separate dataset not used for training. Since SHAP is tied to how much the model output varies, it shouldn't be affected much whether they are computed on training or other data.

Expand full comment

Haha 🙈 I totally get your rant. The two quotes speak for themselves...

Expand full comment

Great post highlithing the need of using and comparing several XAI methods for data analysis.

Here is a blog post with my additional thoughs on this topic : https://medium.com/@jb.excoffier/pitfalls-when-computing-feature-importances-3f5b0e2c198c

Expand full comment

What's funny is that I got a paper rejected for the opposite reason: theoretically studied the treeSHAP algorithm and did not compare with more common approaches like PDP and PFI. In hindsight, I understand the rejection since a lot of theory behind SHAP can seem unnecessary complicated and unjustified given existing methods. For instance, I was unable to justify why treeSHAP is "better" than PFI to one reviewer. But, as you discussed, SHAP and PFI do not use the even same "notion" of importance.

Expand full comment

There is always reviewer number 2 demanding things that don't make sense ...

Expand full comment

What are your thoughts on this process for utilizing SHAP for feature selection on a saturated model?

https://towardsdatascience.com/your-features-are-important-it-doesnt-mean-they-are-good-ff468ae2e3d4

I.e. throwing an entire feature set at a fit, then pull out the noisy features? I have an extremely noisy dataset where I've investigated the features and found that relationships of those with highest spearmanR/mutual info with the response can shift drastically, so I've tried to dissect those dependency shifts and find contrast in other features during positive/negative correlation instances. So in essence, there are certainly interaction effects, but the data is so complex that it's not easily investigated. Thanks!

Expand full comment

Error contribution analysis is closer to feature selection. SHAP importantance doesn't tell you which features to keep. If two features are strongly correlated, the SHAP importance of one of them might be non-zero, but if you remove the feature the model's performance stays the same.

For your case, maybe it's a possibility to look at the group importance of the features in questions. Either via hierarchical SHAP, as I explain in my book https://christophmolnar.com/books/shap/. Or by using permutation feature importance and only permuting the features withiin a block together.

Expand full comment

Nice post. I am quite critical with shapley values because I know already all the pitfalls of feature importance and similar but other things seem to be problematic for shapley values.

I tried out information gain for feature selection, but even there are some pitfalls in case of too many factors. In fact the whole topic is still a bit blurry for me and quite hard to handle.

Expand full comment