You said something a bit self-contradictory at the end of your article: "I wouldn’t directly use feature importance measures for feature selection. We have enough methods to select non-performing features with feature selection. Feature importance like SHAP importance and others can, however, allow you to make qualitative decisions to remove a feature."

To me, the possibility to make qualitative decisions to remove a feature sounds very useful, and the suggestion that enough methods are available for feature selection seems somewhat random. So it would be very interesting to understand, why do you really not recommend using feature importance measures for feature selection. Is it because they are somewhat model-specific? Is it because many measures for feature importance are not reliable for strongly correlated features? Is there something else that I cannot think of right now?

Expand full comment

To clarify a bit: The methods I'm referring to are not designed for feature selection. Like PFI or SHAP importance. One might be tempted to use, for example, PFI to to kick out non-performing features. But PFI doesn't necessarily show you by how much the performance of the model would change if the model were retrained without the feature.

In this case it makes more sense to use a feature selection wrapper where feature selection becomes part of the optimization process.

And to refine the "we have enough methods": first it should be clear why features are selected (e.g. for performance, speed up, ...) and quantified. Then we have a large pool of feature selection methods that can be used to direct the modeling process towards the specificied goal(s). The feature selection tools that we already have are better suited for this goal-oriented feature selection than importance methods, which were designed for interpretation, for feature selection.

Expand full comment

My apologies, I re-read your original post and feel like I largely missed the whole point of it with my question. But maybe this is the reason why the post was written in the first place, feature selection and feature importance are so tightly intertwined. Admittedly, I also had a bit of a secret agenda. In my understanding, if we know that our features are uncorrelated, then PFI probably *would* show us by how much the performance of the model would change if the model is re-trained without a feature. Is this correct, or would you need some additional, stronger assumptions? And an average of SHAP values for this feature on a validation dataset would probably give you something similar in this case?

Expand full comment