How to Deal With Disagreeing Interpretations

Methods like SHAP, LIME and permutation importance can disagree. Disagreements can but don't have to hint at problems.

Christoph Molnar

Nov 15, 2022

SHAP, LIME, PFI, ... you can interpret ML models with many different methods.

It's all fun and games until two methods disagree. 😵‍💫

What can you do if LIME says X1 has a positive contribution, but SHAP says X1 has a negative effect?

A scientist, a robot, and a human-sized lime sit at a table in a library and have a heated argument — created with Stable Diffusion.

The Disagreement Problem

The disagreement problem was named and studied in this paper, coming out of the lab of Prof. Hima Lakkaraju. A lab worth following in the field of interpretable and trustworthy machine learning.

The paper authors studied attribution methods for explaining predictions, but the disagreement problem also applies to other methods.

The authors interviewed 25 data scientists. 22 used multiple interpretation methods and found disagreements:

Different top features (21/25)
Different ordering of features (18/25)
Different effect directions of top features (19/25)

It's a small study, but I found it quite insightful.

2 Types For Disagreements

We have to distinguish 2 types of disagreements

The methods should agree but don't.
The methods don't have to agree, because they target different aspects.

At worst you have a mixture of both problems.

Scenario 1: Methods should agree, but don't.

LIME is a good example: LIME builds a local model around the data point to be explained. For example, a linear model.

Then there are gradient-based methods (specific to gradient-based models like neural networks) that explain predictions using the gradient with regards to the feature input.

Under specific circumstances (small neighborhood), I'd expect LIME and gradient-based methods to agree on, for example, the direction of the effect a feature has on the prediction.

But as the paper showed, they often disagree.

One issue is robustness: LIME can even disagree with itself if computed multiple times. Interpretation methods, when estimated with data, are subject to uncertainty.

Another issue: Interpretation methods might only be expected to agree under specific circumstances.

Take LIME: Parameters like kernel width and binning of features have a big influence on the interpretation. Changing these parameters can move LIME far away from being comparable to gradient-based methods.

Scenario 2: Methods don't have to agree when they have different goals.

LIME and SHAP have different interpretations.

LIME builds a local model: the shape of the prediction function around the data point matters a lot. For SHAP not so much.

SHAP computes Shapley values, which are game-theoretic solutions to a game of collaboration. While the SHAP authors showed equivalence to LIME under very specific circumstances, both methods have very different ways of computing feature importances.

Disagreement in scenario 2 isn't because one of the methods is "wrong". Disagreement is because the two methods have different interpretation targets.

How To Handle Disagreement

I recommend the following to handle disagreements:

Quantify robustness. Simplest way: Compute the method multiple times
Understand what each method quantifies
Or just decide once which interpretation method is the right one for your question and don’t compute multiple interpretations.

You can find introductions to many interpretation methods in my book Interpretable Machine Learning, available for free.

In Other News: Interpretation Cheat Sheets

It’s surprisingly difficult to find concise and correct advice on interpreting different interpretable models (like logistic regression) and methods such as SHAP.

So I’m experimenting with a new product: Interpretation Cheat Sheets.

My first cheat sheet is for interpreting the coefficients of logistic regression — the interpretation of odds, log odds, and logits just doesn’t come naturally and it saves a lot of headaches to have a concise interpretation cheat sheet. Including text templates for convenient and correct interpretation of coefficients.

Get Cheat Sheet

I’d be happy to get feedback on the cheat sheet idea in general. Let me know what the next cheat sheet should be about to help you.

Mindful Modeler

Discussion about this post