6 Things I Hate About SHAP as a Maintainer
A guest post by Tobias Pitters
This is a guest post by Tobias Pitters. Tobias is a shap maintainer and works at the XAI startup CloudExplain.
SHAP is one of the most widely used libraries for model explainability, and for good reason – it makes explaining complex models intuitive and even beautiful. I’ve been working on SHAP since October 2023 and became a maintainer in January 2024. In that time, I’ve contributed to many improvements and fixes, and I still love the power of the underlying math, how easy it is to integrate, and the amazing visualizations (I think that’s a big reason SHAP became so popular in the first place).
But being a maintainer also means seeing where things break, slow down, or get frustrating for users. After 1.5 years, I’ve gathered a list of pain points that I’d really like to see improved – both for the health of the project and for everyone who uses it.
1. Explanations can get slow
When working in a notebook and running a model for thousands of observations, all is fine, you can choose whichever Explainer works best. But as soon as you include more features and more observations, speed drops significantly. For larger models with hundreds of thousands of observations, the explanation can run for hours.
Some of this slowness is inherent to how SHAP values are calculated. But much of it comes from implementation choices, such as limited parallelization and many plain Python loops. Recently, we merged a fix that offloaded some looping to Cython, improving the speed by roughly 5% for the KernelExplainer (see this PR or this PR). I believe there is still a lot of room for improvement; we just haven’t prioritized this yet.
2. DeepExplainer trouble
The DeepExplainer is the natural approach to explain TensorFlow or PyTorch models in a DeepLIFT fashion, where importance is backpropagated layer by layer. Unfortunately, TensorFlow changes from version 2.4 onwards have made this much harder. Many layers are internally built from combinations of basic operations, like multiplication, activation functions, and addition. Previously, we could simply overwrite the backpropagation logic for these low-level operations, and it would automatically apply to higher‑level constructs such as LSTMs. This allowed us to calculate SHAP values for a wide range of layers without extra work. But with TensorFlow’s switch to eager execution and the hiding of these internals, this shortcut no longer works. Today, the custom logic must be implemented separately for each layer we want to support.
As a result, certain layers, including LayerNorm and more complex ones like LSTM or Attention layers, are no longer supported by the DeepExplainer.
3. TreeExplainer – fast but tough to change
Trees are still widely used and a strong choice for tabular prediction problems. Fortunately, there is an analytic way to calculate SHAP values from trees, and our implementation is extremely fast because it leverages C. Unfortunately, we currently lack a maintainer who is deeply familiar with C, and the code is quite old, making it difficult to change. We’ve also encountered memory bugs, very low-level issues that happen if you handle memory incorrectly, which have thankfully been fixed by the community. But there are other potential problems I suspect still exist, and I’d love to get this code into a more maintainable state, meaning either by rewriting it in Rust or finding someone who can take ownership of this part of the codebase. I discussed with some XAI experts about the Rust implementation and we agreed that this could be a suitable way forward, to avoid memory issues due the Rust’s memory safe design while still keeping the performance close to the current implementation.
Also, did you know there’s a GPU version of the TreeExplainer? Sadly, we only support this if you install from source (by cloning the repository and running pip install -e .) and have a GPU available. My dream would be to offer an additional install flag (pip install shap[gpu]) or pre-built GPU wheels (pip install shap-gpu). This requires some updates to our build infrastructure, but is totally doable, like other packages proved.
4. Failing tests and breaking code due to upstream packages
Shap supports many different ML models — scikit-learn, TensorFlow, PyTorch, LightGBM, XGBoost, CatBoost, some text models from the transformers library, and even NGBoost models. This means we must stay compatible with all of them, with tests to make sure of that. The problem is that supporting so many packages across multiple Python versions means our CI pipelines (our automated testing and integration workflows) fail often.
Whenever anything in these libraries changes, it can break our code. As a rough estimate, I spend around 30% of my time on SHAP to fix upstream changes. They are often obscure and hard to debug. For example, we once had our test suite failing on Python 3.10, but it worked seamlessly on Python 3.11 and onwards. The failing tests were related to transformers code. After a couple hours of debugging, it turned out that we needed to upgrade our pipelines to Python 3.10.12 specifically, since transformers started using a filter parameter for warnings that was only backported to Python 3.10.12 (not available in 3.10.0-3.10.11). This kind of feature addition in a patch release is unusual and caught us off guard (see Python release notes here).
There aren’t a lot of solutions around this, we could improve our logging, add type hints, but there still will be a lot uncovered.
5. Plotting
I love shap’s plots and believe they’re a big factor in its success. The variety is impressive, and there’s not much more to be desired – except for the things that are missing but difficult to add to the legacy plotting codebase. And when I say “legacy,” I mean it – some plotting functions are literally named that internally, see here, or here, or the waterfall_legacy function, which you shouldn’t use anymore.
Cleaning these up and simplifying the often 300+ lines of code for a single plotting function would be great. Furthermore, some plots are generated in JavaScript, which is fine, but this code is quite old and hard to test. I would like to see it rewritten as well. But this would require more extensive testing for plotting. We do test them already, but plots remain the least well-tested part of our codebase. If we had thorough testing, we could tackle some bugs but also extend the plotting to allow more control for the users, e.g., adjust color palettes, distance,s and heights.
6. Other things
Some smaller, miscellaneous issues or lacking features:
JAX is a library to build deep learning networks, comparable to PyTorch or TensorFlow, just with a more functional approach. It picks up speed and I would love to support it in the Deep- and GradientExplainer. Though there are a lot more things we should fix first before building large new features, that will cause problems on their own.
We don't have good type annotations, a feature I really enjoy in other libraries. This is a two-fold issue for us: one is simply lacking time to implement, of course, the other is that a lot of our functions are quite polymorphic and allow a lot of inputs of different types, which also makes reading type hints hard for users.
I would really like to have nightly builds and support other features the scientific Python community laid out, like lazily loading code when needed.
The positive outlook
I don't want to sound pessimistic here or even give you the feeling of fatalistic surrender, I believe in and work toward a package that keeps getting better and so do a lot of contributors. Being honest to oneself and recognising the current shortcomings is the first step, but seeing the value SHAP delivers makes it feel worth continued investment. I love when a PR pops up that tackles an issue we had for a long time, even solves one we didn't realize we had, or just builds a whole new feature for SHAP. It's great to see how this brings people together and how things are moving forward. I also love that this package has helped explain black-box ML algorithms, brought explainable AI to the mainstream, helps drive broader ML adoption, and supports us to understand more deeply how complex algorithms work. It helped me dive deep into explainable AI and work on coding aspects I'd never touched before.
If any of those points resonated with you or you have something to share or want to be improved at SHAP, feel free to leave a comment or file an issue. We always welcome everyone to help make SHAP better and contribute, feel free to stop by.

