A short history of SHAP

The 70 year history of one of the most popular machine learning explanations methods

May 30, 2023

Want a deeper dive? Check out my book on SHAP which provides theory and hands-on examples to explain your machine learning models!

Get SHAP Book

This post is a brief history of SHAP focusing on 3 milestones.

1953: Shapley values in game theory
2012: First steps in machine learning
2017: SHAP, the Cambrian explosion

SHAP is a technique for attributing the prediction of a machine learning model to the features. It’s a technique from the field of explainable AI / interpretable machine learning.

This chapter is an excerpt from my upcoming book on SHAP, which will be published on July 11th.

If you’d like to get early access and give feedback, I’m looking for beta readers. Just reply to this mail/post or send an email.

Lloyd Shapley wants a fair game

Shapley values are named after their inventor, Lloyd Shapley, who created the Shapley value in 1951.

Particularly in the 1950s, game theory experienced an active phase during which numerous core concepts were developed, including repeated games, the prisoner's dilemma, fictitious play, and, of course, Shapley values.

While Lloyd Shapley was a mathematician, he excelled in game theory, with fellow theorist Robert Aumann referring to him as the "greatest game theorist of all time”.

Following World War II, Shapley completed his Ph.D. at Princeton University with a thesis titled “Additive and Non-Additive Set Functions.” In 1953, he published “A Value for n-Person Games”, introducing Shapley values.

In 2012, Lloyd Shapley and Alvin Roth received the Nobel Prize in Economics1 for their research in “market design” and “matching theory.”

The fundamental idea behind Shapley values is to measure each player's contribution in a game by averaging over all possible coalitions that could form without that player. In other words, a player's Shapley value represents the expected marginal contribution they make to every possible coalition.

Since then, Shapley values have become a foundational concept in cooperative game theory, applied across various fields including political science, economics, and computer science.

They are widely utilized for determining fair and efficient methods of resource distribution within a group, such as dividing profits among shareholders, allocating costs among collaborators, and assigning credit to contributors in a research project.

At this point, Shapley values were not yet used in machine learning. In fact, machine learning was just beginning at the time.

First steps in machine learning

Fast forward to 2010. Shapley hadn't received his Nobel in Economics yet, but the theory of Shapley values had been around for nearly 60 years.

Machine learning, on the other hand, had made enormous progress in those years. In 2012, the ImageNet competition led by Fei-Fei Li, was won for the first time by a deep neural network (AlexNet) with a significant margin over the next best (non-neural network) algorithm. In many other ways, machine learning continued to improve and attract more research.

In 2010, researchers Erik Štrumbelj and Igor Kononenko published a paper titled “An efficient explanation of individual classifications using game theory”. They proposed using Shapley values to explain predictions of machine learning models.

Four years later, in 2014, they published another paper with an improved method for computing Shapley values, where they suggested the sampling estimator, which samples from coalitions of features instead of iterating them al.

However, the approach didn't gain traction, at least not in the way it eventually would with SHAP.

Some possible reasons for the lack of popularity:

Explainable AI/Interpretable machine learning wasn't as widely recognized at the time.
The papers by Štrumbelj and Kononenko didn't include code, making it harder to use.
The sampling method was still relatively slow and unsuitable for image or text classification.

But all these reasons are somewhat speculative.

Let's examine the events that led to Shapley values becoming popular in machine learning.

SHAP causes a Cambrian explosion

Around 2016, another paper about LIME emerged, which stands for Local Interpretable Model-Agnostic Explanations. I view this LIME paper as a catalyst for the field of explainable AI and interpretable machine learning. That's a bit subjective because it served as a catalyst for me. However, due to its timing and popularity, this paper signifies the beginning of a heightened interest in interpreting machine learning models.

The prevailing sentiment at the time was: “Oh, we are developing increasingly advanced machine learning algorithms, such as deep neural networks, but, gosh, look at them, we have no idea how these models generate their predictions; how can we trust them?”

Then came SHAP.

Not long after, in 2017, Scott Lundberg and Su-In Lee published a paper called “A Unified Approach to Interpreting Model Predictions”.

It was published at NIPS, now NeurIPS2, which stands for Neural Information Processing Systems.

This is a major machine learning conference, and if your research is published there as a machine learning researcher, it's more likely to draw attention. But what was the SHAP paper about? After all, Shapley values for machine learning were already defined in 2010/2014.

Lundberg and Lee reformulated the estimation of Shapley values as a weighted linear regression using the Kernel estimator.

Additionally, the paper showed how their proposed estimation method “unites” other explanation techniques, such as DeepLIFT, LIME, and Layer-Wise Relevance Propagation.

Thus, the novel contributions were:

A new estimator for Shapley values.
Unification of several existing methods unification.

Here are my speculations on why SHAP gained popularity:

The venue it was published in (NIPS/NeurIPS).
Good timing: explainable AI had gotten a lot of attention
Continued research by the authors and others.
Solid, open-source Python implementation.
The role of open source code shouldn't be underestimated, as it allowed people to incorporate Shapley values into their projects.

Since then, SHAP has continued to grow in popularity. Another significant development occurred in 2020 when Lundberg proposed an efficient computation method for SHAP specifically tailored to tree-based models. This was a significant advancement, as tree-boosting performs well in many applications, making it possible to quickly estimate Shapley values for state-of-the-art models. Lundberg's other notable achievement involved extending SHAP beyond individual predictions by stacking Shapley values, akin to assembling Legos, to create global model interpretations. This approach was encouraged by the fast computation for tree-based models, as obtaining global explanations with Kernel SHAP would have been much slower.

Individual Shapley values can be combined for global model interpretation. Here: A dependence plot that shows how alcohol content affects wine ratings.

With the help of numerous contributors, Lundberg continued to develop the shap package, which has now grown into a comprehensive library with many estimators and functionalities. Since then, other researchers have built upon SHAP and Shapley values, suggesting extensions.

Additionally, SHAP has been implemented in other places, meaning the shap package isn't the only option.

Want a deeper dive? Check out my book on SHAP which provides theory and hands-on examples to explain your machine learning models!

Get SHAP Book

In reality, it's not the Nobel prize, but the “Nobel Memorial Prize in Economic Sciences,” or officially, the “Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel.” It's a sort of imitation Nobel prize created by economists since they weren't included in the original five Nobel Prizes.

The name NIPS was criticized due to its association with "nipples" and being used as a slur against Japanese people, so it was changed to NeurIPS.

Mindful Modeler

Discussion about this post