Approaches to uncertainty, causality and interpretability with supervised learning

Oct 04, 2022

Supervised learning excels at producing highly optimized predictive models.

Features in, prediction out. Almost ascetic — compared to the richness that statistical models offer with their coefficients, confidence intervals, interpretability, ...

But fortunately, there are several ways to “enrich” predictive models and use them for more than just prediction.

This post presents 3 approaches that allow you to do more with predictive models:

🎯 Uncertainty quantification with conformal prediction
➡️ Causal effect estimation with double machine learning
🧐 Interpretability with model-agnostic interpretation

The best about the approaches in this post: they are all model-agnostic, which means that they work with any predictive model. Learning these approaches is a smart time investment, as they have a longer shelf-life than methods that are tied to specific models.

Uncertainty Quantification with Conformal Prediction

Many machine learning models come without built-in uncertainty quantification. ML models spit out predictions like champions, but we often don’t know how trustworthy these predictions are.

A clever solution is conformal prediction. Conformal prediction turns "weak" uncertainty scores into rigorous prediction intervals.

Conformal prediction provides a general recipe for uncertainty quantification, and has various implementations for specific use cases:

Prediction sets for multi-class models
Calibration of classification scores so they can be interpreted as probabilities
Fixing coverage of quantile regression
Conformal predictive distributions for regression
…

All these implementations follow the general recipe of calibrating a heuristic uncertainty score using a separate calibration dataset.

Free resources:

Introduction: https://arxiv.org/abs/2107.07511
Overview: https://github.com/valeman/awesome-conformal-prediction
Python: https://github.com/scikit-learn-contrib/MAPIE
R: https://github.com/ryantibs/conformal

Causal Effect Estimation With Double Machine Learning

A typical causal question is: Does the intervention work?

The scenario: We observed an intervention (treatment, policy change, marketing campaign, …) and want to know how it causally affects an outcome.

The problem: We didn’t conduct a scientific experiment, so the intervention is correlated with other variables (confounders).

Directed acyclic graph. Treatment (example: send voucher) points to target (example: churn probability). confounders (example: contract type, money spent, ...) points to target and treatment — A typical task in causal inference: Estimating the effect of an intervention, here with the example of churn prediction.

A difficult part of causal inference is causal identification: Drawing a DAG, including all confounders, identifying whether a causal effect can be estimated at all, and so on.

Double machine learning doesn’t help you with the difficult parts of causality. Sorry. You still have to do the hard work.

There are many approaches for causal effect estimation. What’s the deal with double machine learning then? The paper asks the question: Can we use a supervised machine learning approach to causal effect estimation? Double machine learning follows an old playbook using instrumental variables and partial regression, with one difference: the functions are learned with supervised machine learning. Turns out that naively replacing these functions with highly optimized prediction models introduces overfitting and regularization biases. A contribution of double machine learning is to correct those two biases.

Free resources:

Paper: https://arxiv.org/abs/1608.00060v1
Introduction: https://econml.azurewebsites.net/spec/estimation/dml.html
Python: https://github.com/DoubleML/doubleml-for-py
R: https://github.com/DoubleML/doubleml-for-r
Critique: https://arxiv.org/pdf/2108.11294.pdf

Interpretability With Model-Agnostic Interpretation

If you pick the best performing model, there’s a good chance that it is a not-so-interpretable model or an ensemble of models. And even if sometimes an interpretable model emerges at the top of the benchmark: Interpretability would be a lottery and the next retraining might produce a different model.

Good news: You can interpret any model using model-agnostic interpretation methods. These work for any model because they don’t rely on specific model structures, but work by analyzing the predictions with manipulated input features. Three popular methods are:

SHAP for explaining individual predictions
Permutation feature importance for ranking features
Partial dependence plots for feature effects
...

If you want an in-depth explanation of these interpretation methods you can get my book Interpretable Machine Learning on Amazon:

Buy From Amazon

Free resources:

Uncertainty quantification, causal inference, and interpretability were usually “reserved” for classic statistical modeling, but more and more model-agnostic tools emerge to fill these gaps for machine learning models.

But using these tools also means adapting statistical thinking, such as being mindful of the data-generating process, identifying sources of uncertainty, causal identification, and so on.

So there are at least 3 elements in this approach to modeling:

machine learning + model-agnostic tools + statistical thinking

Performance-driven, yet mindful of data and model.

Mindful Modeler

Discussion about this post