Week #5: Design Your Own Conformal Predictor

Tips on picking non-conformity scores and evaluation

Christoph Molnar

Jan 17, 2023

Reading time: 7 min

Welcome to course week #5 of Introduction to Conformal Prediction (#1, #2, #3, #4).

This week, we’ll learn about

Tips for building your own conformal prediction algorithm
A general recipe for 1D conformal prediction
How to evaluate conformal predictors

Even if you don’t plan to implement a conformal predictor yourself in the near future, you’ll profit from this chapter.

Why would you build your own conformal predictor?

While research on conformal prediction is running hot there are still gaps in conformal prediction methods.
The task and model you are working on might have some quirks that force you to adapt or create a conformal prediction algorithm.
Walking through the steps to develop your own conformal predictor also makes you better at applying and evaluating conformal prediction in your application.

Let’s get started.

Build your own conformal predictor

These are the steps to build your own conformal predictor:

Before you start: Check the Awesome CP repo whether a suitable conformal predictor already exists.
Identify or create a heuristic of uncertainty for your model.
Turn the heuristic notion of uncertainty into a non-conformity score.
Start with the general recipe of conformal prediction but with your (new) non-conformity score.
Optional: Adapt parts of the recipe.
Evaluate your new conformal predictor.

Let’s talk about these different steps.

Find The Right Non-Conformity Score

The non-conformity score is the biggest differentiator between conformal predictors.

Conformal predictors may also differ in other parts of the recipe, but the non-conformity score makes or breaks the conformal predictor.

Tip: The simplest way to create your own conformal predictor is to define a suitable non-conformity score.

I haven’t mentioned constraints so far on the non-conformity score. That’s because you have lots of freedom to pick one and still end up with a conformal predictor, which means that the marginal coverage is fine. But if you use an unsuitable score, the prediction sets or intervals will be super large, and conditional coverage will be in the far distance.

A few tips for picking a non-conformity score:

The score should be small for “certain” predictions and large for uncertain predictions
The better the score is at ranking the predictions by uncertainty, the tighter and more adaptive your prediction regions become.
The score has to be 1-dimensional

Often we don’t get the non-conformity score handed to us from the model. But for many models there’s at least a heuristic uncertainty measure that we can transform.

Start With a Heuristic Notion of Uncertainty

We have to distinguish between the non-conformity score and an uncertainty heuristic:

The uncertainty heuristic is a measure of how certain the prediction is and sometimes comes “for free” from the model. But you might not be able to use the heuristic as a non-conformity score directly. But you can turn it into one. For example, the probability output for a classification model is just an uncertainty heuristic.

Examples of heuristics and non-conformity scores they are turned into:

class probabilities → cumulated probabilities (until true class)
class probabilities → probability (of true class)
the variance of prediction → standardized residuals
quantile range → distance to the nearest interval (negative within range, positive outside)

The non-conformity score is then the measure for which we find the threshold using the calibration data.

Tip: If your starting point is a 1-dimensional heuristic of uncertainty, you can follow a simple recipe to buid a conformal predictor out of that.

A general recipe for 1D conformal prediction

If your heuristic notion of uncertainty is already a 1-dimensional number, you can use the following template:

\(s(x,y) = \frac{|y-\hat{f}(x)|}{u(x)}\)

u(x) is the heuristic notion of uncertainty. Some examples:

variance of the prediction across k-nearest neighbors
variance of the prediction across models within an ensemble
variance of the prediction when perturbing the features of the model

If that formula looks familiar to you, it might be because this framework is already used for conformal regression, where u(x) is the variance of the prediction measured by a second model that tries to predict the residuals.

For more details on 1D non-conformity scores, read this introduction paper, chapter 2.3.2.

Metrics to evaluate conformal predictors

Mess around as much as you want with figuring out the non-conformity score and adapting the recipe of conformal prediction.

Because if you set up the evaluation the right way, you will be able to judge whether your approach was successful.

You should evaluate 3 metrics:

Marginal coverage
Average region size
Conditional coverage

To evaluate these metrics, you need data. You have two options:

simulated data
“real” data

When doing the evaluation, make sure that training, calibration, and evaluation of the prediction regions happen on separate datasets. Also, you will have to repeat the computations multiple times with different data splits or, if you are using simulated data, with freshly drawn data. Because you will want to average the results of repeated runs to get a good estimate of the metrics.

Tip: You can save a lot of tme by cleverly storing results, see this video, minute 9.

How to evaluate marginal coverage:

Apply the conformal predictor to the evaluation set. This will give you lots of prediction regions. Count the number of times the prediction region covers the true outcome. The marginal coverage should come out as > 1 - ɑ. If not, you did something wrong and can't call your method conformal.

How to evaluate the region size:

Plot a histogram of the prediction region sizes (of the evaluation set) and also compute summary stats like the mean size. The smaller the average prediction region size, the better. Also, if the histogram shows a nice spread, that’s a good indicator of conditional coverage (see next point). A low variance of set sizes should alert you that something might be wrong.

Evaluating prediction interval sizes (here for conformal quantile regression): This histogram shows a healthy distribution of interval widths.

How to evaluate conditional coverage:

This one is a bit tricky since you can’t perfectly measure conditional coverage, but you can check whether your CP algorithm has at least some adaptivity. Idea: the sizes correlate with the difficulty of the prediction, and if the coverage for different sizes is close to 1-ɑ, it’s an indicator that conditional coverage is good. So to measure conditional coverage, you should: 1) split or bin the regions by size and 2) compute the marginal coverage for each region-group. If the coverages are all close to 1 - ɑ, that’s a good indicator that your CP algorithm is adaptive.

Tip: Always evaluate coverage and average set size. Also evaluate coverage by average region size as an approximation of conditional coverage.

If your conformal predictor does well on these 3 metrics (or at least on marginal coverage and set sizes), then congrats, you just build a conformal predictor!

And that’s a wrap for the lessons in this course.

Any Questions?

I hope you enjoyed this e-mail course on conformal prediction as well and learned a lot!

This course was a lot of fun for me and was a great opportunity for me to learn about conformal prediction. This was on my TODO list for a long time and I’m happy that I learned about it. Conformal prediction was really worth the time investment. It has been rather intense since conformal prediction was new to me and I’m also writing a book about it. I have at least dreamt twice about conformal prediction now. 😂

While I have no lesson planned for the coming week, I’d be happy to answer all the questions that you have for conformal prediction:

Do you struggle with understanding parts of conformal prediction?
Are you interested in conformal prediction for certain tasks?
Are you wondering whether you can use conformal prediction in your application?

Don’t hesitate: You can comment below this post or send me a message in private.

If there are enough questions, I might do a Q&A session for next week's newsletter.

I’d be happy to answer your questions!

Book Update

I’m making lots of progress with the book. It’s gonna be called “Introduction to Conformal Prediction With Python”.

It expands on this course and additionally features fully reproducible code examples.

If you are interested in getting informed when it’s done, make sure that you are subscribed to this newsletter.

Dive Deeper

Again, I can’t recommend the Awesome CP repo enough. Check this CP supermarket before venturing out to build your own conformal predictor
Watch this tutorial by Anastasios Angelopoulos and Stephen Bates who are also the authors of the Gentle Introduction paper.

Check out my book on Conformal Prediction which provides hands-on examples to quantify uncertainty in your application!

Get book

Mindful Modeler

Discussion about this post