Thanks a lot for this course! Love the intuitive step-by-step explanation and the hands-on example (eg choosing a concrete dataset like the Beans dataset is great).

A little question about the procedure though. Say you take the frequency plot (which contains the predicted probas for all predictions, across all different classes). We applied the 0.999 threshold such that 95% of the data points include the true class. Now, 95% was chosen arbitrarily. Let’s say someone has the idea to just set the threshold to 1.0 to get 100% coverage. Or in other words with that procedure someone can always achieve 100% coverage by setting the threshold to 1.0, which sounds good on paper but is not actually doing anything useful. Maybe I am misunderstanding something but would you mind clarifying?

Expand full comment
Jan 12Liked by Christoph Molnar

Excellent introduction on CP. Looking forwards to read more blogposts on it. Unfortunately 'conformal prediction' approach of evaluating uncertainty has been 'missing ' in popular machine learning books and tutorials although there are plenty of academic articles have been published last few years.

Expand full comment

Great post6, very easy to digest and intuitive.

We ran into a similar problem few weeks ago and taking inspiration from top-3, top-5 predictions in benchmark datasets e.g ImageNet ended up doing exactly what you explained in adaptive conformal predictions.

It's a pain to set the results in production though 😅 as the UI team prefer a fixed number of outputs.

Expand full comment

Thanks for the tutorial. It is helping me program conformal prediction from scratch in R so I can better understand how it works. I'm using the Beans data with SVM. When I set the probability to 0.8 or 0.9, I get a number of beans that do not register in any class.

For example, for a particular future sample, my 1-p values are [0.99, 0.99, 0.99, 0.73, 0.99, 0.27, 0.99] with a qhat_80 = 0.19, a qhat_95 = 0.69 and a qhat_99 = 0.90. The 95% confidence interval is [6] and the 99% confidence interval is [4, 6]; this makes sense and seems correct. But, none of the values are less than qhat_80. Therefore qhat_80 does not classify the sample at all? How should I handle such future samples, assign them to the most probable class? Or did I make a conceptual mistake in the process?

If I use an adaptive prediction set, I get cumulative p-values of [0.73, 0.99, 0.99, 0.99, 0.99, 0.99, 0.99]. Here 0.73 is greater than both qhat_80 and qhat_95. Thus, both predict [6]. The second entry is the first value greater than qhat_99. Thus the 99% confidence interval is [4, 6]. So, should the take home message be to 'always' do adaptive prediction sets?

Expand full comment

Hi Christoph,

Could you please comment on the size of the calibration dataset and how does this factor in the outputs of conformal prediction?

Many thanks for the course!


Expand full comment

Hi Christoph Molnar,

Thanks for this great interesting lesson. I want to work to conformal prediction but I have one question if possible please help me to do this work. Question: Is conformal prediction beneficial for imbalanced dataset for binary classification.

Expand full comment

Shouldn’t this be “equivalent to class “probabilities” > 0.999”?

“We know that for bean uncertainties s_i below 0.999 (equivalent to class “probabilities” > 0.001) we can be sure that with a probability of 95% we have the correct class included.”

Expand full comment

Hi Christoph, thank you for the nice materials. Anyway - I’d like to ask whether the underlining code is (or could be) published in fully reproducible manner.

For instance, I have dealt with the Bean data-set discrepancies as I actually don’t know in which exact format you have loaded it - I have then encountered issues with the Mapie library.



Expand full comment

Hi Christoph,

Great post!! One thing I'm confused about is the assumption that probabilistic predictions are themselves 'confident', i.e., a classifier that says "f(bean=navy) = 0.8" may itself be uncertain and it turns out the full prediction (with uncertainty) should be "f(bean=navy) = 0.8 [0.1, 0.9]" (a simple example of this would be creating predictive intervals from B classification trees in a random forest). Does conformal prediction still work in that case?

Expand full comment

Hi Christoph,

Great first post, I'm looking forward to the next few weeks. I had a question when it comes to implementing conformal prediction and generating prediction sets.

1. Is the calibration dataset the same as the test dataset when we perform a training/validation/test split when model fitting? So we would train a model in the normal way and when it comes to generating calibrated predictions and prediction sets, we use the MAPIE function to do this on the test set? I've tried to include a pipeline with the bean dataset here:


Hopefully I've generated the prediction sets at the correct stage and with the appropriate datasplit.

2. Will the point predictions from the MAPIE classifier be different to those produced by the classifier using the predict method? In my own case they're mostly the same with 1 or 2 predictions being different. The difference is more pronounced in the regression case.

I found the plots you used very helpful, maybe computing conformity scores on a single probability vector by hand would be instructive as well? But it's about balancing if it's more effort than it's worth. Thanks!

Expand full comment
Dec 22, 2022·edited Dec 22, 2022

Hi, why do we actually need the uncertainty scores s=1-f(x,y) (also more confusing since it said 1-s(x,y) below the histogram), instead of working with the equivalent "probabilities" as certainties? Is it being lazy to fit the definition of 95% quantile, rather than saying "take the top 95% of the certainties"?

Expand full comment

Hi, thanks for embarking on this course. I'm happy to join and learn.

I think the problem with interpreting model scores as probabilities (paragraph that starts with "Unfortunately, we shouldn’t interpret these scores as actual probabilities") is not clear enough and would benefit from a counter example that shows why model scores are not probabilities.


"but stew seems to be the"

"The data scientist doesn’t fully the model scores"

Thank you,


Expand full comment