Mindful Modeler

Thanks for the input Valeriy. I agree that you are on the safe side if you keep them separated.

Expand full comment

CARLOS ORTEGA FERNÁNDEZ

Dec 28, 2022

Thanks Christoph for this series!.

Do you plan to use R also?.

In R, there are several packges for conformal analysis most of them for regression.

And just one for classification (conformalClassification), but for "randomForest" models.

Thanks,

Carlos.

Expand full comment

Hey Carlos, for now I'm focusing on Python.

Expand full comment

Reply (2)

Tripartio

Mar 12, 2023

Could you recommend an R library for conformal prediction that is comparable with Python's MAPIE?

Expand full comment

Tripartio

Mar 7, 2023

Thanks for this series. I am just following it now after subscribing to your newsletter.

I loved your IML book and was quite pleased that it was done in R, which I prefer to Python. Just totally off the cuff: I would like your personal comments on why you've switched from R to Python.

Expand full comment

Mar 12, 2023

That's a good question.

I use both R and Python and preferred to do a project in Python again. And I thought I would reach more readers as Python seems more dominant in machine learning https://www.kaggle.com/kaggle-survey-2022

Expand full comment

Tripartio

Mar 12, 2023Edited

Thanks for the response. Oh well. I am sorry that I have to AGREE with you--although I love R (specifically, Tidyverse R) and prefer it to Python, I certainly recognize that Python is far more popular, so it is indeed the better choice to reach more readers.

Expand full comment

Brent Petzoldt

Dec 27, 2022

Thanks for this series; I'm learning a lot!

In regards to which data to use for calibration, it seems to depend on how the conformal prediction results will factor into ultimate decision-making. If it's ultimately viewed as a "nice to have" measure of prediction uncertainty, then perhaps it's not a big deal. But if we want rigor behind assuring the bean company CEO that our [conformal sets which include only two levels] have the correct value 95% of the time, it would seem we would want to test the robustness of the q threshold on truly unseen data?

Expand full comment

That's my hunch as well.

If the test set is used to evaluate the uncalibrated model outputs, it might be ok to have the double use for the calibration/test data, as I pondered in the post.

To evaluate the prediction sets (aka the calibrated or conformalized model) then you definitely need separate calibration and training data.

Expand full comment

Lukas Mosser

Dec 27, 2022

Great article, Christoph!

I have a comment regarding using test data for calibration.

I'll try to elaborate based on conformal regression because I know that a bit better :---)

In the case of conformal regression, when we use quantile regression we may choose to determine a conformal correction to our predicted quantiles. This is determined from some dataset where we have known labels, but which the training algorithm has not seen, so as not to risk overfitting.

When we would use the test set to do calibration, we would a) use the test set as if we knew the labels (which we don't in reality). And b) when we then compute a test score with the computed conformal quantile correction we would now leak information from the test labels into the scoring.

I wonder if it is hence better to just split our validation dataset into a validation and a calibration dataset. With a test dataset held out to the very last step. If we tune hyperparameters we also can't use the validation set for calibration since we'd be risking overfitting to the validation dataset, and hence reusing the validation dataset in that case for calibration would reuse already "used" data.

If we don't care about scoring e.g. after retraining on all training data, then we can just use all test data for calibration and go into production.

What do you think?

Expand full comment