How TabICL and TabPFN handle missing values
You can give training or test data with missing values to the tabular foundation models TabPFN and TabICL, and the prediction will “just work”. But what happens in the background?
Let’s find out.
How TabICL handles missing values
For TabICLv2, we can find that information in the README, section “Preprocessing”:
Create a separate category for missing values in categorical features
Perform mean imputation for missing numerical values (encoded as NaN)
This looks like a pre-processing layer that you could attach to any model. Creating a missing value category is pretty standard. Mean imputation for numerical values is something I see often, too, unfortunately (mean imputation loses a lot of information).
A look into the TabICLv2 paper confirms that missing values handling is indeed just a pre-processing layer:
Adding missing indicators (Le Morvan & Varoquaux, 2025) or introducing missingness during pretraining may improve the handling of missing values, which are currently imputed by the mean, but remain unexplored.
My recommendation for TabICL, in this current version: Deal with missing data imputation yourself. Or at least confirm that adding a missing value category for categorical features and mean imputation for numerical features is what you want.
How TabPFN handles missing values
Let’s have a look at TabPFNs README to see how it handles missing values:
Q: Can TabPFN handle missing values?
Yes!
Ok … cool, I guess. Not very informative though.
Let’s go deeper. The technical report for TabPFN-3.0 says:
Native missing-value handling. For each cell that is NaN, TabPFN-3 computes a binary indicator and concatenates it with the cell value before embedding. The model therefore receives an explicit signal about missing data and can condition its predictions accordingly, rather than relying on upstream imputation.
This sounds promising. Let’s go even deeper.
In the TabPFN code, we can see that missing values get encoded and passed into the model: The NanHandlingEncoderStep class encodes NaNs as -2.0, Inf as 2.0, and -Inf as 4.0. The features themselves are then mean imputed, but the additional “missingness” channel is concatenated with the original features before the linear embedding. It gets a bit more complicated because features are grouped, but that’s not important right now. We only need to know that the TabPFN-3.0 model has the information on missingness for each feature available during inference time.
Having these missingness indicators in the architecture is only half of the story. The other part is pre-training with missing values. Unfortunately, TabPFN’s pre-training code is not open source, but I guess that TabPFN was pre-trained on datasets with missing values.
Based on the way that TabPFN’s architecture encodes missingness, it is well equipped to deal with missing values. According to this research paper, TabPFN can, in theory, even handle the most difficult type of missingness: missing not at random (MNAR). However, I haven’t tried it out myself so far and have not seen any benchmarks yet.

