9 Comments
User's avatar
Noah Legall's avatar

Loving the series Christoph. having a PFNshap would be a great addition to the interpretation toolkit

Christoph Molnar's avatar

Turns out that TabICL for example has a more efficient implementation that relies on TabICL's ability to treat NAs: https://github.com/soda-inria/tabicl/blob/main/src/tabicl/shap/_shap.py

I hadn't thought of that. It's an elegant solution.

yashas Nadig's avatar

Is mechanistic interpretability at the same interpretable level as other methods which we use for classical models though?

Christoph Molnar's avatar

If it will be anything like LLM mechanistic interpretability, I expect it to be very different. More for learning about how the models work than having something telling me which factors were important for a certain prediction.

Fabiano Araujo's avatar

Great TFM series, Christoph. Thanks for sharing your analyses 👏🏼

Brad Chapman's avatar

Great post and discussion. Companies like Goodfire https://www.goodfire.ai/research have made some good progress on model interpretability. Their Alzheimer's biomarkers are especially impressive. Hoping that we see more of this in the tabular space.

James's avatar

What are your thoughts on training a more interpretable surrogate model on the TabPFN probabilities? If they align well would that get us ~95% there do you think?

Mark Herrmann's avatar

Very exciting and this series allows following the development with the amount of time I can invest, so thanks 🙏

When it comes to interpretability of PFNs I'm trying to get my head around this question: rather than trying to understand classical feature importance metrics, is there no way of learning a) which of the DAG based synthetic datasets were predominantly exploited for the prediction and then b) which causal structures in the associated DAGs resemble structures in my data? So kind of an alternative to causal discovery if you will. Or does this not make any sense?

Christoph Molnar's avatar

That makes sense, especially since pre-training already involves DAGs, so it's not a stretch to directly estimate stuff from the DAGs themselves, like treatment effects, see Do-PFN and CausalFM. But interestingly, in-context learning can be seen as considering a prior distribution of possible DAGs, so you don't get out one DAG from the predictions. Indirectly, there might be ways to attempt recovering the DAG, like through embeddings of TabPFN: https://arxiv.org/pdf/2511.07236

I'm very excited to see more research here. I expect a lot to happen at the intersection of tabular foundation models and causality.