If it will be anything like LLM mechanistic interpretability, I expect it to be very different. More for learning about how the models work than having something telling me which factors were important for a certain prediction.
Great post and discussion. Companies like Goodfire https://www.goodfire.ai/research have made some good progress on model interpretability. Their Alzheimer's biomarkers are especially impressive. Hoping that we see more of this in the tabular space.
What are your thoughts on training a more interpretable surrogate model on the TabPFN probabilities? If they align well would that get us ~95% there do you think?
Very exciting and this series allows following the development with the amount of time I can invest, so thanks 🙏
When it comes to interpretability of PFNs I'm trying to get my head around this question: rather than trying to understand classical feature importance metrics, is there no way of learning a) which of the DAG based synthetic datasets were predominantly exploited for the prediction and then b) which causal structures in the associated DAGs resemble structures in my data? So kind of an alternative to causal discovery if you will. Or does this not make any sense?
That makes sense, especially since pre-training already involves DAGs, so it's not a stretch to directly estimate stuff from the DAGs themselves, like treatment effects, see Do-PFN and CausalFM. But interestingly, in-context learning can be seen as considering a prior distribution of possible DAGs, so you don't get out one DAG from the predictions. Indirectly, there might be ways to attempt recovering the DAG, like through embeddings of TabPFN: https://arxiv.org/pdf/2511.07236
I'm very excited to see more research here. I expect a lot to happen at the intersection of tabular foundation models and causality.
Loving the series Christoph. having a PFNshap would be a great addition to the interpretation toolkit
Turns out that TabICL for example has a more efficient implementation that relies on TabICL's ability to treat NAs: https://github.com/soda-inria/tabicl/blob/main/src/tabicl/shap/_shap.py
I hadn't thought of that. It's an elegant solution.
Is mechanistic interpretability at the same interpretable level as other methods which we use for classical models though?
If it will be anything like LLM mechanistic interpretability, I expect it to be very different. More for learning about how the models work than having something telling me which factors were important for a certain prediction.
Great TFM series, Christoph. Thanks for sharing your analyses 👏🏼
Great post and discussion. Companies like Goodfire https://www.goodfire.ai/research have made some good progress on model interpretability. Their Alzheimer's biomarkers are especially impressive. Hoping that we see more of this in the tabular space.
What are your thoughts on training a more interpretable surrogate model on the TabPFN probabilities? If they align well would that get us ~95% there do you think?
Very exciting and this series allows following the development with the amount of time I can invest, so thanks 🙏
When it comes to interpretability of PFNs I'm trying to get my head around this question: rather than trying to understand classical feature importance metrics, is there no way of learning a) which of the DAG based synthetic datasets were predominantly exploited for the prediction and then b) which causal structures in the associated DAGs resemble structures in my data? So kind of an alternative to causal discovery if you will. Or does this not make any sense?
That makes sense, especially since pre-training already involves DAGs, so it's not a stretch to directly estimate stuff from the DAGs themselves, like treatment effects, see Do-PFN and CausalFM. But interestingly, in-context learning can be seen as considering a prior distribution of possible DAGs, so you don't get out one DAG from the predictions. Indirectly, there might be ways to attempt recovering the DAG, like through embeddings of TabPFN: https://arxiv.org/pdf/2511.07236
I'm very excited to see more research here. I expect a lot to happen at the intersection of tabular foundation models and causality.