Discussion about this post

User's avatar
The AI Architect's avatar

Brilliant unpacking of how the posterior predictive distribution framework solves the tabular pre-training problem. The key move of integrating over latent tasks is elegant cause it sidesteps the column alignment issue that makes traditional transfer learning fail for structured data. What's wild is that this approch doesn't require explicit modeling of p(φ), just a generative process that implicitly defines it. I've been working with TabPFN recently and this context makes the synthetic prior generation stratgy make way more sense.

Marco Barbero Mota's avatar

As I understand it you can think of the posterior of φ given the data as the likelihood of the DGP that generated the training data. So it is a likelihood over causal graphs that explain the context you input into the model. What I am also thinking is that such likelihood must be on the support defined by the synthetic graphs that made it in the pre training. So what happens if the model encounters some weird training data that no graph it has seen explains?

No posts

Ready for more?