The architecture behind TabPFN

Feb 3

This is the third post in the tabular foundation model (TFM) series (see part 1, and part 2).

7 Comments

Am I missing something when you talk about: ‘Datapoint attention attends to cells from other data points. So if you look at the vector representation for, say, cell x_12 (column 1, row 2),’. … don’t you mean row 1, column 2?

This article is phenomenal by the way. Thank you!

Reply (1)

Christoph Molnar

Feb 5

Good catch! I fixed it.

Inwon K

Thanks for the writeup. Wanted to point out something -- it's a bit confusing to call this an "encoder-decoder" architecture. When people say that about transformers, they are usually referring to two transformer stacks that handle sequences differently. TabPFN is an encoder-only transformer, with *feature* and *output* encoder/decoder, not to be confused with how T5 is an encoder-decoder.

Rainbow Roxy

Feb 5

Thanks for writing this, it clarifies a lot. Does cell-based abstraction scale for huge databasets? Brilliant insights!

Per Proteous

Feb 3

Is there reason to think it could completely botch certain datasets? Real world datasets just too out of left field vs it's pretrained universe?

Reply (1)

Christoph Molnar

Feb 3

It's possible, and I anecdotally have heard it from other people who experienced this on their data. After all, there's no free lunch

Reply (1)

Per Proteous

Feb 3

Indeed

The usual suspects will probably say, "No really! This time it *is* a free lunch!!" Oh well

Mindful Modeler

The architecture behind TabPFN