7 Comments
User's avatar
K Dunn's avatar

Am I missing something when you talk about: ‘Datapoint attention attends to cells from other data points. So if you look at the vector representation for, say, cell x_12 (column 1, row 2),’. … don’t you mean row 1, column 2?

This article is phenomenal by the way. Thank you!

Christoph Molnar's avatar

Good catch! I fixed it.

Inwon K's avatar

Thanks for the writeup. Wanted to point out something -- it's a bit confusing to call this an "encoder-decoder" architecture. When people say that about transformers, they are usually referring to two transformer stacks that handle sequences differently. TabPFN is an encoder-only transformer, with *feature* and *output* encoder/decoder, not to be confused with how T5 is an encoder-decoder.

Rainbow Roxy's avatar

Thanks for writing this, it clarifies a lot. Does cell-based abstraction scale for huge databasets? Brilliant insights!

Per Proteous's avatar

Is there reason to think it could completely botch certain datasets? Real world datasets just too out of left field vs it's pretrained universe?

Christoph Molnar's avatar

It's possible, and I anecdotally have heard it from other people who experienced this on their data. After all, there's no free lunch

Per Proteous's avatar

Indeed

The usual suspects will probably say, "No really! This time it *is* a free lunch!!" Oh well