Understand interpretation methods via functional decomposition
This is so helpful. Please give this view for partial dependency plots (PDP) as well. Also closely related are partial effect plots (PEP); I don't know if the intuition is different for these two. (Here's a brief explanation of the difference between the two: https://stats.stackexchange.com/questions/371439/partial-effects-plots-vs-partial-dependence-plots-for-random-forests). PEP seems to be more popular with those who model with the statistical mindset whereas PDP is more popular with the machine learning mindset.
This is a beautifully intuitive explanation. However, I feel you left us hanging with ALE, which is precisely what interests me the most. So, I understand that for x2, ALE gives only the f_2(x2) component and nothing else. But then what about the interactions? Am I correct to understand that
* the simple ALE x2 score does not incorporate anything whatsoever of the x2 interactions; and
* the ALE interactions x1_x2 and x2_x3 map directly to the functional decompositions of f_12(x1,x2) and f_23(x2,x3)?
(I won't comment for now on the three-way interaction, since that is not yet well-developed for ALE).