First of all, this is a very nice article! I have some questions regarding the inductive biases introduced by feature engineering.
Do all types of inductive biases essentially boil down to preference over specific functions? For example, lets say we are asked to predict house prices. If we fix a learner, does the usage of features x1, x2 (e.g. house area in m^2, number of tables) instead of x3, x4 (e.g. number of rooms, number of swimming pools) introduce any inductive bias? I mean in both cases we are just left mapping from R^2 to R.
The only way I can think of features introducing an inductive bias is if we view the problem as following. There is an input space X and we are interested in finding a map X -> Y. If we use a learner with x1 and x2, then this amounts to finding a function:
X -> g(X) -> h(g(X)) -> Y
where (x1, x2) = g(X). Now we have an inductive bias, since the function the learner must pick is the total composition h(g(X)) and the constraint arises since it must include g(X).
First of all, this is a very nice article! I have some questions regarding the inductive biases introduced by feature engineering.
Do all types of inductive biases essentially boil down to preference over specific functions? For example, lets say we are asked to predict house prices. If we fix a learner, does the usage of features x1, x2 (e.g. house area in m^2, number of tables) instead of x3, x4 (e.g. number of rooms, number of swimming pools) introduce any inductive bias? I mean in both cases we are just left mapping from R^2 to R.
The only way I can think of features introducing an inductive bias is if we view the problem as following. There is an input space X and we are interested in finding a map X -> Y. If we use a learner with x1 and x2, then this amounts to finding a function:
X -> g(X) -> h(g(X)) -> Y
where (x1, x2) = g(X). Now we have an inductive bias, since the function the learner must pick is the total composition h(g(X)) and the constraint arises since it must include g(X).
I would say choice of features can be seen as a strong inductive bias.
For this view you define your function space as R^4 -> R, then allowing the learner to only use x1 and x2 is a strong constraint, a restrictive bias.