This is part 2 of the inductive bias series which explores inductive biases for machine learning from theory to practice.
The 2-day universe
Imagine a universe that only exists for two days. The example is inspired by Forster (1999).
Each of the two evenings, a symbol appears in the sky, and it can either be a square ⬛️ or a Shiba Inu 🐕. The inhabitants know that this will happen and try to devise a prediction rule since there's nothing else to do in this boring place. They list the following possibilities for the first and second evenings:
(⬛️,🐕) Square followed by Shiba Inu
(⬛️,⬛️) Square on both evenings
(🐕,🐕) Shiba Inu on both evenings
(🐕,⬛️ ) Shiba Inu on evening one, then the square
The inhabitants agree to wait for the first symbol and then predict the next day’s symbol. But they already devised possible prediction rules for day two. We can also call these prediction rules “hypotheses”:
Predict for day two the same symbol as day one
Predict the opposite symbol of day one
Predict Shiba Inu 🐕 regardless of day one
Predict Square ⬛️ regardless of day one
The first evening comes. And with it a Shiba Inu in the sky. The inhabitants feel blessed.
Which one is the correct hypothesis? Prediction rules 1 and 3 would predict Shiba Inu again, and rules 2 and 4 predict the square. Unfortunately, we can’t say which rule is correct, since we don't know the laws of this universe. Even if the Shiba Inu appears we don’t know whether the mechanics were guided by rule 1 or 3 or by a completely different rule altogether.
Each prediction rule for day two has an equal probability of making a correct prediction (50%) if we assume that each of the four sequences is equally likely. Giving each possibility the same "ignorance" prior triggers the no lunch theorem: Averaged over all the possible universes no prediction rule is better than any of the others. Without assumptions, we can only enumerate the hypotheses. To deliver something at the end of the day, we must narrow that hypothesis space down.
Our universe and the problem of induction
Bob works in IT. Today is just another day where Bob takes the subway to work. It will be just another regular day. Bob loves routine.
But will it be just another day? Bob asks himself suddenly. From where does he take the conviction that the future will always be like the past? Of course, his experience tells him that today will likely be just like all the other office days. But for what it’s worth he could be living in a simulation and while he’s in the subway some 15-year-olds are patching an update into the system. Unnerving. Bob had discovered the problem of induction and it was now growing as this uncomfortable feeling in the pit of his stomach. Making him worried. His routine might be endangered.
When he exits the subway, will his office building even be there? His intuition screams, “Of course, the building will be there. like every damn morning!” But Bob has doubts. What if the building would grow legs and run off to find a sunnier spot?
Finally, Bob shakes off these uncomfortable thoughts. Today he’ll give the machine learners in the company an extra tough time. His gut feeling tells him that they have something to do with the problem of induction. Seems like it’s time for unscheduled maintenance of the GPU servers. And indeed, it will become yet another routine day for Bob.
We have to take an inductive leap
Bob has an advantage over the fluffy aliens: Our universe has existed for more than one day. Bob has never seen a building grow legs. And although there is a slight probability that the building no longer exists when he arrives, anyone would bet a lot of money that the average building still exists tomorrow. Our experience says that buildings have existed for years and decades.
However, even in our universe, we take an “inductive leap”. Because there is no guarantee of continuance. It’s an assumption. Assuming that the past predicts the future is so deeply ingrained in us that we might not see it as an untestable assumption. And, mostly, it’s a good assumption to have! How absurd and unthinkable would a world be without some structure? But to truly understand prediction by induction we have to meditate on the fact that every prediction rests on assumptions.
Induction stands for learning a general rule from specific examples. The 2-day world inhabitants didn’t have that luxury, but we do. We can observe our universe and try to extract rules from it.
Sometimes we should question our assumptions. Black Swans are lurking or we might have been mistaken about our universe all along. We may be like the Turkey in Nassim Taleb’s Black Swan who was fed throughout the year, predicting yet another day of normal life, but that next day turned out to be Thanksgiving.
I haven’t spoken much about machine learning and inductive biases in this part of the series but about their foundations. Inductive biases are the ingredients that allow machines to take the inductive leap as we discuss next week. Stay tuned.