From Theory to Practice: Inductive Biases in Machine Learning

part 1 of the inductive bias series

Apr 23, 2024

How do different models behave when you extrapolate one of your features? I discussed this question a while ago on Twitter/LinkedIn along with the following figure:

Average predicted house price (California dataset) when artificially increasing MedInc for all data points. It’s a partial dependence plot (with extrapolation).

Depending on the ML algorithm, the models differ wildly: the linear model extrapolates to all eternity. All tree-based models remain flat, no matter how much we increase the prediction. The neural network and the SVM extrapolate more wildly. k-nearest neighbors show these little jumps where with increasing MedInc neighbors are one-by-one exchanged for other neighbors until it’s also flat.

These models behave so differently due to their inductive biases.

At first, I planned to expand this social media post to a newsletter issue, but then I descended into a rabbit hole. I reemerged with a longer newsletter series on inductive biases and an enthusiasm I usually only get from too much coffee.

A better understanding of inductive biases could lead to a more creative and data-generating process-friendly way of modeling.

This newsletter series (you are reading part 1), is my attempt to catch my thoughts before they elude me again. And to manifest my newfound enthusiasm for inductive biases.

Inductive biases drive learning

If you look up definitions of inductive biases you find stuff like:

[An inductive bias is] any basis for choosing one generalization over another, other than strict consistency with the observed training instances

Mitchel (1980)

The term bias is overloaded in machine learning. I will only talk about inductive bias, not statistical bias (in terms of error) or social bias (preferring one social group over another).

Inductive biases are forces that push the learning algorithm in a certain direction. These biases may exclude some functions altogether (restrictive bias) or create a preference for one form over another (preferential bias).

A simple example of an algorithm that comes with a strong restrictive bias is our good old friend linear regression:

\(f(X) = \beta X\)

While certain neural networks can approximate any function, linear regression models are restricted in their expressiveness: They may only express the prediction as a weighted sum of the features.

Another example is convolutional neural networks (CNNs). It’s not a coincidence that CNNs work well for image data, but a consequence of their inductive biases. Here are just two of the inductive biases useful for image data:

Translation Invariance: The convolutional layers in the CNNs work like filters that recognize patterns on the image. They are constructed so that it doesn’t matter where the pattern (like cat ears) occurs in the image. Whether it’s the top left or in the middle of the image, a filter that recognizes cat ears will fire no matter what.
Hierarchical composition of concepts: CNNs are deep neural networks with multiple layers. In the case of CNNs, there are many convolutional layers followed by dense layers. This architecture introduces an inductive bias to learn concepts hierarchically, meaning concepts that become more and more abstract over the layers.

Features learned by a convolutional neural network (Inception V1) trained on the ImageNet data. The features range from simple features in the lower convolutional layers (left) to more abstract features in the higher convolutional layers (right). Figure from Olah, et al. (2017, CC-BY 4.0) https://distill.pub/2017/feature-visualization/appendix/. — Features learned by a convolutional neural network (Inception V1) trained on the ImageNet data. The features range from simple features in the lower convolutional layers (left) to more abstract features in the higher convolutional layers (right). Figure from Olah, et al. (2017, CC-BY 4.0).

The coming posts will feature many more examples of inductive biases. The list is endlessly long since every modeling choice introduces an inductive bias. From data augmentation to feature engineering and hyperparameter configuration. There wouldn’t be any machine learning without inductive biases.

Why I got excited about inductive biases

To me, inductive biases always felt like arcane academic knowledge. More theoretical than practical.

However, my view of inductive bias has changed. The more you understand the inductive biases introduced by your modeling choices, the better modeler you become.

As I went down the rabbit hole, my thoughts became more speculative: Could inductive biases be the right language and creative tool to make machine learning a better modeling practice? A practice that is mindful of the data-generating process, robust, and much more delicate?

I got excited about a deeper understanding of inductive biases because it reminded me of my initial motivation for this newsletter: to bring the rigor of classic statistical modeling to machine learning. The About section of Mindful Modeler says:

Machine learning has become mainstream while falling short in the silliest ways: lack of interpretability, biased and missing data, wrong conclusions, … To statisticians, these shortcomings are often unsurprising. Statisticians are relentless in their quest to understand how the data came about. They make sure that their models reflect the data-generating process and interpret models accordingly.

Thinking more deliberately about inductive biases of modeling choices might be a path forward to weave the model to reflect the data-generating process well.

This post is part of a longer series. Starting with a philosophic view on learning, followed by the inductive nature of machine learning and specific inductive biases, how to leverage inductive biases to be a better modeler, and why talking about inductive biases may be a step forward to improve machine learning.

Buckle up. Next week we will take the inductive leap.

Mindful Modeler

From Theory to Practice: Inductive Biases in Machine Learning

part 1 of the inductive bias series

Inductive biases drive learning

Why I got excited about inductive biases

Discussion about this post