Rethinking Machine Learning: The Role of…

Christoph Molnar

Jul 2, 2024

Dot Products, Decision Rules, and Distance Functions

Read →

4 Comments

James McDermott

Jul 12

This is a very good post, and the diagram at the top should be early in all ML textbooks.

> These are mathematical procedures that compare a data point with a learned pattern.

NB *pattern* here is a vague term. Eg in k-NN, we compare against some particular data points in the training set, the nearest ones. Eg in Naive Bayes, we instead compare against a pseudo-data point which is calculated as the centroid of the class. But "pattern" can mean other things in other contexts.

> The linear regression model is a dot product.

True but a bit misleading in context, as we are not really using this dot product to calculate the *similarity* between the weights and the query point.

Expand full comment

Ted Lorenzen

Jul 2, 2024

I'm not sure I see the distinction between dot product and distance? Isn't the dot product just the unscaled angular separation of the two vectors? I think if we do scale it, the angle between the two vectors will be a distance measure (symmetric, has a zero, obeys the triangle inequality)?

Expand full comment

Reply (1)

Christoph Molnar

Jul 3, 2024

Thanks for the feedback. Euclidean distance and dot product are related to each other: https://math.stackexchange.com/a/2981910

By normalizing the vectors we go from dot product to cosine similarity.

Taking (1 - cosine similarity), we get cosine distance (which is not a true distance function).

Taking sqrt(2 • cosine distance) gives us Euclidean distance of the normalized vectors.

So this only works for normalized vectors and requires a few steps. For me this separates the two concepts enough to mention them as separate concepts of similarity.

Your feedback convinced me I should at least add a section to the book about the connection between the two.

Expand full comment

Matt Gruner

Jul 2, 2024

Fascinating idea! What would be the statistical analogue of the cross product? Maybe PCA.

Expand full comment

Mindful Modeler

Rethinking Machine Learning: The Role of…