4 Comments
User's avatar
James McDermott's avatar

This is a very good post, and the diagram at the top should be early in all ML textbooks.

> These are mathematical procedures that compare a data point with a learned pattern.

NB *pattern* here is a vague term. Eg in k-NN, we compare against some particular data points in the training set, the nearest ones. Eg in Naive Bayes, we instead compare against a pseudo-data point which is calculated as the centroid of the class. But "pattern" can mean other things in other contexts.

> The linear regression model is a dot product.

True but a bit misleading in context, as we are not really using this dot product to calculate the *similarity* between the weights and the query point.

Expand full comment
Ted Lorenzen's avatar

I'm not sure I see the distinction between dot product and distance? Isn't the dot product just the unscaled angular separation of the two vectors? I think if we do scale it, the angle between the two vectors will be a distance measure (symmetric, has a zero, obeys the triangle inequality)?

Expand full comment
Christoph Molnar's avatar

Thanks for the feedback. Euclidean distance and dot product are related to each other: https://math.stackexchange.com/a/2981910

By normalizing the vectors we go from dot product to cosine similarity.

Taking (1 - cosine similarity), we get cosine distance (which is not a true distance function).

Taking sqrt(2 • cosine distance) gives us Euclidean distance of the normalized vectors.

So this only works for normalized vectors and requires a few steps. For me this separates the two concepts enough to mention them as separate concepts of similarity.

Your feedback convinced me I should at least add a section to the book about the connection between the two.

Expand full comment
Matt Gruner's avatar

Fascinating idea! What would be the statistical analogue of the cross product? Maybe PCA.

Expand full comment