This is a very good post, and the diagram at the top should be early in all ML textbooks.
> These are mathematical procedures that compare a data point with a learned pattern.
NB *pattern* here is a vague term. Eg in k-NN, we compare against some particular data points in the training set, the nearest ones. Eg in Naive Bayes, we instead compare against a pseudo-data point which is calculated as the centroid of the class. But "pattern" can mean other things in other contexts.
> The linear regression model is a dot product.
True but a bit misleading in context, as we are not really using this dot product to calculate the *similarity* between the weights and the query point.
I'm not sure I see the distinction between dot product and distance? Isn't the dot product just the unscaled angular separation of the two vectors? I think if we do scale it, the angle between the two vectors will be a distance measure (symmetric, has a zero, obeys the triangle inequality)?
By normalizing the vectors we go from dot product to cosine similarity.
Taking (1 - cosine similarity), we get cosine distance (which is not a true distance function).
Taking sqrt(2 • cosine distance) gives us Euclidean distance of the normalized vectors.
So this only works for normalized vectors and requires a few steps. For me this separates the two concepts enough to mention them as separate concepts of similarity.
Your feedback convinced me I should at least add a section to the book about the connection between the two.
This is a very good post, and the diagram at the top should be early in all ML textbooks.
> These are mathematical procedures that compare a data point with a learned pattern.
NB *pattern* here is a vague term. Eg in k-NN, we compare against some particular data points in the training set, the nearest ones. Eg in Naive Bayes, we instead compare against a pseudo-data point which is calculated as the centroid of the class. But "pattern" can mean other things in other contexts.
> The linear regression model is a dot product.
True but a bit misleading in context, as we are not really using this dot product to calculate the *similarity* between the weights and the query point.
I'm not sure I see the distinction between dot product and distance? Isn't the dot product just the unscaled angular separation of the two vectors? I think if we do scale it, the angle between the two vectors will be a distance measure (symmetric, has a zero, obeys the triangle inequality)?
Thanks for the feedback. Euclidean distance and dot product are related to each other: https://math.stackexchange.com/a/2981910
By normalizing the vectors we go from dot product to cosine similarity.
Taking (1 - cosine similarity), we get cosine distance (which is not a true distance function).
Taking sqrt(2 • cosine distance) gives us Euclidean distance of the normalized vectors.
So this only works for normalized vectors and requires a few steps. For me this separates the two concepts enough to mention them as separate concepts of similarity.
Your feedback convinced me I should at least add a section to the book about the connection between the two.
Fascinating idea! What would be the statistical analogue of the cross product? Maybe PCA.