John Shea: Nice post, but the name "cosine similarity" is so misleading and unhelpful to your readers. You are simply using a correlation coefficient, which in many applications can be defined as <x,y>/(||x|| ||y||), where <> denotes inner product (which is dot product for vectors) and || || denotes norm.

John Shea: Thanks for your reply. However, I still hold to the opinion that there is no purpose in mentioning cosine, other than it makes the post sound more exciting. Just because a student wrote that in a paper that is on the web does not make it a more accurate description than describing it as a correlation coefficient. In fact, in some research papers, people tend to write things in a way to make them sound more novel than they necessarily are, especially if it is a student doing the writing. I think the introduction of the idea of "cosine" is completely unhelpful, especially since it is too hard to visualize what that would mean in > 3 dimensions. On the other hand, the definition and idea of correlation are simple and clear, and anyone who has good knowledge of Fourier transform, statistics, etc., will immediately understand the implications. In 2 dimensions, it is easy to visualize vectors and the angles between them, but it is the fact that the correlation is equal to the cosine of the angle between the two vectors is what allows you to mention cosine at all.