How does one determine similarity between people online? (Quora和訳)


How does one determine similarity between people online?


How do you quantify similarities between people using shopping sites, social networks, Q&A sites etc (e.g how do Netflix, Amazon, Paypal, Twitter, Linkedin, Facebook, Quora calculate user similarity/assign users into groups)





Here are some educated guesses about what serves as a basis for their work:

(1) Use vector models. For each user, look at the text he wrote.

Build a tf.idf vector. See for details.

Then compute the cosine between any two users.

You can supplement this with fancier Techniques such as LSI.

If you want to implement something such as similarity measure based on text, you can look at the package in Lucene (open source).

(2) Networking sites such as Facebook might simply look at how many friends you have in common and compute something like the Jaccard index .




(1)ベクトルモデルを使用してください。 ユーザーごとに、彼が書いたテキストをもとに。TF-IDFベクトルを構築します。





(関連キーワード: LSI / Jaccard Index )





A good place to start is: Programming Collective Intelligence

Whole books have been written on this area so answering it in a few words is not give anywhere near a full answer.

But if using the collaborative filtering approach, then find how many actions (e.g. products purchased, likes, friends) two people have in common. But to make it work well you have assign weights to the actions in common. The correlation value between two people is the sum of the weights of the actions. But the tricky part is assigning good weights to the actions. The weight has to be relative the two people, the cluster of the two people and the whole population.








しかし、トリッキーな部分はアクションに適切な重みを割り当てています。 重量は相対的な二人、二人のクラスタと全体の人口にする必要があります。



Recommendation mining is very useful to determine similarity between peoples using shopping sites, social networks and Q&A sites on the basis of their likes, choices and on-line behavior

Some of popular recommendation engine are


You can also find similarity between in text posted by authors in Q&A, social networks, blogs etc. For that you can try out following techniques

  • Jaccard Index
  • Simrank
  • Vector space model with Cosine Similarity
  • Sorensen similarity index
  • BM25 Okapi model







  • ジャッカード係数
  • Simrank
  • コサイン類似度によるベクトル空間モデル
  • Sorensen similarity index
  • BM25 Okapi model

(関連キーワード: Jaccard Index / Simrank / Vector space model with Cosine Similarity / Sorensen similarity index / BM25 Okapi model )