How to compare which similarity measurement gives better results?
n01399670
New Altair Community Member
For my text document data sets, i have done 'Data to Similarity' using Cosine, Jaccard, Dice etc similarities. My goal is to determine which similarity measurement gives better results for my input data set. How do i do the comparative check?
Tagged:
0
Best Answer
-
I don't think there is a simple answer to this question. Each of these distance metrics measures distance in a slightly different way. You can read about the exact calculations on wikipedia or other sites. You need to select which one corresponds most closely to the way that you are thinking about similarity between your texts. In a supervised learning problem you can make this parameter subject to optimization and determine the "best" answer based on overall model performance, but if you are simply computing similarity for its own sake, then there is no way for RapidMiner to tell you which one is the "best" for that comparison.
5
Answers
-
I don't think there is a simple answer to this question. Each of these distance metrics measures distance in a slightly different way. You can read about the exact calculations on wikipedia or other sites. You need to select which one corresponds most closely to the way that you are thinking about similarity between your texts. In a supervised learning problem you can make this parameter subject to optimization and determine the "best" answer based on overall model performance, but if you are simply computing similarity for its own sake, then there is no way for RapidMiner to tell you which one is the "best" for that comparison.
5 -
Thanks for clarifying!!0