Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
How to compare which similarity measurement gives better results?
n01399670
For my text document data sets, i have done 'Data to Similarity' using Cosine, Jaccard, Dice etc similarities. My goal is to determine which similarity measurement gives better results for my input data set. How do i do the comparative check?
Find more posts tagged with
AI Studio
Similarity
Performance
Accepted answers
Telcontar120
I don't think there is a simple answer to this question. Each of these distance metrics measures distance in a slightly different way. You can read about the exact calculations on wikipedia or other sites. You need to select which one corresponds most closely to the way that you are thinking about similarity between your texts. In a supervised learning problem you can make this parameter subject to optimization and determine the "best" answer based on overall model performance, but if you are simply computing similarity for its own sake, then there is no way for RapidMiner to tell you which one is the "best" for that comparison.
All comments
Telcontar120
I don't think there is a simple answer to this question. Each of these distance metrics measures distance in a slightly different way. You can read about the exact calculations on wikipedia or other sites. You need to select which one corresponds most closely to the way that you are thinking about similarity between your texts. In a supervised learning problem you can make this parameter subject to optimization and determine the "best" answer based on overall model performance, but if you are simply computing similarity for its own sake, then there is no way for RapidMiner to tell you which one is the "best" for that comparison.
n01399670
Thanks for clarifying!!
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups