Text Similarity Detection
statspro
New Altair Community Member
Hi I am beginner to Rapid Miner. I want to use the Data to Similarity operator to check the text similarity but my problem is little different. I have a Excel file which has 2 columns (UserID & Review) and I want to check the text similarity of common userid.
For example - I have userid's A1, B1, C1, A1, B1, A1, B1, A1..etc now I want to check the text similarity of reviews given by A1 only.
UserID Review
A1 I love McDonald
B1 McDonald is bad
C1 I love McDonald in Newyork
A1 I love McDonald
B1 abc love McDonald
A1 I love McDonald when I was in Paris.
B1 My Experience of McDonald is Pathetic
A1 I love it
I would appreciate if anyone can help me on it....
Thanks,
Arun
For example - I have userid's A1, B1, C1, A1, B1, A1, B1, A1..etc now I want to check the text similarity of reviews given by A1 only.
UserID Review
A1 I love McDonald
B1 McDonald is bad
C1 I love McDonald in Newyork
A1 I love McDonald
B1 abc love McDonald
A1 I love McDonald when I was in Paris.
B1 My Experience of McDonald is Pathetic
A1 I love it
I would appreciate if anyone can help me on it....
Thanks,
Arun
Tagged:
0
Answers
-
Hi Arun,
use the Loop Values opeator to loop the different UserIDs. Inside the loop, use Filter Examples to filter only the examples of the current user, then apply Data to Similarity.
Best regards,
Marius0 -
Hello,
What similarity should be used in the "data to similarity" componenet in the case mentioned before?
Thank you in advance..0 -
For text data often the CosineSimilarity (in Numerical Measures) is a good choice.
Please remember to convert the texts to TF/IDF values or another suitable measure using the Process Documents operators from the text processing extension. Otherwise RapidMiner does not "understand" the unmodified/unprepared texts.
Best regards,
Marius0