Text Similarity Detection

statspro
statspro New Altair Community Member
edited November 5 in Community Q&A
Hi  I am  beginner to Rapid Miner. I want to use the Data to Similarity operator to check the text similarity but my problem is little different. I have a Excel file which has 2 columns (UserID & Review) and I want to check the text similarity of common userid.
For example - I have userid's A1, B1, C1, A1, B1, A1, B1, A1..etc now I want to check the text similarity of reviews given by A1  only.

UserID      Review
A1            I love McDonald
B1            McDonald is bad
C1            I love McDonald in Newyork
A1            I love McDonald
B1            abc love McDonald
A1            I love McDonald when I was in Paris.
B1            My Experience of McDonald is Pathetic
A1            I love it

I would appreciate if anyone can help me on it....

Thanks,
Arun

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    Hi Arun,

    use the Loop Values opeator to loop the different UserIDs. Inside the loop, use Filter Examples to filter only the examples of the current user, then apply Data to Similarity.

    Best regards,
    Marius
  • veve
    veve New Altair Community Member
    Hello,

    What similarity should be used in the "data to similarity" componenet  in the case mentioned before?

    Thank you in advance..
  • MariusHelf
    MariusHelf New Altair Community Member
    For text data often the CosineSimilarity (in Numerical Measures) is a good choice.

    Please remember to convert the texts to TF/IDF values or another suitable measure using the Process Documents operators from the text processing extension. Otherwise RapidMiner does not "understand" the unmodified/unprepared texts.

    Best regards,
    Marius