Text Mining - Documents Similarity (words position)
Hello,
I'm looking for a way to get the similarity between documents, but where the words positions is relevant.
I've already implemented the sample with "Data Similarity" operator (CosineSimilarity) like:
https://community.rapidminer.com/t5/RapidMiner-Studio-Forum/How-to-compare-similarity-of-large-number-of-documents/td-p/16002
But I need to take into account the order/position of words, not only frecuency or occurrence.
I.E:
Example 1: A B C D E F G
Example 2: A X B D Y F G
Example 3: G F E A B C D
Example 1 and 2 have more similarity than Example 1 and 3 because although Example 3 has exactly the same words than Example 1 (CosineSimilarity=1), they are in different position. Example 2 only has two different words (X,Y), and other word in other position but near the original position...
I think is a problem difficult to explain and I'm not sure if RapidMiner can give me a solution.
Best regards,
Silvia
I'm looking for a way to get the similarity between documents, but where the words positions is relevant.
I've already implemented the sample with "Data Similarity" operator (CosineSimilarity) like:
https://community.rapidminer.com/t5/RapidMiner-Studio-Forum/How-to-compare-similarity-of-large-number-of-documents/td-p/16002
But I need to take into account the order/position of words, not only frecuency or occurrence.
I.E:
Example 1: A B C D E F G
Example 2: A X B D Y F G
Example 3: G F E A B C D
Example 1 and 2 have more similarity than Example 1 and 3 because although Example 3 has exactly the same words than Example 1 (CosineSimilarity=1), they are in different position. Example 2 only has two different words (X,Y), and other word in other position but near the original position...
I think is a problem difficult to explain and I'm not sure if RapidMiner can give me a solution.
Best regards,
Silvia