Text Mining - Documents Similarity (words position)

New Altair Community Member

Feb 22, 2018

Updated Nov 5, 2024 by Jocelyn

Hello,

I'm looking for a way to get the similarity between documents, but where the words positions is relevant.
I've already implemented the sample with "Data Similarity" operator (CosineSimilarity) like:
https://community.rapidminer.com/t5/RapidMiner-Studio-Forum/How-to-compare-similarity-of-large-number-of-documents/td-p/16002
But I need to take into account the order/position of words, not only frecuency or occurrence.
I.E:
Example 1: A B C D E F G
Example 2: A X B D Y F G
Example 3: G F E A B C D

Example 1 and 2 have more similarity than Example 1 and 3 because although Example 3 has exactly the same words than Example 1 (CosineSimilarity=1), they are in different position. Example 2 only has two different words (X,Y), and other word in other position but near the original position...

I think is a problem difficult to explain and I'm not sure if RapidMiner can give me a solution.

Best regards,
Silvia

Find more posts tagged with

AI Studio

Text Mining + NLP

🎉Community Raffle - Win $25

Text Mining - Documents Similarity (words position)

Find more posts tagged with

Quick Links