A program to recognize and reward our most engaged community members
i am still kinda new to rapidminer. From what i saw sofar, this is clearly a powerful result of massiv brainpower!
Is there an operator to avoid redundant ngramm generation? What would be the best way to match say 5 lists of descriptions on the master list without redundant task?
Can you think of a setup where i consider all lists equal? Basically a cloud of descriptions, where i aggregate the most similar ones?
Ingo Mierswa wrote:Well, there is no non-redundant n-gram generation directly. Maybe you could remove them afterwards. However, I am not sure though if I have understood your task correctly, but maybe it would be possible to calculate not only the n-grams but also a vectorized representation by, for example, TFIDF. In that case the redundant terms would no longer occur but you could calculate a similarity instead which would also deliver "fuzzy" matches (which could be disregarded if only perfect matches are of interest. But as I said, I am not sure if I got you correctly 100%...