List of words that are filtered with Stopwords, Stemming and Tokenizing?

Jonas97
New Altair Community Member
Hello,
is there a function in Rapid Miner that I can use to create a list of words or the number of words, which the Process Steps Filter Stopwords, Stemming and Tokenizing has identiefied and excluded from the analyse of the Textcorpus?
Thank you in advance!
Jonas
is there a function in Rapid Miner that I can use to create a list of words or the number of words, which the Process Steps Filter Stopwords, Stemming and Tokenizing has identiefied and excluded from the analyse of the Textcorpus?
Thank you in advance!
Jonas
Tagged:
0
Answers
-
I am not sure if there is a direct way to view this, but you could accomplish this if you first run your document through and just tokenize, then run it through a 2nd time and tokenize as well as the other text processing options you want (stopwords, stemming, etc.) and then take both resulting wordlist datasets and use Set Minus (join type) to get the non-matches.1