Im new to Rapidminer and I wanted to generate N-grams from my excel file that contains comments and replies from forum posts. My process design currently contains the following operators: Data, Process Documents (w/ Tokenize, Filter Stopwords English, Generate n-grams, Filter Tokens by Length), and Write Excel. I am not sure why my results are showing me all the possible combinations of words within the data instead of just showing me the combinations that occur twice or more. Maybe im missing an important detail. Really need urgent help! TIA!
(Images below depicting my current problem)
what i want it to look like
what it actually looks like