Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

operating generate N-Grams (terms)

hi,

I would like to know how the n-grams are generated, I noticed, some words are grouped together as n-gram (terms), and some others are not (single words), how does it decide which terms group together and which not? many of the most frequent occuring terms have no n-gram groupings...

Find more posts tagged with

AI Studio

Accepted answers

All comments

Thomas_Ott

The way n-grams works is like this if you set it to 2. It will make combinations of the following sentence "RapidMiner Studio is the best."

RapidMiner_Studio

Studio_is

is_the

the_best

Assuming your corpus of documents is about RapidMiner Studio reviews and you have TF-IDF set as your word vector creation, it will likely give "is_the" a very low value and "RapidMiner_Studio" and "the_best" as higher values. Of course if you have stemming, filtering, and pruning set, it might just drop out "is_the" completely out, and that's probably what's happening with your process.

Fred12

well inside process documents operator, I had tokenize, stemming, stopwords and n-gram operator, but this might have been the cause...