🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

operating generate N-Grams (terms)

Fred12User: "Fred12"
New Altair Community Member
Updated by Jocelyn

hi,

I would like to know how the n-grams are generated, I noticed, some words are grouped together as n-gram (terms), and some others are not (single words), how does it decide which terms group together and which not? many of the most frequent occuring terms have no n-gram groupings...

Find more posts tagged with

Sort by:
1 - 2 of 21

    The way n-grams works is like this if you set it to 2.  It will make combinations of the following sentence "RapidMiner Studio is the best."

     

    RapidMiner_Studio

    Studio_is

    is_the

    the_best

     

    Assuming your corpus of documents is about RapidMiner Studio reviews and you have TF-IDF set as your word vector creation, it will likely give "is_the" a very low value and "RapidMiner_Studio" and "the_best" as higher values. Of course if you have stemming, filtering, and pruning set, it might just drop out "is_the" completely out, and that's probably what's happening with your process.

    Fred12User: "Fred12"
    New Altair Community Member
    OP

    well inside process documents operator, I had tokenize, stemming, stopwords and n-gram operator, but this might have been the cause...