How can I plot the frequency of word?

LindsayKelevra
LindsayKelevra New Altair Community Member
edited November 5 in Community Q&A

Hello everyone!

I'm trying to use the operator Generate Gaussian in order to plot the frequency of words, but comparing my results (calculated manually) with them they're really different. I need this operation to understand which values ​​to discard through the pruning. What's the formula that RapidMiner uses to create the Gaussian? 

Thank you.

Answers

  • Telcontar120
    Telcontar120 New Altair Community Member
    Are you expecting your word frequency to follow a normal distribution?  It's not clear that is the best a priori model for word distributions depending on the type of text.
    I am also not clear how conformity to a hypothetically pure statistical distribution affects pruning.  You might be better off simply setting pruning thresholds by frequency or by percentage at a few different levels and seeing what words are dropped as a consequence.  Typically having a lot of words with only a handful of occurrences does nothing at all for model performance but can lead to large datasets and long runtimes.