🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Remove all lines with text occurency smaller than 10 from certain column"

MoritzUser: "Moritz"
New Altair Community Member
Updated by Jocelyn

Hi, 

Im trying to refer to a certain column of the sample set and remove all lines smaller than 10. Whats the way to do that? 

e.g. 

Process Documents from Files >> Filter Stopwords >> Tokenize >> Transform Cases >> Stem >> ??? now remove all lines where the clumn "text occurence" is lower than 10 ???

Sort by:
1 - 5 of 51

    Your question is a bit confusing.  Do you want to get rid of tokens that occur less than 10 times, or sentences (lines) that have fewer than 10 tokens?  In either case, RapidMiner can do it.  In the first case, just use the pruning options in Process Documents and set an absolute threshold of 10.  In the 2nd case, split each sentence into a separate document (you can use "Cut Documents" for this) and then "Extract Token Number" and then filter for any document (sentence) that has token length fewer than 10. 

    MoritzUser: "Moritz"
    New Altair Community Member
    OP

    Thanks for the fast answer. 

     

    Let me try to rephrase a bit: The task is to remove all words from our document with a total occurance smaller than 10. I already tried the pruning operator, but since there is no option to refer to the column "total occurance", i dont have the opporutity to prune after it / remove all words with a smaller occurance than 10


    @Telcontar120 wrote:

    Your question is a bit confusing.  Do you want to get rid of tokens that occur less than 10 times, or sentences (lines) that have fewer than 10 tokens?  In either case, RapidMiner can do it.  In the first case, just use the pruning options in Process Documents and set an absolute threshold of 10.  In the 2nd case, split each sentence into a separate document (you can use "Cut Documents" for this) and then "Extract Token Number" and then filter for any document (sentence) that has token length fewer than 10. 



     

     

     

    Ah, got it.  "Wordlist to Data" will let you take the wordlist and turn it into an exampleset and then you will be able to Filter on the "Total Occurrences" column.

    MoritzUser: "Moritz"
    New Altair Community Member
    OP

    Okay. I did the first part, but I still cant filter for columns. Where do I apply the filter? / Which filter do I apply

     

    Use "Filter Examples" and then set your condition to values where the Total Occurrence column is greater than 10.