Choose elements from Column

Me_Again447
Me_Again447 New Altair Community Member
edited November 2024 in Community Q&A

My problem is that i need to remove all rows from a datasheet which have in a specific column unique input.

 

for example .... Lets say there is an column that have results from 1 to 9 ... and those can exist for 0 to 100 times or more ... if the numbers 1 and 2 in the column exist only once I want to remove their rows. 

 

any ideas ?

 

thanks

 

Tagged:

Best Answer

  • FBT
    FBT New Altair Community Member
    Answer ✓

    Ok, got it. It sounds like you could try to use the "Aggregate" operator with the aggregation function "Count" on your attributes, in order to get the values that should be filtered out (because they rarely show up). Then you could use those values as input in the "Filter Examples" operator, e.g. with a macro ("Extract Macro" operator). You would need to use the "Multiply" operator to get different threads of your data though.  It may become a bit labor-intensive, if you have a huge amount of attributes, but there would probably be a way to solve this kind of situation with a loop operator.

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    Use the RegEx parameter in Select Attributes, write the RegEx, and then toggle on Invert Condition.

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    AH I just read your post a bit more, you want to remove Rows based on a specific column value.

  • Me_Again447
    Me_Again447 New Altair Community Member

    I cant find the correct RegEx ... I can not understand the spelling for it :/

  • FBT
    FBT New Altair Community Member

    You could try generating a filter attribute with "Generate Attributes" and then filter out the rows that have the specific filter value with "Filter Examples". If you can post a small subset of your data, I'll have a look. 

  • Me_Again447
    Me_Again447 New Altair Community Member

    there are almost 150 different values and the 80% from those exist only once or twise. (ex of value: A,B,AA,CA,GT ect.)

    I need to remove them in order to have a clear sample result.

     

  • FBT
    FBT New Altair Community Member
    Answer ✓

    Ok, got it. It sounds like you could try to use the "Aggregate" operator with the aggregation function "Count" on your attributes, in order to get the values that should be filtered out (because they rarely show up). Then you could use those values as input in the "Filter Examples" operator, e.g. with a macro ("Extract Macro" operator). You would need to use the "Multiply" operator to get different threads of your data though.  It may become a bit labor-intensive, if you have a huge amount of attributes, but there would probably be a way to solve this kind of situation with a loop operator.

  • Me_Again447
    Me_Again447 New Altair Community Member

    Thanks a lot ... Aggregate and filter operator did the trick to get the results I needed

     

    Thank your help all:)

  • JEdward
    JEdward New Altair Community Member

    Interestingly a similar problem and solution is taught in the official RM Radoop training.  
    I recommend going through as many RapidMiner training courses as you can because as well as a snazzy certificate there's quite a few practical tips on how to approach data mining problems like this.