Choose elements from Column
My problem is that i need to remove all rows from a datasheet which have in a specific column unique input.
for example .... Lets say there is an column that have results from 1 to 9 ... and those can exist for 0 to 100 times or more ... if the numbers 1 and 2 in the column exist only once I want to remove their rows.
any ideas ?
thanks
Best Answer
-
Ok, got it. It sounds like you could try to use the "Aggregate" operator with the aggregation function "Count" on your attributes, in order to get the values that should be filtered out (because they rarely show up). Then you could use those values as input in the "Filter Examples" operator, e.g. with a macro ("Extract Macro" operator). You would need to use the "Multiply" operator to get different threads of your data though. It may become a bit labor-intensive, if you have a huge amount of attributes, but there would probably be a way to solve this kind of situation with a loop operator.
1
Answers
-
Use the RegEx parameter in Select Attributes, write the RegEx, and then toggle on Invert Condition.
0 -
AH I just read your post a bit more, you want to remove Rows based on a specific column value.
0 -
I cant find the correct RegEx ... I can not understand the spelling for it
0 -
You could try generating a filter attribute with "Generate Attributes" and then filter out the rows that have the specific filter value with "Filter Examples". If you can post a small subset of your data, I'll have a look.
0 -
there are almost 150 different values and the 80% from those exist only once or twise. (ex of value: A,B,AA,CA,GT ect.)
I need to remove them in order to have a clear sample result.
0 -
Ok, got it. It sounds like you could try to use the "Aggregate" operator with the aggregation function "Count" on your attributes, in order to get the values that should be filtered out (because they rarely show up). Then you could use those values as input in the "Filter Examples" operator, e.g. with a macro ("Extract Macro" operator). You would need to use the "Multiply" operator to get different threads of your data though. It may become a bit labor-intensive, if you have a huge amount of attributes, but there would probably be a way to solve this kind of situation with a loop operator.
1 -
Thanks a lot ... Aggregate and filter operator did the trick to get the results I needed
Thank your help all:)
1 -
Interestingly a similar problem and solution is taught in the official RM Radoop training.
I recommend going through as many RapidMiner training courses as you can because as well as a snazzy certificate there's quite a few practical tips on how to approach data mining problems like this.2