Data Cleanup

iason
iason New Altair Community Member
edited November 5 in Community Q&A
Hello all,

This is my first post and my first attempt to work with actual data on Rapidminer, so please excuse any ignorance.

What I am trying to achieve is cleanup my data, imported from csv files.
First of all, I have a lot of missing values, which show up as ? on the tables. I need a way to keep those out.
Secondly, I have some rules (ie att1*att2 < 5000) and I want to filter the data based on that, preferably without adding an extra column.
I can do all that in a spreadsheet and import clean data in RM, but it would save much time if done internally.

Thank you all in advance.
Tagged:

Answers

  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    First of all, I have a lot of missing values, which show up as ? on the tables. I need a way to keep those out.
    What would you like to filter out? Examples containing any missing value (?) or attributes containing any missing values? For the first, you would use the operator "Filter Examples" with condition "no missing attributes" and for the second you would use the operator "Select Attributes" with filter type "no missing values".

    Secondly, I have some rules (ie att1*att2 < 5000) and I want to filter the data based on that, preferably without adding an extra column.
    Currently the best option probably is to create such an index colum with the operator "Generate Attributes", filter the examples with "Filter Examples" and remove the index column again with "Select Attributes".

    We are actually revising the operator "Filter Examples" for one of the next versions and it will certainly also allow to use expressions like those directly in the operator then.

    Cheers,
    Ingo
  • iason
    iason New Altair Community Member
    Thank you, problem solved