discard attribute with more than x% missing values operator

dan_agape
dan_agape New Altair Community Member
edited November 5 in Community Q&A
A suggestion: the above operator (see subject) seems to be needed in RM. It is very useful in the data pre-processing step. This simple but essential function is offered in almost any popular DM suite.

Best
Dan
Tagged:

Answers

  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    you are right, such an operator would be nice. I have uploaded a process with our new Community Extension which performs exactly the desired task. It is called "Discard Attribute with More than x% Missing Values (Loops + Macros)" and you can download and execute the process with a few clicks after having installed our new myExperiment Community Extension from the help menu of RapidMiner.

    This process loops over all attributes and calculates the fraction of missings for each attribute. If this fration is larger than the fraction defined in the first "Set Macro" operator (macro: max_unknown), the attribute will be removed from the example set.

    Cheers,
    Ingo

  • dan_agape
    dan_agape New Altair Community Member
    Hi Ingo,

    Thanks for the prompt reply. The RM team does a great job, and we, the users, thank you for that.

    BTW, that's an excellent thing that most of Weka algorithms are included under a plug in component. However it would be useful perhaps to include all the pre-processing functionality from there, although RM is very strong in this. In particular the operator from the subject would have been included.

    Best
    Dan