Impute Missing Values

User: "Marek_Lubicz"
Altair Community Member
Updated by Jocelyn
Working with my students on dealing with missing and imbalanced data in RM we found that Impute Missing Values operator, used in in the Tutorial Process for that operator, removes the label role from the class attribute (of  the Labor-Negotiations dataset) and  transfers it to duration attribute.
You can easily check the attributes and their roles on the k-NN (or any othe learner inside the operator) outside input and inside input.
I was not able to explain such a behaviour (although of course it is easy to work it out using Set Role twice).
Does anybody know the formal explanation?

Find more posts tagged with

Sort by:
1 - 3 of 31
    User: "YYH"
    Altair Employee
    Accepted Answer
    Updated by YYH
    As explained in the first reply, the learner is built iteratively on each column. When you impute for column A, it automatically set column A as label because you need to predict the missing values in column A. When you impute column B in the next round, the learner will use non-missing values of column B as label to predict the missing values in column B. Repeat this (set different column as label in each step) for column C, column D, column E,…, until you finish imputing missing values in all columns.
    More explanation and implementation details can be found on the GitHub open source page here
    https://github.com/rapidminer/rapidminer-studio/blob/master/src/main/java/com/rapidminer/operator/preprocessing/filter/MissingValueImputation.java

    What is role? Check this out https://community.rapidminer.com/discussion/54288/roles-and-labels-a-quick-guide
    User: "YYH"
    Altair Employee
    Accepted Answer
    Updated by YYH
    You should insert break points before the learner inside the nest and check the refreshed metadata in each iteration. In your sceeenshot, the metadata is valid for the first iteration Only.
    User: "Marek_Lubicz"
    Altair Community Member
    OP
    Accepted Answer
    Thank you for your comprehensive replies, including directing me to the operator code on the GitHub, which clarifies a lot, particularly "* setting one of the regular attributes to label under the assumption that all * attributes are from the same type".
    I think we could set the question as solved from practical point of view (although it could be interesting to investigate the case when the above assumption is not hold while a learner accepts attributes of a specific type, like DT for a selected criterion; maybe at least an explanation in the IMV operator description in Help would be helpful, if not enabling the IMV to impute missing values for attributes of that specific type)