🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

How to handle empty fields problems (Not missing data) in a data set

User: "MasoudG"
New Altair Community Member
Updated by Jocelyn
Hello guys.
I have a data set that I collected from 35 companies. one of my attributes is: "do they have this type of plan" and the values will be "Yes" and "No" and my second attribute is "how much is the price of this plan" so for the companies that their first attribute is "Yes" the value would be a number like 30 euros, but for the companies that their first attribute is "No" this filled is empty.
I want to do clustering but because of the empty fields, I can't proceed. I don't want to remove this attribute or any example or even fill up these fields with any missing data techniques, because they are not missing.
is there any technique in Rapidminer to define:  if the first attribute is no then ignored the second attribute for that example?
Thank you very much

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "jacobcybulski"
    New Altair Community Member
    Accepted Answer
    I agree with @David_A. You can replace those missing values with something meaningful, e.g. 0 for missing (but meaningful) numerical values (I assume if it is not there it can be interpreted as zero) and "undefined" for nominal attributes (so that you could treat these in a special way). If you are concerned that those extra zeroes are going to upset your statistics, e.g. during your cluster analysis, this means that in your mind you want these cases to be treated separately. If this is the case and you wanted to do segmentation analysis, conduct your clustering in two different processes (filter them out or in for each) and interpret each separately. If you wanted to use cluster attribute for building some predictive model, you could then rename these cluster attributes C1 and C2 (create dummy attributes C2 and C1 each, with some specific values - in a sense putting them all in a separate cluster) and append all examples back, generating two extra columns, for further processing.

    Jacob