discretize by variance?

User: "omoratto"
New Altair Community Member
Updated by Jocelyn
Hi. I have a DB, each row represents a person. One of the columns is the income. I tried to apply a K-Means to group the data set. Originally, I normalized and applyied logs to the income column, but the either way, results are not logical, because it groups people very dissimilar in terms of income. Although income is not the only variable, it is an important one. Because income has a big coefficient of variation (1000%), I though I can construct bins with similar coefficient of variation, i.e., up to 30%. After discretizing, I should transform the bins to numerical values in order to be used by the k-means operator.

It can be done in rapid miner? Any ideas that can help me.

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "omoratto"
    New Altair Community Member
    OP
    Accepted Answer
    Brian, thank you so much for your feedback. I tried your suggested approach by normalization not by z, unfortunatelly it came up with two groups. What I decided was to apply an outlier detection model before clustering the results, in that way, I Split the dataset into two sections (outlier, non-outlier) and applied k-means to each section. It worked pretty well.

    Thank you