discretize by variance?
Hi. I have a DB, each row represents a person. One of the columns is the income. I tried to apply a K-Means to group the data set. Originally, I normalized and applyied logs to the income column, but the either way, results are not logical, because it groups people very dissimilar in terms of income. Although income is not the only variable, it is an important one. Because income has a big coefficient of variation (1000%), I though I can construct bins with similar coefficient of variation, i.e., up to 30%. After discretizing, I should transform the bins to numerical values in order to be used by the k-means operator.
It can be done in rapid miner? Any ideas that can help me.
It can be done in rapid miner? Any ideas that can help me.