Hello everyone,
I want to
build different classification models. I have two questions.
1) At first,
I want to build a decision tree. So I have to change the numeric values into
nominal. I can do this with the discretizing operator. But all my numeric
attributes are differently distributed. Do you know any literature which says
the best method in each case? I also read that I can do it with k-means clustering,
but it doesn’t work with missing values.
2) I often
read that I have to split my dataset into a training and a testing part. I can
do this with the splitting operator. I don’t understand why I have to split only
into two parts and not into three. Because what is about my non-classified observations?
Are they included in each of them (training and testing)? In my opinion I have to split in a training, a
testing and a real prediction part.
Thank you
very much.
Regards