Classification model and preparing data
Hello everyone,
I want to build different classification models. I have two questions.
1) At first, I want to build a decision tree. So I have to change the numeric values into nominal. I can do this with the discretizing operator. But all my numeric attributes are differently distributed. Do you know any literature which says the best method in each case? I also read that I can do it with k-means clustering, but it doesn’t work with missing values.
2) I often read that I have to split my dataset into a training and a testing part. I can do this with the splitting operator. I don’t understand why I have to split only into two parts and not into three. Because what is about my non-classified observations? Are they included in each of them (training and testing)? In my opinion I have to split in a training, a testing and a real prediction part.
Thank you very much.
Regards