🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Classification model and preparing data

User: "keb1811"
New Altair Community Member
Updated by Jocelyn

Hello everyone,

I want to build different classification models. I have two questions.

1) At first, I want to build a decision tree. So I have to change the numeric values into nominal. I can do this with the discretizing operator. But all my numeric attributes are differently distributed. Do you know any literature which says the best method in each case? I also read that I can do it with k-means clustering, but it doesn’t work with missing values.

2) I often read that I have to split my dataset into a training and a testing part. I can do this with the splitting operator. I don’t understand why I have to split only into two parts and not into three. Because what is about my non-classified observations? Are they included in each of them (training and testing)?  In my opinion I have to split in a training, a testing and a real prediction part.  

Thank you very much.

Regards


Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "MartinLiebig"
    Altair Employee
    Accepted Answer
    Hi @keb1811,
    for 1) RapidMiner's DecisionTree can cope with numerical values. You do not need to convert them. It sometimes may help to do so, but then there is no "best thing to do".

    for 2) you are basically right: In literature your  real prediction (you may rather call it application data set) is often neglected. You basically separate this away first (using the Filter Examples operator) and the do you splitting.

    Cheers,
    Martin
    User: "MartinLiebig"
    Altair Employee
    Accepted Answer
    Hi,
    you got the idea right, great! you can basically replace the Decision Tree operator with you Cross Validation. You will receive the model on the mod port on top.
    Best,
    Martin