🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

How do I partition my data to create testing and training sets?

User: "georgebezerra83"
New Altair Community Member
Updated by Jocelyn

Hi RapidMiner,

 

I'm unsure what operators I should be using but was thinking it would be split data, split, or maybe wrapper-x-validation. 

 

Thanks!

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "georgebezerra83"
    New Altair Community Member
    OP
    Accepted Answer

    Thank you for the response Thomas. I tried using the cross validation operator, but my data doesn't have the label attribute since they are integers. Would the split operator be more useful in this case to get the train and testing sets? I need to split the data into random equal sized train and test sets.

     

    Best,

    George Bezerra

    User: "Telcontar120"
    New Altair Community Member
    Accepted Answer

    You are going to need to create a label at some point if you are intending on doing modeling in RapidMiner, which seems likely since you say that you want a train and test set.  You use the "Set Role" operator for that.  It doesn't matter if the label is an integer or not since you can use many algorithms to predict numerical labels.

    Once the label is set then you can use Cross Validation.  You could also use Split Validation, although as @Thomas_Ott already said, Cross Validation is superior for many reasons.  The Split operator literally just splits your dataset into multiple chunks but it does not directly have anything to do with training or testing.