How to " tell" RM what to use for training/ testing
Dear Miners,
Please help me to get the hang of this.
I have a desicion tree model with a set of data that has about 60 000 rowas of data with the lable attribute and 15 000 without. I assumed/ wanted the data with the lable attribute to be the training data and the rows with missing lable attribute should be the test values ( wich I want to export at the end for external validation)
Now my export only has 5900 rows of data and it seems not to use the " empty" rows for test, but replace missing values with mean value per default option and split the whole data into test and training set.
I am wondering how to fix this issue, without having to disassembke the entire design ( which would be painfull, since I already incorporated the modle outcome in my thesis draft)
Could you please help me?
Kind regards
A data science newbie
Please help me to get the hang of this.
I have a desicion tree model with a set of data that has about 60 000 rowas of data with the lable attribute and 15 000 without. I assumed/ wanted the data with the lable attribute to be the training data and the rows with missing lable attribute should be the test values ( wich I want to export at the end for external validation)
Now my export only has 5900 rows of data and it seems not to use the " empty" rows for test, but replace missing values with mean value per default option and split the whole data into test and training set.
I am wondering how to fix this issue, without having to disassembke the entire design ( which would be painfull, since I already incorporated the modle outcome in my thesis draft)
Could you please help me?
Kind regards
A data science newbie