How is data split into training / test sets in rapidminer GO?
Find more posts tagged with
Sort by:
1 - 3 of
31
- White paper: https://academy.rapidminer.com/learn/article/how-to-correctly-validate-machine-learning-models
- Videos: https://academy.rapidminer.com/learn/video/validation-demo, https://academy.rapidminer.com/learn/video/validating-a-model
Thank you,
Is there anymore information on this? I am new to data science and self teaching, so I'm a bit confused by the terminology.
I am asking because I am noticing a difference in the predictive power of my models based on which order the data set they were built on was originally uploaded.
To clarify, the data is 60 / 40 split but what goes into the 60 and 40 respectively is done randomly but ensuring the same distribution is kept?
Or is it the first 60% of rows and last 40% of rows for the split?
Is there anymore information on this? I am new to data science and self teaching, so I'm a bit confused by the terminology.
I am asking because I am noticing a difference in the predictive power of my models based on which order the data set they were built on was originally uploaded.
To clarify, the data is 60 / 40 split but what goes into the 60 and 40 respectively is done randomly but ensuring the same distribution is kept?
Or is it the first 60% of rows and last 40% of rows for the split?
You can find free learning materials on https://academy.rapidminer.com/. I would recommend checking it out.
For example, on validation I found the following materials, that could help you:
When splitting data, Go always shuffles the dataset. In case of a nominal (categorical) label, Go ensures the same distribution.
Regards,
Andras
For example, on validation I found the following materials, that could help you:
When splitting data, Go always shuffles the dataset. In case of a nominal (categorical) label, Go ensures the same distribution.
Regards,
Andras
We use a 60/40 split for every model. If the target column is nominal, Go builds random subsets and ensures that value distribution is the same as in the original dataset. Otherwise, Go builds subsets randomly.
Regards,
Andras