Why does Rapid Miner Studio reduce the number of rows in the model results
jsdrew
New Altair Community Member
I am using Rapid Miner Studio for the first time. I've loaded a dataset and done an automodel. But the exported results only have about 11,000 rows while the dataset has 29,000 rows. How do I get it to give me predictions for all rows?
Tagged:
0
Best Answer
-
Hi @jsdrew,
a basic principle of predictive modeling is that you shouldn't use the model that was built on a record to predict the outcome of that same record. This would favor overfitted models.
Therefore, AutoModel does a "split validation". It takes about 2/3 of the data for building the model and the rest for evaluating the model by comparing the known label to the predicted one.
If you take the process created by AutoModel and replace the split validation with a cross validation, the process will take longer (which is why AutoModel doesn't use it), as it is building 10 or 11 models. However, in this case you will get a prediction for every row in your data.
The Academy has videos for these topics if you need more information.
Regards,
Balázs5
Answers
-
Hi @jsdrew,
a basic principle of predictive modeling is that you shouldn't use the model that was built on a record to predict the outcome of that same record. This would favor overfitted models.
Therefore, AutoModel does a "split validation". It takes about 2/3 of the data for building the model and the rest for evaluating the model by comparing the known label to the predicted one.
If you take the process created by AutoModel and replace the split validation with a cross validation, the process will take longer (which is why AutoModel doesn't use it), as it is building 10 or 11 models. However, in this case you will get a prediction for every row in your data.
The Academy has videos for these topics if you need more information.
Regards,
Balázs5