"Different performance results online-training vs. model loading"
balamir
New Altair Community Member
I'm new to RapidMiner and I'm experimenting with setting my model. I first tried with a randomforrest learner. Here is the tree view to show my setup (similar to a tutorial setup).
I got around 75% accuracy. Then I created another experiment which outputed its model to a file.
and I loaded that model and run the experiment again.
The last experiment gave me ~97% accuracy (in one run it was 100%)
Did I misunderstand the flow? I assume first two experiments generate the same model (or similar) when I give the same experimental input set. So why when I load the model it gives very high accuracy? I tried it a few times just to make sure it was not a lucky selection of the features.
Thanks for any explanation.
I got around 75% accuracy. Then I created another experiment which outputed its model to a file.
and I loaded that model and run the experiment again.
The last experiment gave me ~97% accuracy (in one run it was 100%)
Did I misunderstand the flow? I assume first two experiments generate the same model (or similar) when I give the same experimental input set. So why when I load the model it gives very high accuracy? I tried it a few times just to make sure it was not a lucky selection of the features.
Thanks for any explanation.
Tagged:
0
Answers
-
Hello balamir and welcome to RapidMiner
First of all: There is no need to make screenshots. It is sufficient to copy the text in the xml-tab in RapidMinerGUI and post it here. This has also the advantage that we see all parameters you have set .
@your question:
You misunderstood the concept of Crossvalidation.
In the first setup only 9/10 of the dataset is used to create the model, the rest (1/10) is used to calculate the accuracy.
In the next setup you use 10/10 of your data to create the model. Then you apply the model 10 times to 1/10 of the dataset.
The key difference is, that in your first setup the data you use for validation has NOT been used to create the model in opposite to your second/third setup.
Since the model in the third setup has seen all the data, nothing can surprise it, so the accuracy is much more higher. This is what we call Overfitting.
I strongly suggest that you reread the description of Crossvalidation in RapidMiner Tutorial and/or take a look into a good book
.
Here is another thread regarding crossvalidation (only the first and second post are relevant): http://rapid-i.com/rapidforum/index.php/topic,62.0.html
greetings,
Steffen
0 -
Thanks steffen for the warm welcome and quick reply. I'm aware of cross validation but I didn't make the connection. Your explanation clarified the difference between both setups..
0