Cross validation

Papad
New Altair Community Member
Hello,
Is there anybody who can solve me this problem?
in the first picture I have this:

Here I measure the performance on the same data, and the accuracy is 87,44%.
When I have the same procedure but inside cross validation like this:

(inside cross validation)

The accuracy I have here is 82.11%.
It is about the same procedure but inside a cross validation operator.
Why there is that difference on two cases?
What I have understand is that because in the second case my model is being trained and then it measures the performance in the testing section so it is more accurate.
So more training doesn't always means greater accuracy?
I hope my question is clear.
Thanks in advance.
Is there anybody who can solve me this problem?
in the first picture I have this:

Here I measure the performance on the same data, and the accuracy is 87,44%.
When I have the same procedure but inside cross validation like this:

(inside cross validation)

The accuracy I have here is 82.11%.
It is about the same procedure but inside a cross validation operator.
Why there is that difference on two cases?
What I have understand is that because in the second case my model is being trained and then it measures the performance in the testing section so it is more accurate.
So more training doesn't always means greater accuracy?
I hope my question is clear.
Thanks in advance.
Tagged:
0
Best Answers
-
The first picture is measuring, how well you describe your training data. The second one is measuring how good you predict unknown (out-of-sample) data. You almost everytime want to do the second.
2 -
Hello @Papad
As Martin informed, in the first case you are training and testing the model on the same data, which is not useful to validate your model. In the second case, you are cross-validating a model, which means you are training on one data and testing on another data which the model never saw, this is the best method to validate your model.
To understand cross-validation, here is an excellent post from Scott.
https://community.rapidminer.com/discussion/54621/cross-validation-and-its-outputs-in-rm-studio
Thanks
1
Answers
-
The first picture is measuring, how well you describe your training data. The second one is measuring how good you predict unknown (out-of-sample) data. You almost everytime want to do the second.
2 -
Hello @Papad
As Martin informed, in the first case you are training and testing the model on the same data, which is not useful to validate your model. In the second case, you are cross-validating a model, which means you are training on one data and testing on another data which the model never saw, this is the best method to validate your model.
To understand cross-validation, here is an excellent post from Scott.
https://community.rapidminer.com/discussion/54621/cross-validation-and-its-outputs-in-rm-studio
Thanks
1 -
What I can't fully understand is that is the cross validation case, we have one set of data, we know the result, so how it is used for unkown data?0