"Improve root_mean_squared_error in results with testing dataset"
pepe_jaen
New Altair Community Member
Hello all.
This is my first topic in this Forum, i will be very pleased to receive any help in my Rapidminer process .
I´m trying to create a model to estimate a float label using 9 attributes in a data set of 430 example. Those 9 attributes were selected previously using wrapper valitation and also correlation matrix.
I was training a model with SVM with the first 380 examples. After use "loop parameters", I obtained SVM [kernel = epachenikov; kernel cache=200; C=10.0; convergence epsilon =0.001 ...].
With this parameters from SVM I was able to reach a root_mean_squared_error of 0.0001 (using "apply model" + "performance" operators). The label and the predicted label with this model is practically the same (100% performance).
If I test the rest of the dataset (49 examples), the root_mean_squared_error is very high (1.411), and the prediction is not close to the label value.
I was also using windowing with horizon=1 and sliding windowing validation with the same result, 100%performance in training dataset and very low perfromance in the testing dataset.
Is there any posibility of train the model looking for improve the performance of the predicted label with the testing dataset (last 49 examples)?.
Another idea in order to improve. ???
Thanks in advance.
This is my first topic in this Forum, i will be very pleased to receive any help in my Rapidminer process .
I´m trying to create a model to estimate a float label using 9 attributes in a data set of 430 example. Those 9 attributes were selected previously using wrapper valitation and also correlation matrix.
I was training a model with SVM with the first 380 examples. After use "loop parameters", I obtained SVM [kernel = epachenikov; kernel cache=200; C=10.0; convergence epsilon =0.001 ...].
With this parameters from SVM I was able to reach a root_mean_squared_error of 0.0001 (using "apply model" + "performance" operators). The label and the predicted label with this model is practically the same (100% performance).
If I test the rest of the dataset (49 examples), the root_mean_squared_error is very high (1.411), and the prediction is not close to the label value.
I was also using windowing with horizon=1 and sliding windowing validation with the same result, 100%performance in training dataset and very low perfromance in the testing dataset.
Is there any posibility of train the model looking for improve the performance of the predicted label with the testing dataset (last 49 examples)?.
Another idea in order to improve. ???
Thanks in advance.
0