Question data
I have these two csv, in which both csv have several feats. Feat1- model, Feat2-power measure, Feat3- is something that this object has or does not have, being 1 has and 0 does not, Feat4 is a feature that I don’t know what it is, Feat5- device installation date, Feat6 / 7- It is the latitude and longitude and feat 8 is the number maintenance interventions. In the CSV Training I have values for feat 8 and in the Test no. My goal is to estimate the Feat 8 for the Test set. How can I do this? Thanks
Find more posts tagged with
Sort by:
1 - 11 of
111

Telcontar120
New Altair Community Member
You should review the RapidMiner tutorials for Cross Validation and for Apply Model. Basically you are going to define Feat 8 as the label and build your model on that, and then you are going to save that model and apply it to the 2nd dataset.
Hi @andre5007,
The workflow looks fine if you have your own test set. However, as Brian mentioned above, cross validation is always a smart option on your training set.
https://academy.rapidminer.com/learn/article/cross-validation
https://academy.rapidminer.com/learn/video/validating-a-model
https://rapidminer.com/blog/validate-models-cross-validation/
HTH!
YY
The workflow looks fine if you have your own test set. However, as Brian mentioned above, cross validation is always a smart option on your training set.
https://academy.rapidminer.com/learn/article/cross-validation
https://academy.rapidminer.com/learn/video/validating-a-model
https://rapidminer.com/blog/validate-models-cross-validation/
HTH!
YY
I put a filter at the beginning because it had a value that was missing and because of that it gave an error.
Then in the cross validation, I placed the decision tree inside the process at the training site and in the test the apply model and performance.
Then I linked the cross validation to another apply model and in that apply model I also put the test data set where I have to define feat 8.
Do you think you should change anything in the operators parameters? Because I didn't change anything just when it was necessary to be able to run the process.
What do you think I can improve? Or if I am now on the right path?
ThanksBest regards
André






Hi @andre5007, it looks your prediction target is numerical (integers). Are you sure you want to build decision tree or any predictive model for classification, rather than regressions? I would parse the label into numbers and try the regression decision trees or GLM/GBT for regression.
Hi @andre5007, my point was regression is better than classification here as the model for your data. Because the label is integer. For the difference between regression and classification, https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/
As a model for your data, regression is better than classification. Due to the integer nature of the label. In order to understand the difference between regression and classification - https://nimblebox.ai/blog/regression-machine-learning
Sort by:
1 - 3 of
31
Hi @andre5007, it looks your prediction target is numerical (integers). Are you sure you want to build decision tree or any predictive model for classification, rather than regressions? I would parse the label into numbers and try the regression decision trees or GLM/GBT for regression.
Hi @yyhuang
Why do you think regression decision trees or GLM/GBT for regression is better?
Thanks
André
Why do you think regression decision trees or GLM/GBT for regression is better?
Thanks
André
Hi @andre5007, my point was regression is better than classification here as the model for your data. Because the label is integer. For the difference between regression and classification, https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/