Question data
andre5007
New Altair Community Member
I have these two csv, in which both csv have several feats. Feat1- model, Feat2-power measure, Feat3- is something that this object has or does not have, being 1 has and 0 does not, Feat4 is a feature that I don’t know what it is, Feat5- device installation date, Feat6 / 7- It is the latitude and longitude and feat 8 is the number maintenance interventions. In the CSV Training I have values for feat 8 and in the Test no. My goal is to estimate the Feat 8 for the Test set. How can I do this? Thanks
Tagged:
0
Best Answers
-
Hi @andre5007, it looks your prediction target is numerical (integers). Are you sure you want to build decision tree or any predictive model for classification, rather than regressions? I would parse the label into numbers and try the regression decision trees or GLM/GBT for regression.1
-
Hi @andre5007, my point was regression is better than classification here as the model for your data. Because the label is integer. For the difference between regression and classification, https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/-1
Answers
-
You should review the RapidMiner tutorials for Cross Validation and for Apply Model. Basically you are going to define Feat 8 as the label and build your model on that, and then you are going to save that model and apply it to the 2nd dataset.1
-
Ok, I will try to see and do, if you have any questions then can you help me?
0 -
Can someone tell me if I'm going in a good way please?0
-
Hi @andre5007,
The workflow looks fine if you have your own test set. However, as Brian mentioned above, cross validation is always a smart option on your training set.
https://academy.rapidminer.com/learn/article/cross-validation
https://academy.rapidminer.com/learn/video/validating-a-model
https://rapidminer.com/blog/validate-models-cross-validation/
HTH!
YY
1 -
Now I noticed that I was wrong on the print I sent, because it was not the one I wanted to have selected.
I put a filter at the beginning because it had a value that was missing and because of that it gave an error.
Then in the cross validation, I placed the decision tree inside the process at the training site and in the test the apply model and performance.
Then I linked the cross validation to another apply model and in that apply model I also put the test data set where I have to define feat 8.Do you think you should change anything in the operators parameters? Because I didn't change anything just when it was necessary to be able to run the process.
What do you think I can improve? Or if I am now on the right path?
Thanks
Best regardsAndré0 -
Looks like a good setup for basic model construction and validation with an additional out-of-sample validation.1
-
Can you explain how I can do to improve the value that I mark in red? Thanks
0 -
Hi @andre5007, it looks your prediction target is numerical (integers). Are you sure you want to build decision tree or any predictive model for classification, rather than regressions? I would parse the label into numbers and try the regression decision trees or GLM/GBT for regression.1
-
Hi @andre5007, my point was regression is better than classification here as the model for your data. Because the label is integer. For the difference between regression and classification, https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/-1
-
As a model for your data, regression is better than classification. Due to the integer nature of the label. In order to understand the difference between regression and classification - https://nimblebox.ai/blog/regression-machine-learning
0