Question data

andre5007
andre5007 New Altair Community Member
edited November 2024 in Community Q&A
I have these two csv, in which both csv have several feats.
Feat1- model, Feat2-power measure, Feat3- is something that this object has or does not have, being 1 has and 0 does not, Feat4 is a feature that I don’t know what it is, Feat5- device installation date, Feat6 / 7- It is the latitude and longitude and feat 8 is the number
maintenance interventions.
In the CSV Training I have values ​​for feat 8 and in the Test no.
My goal is to estimate the Feat 8 for the Test set.
How can I do this? 
Thanks
Tagged:

Best Answers

  • YYH
    YYH
    Altair Employee
    Answer ✓
    Hi @andre5007, it looks your prediction target is numerical (integers). Are you sure you want to build decision tree or any predictive model for classification, rather than regressions? I would parse the label into numbers and try the regression decision trees or GLM/GBT for regression.
  • andre5007
    andre5007 New Altair Community Member
    Answer ✓
    Hi @yyhuang
    Why do you think regression decision trees or GLM/GBT for regression is better?
    Thanks
    André
  • YYH
    YYH
    Altair Employee
    Answer ✓
    Hi @andre5007, my point was regression is better than classification here as the model for your data. Because the label is integer. For the difference between regression and classification, https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/

Answers

  • Telcontar120
    Telcontar120 New Altair Community Member
    You should review the RapidMiner tutorials for Cross Validation and for Apply Model. Basically you are going to define Feat 8 as the label and build your model on that, and then you are going to save that model and apply it to the 2nd dataset.
  • andre5007
    andre5007 New Altair Community Member
    Ok, I will try to see and do, if you have any questions then can you help me?
    
    
  • andre5007
    andre5007 New Altair Community Member
    Can someone tell me if I'm going in a good way please?


  • YYH
    YYH
    Altair Employee
    Hi @andre5007,

    The workflow looks fine if you have your own test set. However, as Brian mentioned above, cross validation is always a smart option on your training set.

    https://academy.rapidminer.com/learn/article/cross-validation
    https://academy.rapidminer.com/learn/video/validating-a-model
    https://rapidminer.com/blog/validate-models-cross-validation/

    HTH!

    YY

  • andre5007
    andre5007 New Altair Community Member
    Now I noticed that I was wrong on the print I sent, because it was not the one I wanted to have selected.

    I put a filter at the beginning because it had a value that was missing and because of that it gave an error.

    Then in the cross validation, I placed the decision tree inside the process at the training site and in the test the apply model and performance.

    Then I linked the cross validation to another apply model and in that apply model I also put the test data set where I have to define feat 8.

    Do you think you should change anything in the operators parameters? Because I didn't change anything just when it was necessary to be able to run the process.

    What do you think I can improve? Or if I am now on the right path? 

    Thanks
    Best regards
    André


  • Telcontar120
    Telcontar120 New Altair Community Member
    Looks like a good setup for basic model construction and validation with an additional out-of-sample validation. 
  • andre5007
    andre5007 New Altair Community Member
    Can you explain how I can do to improve the value that I mark in red?
    Thanks
    
  • YYH
    YYH
    Altair Employee
    Answer ✓
    Hi @andre5007, it looks your prediction target is numerical (integers). Are you sure you want to build decision tree or any predictive model for classification, rather than regressions? I would parse the label into numbers and try the regression decision trees or GLM/GBT for regression.
  • andre5007
    andre5007 New Altair Community Member
    Answer ✓
    Hi @yyhuang
    Why do you think regression decision trees or GLM/GBT for regression is better?
    Thanks
    André
  • YYH
    YYH
    Altair Employee
    Answer ✓
    Hi @andre5007, my point was regression is better than classification here as the model for your data. Because the label is integer. For the difference between regression and classification, https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/
  • rugmanasokan
    rugmanasokan New Altair Community Member
    As a model for your data, regression is better than classification. Due to the integer nature of the label. In order to understand the difference between regression and classification - https://nimblebox.ai/blog/regression-machine-learning