Doubt

andre5007
andre5007 New Altair Community Member
edited November 5 in Community Q&A
Good Morning.
Can someone help me with my question?
I have the following process which is to predict the value of feat 8, in which I have other feat and these feat affect the value of feat 8.
I have this process, someone can tell me if it is possible to place an operator here that makes the outliers, does not contain for the estimate.
And you can also tell me if in the process I did if all the feat's are having importance in predicting the value of the feat8 or the only feat that is is the feat9 that was the feat created with the number of days that the object was installed.
Thanks
André

Best Answers

  • andre5007
    andre5007 New Altair Community Member
    Answer ✓
    Hi @yyhuang

    I was thinking in another way using only excel, I can calculate the feat9 that are the days that the device is already active and I thought to calculate a feat 10 that would be the feat 9/ feat 8, so I would be calculated the number of interventions per day, then I calculated the average for each model and now I can go to the csv Test and calculate the feat 9 and take the average of feat 10 for each model of the other csv and calculate the feat 8 in the csv Test.
    I tried to do the same in rapidminer but I don't think it is possible, because rapidminer ends up telling me that then the feat 10 is missing for retrive test and even if I put the operator 'Generate Attributes' it will not be possible to calculate in this case because I don't have the feat 8.
    Is there any way to use the feat 10 calculated in Retrive train and pass to retrive test to be one more feat to have importance in the estimation of feat 8.



  • YYH
    YYH
    Altair Employee
    Answer ✓
    The target encoding operator is used to take care the nominal attributes (feature 1 and feature 4) inside cross validation. It is similar to nominal to numerical conversion. I will get the documentation link if there is any. But you can always refer to help views for the help docs. You can delete it if you don't like to convert. nominal attributes.. The reason that I apply the encoding on these nominal attributes is, too many different values in the nominal attributes could result in overfitting for trees.


    I can run the process with the new feature 10 added. Not sure about the error in your screenshot.. But I will not use the feature 10 = feature 9/ feature 8 as predictor in my model. Because this new feature is derived from my label (feature 8)and will be data leakage


    For geolocation outlier, you can run the quick Tukey test before you apply a filter that remove the outlier 

Answers

  • YYH
    YYH
    Altair Employee
    To exclude the outlier, firstly you need to detect outliers. You can try anomaly detection models, e.g. Tukey Test, One-class SVM etc. for that.

    To understand the predictions, you can use
  • andre5007
    andre5007 New Altair Community Member
    Hi @yyhuang
    Do you in your rapidminer process use more than one feat to determine feat 8 values not just the feat 9?
    Do you think you could explain to me the meaning of having placed this operator?

    Can you also explain why in the cross validation results not all the id's of the csv file appear?

    Sorry for so many questions but I'm really noob at this.
    Ty
    André


  • andre5007
    andre5007 New Altair Community Member
    Answer ✓
    Hi @yyhuang

    I was thinking in another way using only excel, I can calculate the feat9 that are the days that the device is already active and I thought to calculate a feat 10 that would be the feat 9/ feat 8, so I would be calculated the number of interventions per day, then I calculated the average for each model and now I can go to the csv Test and calculate the feat 9 and take the average of feat 10 for each model of the other csv and calculate the feat 8 in the csv Test.
    I tried to do the same in rapidminer but I don't think it is possible, because rapidminer ends up telling me that then the feat 10 is missing for retrive test and even if I put the operator 'Generate Attributes' it will not be possible to calculate in this case because I don't have the feat 8.
    Is there any way to use the feat 10 calculated in Retrive train and pass to retrive test to be one more feat to have importance in the estimation of feat 8.



  • andre5007
    andre5007 New Altair Community Member
    Hi @yyhuang
    Sorry for asking so many questions but I managed to detect several outliers, but now I would like to remove them, as I would like to know if it would impact the forecast but whenever I put for example the operator 'Detect Outlier (Distances)', but when I run the process it takes ages to run, I don't know what I am doing wrong, could you help me please.
    Thank you and sorry for the trouble.
    André
  • YYH
    YYH
    Altair Employee
    Answer ✓
    The target encoding operator is used to take care the nominal attributes (feature 1 and feature 4) inside cross validation. It is similar to nominal to numerical conversion. I will get the documentation link if there is any. But you can always refer to help views for the help docs. You can delete it if you don't like to convert. nominal attributes.. The reason that I apply the encoding on these nominal attributes is, too many different values in the nominal attributes could result in overfitting for trees.


    I can run the process with the new feature 10 added. Not sure about the error in your screenshot.. But I will not use the feature 10 = feature 9/ feature 8 as predictor in my model. Because this new feature is derived from my label (feature 8)and will be data leakage


    For geolocation outlier, you can run the quick Tukey test before you apply a filter that remove the outlier 

  • andre5007
    andre5007 New Altair Community Member
    edited May 2021
    So i can remove my outliers like this? 


    Ty @yyhuang
  • andre5007
    andre5007 New Altair Community Member
    @yyhuang
    Sorry to bother you but if you can explain how and with which operators I can do this I would appreciate it.
    I wonder if you could send some prints exemplifying how you remove the outliers, or if you could provide the process so I can keep it and try to understand how it was done.
    Thanks
    André