How to reduce RMSE/SE when it's too high

User111113
User111113 New Altair Community Member
edited November 5 in Community Q&A
Hi All,

My data has 2 integers and all other polynomial attributes
id
state
year
month
leads (int)
responses (int)
typeOfMail
status

I used split model where I split my data between 20 and 2 months for 22 months and I got 12.41 RSME and  squared_error: 154.176 +/- 335.663.

I don't know how to reduce this and also not sure if I can apply any other models because I believe my options are limited




I already tried using other combinations in applying these models like adding K-NN and decision tree but that didn't help

Also, tried to split data between 18 and 4 months for total of 22 which didn't help either







what should I do?

Best Answer

  • varunm1
    varunm1 New Altair Community Member
    Answer ✓
    Hello @User111113

    The optimize hyperparameter will test different settings of the algorithm. It will try all the combinations of hyperparameters and select best based on low error rate. 

    It takes time as the algorithms is trying multiple combinations. For example hyperparamters in a decision tree are the ones you can find in parameters, you can see how many combinations will be trained. Try reading help of optimize parameter grid operator and also hyperparamter tuning on google.

    I just gave a sample, you need to change based on your understanding and requirement. 

Answers

  • varunm1
    varunm1 New Altair Community Member
    edited January 2020
    Hello @User111113

    You can add feature selection and hyperparameter optimization into your process. Feature selection can be done using "Automatic feature engineering" operator, and hyperparameter selection is done using "Optimize Hyperprarameter (Grid)". Both of these should be on the training side of the validation operator.

     I am not sure how much data you have, but if it is not a significant amount, then your models might overfit as you are using two complex and data-hungry algorithms.

    You can also generate new features using the same automatic feature engineering.
  • User111113
    User111113 New Altair Community Member
    Hi @varunm1

    Thank you for your response. Where do I add "Automatic feature engineering" operator in my process as you can see the screenshots from above.

    Should I use both the operators together and by saying " should be on the training side of the validation operator" you mean at the same place where I have my models so one more link will go from my "multiply" operator to one of these and then they go to the model?

    I would like to use automatic feature generation but not sure how and where should I place that..... I am using last 2 years of data which I still think is not enough for this type of prediction.

    kindly help me with the next steps thank you.




  • varunm1
    varunm1 New Altair Community Member
    edited January 2020
    Here is a sample process I built quickly. You can see inside the validation operator and also observe the parameters I selected for each operator.

    You can import this by downloading into your PC and selecting File --> import process in rapidminer.

  • User111113
    User111113 New Altair Community Member
    Thank you I am trying to run it this way right now.
  • User111113
    User111113 New Altair Community Member
    @varunm1
    Thank you for your responses.

    I got this error. I am still a little confused as to which parameters I should choose inside "Optimize Parameter (grid)" operator.

    I did chose what was there in the sample but I got the below error so now I am going to run it again with something else.

    For some reason each run takes about 25 mins to complete so not sure how to reduce that but getting lower error rate is important


  • varunm1
    varunm1 New Altair Community Member
    Answer ✓
    Hello @User111113

    The optimize hyperparameter will test different settings of the algorithm. It will try all the combinations of hyperparameters and select best based on low error rate. 

    It takes time as the algorithms is trying multiple combinations. For example hyperparamters in a decision tree are the ones you can find in parameters, you can see how many combinations will be trained. Try reading help of optimize parameter grid operator and also hyperparamter tuning on google.

    I just gave a sample, you need to change based on your understanding and requirement.