Random forest visualization

dgrossu
dgrossu New Altair Community Member
edited November 5 in Community Q&A
Hello,

Would you help me decide on which model should I use?
Thank you
Tagged:

Best Answer

  • rfuentealba
    rfuentealba New Altair Community Member
    Answer ✓
    Hello @dgrossu,

    Root Mean Squared Error is a metric that tells us how far apart our predicted values are from our observed values, on average. The lowest RMSE is the one provided by Generalized Linear Model, so I would go for that one.

    The other value I normally take a peek at is the squared correlation (R2), which is the square of a Pearson Correlation, IIRC. I'll leave a visit to Wikipedia to read what does this mean, but for now, if it's closer to 1, then the correlation is stronger, and again the GLM is the algorithm with the highest R2.

    So I'll take SVM.

    No, just kidding. for that data, it's the GLM.

    Make sure you test your algorithm every now and then, because these values are only good for your training/testing data, but once you put these to production, you may find that your algorithm doesn't tolerate the new data. But it's a matter of getting these values again once every X amount of time and make the required adjustments.

    All the best,

    Rod.



Answers

  • rfuentealba
    rfuentealba New Altair Community Member
    Answer ✓
    Hello @dgrossu,

    Root Mean Squared Error is a metric that tells us how far apart our predicted values are from our observed values, on average. The lowest RMSE is the one provided by Generalized Linear Model, so I would go for that one.

    The other value I normally take a peek at is the squared correlation (R2), which is the square of a Pearson Correlation, IIRC. I'll leave a visit to Wikipedia to read what does this mean, but for now, if it's closer to 1, then the correlation is stronger, and again the GLM is the algorithm with the highest R2.

    So I'll take SVM.

    No, just kidding. for that data, it's the GLM.

    Make sure you test your algorithm every now and then, because these values are only good for your training/testing data, but once you put these to production, you may find that your algorithm doesn't tolerate the new data. But it's a matter of getting these values again once every X amount of time and make the required adjustments.

    All the best,

    Rod.