Auto Model Issue

Judy
Judy New Altair Community Member
edited November 2024 in Community Q&A
Hi 

I have try to use the auto model tool for forecasting. The excel data i imported in have 1000 rows. But when the prediction results that is out for the linear model and deep learning, there is only half of it (~500 rows). Why is this so?

Please advice.

Thanks!

Best Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Answer ✓
    Hi @Judy

    It's not an issue. It's an evolution of RapidMiner 9.1.
    Now, by default in AutoModel, RapidMiner performs a data split with 60% for training the model and 40% for testing the model.(See the Split Data operator in the generated process).
    The predictions are only performed on the test set. So for a dataset of around 1000 examples, there are around 0.4 * 1000 = 400 predictions.

    I hope it helps,

    Regards,

    Lionel
  • Judy
    Judy New Altair Community Member
    Answer ✓
    Hi @lionelderkrikor thank you for explaining!  :)

Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Answer ✓
    Hi @Judy

    It's not an issue. It's an evolution of RapidMiner 9.1.
    Now, by default in AutoModel, RapidMiner performs a data split with 60% for training the model and 40% for testing the model.(See the Split Data operator in the generated process).
    The predictions are only performed on the test set. So for a dataset of around 1000 examples, there are around 0.4 * 1000 = 400 predictions.

    I hope it helps,

    Regards,

    Lionel
  • Judy
    Judy New Altair Community Member
    Answer ✓
    Hi @lionelderkrikor thank you for explaining!  :)
  • daniel_beck
    daniel_beck New Altair Community Member
    Hi,

    for a predictive Maintenance Case, i predict a Health Index for each Job of a machine. The Health Index is a simple Value how many percentage of successfull jobs the machine produced.

    My plan is to sum up all predicted Health Index Values from Rapidminer and compare them with the real outcome of the machine. So i can see over a timeline, whether the predicted values show more a trend of a higher or lower Health Index. 

    To do so, i need 100% of the "explained predictions". So far the Automodel restrict it to "40% hold-out set" as written also in the forum. I tried to change this and play with the "Split Data" Operator and the "random seed", but it was not successfull to increase the number of sample sets to export more samples then the 40%.

    Any idea from your site to get more samples out for the Export?

    Thanks a lot,
    Daniel
  • varunm1
    varunm1 New Altair Community Member
    Hi @daniel_beck

    You can use cross-validation operator instead of split data so that you get predictions for all the samples in your dataset. 
  • IngoRM
    IngoRM New Altair Community Member
    To be honest, it actually sounds like you want to apply the model on the data you trained it on which is obviously a big no-no.  Opening the process and changing it so that it does that is one option.  An alternative would be the cross-validation approach @varunm1 has mentioned, but to be honest, than you end up with predictions from, let's say, ten different models.  Not sure how much what tell you that either...  Why can't you just sum up the health indices from the hold-out set?  That is the only option I can see which has at least a directly interpretable meaning.
    Just my 2c,
    Ingo