What type of validation does Auto Model use for small data sets?

New Altair Community Member

May 27, 2019

Updated Nov 5, 2024 by Jocelyn

Hi Everyone,

I am in the middle of using RapidMiner Auto Model for classification in my thesis but can't seem to find information regarding what type of validation is used for Auto Model on a data set of 100 items. What type of validation does Auto Model use in my situation and can someone link me to documentation that I can reference for writeup?

Also, what is the default split between testing and training data for Auto Model?

Thanks so much in advance for your help!

Find more posts tagged with

AI Studio

Sort by:

1 - 6 of 61

varunm1

New Altair Community Member

May 27, 2019

Hello @Chin

Currently, Automodel splits the dataset into 60:40 ratio (train:test). This is same for any dataset and doesn't depend on the size as per my understanding. Once the automodel executes, you can access it and see how the process is working.

Image: https://us.v-cdn.net/6030995/uploads/editor/dd/y9dko8bh7sw9.png

Hope this helps

IngoRM

New Altair Community Member

May 28, 2019

Updated May 28, 2019 by IngoRM

Just to add to the great explanation of @varunm1

In addition to the 60% : 40% split we do, we then perform a multiple hold-out set validation on the 40% test data, i.e. we split the 40% again into 7 parts, evaluate the model on each part, get rid of the two extremes / outliers, and build the average of the rest. This way we keep many of the benefits of a cross-validation without it's biggest drawback: 5x-10x runtime increases. In my experiments, I did not find significant differences between this approach and cross validation and if I ever find the time, I will write a nice blog post about it :-)

Hope this helps,
Ingo

varunm1

New Altair Community Member

May 28, 2019

The only drawback of this method is that it cannot provide predictions for all the samples in our dataset incase we want to analyze (Example: healthcare data for the individual patient). But this is a specific requirement, so we need to manually add cross-validation to auto model process in this case. There will always be a trade-off

Chin

New Altair Community Member

May 30, 2019

Thanks so much for your help, @IngoRM and @varunm1 : ) : ) : ) I really appreciate it.

Yin

New Altair Community Member

Sep 4, 2022

@varunm1 is there a link to show how to implement CV in auto model? I think would be helpful to many given the number of questions about this. Thanks.

Yin

New Altair Community Member

Sep 4, 2022

PS: I saw the main CV implementation video (link below) but that is different than manipulating the one in automodel. Any guide on doing this for auto model specifically? how to link things? E.g. Split data has only exa as input and par as output, but CV has mod, exa, yes, and per as out put.

https://academy.rapidminer.com/learn/video/validating-a-model

What type of validation does Auto Model use for small data sets?

Find more posts tagged with

Quick Links