Auto Model data set split using choice (e.g. linear sampling)

tomMEM
tomMEM New Altair Community Member
edited November 5 in Community Q&A
Hello, I wonder if it is possible to indicate e.g. linear sampling split for the training and test data set generation within the module "Auto Model".
Somehow the predicted values are far to good, so that the it would be better for my data set to use linear sampling to split the data set.
Of course it would be possible to do so after Auto Model using the stored process, but for convince it might better to chose first hand.
Thank you.

Best Answer

  • Caperez
    Caperez Altair Community Member
    Answer ✓
    Hi @behnish
    The Auto model perform a lot of operations automatically using a standard good practices for ML. Each model created using these good practices has a lot of parameters and its unmanageable from a panel.
    the best solution is to run a Auto model and then go into the model and adap it

    Regards.

Answers

  • Caperez
    Caperez Altair Community Member
    Answer ✓
    Hi @behnish
    The Auto model perform a lot of operations automatically using a standard good practices for ML. Each model created using these good practices has a lot of parameters and its unmanageable from a panel.
    the best solution is to run a Auto model and then go into the model and adap it

    Regards.
  • tomMEM
    tomMEM New Altair Community Member
    Hello @ceaperez, thank u for the prompt response. Indeed, the Auto model gives a great overview about models and feature sets. Then that is the way to do it - adapt it afterwards.
    Best. T
  • tomMEM
    tomMEM New Altair Community Member

    Hello, it looks like the Auto model is designed to extract interleaved training and test sets at a ratio of 0.6 to 0.4 over the whole example set range. The model gives then a very good regression with my dataset. 

    Creating the Model based on training and testing data sets using linear sampling (0.9 -0.1) resulted in an about 4 times worse performance. This indicates that the model needs further steps to get more generalized and the importance of the training set preparation.

    Thus, it would be still nice to have a choice for data set splitting in the Auto model.

    In addition, the problem remains how to further optimize the Model to get more generalized. One way could be to run the Model using a variety of data set splitting to optimize the Model parameters or to add random noise levels into the data, like in Image recognition approaches.