Auto Model Performance. Is it training, testing, or validation?

Konradlk
Konradlk New Altair Community Member
edited November 2024 in Community Q&A

Best Answer

  • varunm1
    varunm1 New Altair Community Member
    edited November 2019 Answer ✓
    @Konradlk

    Here you go. I tried a couple of neural layers with different layer sizes and adding new layers. It looks like the best performance (in my trials) came with only one layer with 2 neurons. Adding more neurons or layers is reducing the Test performance as it seems overfitting.

    The process attached seemed optimal with RMSE of 0.023 and Squared Correlation of 0.5. You can try other models and compare them with a neural network to see if the RMSE is decreasing and Square correlation is increasing. Higher squared correlation and lower RMSE are better.

    Below are the testing data performances (RMSE & Squared Correlation respectively)
    NN with a single layer and  4 neuron Test 0.025 0.430
    NN with a single layer and  10 neuron Test 0.027 0.419
    NN with two-layer and 2 neurons in each layer 0.027  0.395
    NN with a single layer and 2 neurons test 0.023 0.50

    Hope this helps.

Answers

  • varunm1
    varunm1 New Altair Community Member
    Hello @Konradlk

    Auto model divides the original dataset into 60:40 split (Train: Test). The validation in the auto model is a multi hold out set validation. The model will be trained on 60% data and the 40% test data will be divided into 7 subsets. Once the model is trained, it will be used to make predictions on each of the 7 subsets independently and the performance of these 7 subsets will be averaged. So the performance you see in the auto model is from the test data using a multi hold out validation method.

    Hope this helps. Please inform if you need more information.
  • Konradlk
    Konradlk New Altair Community Member
    Thank you so much @varunm1 . Do you have resources to find the other errors? 
  • varunm1
    varunm1 New Altair Community Member
    Do you have resources to find the other errors?
    Can you inform what kind of resources and errors you are looking for?

    If you click on "performance" of each model you can find different performance metrics like accuracy, precision, recall etc
  • Konradlk
    Konradlk New Altair Community Member
    edited November 2019
    @varunm1

    Hi, Im looking to get the performance vector for each step of the process. So I am looking for the performance vector of Training, Validation and Testing. 

    I was previously using a process a coworker left me, and they explicitly said that they need errors for all 3 stages. I am sorry that this is unclear. I do not have the greatest understanding of this and trying to learn very very quickly. 

    My goal is to run several different prediction models and compare the performance of the different models.

    This pictures down below was what i was left with. I can post more information if necessary. 


  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Hi @Konradlk,

    Your process is correct.
    You have effectively : 
     - the training performance (given by the Performance operator in the "training" part of the Cross Validation operator)
     - the validation performance (given by the Performance operator in the "testing" part of the Cross Validation operator)
     - the testing performance (given by the Performance operator in the main process)

    Do you encounter some errors with this process ?

    Regards,

    Lionel


  • Konradlk
    Konradlk New Altair Community Member
    @lionelderkrikor

    I do encounter errors when I try to change the neural network for deep learning or Generalized linear or SVM.

    The problem I encounter is that no matter what the predictive models I run I get the exact same errors for each performance test. 

    When I run an auto model I get different errors for each model but not when I change them in my process. I change the models by just changing the neural network box to whatever else I wanted to run. 
  • varunm1
    varunm1 New Altair Community Member
    I do encounter errors when I try to change the neural network for deep learning or Generalized linear or SVM.

    Can you inform the details of those errors? If possible provide us with data and .rmp file to debug.

    When I run an auto model I get different errors for each model but not when I change them in my process.
    You might get different errors because the processes are different.
  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    @Konradlk

    The method of validation is different in AutoModel and in your process : 
     - In AutoModel, a split validation with a multi hold out set validation is performed like described by Varun. You can open the process generated by AutoModel to understand how is validated your model.
     - In your process, you are using a Cross Validation.

    Although performance should not differ significantly in both cases, the use of 2 different validations method can explain the differences.

    Moreover you are applying a preprocessing step to your data (Normalization). To my knowledge, AutoModel does not apply such preprocessing step by default. This difference in the preprocessing step can explain the difference in the performance results. Once again you can open the process generated by AutoModel and compare it to your process.
      
    But in order we can reproduce what you observe, and find what exactly is going on, can you share your data and your process (the process of your screenshot)

    Regards,

    Lionel
  • Konradlk
    Konradlk New Altair Community Member
    @varunm1 @lionelderkrikor
    Once again thank you both for your time and help. I am going to attach my .rmp file and both excel files I use. If either of you can help me figure out how to get decent data for neural network and at least one predictive model I would be so grateful.

    For both of the excel files only the last sheet is used
  • varunm1
    varunm1 New Altair Community Member
    edited November 2019
    Hello @Konradlk

    Any reference performance values you have or you are looking for? I modified your process and added an optimize parameter grid for the neural network. I didn't change layer information inside a neural network like adding neurons or layers. 

    I attached the working process without errors. You can change the layers in the neural network operator inside the optimize parameter (Grid) to see how different layers work. I will try other settings, you can add layers and try as well. Use Squared correlation and RMSE as your performance evaluation metrics.

    Please let us know if you have more questions
  • varunm1
    varunm1 New Altair Community Member
    edited November 2019 Answer ✓
    @Konradlk

    Here you go. I tried a couple of neural layers with different layer sizes and adding new layers. It looks like the best performance (in my trials) came with only one layer with 2 neurons. Adding more neurons or layers is reducing the Test performance as it seems overfitting.

    The process attached seemed optimal with RMSE of 0.023 and Squared Correlation of 0.5. You can try other models and compare them with a neural network to see if the RMSE is decreasing and Square correlation is increasing. Higher squared correlation and lower RMSE are better.

    Below are the testing data performances (RMSE & Squared Correlation respectively)
    NN with a single layer and  4 neuron Test 0.025 0.430
    NN with a single layer and  10 neuron Test 0.027 0.419
    NN with two-layer and 2 neurons in each layer 0.027  0.395
    NN with a single layer and 2 neurons test 0.023 0.50

    Hope this helps.