Simulator and test-training data in Automodel

Chemical_eng
Chemical_eng New Altair Community Member
edited November 5 in Community Q&A
Hello, 
I am using AutoModel. I have some questions : 
1. Is the simulator based on test data, training data or all data ? 
2. How do I ensure that my test dataset is balanced, I have a lot of categorical variables, how do I ensure then the test dataset is balanced ? 
3. Can I see metrics for both training and test error rates ? I think I only see for test. 

Thanks

Best Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓

    1. Simulator uses mostly none of the two data sets. It is in the end applying the model on the data set you configure. The data is only used to determine min and max.
    2. Usually you won't balance on attributes, but only on labels? You can do this by changing the costs/gains matrix
    3. Only test error rates are reported, because train error rates are rarely of any use.

    BR,
    Martin
  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    i suppose not all random seeds are set to a fixed seed, so that the split and also the randomness in some algorithms (or the randomness introduced by parallel computation) does change the results slightly.

    The results should not differ much, right?

    BR,
    Martin

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓

    1. Simulator uses mostly none of the two data sets. It is in the end applying the model on the data set you configure. The data is only used to determine min and max.
    2. Usually you won't balance on attributes, but only on labels? You can do this by changing the costs/gains matrix
    3. Only test error rates are reported, because train error rates are rarely of any use.

    BR,
    Martin
  • Chemical_eng
    Chemical_eng New Altair Community Member
    Hi Thanks for your answer. Here some comments : 2. My label is continuous, so I have a regression problem. Is more like I have some inputs with categorical variables, which are not equally distributed, so is more ensuring that the test set represents these inputs . 

    3. I wanted the error on train dataset to compare to test for overfitting/underfitting purposes, but is ok. 

    Also another question : 1. I am getting different result when I run train the algorithm in the same dataset, I assume is because of random errors by the algorithm parameters or by choosing the data for train-test split. I am using the model for optimization and I see it gives me different recommendations every time I train, any ideas on how to keep the model fixed ? can we fix these random parameters or choose and average or best combination?
  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    i suppose not all random seeds are set to a fixed seed, so that the split and also the randomness in some algorithms (or the randomness introduced by parallel computation) does change the results slightly.

    The results should not differ much, right?

    BR,
    Martin
  • Chemical_eng
    Chemical_eng New Altair Community Member
    They do differ somewhat significant for when we do the optimization , I think this problem is complex so maybe we would need to use some extra advise.