forward selection and backward elimination

AD2019
AD2019 New Altair Community Member
edited November 2024 in Community Q&A
I ran a multiple regression model on a dataset having 15 variables first using the "forward selection" nested operator, and then using the "backward elimination" nested operator.  I got dramatically different models.  the first had 3 independent variables, the second had 8  IVs.  why such a bid difference.  I realize the serial elimination or addition of IVs may yield local optima, but is it common to get such wildly different "optimal" models for the same dataset?  How can training yield such dramatically different trained models?
thanks in advance,
AD
Tagged:

Best Answers

  • varunm1
    varunm1 New Altair Community Member
    Answer ✓
    Hello @AD2019

    Yes, you can get highly varying results for both types of selections. The reason is the parameter settings in these operators. If you used a forward selection, and the operator is adding each variable one by one based on the improvement and then it finds there is no improvement (Stuck in local optima) then it stops. The number of speculative rounds (helps avoid local optima) helps you bypass stopping based on one decision round. Stopping behavior and a maximal number of selections and eliminations also decides the number of attributes available after selection.



    Below is my dataset that has 408 attributes where forward selection selected 8 attributes and backward elimination selected 401 attributes with default settings.
     


    Hope this helps. Please inform if you need more information.
  • varunm1
    varunm1 New Altair Community Member
    edited October 2019 Answer ✓
    The backward elimination algorithm does not have this option and my suspicion is that it is getting stuck in a local optimum
    @AD2019
    Can you check again? Backward elimination also has this option.

Answers

  • varunm1
    varunm1 New Altair Community Member
    Answer ✓
    Hello @AD2019

    Yes, you can get highly varying results for both types of selections. The reason is the parameter settings in these operators. If you used a forward selection, and the operator is adding each variable one by one based on the improvement and then it finds there is no improvement (Stuck in local optima) then it stops. The number of speculative rounds (helps avoid local optima) helps you bypass stopping based on one decision round. Stopping behavior and a maximal number of selections and eliminations also decides the number of attributes available after selection.



    Below is my dataset that has 408 attributes where forward selection selected 8 attributes and backward elimination selected 401 attributes with default settings.
     


    Hope this helps. Please inform if you need more information.
  • AD2019
    AD2019 New Altair Community Member
    thank you for the response.  I did increase the number of speculative iterations to get around the issue of local minima, but this option is only available for forward selection.  The backward elimination algorithm does not have this option and my suspicion is that it is getting stuck in a local optimum whereas the forward selection (with speculative iteration set to 30) is getting around the local optimum problem.
  • varunm1
    varunm1 New Altair Community Member
    edited October 2019 Answer ✓
    The backward elimination algorithm does not have this option and my suspicion is that it is getting stuck in a local optimum
    @AD2019
    Can you check again? Backward elimination also has this option.

  • AD2019
    AD2019 New Altair Community Member
    my apologies.  you are correct.  Backward elimination does have the speculative option.  I ran forward and backward with speculative iterations set to 30 and still get very different models.  Three IVs in one direction, 8 in the other.  I guess this is okay if the objective is prediction - "i don't care what the IVs are as long as prediction is good", but is kind of disturbing if you are building a model to understand the contribution of IVs.  Sometimes the inner workings of RapidMiner are inscrutable.  In another regression model, I had set alpha to 0.01 for feature selection using the t-test, and RM produced IVs with a p-value of 0.05.  I didn't understand that one.