Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

Unexpected results from Automatic Feature Engineering

So I am trying to squeeze out the most accurate regression possible on my model, and for that I have narrowed GLM, GBT and SVM as the best learners for my data. I first try to optimize GLM as it trains the fastest.

I then generated a bunch of features with loops (manually) and selected the best broad group (this was still 400+ features we are talking about) for GLM. This group was not optimal for SVM or GBT but I wasn't optimizing that yet.

I then proceeded to run AFE on that Set to get the best GLM performance possible. It was no surprise that I got 8 or 9 optimal features that gave me the same GLM performance I had with 400+. So I was happy about that and applied that FeatureSet to my data so I would cut out the long AFE process.

However, this new dataset has considerably better performances in most learners. Including SVM and GBT. Even thou it was GLM optimized.

I then proceed to try and repeat the process for SVM, thinking that if I got such an improvement from a GLM oriented FeatureSet, I would get a better one from running AFE on SVM. But no. The SVM AFE returned a SIMPLER FeatureSet (even when I selected for Accuracy) with decent performance, but it did not beat the GLM AFE FeatureSet.

I did not think that was possible under most circumstances, but yet it happened.

Find more posts tagged with

AI Studio

Accepted answers

IngoRM

Hi,

In general the wrapper approach we are using with AFE is supposed to deliver a specific feature set for the inner learner the feature engineering is optimized for. And while they are often somewhat similar across multiple models, they typically also differ at least somewhat based on the model type so I understand your confusion.

Here is the most likely reason why the feature set from the GLM works better also for the SVM than the one created for the SVM itself: the SVM is MUCH slower than the GLM learner which means that in the same amount of time there will be much more feature sets tried in the GLM case than in the SVM case.

The SVM therefore simply did not have the same time for finding the optimal set when the optimization has been stopped for it. In that sense, the SVM feature set was still somewhat suboptimal for the SVM. The GLM feature set, which has been optimized for a different learner but had more time to be developed, happens to beat the one found for the SVM (so far).

There could also be just smaller random effects causing this but typically in my experience the reason above is why other feature sets - which are likely to be not optimal for the model as well - can outperform the optimized one for the model which just has not been optimized enough (yet) and is therefore even more suboptimal.

Hope this helps,
Ingo

All comments

varunm1

I did not think that was possible under most circumstances, but yet it happened.

This performance is after getting features from AFE and then applying to SVM or other models with optimal parameter selection right?

Out of curiosity, is the difference in performance huge? I saw a few instances in research where GLM performed comparably to SVM's but not a huge difference in GLM totally outperforming SVM.

pblack476

@varunm1To clarify: what happened was: I trained a GLM and got back a FeatureSet from AFE (that was supposed to be the best for GLM). I used that FeatureSet fo predict with SVM and got an improvement over training SVM with AFE.

So the GLM featureset was not only best for GLM but also for SVM. It alsso applies to GBT and DT. Both got consistently better from this FS but I have not yet tested with their own respective optimal Feature Sets.

The difference in my case was very substantial. Trying to predict stock prices I went from 2.03% relative error with the SVM AFE FeatureSet on SVM to 1.6% with the GLM set. At the same time performances went from 2.5% to 2.1% on GLM. And this happens across multiple labels on the set as well. In my specific case, 0.4% error is very meaningful because this is supposed to be used for trading strategies later on.

GBT and DT also improved with those sets by similar amounts. But SVM seems to reap the most rewards from this.

IngoRM

Hi,

Hope this helps,
Ingo

varunm1

the SVM is MUCH slower than the GLM learner which means that in the same amount of time there will be much more feature sets tried in the GLM case than in the SVM case.

Woah, is this the same case when we dont select "Use time limit" option in AFE? I thought its checking all.

IngoRM

Nope, BUT in this case you still may stop earlier if there is no progress in the optimization for some number of generations. Given that a (non-linear) SVM is already more powerful than a linear regression, this is more likely to happen for the SVM which effectively is the same outcome then.

pblack476

@IngoRM Indeed! I ran the SVM AFE without a time limit and got an equal score (1.6%) as the GLM set.

One thing to note however that I have observed is that even with time limit turned off, some pre-selection of the subset on which you run the AFE on makes a difference.

I had a "pruned" featureset that I used as a base for GLM before AFE. That set gave me my base score. However, when I used the full set, one that CONTAINED the entire "pruned" set within it + some other attributes, the AFE results were worse (2.1% vs. 2.3% Rel. Error). Even without a time limit it seems that the addition of noise can impact the results.

IngoRM

Yes, it can. But for small differences also the fact that the optimization algorithm uses randomized heuristics which are likely (but not guaranteed) to find an optimal solution may contribute to this. This is what I meant above with "There could also be just smaller random effects..." in my earlier answer.

Cheers,

Ingo