Running out of features during feature selection
I am stumbling upon the same error again and again while using FEATURE SELECTION operator with GLM learner inside. It starts with 56 features and pretty fast literally runs out of features each time I am trying to run the process.
These are GLM settings:
These are feature selection settings:
Please advise. I can also provide any additional information if needed.
Thanks!
Best Answer
-
Are some of your features constant or highly correlated? The GLM learner removes those unfortunately. The error messaging coming from H2O is a bit weird because actually it WAS presented with features (the FS makes sure that there is always at least one input column) BEFORE it itself removed those :-) If the collinear features are the problem, you can uncheck the setting in the GLM parameters. If you have constants in your data it is best to remove them already before you start the feature selection to void the problem.Hope that helps,
Ingo1
Answers
-
Are some of your features constant or highly correlated? The GLM learner removes those unfortunately. The error messaging coming from H2O is a bit weird because actually it WAS presented with features (the FS makes sure that there is always at least one input column) BEFORE it itself removed those :-) If the collinear features are the problem, you can uncheck the setting in the GLM parameters. If you have constants in your data it is best to remove them already before you start the feature selection to void the problem.Hope that helps,
Ingo1 -
Hi @IngoRM
I am getting back to this thread as I have faced the problem again.
Previously I have disabled removing collinear columns by nested GLM and this helped, so it helped and the process worked OK.
This time I have run into it again and found out that there was actually one constant column in my data after filtering the smaller subset for feature selection.
Hence my question, can't feature selection operator just ignore such columns, as it can happen eventually as in my case, but the error message itself seems too confusing actually?
Thanks!1 -
Yeah, the error message is bad indeed. Unfortunately there is nothing we can do about this because we do not "own" that particular part of the code... :-( I am personally a bit torn on the constant handling here though. If we just keep it in, we avoid the error in this particular case but it kind of bugs me that a feature selection, which is supposed to get rid of the weak features, forces to keep constant column in. It kind of defeats the purpose.... also because it is really undocumented / special behavior of the H2O learner here we would need to work around...
So I actually would prefer to keep it the way it is but that would require you to use a Remove Useless Attributes operator before. Last option would be to remove all constant features automatically BEFORE we start the feature selection (and throw an error if that removes all columns), but that makes this a bit implicit which is not great either...
Any opinions on this?
0 -
Sorry @kypexin for posting here. @IngoRM Do you think the below post is also because of the same issue? I asked user to have a breakpoint and check, but it shows that there is a feature going inside the model, not sure why its throwing the same H2O error. I tried with different datasets but didnot encounter this error. Just curious why its returning an error when there are featurea going inside GBT
https://community.rapidminer.com/discussion/55390/forward-selection-error-thrown#latest
1 -
Yes, good catch! This is indeed extremely likely for the same reason. This error message is only shown if the H2O model removed all features itself (which is super annoying - wish we could turn this behavior just off...). Typically this happens because of co-linear features (that can actually be turned off, but cannot be the reason for the other thread since there is only one input feature anyway...). The other reason is a constant input which H2O simply removes as well. This is what I think is going on here: all values in the window are constant, H2O removes it, and finally it complains that there are not features left (sigh)...I will bring this up with our engineers to see if they can talk to the H2O folks to make this work. But to be honest, I would not hold my breath...-3