Disappearing attributes

Noel
New Altair Community Member
Hi All-
I have a process (attached) where, prior to the data stream getting piped into an Optimize Selection (Evolutionary) operator, there are 23 attributes including an index called "anch_dt" (this is a time series analysis). The first breakpoint I set in the process is at this point. Once the data gets piped into the Optimize Selection (Evolutionary) operator, however, only 14 attributes are present and the index "anch_dt" is one that is missing. I set a second breakpoint at this point in the process.
Where did the rest of the data go?
Any help would be greatly appreciated. I've been banging my head against this obstacle and I suspect mental blindness is preventing me from seeing the obvious error.
Best, Noel
I have a process (attached) where, prior to the data stream getting piped into an Optimize Selection (Evolutionary) operator, there are 23 attributes including an index called "anch_dt" (this is a time series analysis). The first breakpoint I set in the process is at this point. Once the data gets piped into the Optimize Selection (Evolutionary) operator, however, only 14 attributes are present and the index "anch_dt" is one that is missing. I set a second breakpoint at this point in the process.
Where did the rest of the data go?
Any help would be greatly appreciated. I've been banging my head against this obstacle and I suspect mental blindness is preventing me from seeing the obvious error.
Best, Noel
Tagged:
0
Best Answers
-
-
A quick way to see what is happening is to switch to the regular Optimize Selection. Setting direction to Forward, will select one attribute. Backward on the other hand will select all the attributes. You can see this on the entry port by right clicking show example set. This does give the impression that this is happening ahead of the operator. It is just something to be aware of.
On the Optimize selection evolutionary operator you can force the starting selection by setting the exact number of attributes.5
Answers
-
Hi Noel, I am assuming that you are trying to predict "anch_dt". If not and you feel it is essential to your process then you need to exclude it from the Optimize Selection. The Optimize selection from your description is working the way it should. Let me look at your process. This may be as simple as setting a label.0
-
Hi,the thing is, that the goal of the "Optimize Selection (Evolutionary)" operator is to remove attributes. So when your running the process, it directly starts to evaluate the performance of you model on that subset and compares it to others (that's the evolutionary part).The problem is, that your windowing relies on the complete subset, which then might break if either the indices or label attribute are missing. From what I have seen from your process and tried out with some sample date, it should be sufficient to assign special roles (label and ID) to the anch_dt and ccc_bonds_stw_differentiated attributes.I hope that helps,David
2 -
@David_A quick question. Is it normal that attributes were filtered out before the breakpoint? I see that the breakpoint is set to before, is this breakpoint related to the internal process in optimize selection and not before feature selection?
Thanks0 -
Hi Noel, I can't find your horizon attribute ccc_bonds_stw_differentiated. If you set your horizon attribute to ccc_bonds_stw then your process runs with your missing attributes. Set the windowing operator correctly and that should help.1
-
Hi Alex- As always, thank you for your help. I made the horizon attribute change you suggested, thanks. Unfortunately, I'm still getting the "Attribute not found" error (because anch_dt is missing). Error screen cap below.
As you said, I'm trying to predict "ccc_bonds_stw" and since the data are all time series, I'm windowing and using "anch_dt" as the index. Apologies if I'm missing something or misunderstood, but I'm still stuck.
Off topic, I took your suggestion and bot Marcos Lopez de Prado's book. I'm still at the beginning, but it looks like it'll be helpful.
-Noel
--0 -
Hi Noel, I have your process working but I am out of the office for a couple of hours. I will upload it when I am back in.1
-
Super, thanks. Beautiful day here in CT. Hope you're getting a taste of it wherever you are.0
-
-
Fantastic! Thanks, Alex, I'm good to go.
If you wouldn't mind one more question, though... I definitely should've tried moving the windowing operator outside the optimize selection, but why didn't the original process work? Based on the settings, was 14 the max number of attributes for the Optimize Selection operator? Also, do indices not find their way into the operator or was it just a coincidence in this case?
Have a great night and thanks again.
-Noel1 -
The algorithm is filtering out that attribute. The feature selection algorithm is treating your indices as a regular attribute as it is not set to any special role. 14 is not a maximum number, based on your data 14 features are useful. The number of features selected by optimize selection depends on their relevance to prediction, if they are not useful they will be removed.
Hope this clarifies1 -
@varunm1- Thanks. Not to beat a dead horse (that’s a terrible expression), the reduction in attributes seemed to happen immediately — upstream of the cross validation. So to me, a novice, it appears that the optimize selection operator hasn’t even started its work. If this is the case (and it very well might not be), what caused the attribute reduction?1
-
-
A quick way to see what is happening is to switch to the regular Optimize Selection. Setting direction to Forward, will select one attribute. Backward on the other hand will select all the attributes. You can see this on the entry port by right clicking show example set. This does give the impression that this is happening ahead of the operator. It is just something to be aware of.
On the Optimize selection evolutionary operator you can force the starting selection by setting the exact number of attributes.5