Home
Discussions
Community Q&A
Disappearing attributes
Noel
Hi All-
I have a process (attached) where, prior to the data stream getting piped into an Optimize Selection (Evolutionary) operator, there are 23 attributes including an index called "anch_dt" (this is a time series analysis). The first breakpoint I set in the process is at this point. Once the data gets piped into the Optimize Selection (Evolutionary) operator, however, only 14 attributes are present and the index "anch_dt" is one that is missing. I set a second breakpoint at this point in the process.
Where did the rest of the data go?
Any help would be greatly appreciated. I've been banging my head against this obstacle and I suspect mental blindness is preventing me from seeing the obvious error.
Best, Noel
Find more posts tagged with
AI Studio
Accepted answers
hughesfleming68
Hi Noel, here you go. Let me know if this is what you were expecting.
Noel_Process.rmp
hughesfleming68
A quick way to see what is happening is to switch to the regular Optimize Selection. Setting direction to Forward, will select one attribute. Backward on the other hand will select all the attributes. You can see this on the entry port by right clicking show example set. This does give the impression that this is happening ahead of the operator. It is just something to be aware of.
On the Optimize selection evolutionary operator you can force the starting selection by setting the exact number of attributes.
All comments
hughesfleming68
Hi Noel, I am assuming that you are trying to predict "anch_dt". If not and you feel it is essential to your process then you need to exclude it from the Optimize Selection. The Optimize selection from your description is working the way it should. Let me look at your process. This may be as simple as setting a label.
David_A
Hi,
the thing is, that the goal of the
"Optimize Selection (Evolutionary)"
operator is to remove attributes. So when your running the process, it directly starts to evaluate the performance of you model on that subset and compares it to others (that's the evolutionary part).
The problem is, that your windowing relies on the complete subset, which then might break if either the indices or label attribute are missing. From what I have seen from your process and tried out with some sample date, it should be sufficient to assign special roles (label and ID) to the
anch_dt
and
ccc_bonds_stw_differentiated
attributes.
I hope that helps,
David
varunm1
@David_A
quick question. Is it normal that attributes were filtered out before the breakpoint? I see that the breakpoint is set to before, is this breakpoint related to the internal process in optimize selection and not before feature selection?
Thanks
hughesfleming68
Hi Noel, I can't find your horizon attribute ccc_bonds_stw_differentiated. If you set your horizon attribute to ccc_bonds_stw then your process runs with your missing attributes. Set the windowing operator correctly and that should help.
Noel
Hi Alex- As always, thank you for your help. I made the horizon attribute change you suggested, thanks. Unfortunately, I'm still getting the "Attribute not found" error (because anch_dt is missing). Error screen cap below.
As you said, I'm trying to predict "ccc_bonds_stw" and since the data are all time series, I'm windowing and using "anch_dt" as the index. Apologies if I'm missing something or misunderstood, but I'm still stuck.
Off topic, I took your suggestion and bot Marcos Lopez de Prado's book. I'm still at the beginning, but it looks like it'll be helpful.
-Noel
--
hughesfleming68
Hi Noel, I have your process working but I am out of the office for a couple of hours. I will upload it when I am back in.
Noel
Super, thanks. Beautiful day here in CT. Hope you're getting a taste of it wherever you are.
hughesfleming68
Hi Noel, here you go. Let me know if this is what you were expecting.
Noel_Process.rmp
Noel
Fantastic! Thanks, Alex, I'm good to go.
If you wouldn't mind one more question, though... I definitely should've tried moving the windowing operator outside the optimize selection, but why didn't the original process work? Based on the settings, was 14 the max number of attributes for the Optimize Selection operator? Also, do indices not find their way into the operator or was it just a coincidence in this case?
Have a great night and thanks again.
-Noel
varunm1
The algorithm is filtering out that attribute. The feature selection algorithm is treating your indices as a regular attribute as it is not set to any special role. 14 is not a maximum number, based on your data 14 features are useful. The number of features selected by optimize selection depends on their relevance to prediction, if they are not useful they will be removed.
Hope this clarifies
Noel
@varunm1-
Thanks. Not to beat a dead horse (that’s a terrible expression), the reduction in attributes seemed to happen immediately — upstream of the cross validation. So to me, a novice, it appears that the optimize selection operator hasn’t even started its work. If this is the case (and it very well might not be), what caused the attribute reduction?
Noel
@varunm1-
I appreciate your help.
varunm1
@Noel
I tested your process and I requested
@David_A
suggestion in my previous comment to know how the breakpoint works for this operator. From my preliminary understanding once it passes your filter examples operator it getting a reduction by optimize selection.
hughesfleming68
A quick way to see what is happening is to switch to the regular Optimize Selection. Setting direction to Forward, will select one attribute. Backward on the other hand will select all the attributes. You can see this on the entry port by right clicking show example set. This does give the impression that this is happening ahead of the operator. It is just something to be aware of.
On the Optimize selection evolutionary operator you can force the starting selection by setting the exact number of attributes.
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)