Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

Disappearing attributes

Hi All-

I have a process (attached) where, prior to the data stream getting piped into an Optimize Selection (Evolutionary) operator, there are 23 attributes including an index called "anch_dt" (this is a time series analysis). The first breakpoint I set in the process is at this point. Once the data gets piped into the Optimize Selection (Evolutionary) operator, however, only 14 attributes are present and the index "anch_dt" is one that is missing. I set a second breakpoint at this point in the process.

Where did the rest of the data go?

Any help would be greatly appreciated. I've been banging my head against this obstacle and I suspect mental blindness is preventing me from seeing the obvious error.

Best, Noel

Find more posts tagged with

AI Studio

Accepted answers

hughesfleming68

Hi Noel, here you go. Let me know if this is what you were expecting.

Noel_Process.rmp

hughesfleming68

A quick way to see what is happening is to switch to the regular Optimize Selection. Setting direction to Forward, will select one attribute. Backward on the other hand will select all the attributes. You can see this on the entry port by right clicking show example set. This does give the impression that this is happening ahead of the operator. It is just something to be aware of.

On the Optimize selection evolutionary operator you can force the starting selection by setting the exact number of attributes.

All comments

hughesfleming68

Hi Noel, I am assuming that you are trying to predict "anch_dt". If not and you feel it is essential to your process then you need to exclude it from the Optimize Selection. The Optimize selection from your description is working the way it should. Let me look at your process. This may be as simple as setting a label.

David_A

Hi,

the thing is, that the goal of the "Optimize Selection (Evolutionary)" operator is to remove attributes. So when your running the process, it directly starts to evaluate the performance of you model on that subset and compares it to others (that's the evolutionary part).

The problem is, that your windowing relies on the complete subset, which then might break if either the indices or label attribute are missing. From what I have seen from your process and tried out with some sample date, it should be sufficient to assign special roles (label and ID) to the anch_dt and ccc_bonds_stw_differentiated attributes.

I hope that helps,

David

varunm1

@David_A quick question. Is it normal that attributes were filtered out before the breakpoint? I see that the breakpoint is set to before, is this breakpoint related to the internal process in optimize selection and not before feature selection?

Thanks

hughesfleming68

Hi Noel, I can't find your horizon attribute ccc_bonds_stw_differentiated. If you set your horizon attribute to ccc_bonds_stw then your process runs with your missing attributes. Set the windowing operator correctly and that should help.

Noel

Hi Alex- As always, thank you for your help. I made the horizon attribute change you suggested, thanks. Unfortunately, I'm still getting the "Attribute not found" error (because anch_dt is missing). Error screen cap below.

As you said, I'm trying to predict "ccc_bonds_stw" and since the data are all time series, I'm windowing and using "anch_dt" as the index. Apologies if I'm missing something or misunderstood, but I'm still stuck.

Off topic, I took your suggestion and bot Marcos Lopez de Prado's book. I'm still at the beginning, but it looks like it'll be helpful.

-Noel
--

Image: https://us.v-cdn.net/6030995/uploads/editor/7y/l1lkb834t1m1.jpg

hughesfleming68

Hi Noel, I have your process working but I am out of the office for a couple of hours. I will upload it when I am back in.

Noel

Super, thanks. Beautiful day here in CT. Hope you're getting a taste of it wherever you are.

hughesfleming68

Hi Noel, here you go. Let me know if this is what you were expecting.

Noel_Process.rmp

Noel

Fantastic! Thanks, Alex, I'm good to go.

If you wouldn't mind one more question, though... I definitely should've tried moving the windowing operator outside the optimize selection, but why didn't the original process work? Based on the settings, was 14 the max number of attributes for the Optimize Selection operator? Also, do indices not find their way into the operator or was it just a coincidence in this case?

Have a great night and thanks again.
-Noel

varunm1

The algorithm is filtering out that attribute. The feature selection algorithm is treating your indices as a regular attribute as it is not set to any special role. 14 is not a maximum number, based on your data 14 features are useful. The number of features selected by optimize selection depends on their relevance to prediction, if they are not useful they will be removed.

Hope this clarifies

Noel

@varunm1- Thanks. Not to beat a dead horse (that’s a terrible expression), the reduction in attributes seemed to happen immediately — upstream of the cross validation. So to me, a novice, it appears that the optimize selection operator hasn’t even started its work. If this is the case (and it very well might not be), what caused the attribute reduction?

Noel

@varunm1- I appreciate your help.

varunm1

@Noel I tested your process and I requested @David_A suggestion in my previous comment to know how the breakpoint works for this operator. From my preliminary understanding once it passes your filter examples operator it getting a reduction by optimize selection.

hughesfleming68