"Getting the names of the attributes selected in Loop Attribute Subset"
wasperen
New Altair Community Member
I am using the Loop Attribute Subset and it nicely generates a collection of all combinations of the appropriate attributes.
But I would like to create an example set that says: combining A+B gives result X, combining A+C gives result Y etc. Is there a way to obtain, in the loop, a notion of what attributes are currently looked at?
Something like %{attributes} that gives me A;B. I could then add that as an attribute to my result set...
But I would like to create an example set that says: combining A+B gives result X, combining A+C gives result Y etc. Is there a way to obtain, in the loop, a notion of what attributes are currently looked at?
Something like %{attributes} that gives me A;B. I could then add that as an attribute to my result set...
Tagged:
0
Answers
-
Hi,
this is of course possible: You could use the operator "Log" for accessing the current iteration's used attributes, the attribute count, and a performance (if available). I have uploaded a sample process to myExperiment.org:
http://www.myexperiment.org/workflows/2211.html
You can easily download the process with our Community Extension from myExperiment (search in the forum for more information about the extension).
The result will be a table containing the attribute names, the attribute count, and I calculated a performance with an inner cross validation as well and stored it also in the table. Below is the result for "Golf":
Outlook, Temperature 2.0 0.7
Outlook, Temperature, Wind 3.0 0.7
Outlook 1.0 0.65
Temperature 1.0 0.65
Outlook, Humidity 2.0 0.65
Humidity, Wind 2.0 0.65
Temperature, Humidity, Wind 3.0 0.65
Wind 1.0 0.6
Outlook, Wind 2.0 0.6
Temperature, Humidity 2.0 0.6
Temperature, Wind 2.0 0.55
Outlook, Temperature, Humidity 3.0 0.55
Outlook, Temperature, Humidity, Wind 4.0 0.55
Humidity 1.0 0.45
Outlook, Humidity, Wind 3.0 0.35
Hope that helps,
Ingo
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.008" expanded="true" name="Process">
<process expanded="true" height="674" width="919">
<operator activated="true" class="retrieve" compatibility="5.1.008" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Samples/data/Golf"/>
</operator>
<operator activated="true" class="loop_attribute_subsets" compatibility="5.1.008" expanded="true" height="60" name="Loop Subsets" width="90" x="179" y="30">
<process expanded="true" height="674" width="919">
<operator activated="true" class="x_validation" compatibility="5.1.008" expanded="true" height="112" name="Validation" width="90" x="45" y="30">
<process expanded="true" height="674" width="434">
<operator activated="true" class="decision_tree" compatibility="5.1.008" expanded="true" height="76" name="Decision Tree" width="90" x="45" y="30"/>
<connect from_port="training" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Decision Tree" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true" height="674" width="434">
<operator activated="true" class="apply_model" compatibility="5.1.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="5.1.008" expanded="true" height="76" name="Performance" width="90" x="179" y="30"/>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log" compatibility="5.1.008" expanded="true" height="76" name="Log" width="90" x="179" y="30">
<list key="log">
<parameter key="used_attributes" value="operator.Loop Subsets.value.feature_names"/>
<parameter key="used_number" value="operator.Loop Subsets.value.feature_number"/>
<parameter key="performance" value="operator.Validation.value.performance"/>
</list>
</operator>
<connect from_port="example set" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
<portSpacing port="source_example set" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Loop Subsets" to_port="example set"/>
<connect from_op="Loop Subsets" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
Hi Ingo (yes, I learn quickly),
Thanks for this. It takes a bit of a round-tour but works for me.
Kind regards,
Willem0 -
By the way. Using this logger in a Optimize Selection (Brute Force) does not give proper values for the feature_names value... Or so it seems. Only one shows up. Is that maybe because of the parallel execution?0
-
Hi,
thanks for the greetings...
Hi Ingo (yes, I learn quickly)
No, the reason for this is actually much simpler and lies in the way of implementation: the operators "Optimize Selection (...)" deliver only the feature names of the best individual so far since all those algorithms are based on populations (similar to evolutionary approaches). Delivering the feature names of all sets of the current population would be an option but in that case one would not know which performance belongs to which feature set. If you want to see this level of detail, the loop operator probably is the better option.
Using this logger in a Optimize Selection (Brute Force) does not give proper values for the feature_names value... Or so it seems. Only one shows up. Is that maybe because of the parallel execution?
Cheers,
Ingo0