FFS template: logging anomaly or bug?
wessel
New Altair Community Member
Hello,
I ran the Feature Selection template, I looked at the log results, and noticed many '?' (NaN) symbols at the start of the log.
So I made some modifications to the template to understand where these '?' symbols come from.
But I can't explain why FFS is running the same attribute subset many times, and get different performance score each generation.
The log looks like this:
# FS.performance FS.generation Performance.kappa Apply Model.applycount FS.feature_names
NaN 0.0 0.0 1.0 ?
NaN 0.0 0.39 2.0 ?
NaN 0.0 0.39 3.0 ?
NaN 0.0 0.0 4.0 ?
NaN 0.0 0.0 5.0 ?
NaN 0.0 0.0 6.0 ?
NaN 0.0 0.0 7.0 ?
NaN 0.0 0.199 8.0 ?
NaN 0.0 0.0 9.0 ?
NaN 0.0 0.0 10.0 ?
NaN 0.0 0.0 11.0 ?
NaN 0.0 0.0 12.0 ?
NaN 0.0 0.0 13.0 ?
NaN 0.0 0.0 14.0 ?
NaN 0.0 0.0 15.0 ?
NaN 0.0 0.0 16.0 ?
0.39 1.0 0.39 17.0 wage-inc-1st
0.39 1.0 0.39 18.0 wage-inc-1st
0.39 1.0 0.0 19.0 wage-inc-1st
0.39 1.0 0.39 20.0 wage-inc-1st
0.39 1.0 0.196 21.0 wage-inc-1st
0.39 1.0 0.0 22.0 wage-inc-1st
0.39 1.0 0.39 23.0 wage-inc-1st
0.39 1.0 0.39 24.0 wage-inc-1st
0.39 1.0 0.39 25.0 wage-inc-1st
0.39 1.0 0.5 26.0 wage-inc-1st
0.39 1.0 0.0 27.0 wage-inc-1st
0.39 1.0 0.39 28.0 wage-inc-1st
0.39 1.0 0.39 29.0 wage-inc-1st
0.39 1.0 0.39 30.0 wage-inc-1st
0.39 1.0 0.39 31.0 wage-inc-1st
0.5 2.0 0.5 32.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.39 33.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 34.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 35.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.196 36.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 37.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 38.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 39.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 40.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.0 41.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 42.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 43.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 44.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 45.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 46.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.39 47.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 48.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 49.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.196 50.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 51.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 52.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 53.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 54.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 55.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 56.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 57.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 58.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 59.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.39 60.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 61.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 62.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.196 63.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 64.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 65.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 66.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 67.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 68.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 69.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 70.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 71.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.39 72.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 73.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 74.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.196 75.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 76.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 77.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 78.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 79.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 80.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 81.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.5 82.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.39 83.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.5 84.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.5 85.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.196 86.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.5 87.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.5 88.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.5 89.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.5 90.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.5 91.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.5 92.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.39 93.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.5 94.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.5 95.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.196 96.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.5 97.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.5 98.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.5 99.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.5 100.0 wage-inc-2nd, statutory-holidays
0.5 8.0 0.5 101.0 wage-inc-2nd, statutory-holidays
0.5 8.0 0.39 102.0 wage-inc-2nd, statutory-holidays
0.5 8.0 0.5 103.0 wage-inc-2nd, statutory-holidays
0.5 8.0 0.5 104.0 wage-inc-2nd, statutory-holidays
0.5 8.0 0.196 105.0 wage-inc-2nd, statutory-holidays
0.5 8.0 0.5 106.0 wage-inc-2nd, statutory-holidays
0.5 8.0 0.5 107.0 wage-inc-2nd, statutory-holidays
0.5 8.0 0.5 108.0 wage-inc-2nd, statutory-holidays
0.5 9.0 0.5 109.0 wage-inc-2nd, statutory-holidays
0.5 9.0 0.39 110.0 wage-inc-2nd, statutory-holidays
0.5 9.0 0.5 111.0 wage-inc-2nd, statutory-holidays
0.5 9.0 0.5 112.0 wage-inc-2nd, statutory-holidays
0.5 9.0 0.196 113.0 wage-inc-2nd, statutory-holidays
0.5 9.0 0.5 114.0 wage-inc-2nd, statutory-holidays
0.5 9.0 0.5 115.0 wage-inc-2nd, statutory-holidays
0.5 10.0 0.5 116.0 wage-inc-2nd, statutory-holidays
0.5 10.0 0.39 117.0 wage-inc-2nd, statutory-holidays
0.5 10.0 0.5 118.0 wage-inc-2nd, statutory-holidays
0.5 10.0 0.5 119.0 wage-inc-2nd, statutory-holidays
0.5 10.0 0.196 120.0 wage-inc-2nd, statutory-holidays
0.5 10.0 0.5 121.0 wage-inc-2nd, statutory-holidays
0.5 11.0 0.5 122.0 wage-inc-2nd, statutory-holidays
0.5 11.0 0.39 123.0 wage-inc-2nd, statutory-holidays
0.5 11.0 0.5 124.0 wage-inc-2nd, statutory-holidays
0.5 11.0 0.5 125.0 wage-inc-2nd, statutory-holidays
0.5 11.0 0.196 126.0 wage-inc-2nd, statutory-holidays
0.5 12.0 0.5 127.0 wage-inc-2nd, statutory-holidays
0.5 12.0 0.39 128.0 wage-inc-2nd, statutory-holidays
0.5 12.0 0.5 129.0 wage-inc-2nd, statutory-holidays
0.5 12.0 0.196 130.0 wage-inc-2nd, statutory-holidays
0.5 13.0 0.5 131.0 wage-inc-2nd, statutory-holidays
0.5 13.0 0.39 132.0 wage-inc-2nd, statutory-holidays
0.5 13.0 0.196 133.0 wage-inc-2nd, statutory-holidays
0.5 14.0 0.39 134.0 wage-inc-2nd, statutory-holidays
0.5 14.0 0.196 135.0 wage-inc-2nd, statutory-holidays
0.39 15.0 0.39 136.0 wage-inc-2nd, statutory-holidays
this is the xml:
I ran the Feature Selection template, I looked at the log results, and noticed many '?' (NaN) symbols at the start of the log.
So I made some modifications to the template to understand where these '?' symbols come from.
But I can't explain why FFS is running the same attribute subset many times, and get different performance score each generation.
The log looks like this:
# FS.performance FS.generation Performance.kappa Apply Model.applycount FS.feature_names
NaN 0.0 0.0 1.0 ?
NaN 0.0 0.39 2.0 ?
NaN 0.0 0.39 3.0 ?
NaN 0.0 0.0 4.0 ?
NaN 0.0 0.0 5.0 ?
NaN 0.0 0.0 6.0 ?
NaN 0.0 0.0 7.0 ?
NaN 0.0 0.199 8.0 ?
NaN 0.0 0.0 9.0 ?
NaN 0.0 0.0 10.0 ?
NaN 0.0 0.0 11.0 ?
NaN 0.0 0.0 12.0 ?
NaN 0.0 0.0 13.0 ?
NaN 0.0 0.0 14.0 ?
NaN 0.0 0.0 15.0 ?
NaN 0.0 0.0 16.0 ?
0.39 1.0 0.39 17.0 wage-inc-1st
0.39 1.0 0.39 18.0 wage-inc-1st
0.39 1.0 0.0 19.0 wage-inc-1st
0.39 1.0 0.39 20.0 wage-inc-1st
0.39 1.0 0.196 21.0 wage-inc-1st
0.39 1.0 0.0 22.0 wage-inc-1st
0.39 1.0 0.39 23.0 wage-inc-1st
0.39 1.0 0.39 24.0 wage-inc-1st
0.39 1.0 0.39 25.0 wage-inc-1st
0.39 1.0 0.5 26.0 wage-inc-1st
0.39 1.0 0.0 27.0 wage-inc-1st
0.39 1.0 0.39 28.0 wage-inc-1st
0.39 1.0 0.39 29.0 wage-inc-1st
0.39 1.0 0.39 30.0 wage-inc-1st
0.39 1.0 0.39 31.0 wage-inc-1st
0.5 2.0 0.5 32.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.39 33.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 34.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 35.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.196 36.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 37.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 38.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 39.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 40.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.0 41.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 42.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 43.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 44.0 wage-inc-2nd, statutory-holidays
0.5 2.0 0.5 45.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 46.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.39 47.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 48.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 49.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.196 50.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 51.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 52.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 53.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 54.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 55.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 56.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 57.0 wage-inc-2nd, statutory-holidays
0.5 3.0 0.5 58.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 59.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.39 60.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 61.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 62.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.196 63.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 64.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 65.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 66.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 67.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 68.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 69.0 wage-inc-2nd, statutory-holidays
0.5 4.0 0.5 70.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 71.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.39 72.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 73.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 74.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.196 75.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 76.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 77.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 78.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 79.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 80.0 wage-inc-2nd, statutory-holidays
0.5 5.0 0.5 81.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.5 82.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.39 83.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.5 84.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.5 85.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.196 86.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.5 87.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.5 88.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.5 89.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.5 90.0 wage-inc-2nd, statutory-holidays
0.5 6.0 0.5 91.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.5 92.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.39 93.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.5 94.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.5 95.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.196 96.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.5 97.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.5 98.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.5 99.0 wage-inc-2nd, statutory-holidays
0.5 7.0 0.5 100.0 wage-inc-2nd, statutory-holidays
0.5 8.0 0.5 101.0 wage-inc-2nd, statutory-holidays
0.5 8.0 0.39 102.0 wage-inc-2nd, statutory-holidays
0.5 8.0 0.5 103.0 wage-inc-2nd, statutory-holidays
0.5 8.0 0.5 104.0 wage-inc-2nd, statutory-holidays
0.5 8.0 0.196 105.0 wage-inc-2nd, statutory-holidays
0.5 8.0 0.5 106.0 wage-inc-2nd, statutory-holidays
0.5 8.0 0.5 107.0 wage-inc-2nd, statutory-holidays
0.5 8.0 0.5 108.0 wage-inc-2nd, statutory-holidays
0.5 9.0 0.5 109.0 wage-inc-2nd, statutory-holidays
0.5 9.0 0.39 110.0 wage-inc-2nd, statutory-holidays
0.5 9.0 0.5 111.0 wage-inc-2nd, statutory-holidays
0.5 9.0 0.5 112.0 wage-inc-2nd, statutory-holidays
0.5 9.0 0.196 113.0 wage-inc-2nd, statutory-holidays
0.5 9.0 0.5 114.0 wage-inc-2nd, statutory-holidays
0.5 9.0 0.5 115.0 wage-inc-2nd, statutory-holidays
0.5 10.0 0.5 116.0 wage-inc-2nd, statutory-holidays
0.5 10.0 0.39 117.0 wage-inc-2nd, statutory-holidays
0.5 10.0 0.5 118.0 wage-inc-2nd, statutory-holidays
0.5 10.0 0.5 119.0 wage-inc-2nd, statutory-holidays
0.5 10.0 0.196 120.0 wage-inc-2nd, statutory-holidays
0.5 10.0 0.5 121.0 wage-inc-2nd, statutory-holidays
0.5 11.0 0.5 122.0 wage-inc-2nd, statutory-holidays
0.5 11.0 0.39 123.0 wage-inc-2nd, statutory-holidays
0.5 11.0 0.5 124.0 wage-inc-2nd, statutory-holidays
0.5 11.0 0.5 125.0 wage-inc-2nd, statutory-holidays
0.5 11.0 0.196 126.0 wage-inc-2nd, statutory-holidays
0.5 12.0 0.5 127.0 wage-inc-2nd, statutory-holidays
0.5 12.0 0.39 128.0 wage-inc-2nd, statutory-holidays
0.5 12.0 0.5 129.0 wage-inc-2nd, statutory-holidays
0.5 12.0 0.196 130.0 wage-inc-2nd, statutory-holidays
0.5 13.0 0.5 131.0 wage-inc-2nd, statutory-holidays
0.5 13.0 0.39 132.0 wage-inc-2nd, statutory-holidays
0.5 13.0 0.196 133.0 wage-inc-2nd, statutory-holidays
0.5 14.0 0.39 134.0 wage-inc-2nd, statutory-holidays
0.5 14.0 0.196 135.0 wage-inc-2nd, statutory-holidays
0.39 15.0 0.39 136.0 wage-inc-2nd, statutory-holidays
this is the xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Root">
<description><p> Transformations of the attribute space may ease learning in a way, that simple learning schemes may be able to learn complex functions. This is the basic idea of the kernel trick. But even without kernel based learning schemes the transformation of feature space may be necessary to reach good learning results. </p> <p> RapidMiner offers several different feature selection, construction, and extraction methods. This selection process (the well known forward selection) uses an inner cross validation for performance estimation. This building block serves as fitness evaluation for all candidate feature sets. Since the performance of a certain learning scheme is taken into account we refer to processes of this type as &quot;wrapper approaches&quot;.</p> <p>Additionally the process log operator plots intermediate results. You can inspect them online in the Results tab. Please refer to the visualization sample processes or the RapidMiner tutorial for further details.</p> <p> Try the following: <ul> <li>Start the process and change to &quot;Result&quot; view. There can be a plot selected. Plot the &quot;performance&quot; against the &quot;generation&quot; of the feature selection operator.</li> <li>Select the feature selection operator in the tree view. Change the search directory from forward (forward selection) to backward (backward elimination). Restart the process. All features will be selected.</li> <li>Select the feature selection operator. Right click to open the context menu and repace the operator by another feature selection scheme (for example a genetic algorithm).</li> <li>Have a look at the list of the process log operator. Every time it is applied it collects the specified data. Please refer to the RapidMiner Tutorial for further explanations. After changing the feature selection operator to the genetic algorithm approach, you have to specify the correct values. <table><tr><td><icon>groups/24/visualization</icon></td><td><i>Use the process log operator to log values online.</i></td></tr></table> </li> </ul> </p></description>
<process expanded="true" height="500" width="576">
<operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Samples/data/Labor-Negotiations"/>
</operator>
<operator activated="true" class="split_data" expanded="true" height="94" name="Split Data" width="90" x="179" y="30">
<enumeration key="partitions">
<parameter key="ratio" value="0.5"/>
<parameter key="ratio" value="0.5"/>
</enumeration>
</operator>
<operator activated="true" class="optimize_selection" expanded="true" height="94" name="FS" width="90" x="313" y="30">
<parameter key="limit_generations_without_improval" value="false"/>
<parameter key="generations_without_improval" value="2"/>
<parameter key="maximum_number_of_generations" value="1"/>
<process expanded="true" height="500" width="570">
<operator activated="true" class="weka:W-J48" expanded="true" height="76" name="W-J48" width="90" x="45" y="30"/>
<operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="180" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_binominal_classification" expanded="true" height="76" name="Performance" width="90" x="315" y="30">
<parameter key="accuracy" value="false"/>
<parameter key="kappa" value="true"/>
<parameter key="AUC" value="true"/>
<parameter key="sensitivity" value="true"/>
<parameter key="specificity" value="true"/>
</operator>
<operator activated="true" class="log" expanded="true" height="76" name="ProcessLog" width="90" x="447" y="30">
<parameter key="filename" value="C:\Users\wessel\Desktop\sdf.log"/>
<list key="log">
<parameter key="FS.performance" value="operator.FS.value.performance"/>
<parameter key="FS.generation" value="operator.FS.value.generation"/>
<parameter key="kappa" value="operator.Performance.value.kappa"/>
<parameter key="Perf.AUC" value="operator.Performance.value.AUC"/>
<parameter key="Perf.sens" value="operator.Performance.value.sensitivity"/>
<parameter key="Performance.speci" value="operator.Performance.value.specificity"/>
<parameter key="Apply Model.applycount" value="operator.Apply Model.value.applycount"/>
<parameter key="J48.looptime" value="operator.W-J48.value.looptime"/>
<parameter key="FS.feature_names" value="operator.FS.value.feature_names"/>
</list>
</operator>
<connect from_port="example set" to_op="W-J48" to_port="training set"/>
<connect from_port="through 1" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="W-J48" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_op="ProcessLog" to_port="through 1"/>
<connect from_op="ProcessLog" from_port="through 1" to_port="performance"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="source_through 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="FS" to_port="example set in"/>
<connect from_op="Split Data" from_port="partition 2" to_op="FS" to_port="through 1"/>
<connect from_op="FS" from_port="example set out" to_port="result 1"/>
<connect from_op="FS" from_port="performance" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
-
Hi,
I guess the performance values are logged even before set. So the NaN marks an unknown value. I think it's neither a bug nor a logging anomaly, but it can't be avoided if the logging is performed inside the operator whose value is logged. Since the performance can be know at first when the sub process completed. So logging values of a super operator must fail if depending on the results of the sub process...
Greetings,
Sebastian0 -
Hmm,
so how to fix this?
The log is located at the last element in the flow.
Maybe I need to add a macro which finds out what feature names are used?
Is there such a macro?
attribute names, number of attributes
? 1
wage-inc-1st 2 <-- clearly this is wrong, there is 2 attributes here, but attribute names say 10 -
Hi,
at first I would suggest using the Forward Attribute Selection instead, because it's much fast, more reliable and can cope with more data than the old Feature Selection operator. In fact this has only been kept for compatibility reasons...
With the next update this operator will have the capability of providing the current attributes as value for the logging. But to my knowledge there's no workaround until then.
Greetings,
Sebastian0 -
There is no way to log the current attributes that are used?
Is it possible to use the groovy script to output this?
Like write to log file:
attribute subset, performance
I tried the following script:
void log(def text){
ouFile = new File("C:/Users/wluijben/Desktop/scriptoutput.txt");
ouFile << text << '\n';
}
log("hello world");
ExampleSet exampleSet = operator.getInput(ExampleSet.class);
for (Attribute attribute : exampleSet.getAttributes()) {
String attributeName = attribute.getName();
log(attributeName);
}
It outputs the attribute names, but I'm not sure yet how to output the performance.0 -
I think this is how you do it.
For the iris dataset it outputs:
1 0.0 0.6 a1
2 0.0 0.52 a2
3 0.0 0.89 a3
4 0.0 0.93 a4
5 1.0 0.93 a1 a4
6 1.0 0.93 a2 a4
7 1.0 0.93 a3 a4
Can someone check if I do bad things which make the script unnecessary slow?<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Root">
<process expanded="true" height="638" width="573">
<operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="rename_by_replacing" expanded="true" height="76" name="rename" width="90" x="45" y="120">
<parameter key="replace_what" value="attribute_"/>
<parameter key="replace_by" value="a"/>
</operator>
<operator activated="true" class="split_data" expanded="true" height="94" name="split" width="90" x="179" y="30">
<enumeration key="partitions">
<parameter key="ratio" value="0.5"/>
<parameter key="ratio" value="0.5"/>
</enumeration>
</operator>
<operator activated="true" class="optimize_selection" expanded="true" height="94" name="FS" width="90" x="313" y="30">
<process expanded="true" height="638" width="573">
<operator activated="true" class="select_attributes" expanded="true" height="76" name="no special" width="90" x="45" y="30">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value="class"/>
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="execute_script" expanded="true" height="76" name="Execute Script" width="90" x="179" y="30">
<parameter key="script" value="ExampleSet exampleSet = operator.getInput(ExampleSet.class); String atts = ""; for (Attribute attribute : exampleSet.getAttributes()) { String name = attribute.getName(); 	atts = atts + " " + name; } operator.getProcess().getMacroHandler().addMacro("atts", atts)"/>
</operator>
<operator activated="true" class="weka:W-J48" expanded="true" height="76" name="W-J48" width="90" x="179" y="165"/>
<operator activated="true" class="apply_model" expanded="true" height="76" name="Applier" width="90" x="112" y="300">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="246" y="300"/>
<operator activated="true" class="provide_macro_as_log_value" expanded="true" height="76" name="Provide Macro as Log Value" width="90" x="313" y="30">
<parameter key="macro_name" value="atts"/>
</operator>
<operator activated="true" class="log" expanded="true" height="94" name="MyLog" width="90" x="380" y="120">
<list key="log">
<parameter key="generation" value="operator.FS.value.generation"/>
<parameter key="performance" value="operator.Performance.value.performance"/>
<parameter key="atts" value="operator.Provide Macro as Log Value.value.macro_value"/>
</list>
</operator>
<connect from_port="example set" to_op="no special" to_port="example set input"/>
<connect from_port="through 1" to_op="Applier" to_port="unlabelled data"/>
<connect from_op="no special" from_port="example set output" to_op="Execute Script" to_port="input 1"/>
<connect from_op="no special" from_port="original" to_op="W-J48" to_port="training set"/>
<connect from_op="Execute Script" from_port="output 1" to_op="Provide Macro as Log Value" to_port="through 1"/>
<connect from_op="W-J48" from_port="model" to_op="Applier" to_port="model"/>
<connect from_op="Applier" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_op="MyLog" to_port="through 1"/>
<connect from_op="Provide Macro as Log Value" from_port="through 1" to_op="MyLog" to_port="through 2"/>
<connect from_op="MyLog" from_port="through 1" to_port="performance"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="source_through 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log_to_data" expanded="true" height="94" name="Log to Data" width="90" x="447" y="120">
<parameter key="log_name" value="MyLog"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="rename" to_port="example set input"/>
<connect from_op="rename" from_port="example set output" to_op="split" to_port="example set"/>
<connect from_op="split" from_port="partition 1" to_op="FS" to_port="example set in"/>
<connect from_op="split" from_port="partition 2" to_op="FS" to_port="through 1"/>
<connect from_op="FS" from_port="weights" to_port="result 1"/>
<connect from_op="FS" from_port="performance" to_op="Log to Data" to_port="through 1"/>
<connect from_op="Log to Data" from_port="exampleSet" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>0 -
Hi,
I don't think it's your script, but scripting in general isn't that fast...Seems to me to be ok.
Greetings,
Sebastian0