What is the correct way to replace all instances of 999 in data with empty?
martyns
New Altair Community Member
I have a dataset where all missing values are represented by 999.
If I try to replace them with:
If I set the filter to numeric only, then it changes the order of the variable list which messes up the model application in that there is an error:
May 28, 2009 10:12:33 AM: [Warning] W-J48: The order of attributes is not equal for the training and the application example set. This might lead to problems for some models.
A great helper on the list suggested that 999 should not input a replace_by value but then nothing appears to happen at all.
I have unticked work on input as it then seems to pass along the modified example set to the model applier further along. Should I be ticking work on input and placing the model applier differently?
So, what is the correct way to replace all values of 999 in a dataset with empty or blank values?
And how about just for numeric values?
Thanks!
If I try to replace them with:
<operator name="FeatureIterator" class="FeatureIterator" expanded="yes">then everything goes horribly wrong and when I try to send the data through a model it fails drastically in terms of the nominal variables.
<parameter key="work_on_input" value="false"/>
<operator name="Mapping" class="Mapping">
<parameter key="attributes" value="%{loop_feature}"/>
<list key="value_mappings">
</list>
<parameter key="replace_what" value="999"/>
<parameter key="replace_by" value="?"/>
</operator>
</operator>
If I set the filter to numeric only, then it changes the order of the variable list which messes up the model application in that there is an error:
May 28, 2009 10:12:33 AM: [Warning] W-J48: The order of attributes is not equal for the training and the application example set. This might lead to problems for some models.
A great helper on the list suggested that 999 should not input a replace_by value but then nothing appears to happen at all.
I have unticked work on input as it then seems to pass along the modified example set to the model applier further along. Should I be ticking work on input and placing the model applier differently?
So, what is the correct way to replace all values of 999 in a dataset with empty or blank values?
And how about just for numeric values?
Thanks!
Tagged:
0
Answers
-
G'Day!
Bit long winded, but this does it....
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="random"/>
</operator>
<operator name="SetData" class="SetData">
<parameter key="attribute_name" value="att1"/>
<parameter key="example_index" value="1"/>
<parameter key="value" value="999"/>
</operator>
<operator name="Numerical2FormattedNominal" class="Numerical2FormattedNominal">
</operator>
<operator name="Replace" class="Replace">
<parameter key="attributes" value=".*"/>
<parameter key="replace_what" value="999"/>
</operator>
<operator name="NominalNumbers2Numerical" class="NominalNumbers2Numerical">
</operator>
<operator name="Numerical2Real" class="Numerical2Real">
</operator>
</operator>0