Select all attributes having only missing value
Hi,
How shall I select all the attributes which has ONLY missing values in Rapidminer tool. I dont want to select other attributes which has both missing and non-missing values. If I put 'no_missing_values' in attribute_filter_type option in 'Select Attributes' operator and inverse selection, it select the rows which has both missing and non-missing values. But I need to select attributes which has all the values missing.
Thanks,
Zubair
Find more posts tagged with
Well, a shorter version (although not exactly the same) is to just use "Replace Missing Values" with a constant value NOT in the data. Then use "Remove Useless Attributes". Of course this one also removes other attributes which are constant (but how useful are those?). You can then turn the constant value you have used above back into a missing again with "Declare Missing Value".
Here is the code:
<?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" breakpoints="after" class="subprocess" compatibility="7.3.000" expanded="true" height="82" name="Subprocess" width="90" x="45" y="34">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Retrieve Golf" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Samples/data/Golf"/>
</operator>
<operator activated="true" class="declare_missing_value" compatibility="7.3.000" expanded="true" height="82" name="Declare Missing Value" width="90" x="179" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Humidity"/>
<parameter key="mode" value="expression"/>
<parameter key="expression_value" value="Humidity>5"/>
</operator>
<operator activated="true" class="declare_missing_value" compatibility="7.3.000" expanded="true" height="82" name="Declare Missing Value (2)" width="90" x="447" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Temperature"/>
<parameter key="numeric_value" value="80.0"/>
</operator>
<connect from_op="Retrieve Golf" from_port="output" to_op="Declare Missing Value" to_port="example set input"/>
<connect from_op="Declare Missing Value" from_port="example set output" to_op="Declare Missing Value (2)" to_port="example set input"/>
<connect from_op="Declare Missing Value (2)" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Get A data set</description>
</operator>
<operator activated="true" class="replace_missing_values" compatibility="7.3.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="179" y="34">
<parameter key="default" value="value"/>
<list key="columns"/>
<parameter key="replenishment_value" value="-99"/>
</operator>
<operator activated="true" class="remove_useless_attributes" compatibility="7.3.000" expanded="true" height="82" name="Remove Useless Attributes" width="90" x="313" y="34"/>
<operator activated="true" class="declare_missing_value" compatibility="7.3.000" expanded="true" height="82" name="Declare Missing Value (4)" width="90" x="447" y="34">
<parameter key="mode" value="nominal"/>
<parameter key="nominal_value" value="-99"/>
</operator>
<operator activated="true" class="declare_missing_value" compatibility="7.3.000" expanded="true" height="82" name="Declare Missing Value (3)" width="90" x="581" y="34">
<parameter key="numeric_value" value="-99.0"/>
</operator>
<connect from_op="Subprocess" from_port="out 1" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_op="Remove Useless Attributes" to_port="example set input"/>
<connect from_op="Remove Useless Attributes" from_port="example set output" to_op="Declare Missing Value (4)" to_port="example set input"/>
<connect from_op="Declare Missing Value (4)" from_port="example set output" to_op="Declare Missing Value (3)" to_port="example set input"/>
<connect from_op="Declare Missing Value (3)" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Cheers,
Ingo
Dear Zubairali,
interesting question. I did not find a one operator solution. Attached is a longer process doing the job. I would be curious if there is an easier way to do it.
~Martin