Replace missing values for weight with average/mean of other attribute (item identifyer)
FrancisC
New Altair Community Member
Hi,
I have a data set containing supermarket data and two of my attributes are item weight and item identifier.
A lot of examples are missing weight info, but because of the item identifier I know what they have to be (see image: DRA24 has to be 19.350 and DRA59 has to be 8.270)
How can I replace the missing values for weight based on the average or mean of the item identifier attribute?
Or is there another way how I can fix the missing values for weight?
I have a data set containing supermarket data and two of my attributes are item weight and item identifier.
A lot of examples are missing weight info, but because of the item identifier I know what they have to be (see image: DRA24 has to be 19.350 and DRA59 has to be 8.270)
How can I replace the missing values for weight based on the average or mean of the item identifier attribute?
Or is there another way how I can fix the missing values for weight?
Tagged:
0
Answers
-
Hi @FrancisC,I would use Group Into Collection like this:<?xml version="1.0" encoding="UTF-8"?><process version="9.8.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.8.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="utility:create_exampleset" compatibility="9.8.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="179" y="136">
<parameter key="generator_type" value="attribute functions"/>
<parameter key="number_of_examples" value="100"/>
<parameter key="use_stepsize" value="false"/>
<list key="function_descriptions">
<parameter key="Item_Id" value="round(10*rand())"/>
<parameter key="Item_Weight" value="if(rand()<0.1,rand()*100,MISSING_NUMERIC)"/>
</list>
<parameter key="add_id_attribute" value="false"/>
<list key="numeric_series_configuration"/>
<list key="date_series_configuration"/>
<list key="date_series_configuration (interval)"/>
<parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="column_separator" value=","/>
<parameter key="parse_all_as_nominal" value="false"/>
<parameter key="decimal_point_character" value="."/>
<parameter key="trim_attribute_names" value="true"/>
</operator>
<operator activated="true" class="operator_toolbox:group_into_collection" compatibility="2.8.000-SNAPSHOT" expanded="true" height="82" name="Group Into Collection" width="90" x="313" y="136">
<parameter key="group_by_attribute" value="Item_Id"/>
<parameter key="group_by_attribute (numerical)" value=""/>
<parameter key="sorting_order" value="none"/>
<description align="center" color="transparent" colored="false" width="126">Get one example set per item_id</description>
</operator>
<operator activated="true" class="loop_collection" compatibility="9.8.000" expanded="true" height="82" name="Loop Collection" width="90" x="447" y="136">
<parameter key="set_iteration_macro" value="false"/>
<parameter key="macro_name" value="iteration"/>
<parameter key="macro_start_value" value="1"/>
<parameter key="unfold" value="false"/>
<process expanded="true">
<operator activated="true" class="replace_missing_values" compatibility="9.8.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="313" y="136">
<parameter key="return_preprocessing_model" value="false"/>
<parameter key="create_view" value="false"/>
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="default" value="average"/>
<list key="columns"/>
</operator>
<connect from_port="single" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Create ExampleSet" from_port="output" to_op="Group Into Collection" to_port="exa"/>
<connect from_op="Group Into Collection" from_port="col" to_op="Loop Collection" to_port="collection"/>
<connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
1 -
Thank you so much! Unfortunately, I don't know how to write code.. Is there an operator that could do the same?0
-
Hi @FrancisC,this is a process. Please check https://community.rapidminer.com/discussion/50470/import-xml-code-to-process on how to get the XML into your RapidMiner.Best,Martin
0