Replace missing values for weight with average/mean of other attribute (item identifyer)

FrancisC
FrancisC New Altair Community Member
edited November 2024 in Community Q&A
Hi,

I have a data set containing supermarket data and two of my attributes are item weight and item identifier.
A lot of examples are missing weight info, but because of the item identifier I know what they have to be (see image: DRA24 has to be 19.350 and DRA59 has to be 8.270)

How can I replace the missing values for weight based on the average or mean of the item identifier attribute?
Or is there another way how I can fix the missing values for weight?

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    I would use Group Into Collection like this:

    Spoiler
    <?xml version="1.0" encoding="UTF-8"?><process version="9.8.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.8.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.8.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="179" y="136">
            <parameter key="generator_type" value="attribute functions"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions">
              <parameter key="Item_Id" value="round(10*rand())"/>
              <parameter key="Item_Weight" value="if(rand()&lt;0.1,rand()*100,MISSING_NUMERIC)"/>
            </list>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="operator_toolbox:group_into_collection" compatibility="2.8.000-SNAPSHOT" expanded="true" height="82" name="Group Into Collection" width="90" x="313" y="136">
            <parameter key="group_by_attribute" value="Item_Id"/>
            <parameter key="group_by_attribute (numerical)" value=""/>
            <parameter key="sorting_order" value="none"/>
            <description align="center" color="transparent" colored="false" width="126">Get one example set per item_id</description>
          </operator>
          <operator activated="true" class="loop_collection" compatibility="9.8.000" expanded="true" height="82" name="Loop Collection" width="90" x="447" y="136">
            <parameter key="set_iteration_macro" value="false"/>
            <parameter key="macro_name" value="iteration"/>
            <parameter key="macro_start_value" value="1"/>
            <parameter key="unfold" value="false"/>
            <process expanded="true">
              <operator activated="true" class="replace_missing_values" compatibility="9.8.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="313" y="136">
                <parameter key="return_preprocessing_model" value="false"/>
                <parameter key="create_view" value="false"/>
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="default" value="average"/>
                <list key="columns"/>
              </operator>
              <connect from_port="single" to_op="Replace Missing Values" to_port="example set input"/>
              <connect from_op="Replace Missing Values" from_port="example set output" to_port="output 1"/>
              <portSpacing port="source_single" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Group Into Collection" to_port="exa"/>
          <connect from_op="Group Into Collection" from_port="col" to_op="Loop Collection" to_port="collection"/>
          <connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


  • FrancisC
    FrancisC New Altair Community Member
    Thank you so much! Unfortunately, I don't know how to write code.. Is there an operator that could do the same?
  • MartinLiebig
    MartinLiebig
    Altair Employee
    this is a process. Please check https://community.rapidminer.com/discussion/50470/import-xml-code-to-process on how to get the XML into your RapidMiner.

    Best,
    Martin

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.