one-hot encoding missing value attribute

jbalint
jbalint New Altair Community Member
edited November 5 in Community Q&A
Hi,
I'm facing an issue using the one hot encoding operator. Attributes are generated for all but one of the values. I've included an example based on the Titanic data. The result includes newly added attributes 


Best Answer

  • IngoRM
    IngoRM New Altair Community Member
    Answer ✓
    That’s by design.  If both columns are zero this means that the “missing” column would be a one.  Hope this helps,
    Ingo

Answers

  • jbalint
    jbalint New Altair Community Member
    edited March 2020
    sorry I posted trying to add a tag.

    Anyways. The result includes newly added attributes "Passenger class = First" and "Passenger Class = Third", but omits a new attribute for second class. Process attached.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.5.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.4.000" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="UTF-8"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.5.000" expanded="true" height="68" name="Retrieve Titanic" origin="GENERATED_TUTORIAL" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Samples/data/Titanic"/>
          </operator>
          <operator activated="true" class="model_simulator:one_hot_encoding" compatibility="9.5.000" expanded="true" height="103" name="One Hot Encoding" origin="GENERATED_TUTORIAL" width="90" x="179" y="34">
            <parameter key="return_preprocessing_model" value="false"/>
            <parameter key="create_view" value="false"/>
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Passenger Class"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="remove with too many values" value="false"/>
            <parameter key="maximum number of values" value="20"/>
            <parameter key="perform encoding" value="true"/>
          </operator>
          <connect from_op="Retrieve Titanic" from_port="output" to_op="One Hot Encoding" to_port="example set input"/>
          <connect from_op="One Hot Encoding" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    

  • IngoRM
    IngoRM New Altair Community Member
    Answer ✓
    That’s by design.  If both columns are zero this means that the “missing” column would be a one.  Hope this helps,
    Ingo
  • jbalint
    jbalint New Altair Community Member
    thanks. i guess i can use Loop Values to get them all