[SOLVED] Convert binominal to numeric?

wessel
wessel New Altair Community Member
edited November 5 in Community Q&A
Dear All,

I have several binomial attributes, on which I wish to run linear regression.
So I must convert these binomial attributes with values "true" and "false" to real attributes with values "1" and "0".
How can I do this?

I tried the generate attributes operator but this did not work.
I used the following settings:
attribute name: myNewAtt    
functional expression: if(myAtt == true, 1, 0)

Even though this expression is functionally correct, it always returns 0.

Best regards,

Wessel
Tagged:

Answers

  • wessel
    wessel New Altair Community Member
    A process that does work is the following:
    using operators
    1. replace (replace all true values to 1)
    2. replace (replace all false values to 0)
    3. parse numbers

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.017">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.1.017" expanded="true" name="Process">
       <process expanded="true" height="642" width="778">
         <operator activated="true" class="replace" compatibility="5.1.017" expanded="true" height="76" name="Replace" width="90" x="59" y="140">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="|cluster_2|cluster_1|cluster_0"/>
           <parameter key="replace_what" value="true"/>
           <parameter key="replace_by" value="1"/>
         </operator>
         <operator activated="true" class="replace" compatibility="5.1.017" expanded="true" height="76" name="Replace (2)" width="90" x="187" y="85">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="|cluster_2|cluster_1|cluster_0"/>
           <parameter key="replace_what" value="false"/>
           <parameter key="replace_by" value="0"/>
         </operator>
         <operator activated="true" class="parse_numbers" compatibility="5.1.017" expanded="true" height="76" name="Parse Numbers" width="90" x="315" y="30">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="|cluster_2|cluster_1|cluster_0"/>
         </operator>
         <connect from_op="Replace" from_port="example set output" to_op="Replace (2)" to_port="example set input"/>
         <connect from_op="Replace (2)" from_port="example set output" to_op="Parse Numbers" to_port="example set input"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
       </process>
     </operator>
    </process>
  • earmijo
    earmijo New Altair Community Member
    Hi Wessel:

    Two additional solutions:

    1) Use Weka's Linear Regression Operator. It will code the binomial attributes for you automatically. This is sooooo convenient.

    2) Use the "Nominal to Numerical" Operator and select Dummy Coding. You have to define then for each binomial variable a "comparison group" which will get coded 0. According to your message, the comparison group will be false.

    Regards,

    \E

    Here's a example that uses the Golf dataset:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.017">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.017" expanded="true" name="Process">
        <process expanded="true" height="637" width="950">
          <operator activated="true" class="retrieve" compatibility="5.1.017" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
            <parameter key="repository_entry" value="//Samples/data/Golf"/>
          </operator>
          <operator activated="true" class="nominal_to_numerical" compatibility="5.1.017" expanded="true" height="94" name="Nominal to Numerical" width="90" x="182" y="72">
            <parameter key="coding_type" value="dummy coding"/>
            <parameter key="use_comparison_groups" value="true"/>
            <list key="comparison_groups">
              <parameter key="Wind" value="false"/>
              <parameter key="Outlook" value="sunny"/>
            </list>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>