Scientific Notation for very small numbers 1E-12

dragoljub
dragoljub New Altair Community Member
edited November 5 in Community Q&A
I have imported some data from a csv file using the AML operator. The data has columns of small E-12 valued data.

I noticed that in the results view all very small numbers are represented as zeros. Even in the meta data view the statistics is all zero. However, when you copy and paste the entry you see that the correct E-12 number is stored there.

Does rapid miner correctly use these numbers (E-10 - E-12 range) or does it assume zero for the processing operators. I suppose I could scale up by some constant but is that necessary?

Also is there any way to show scientific notation in the results view?  ;D

Thanks,
-Gagi
Tagged:

Answers

  • dragoljub
    dragoljub New Altair Community Member
    I have also noticed that this could be problematic when using the 'Remove Useless' operator. It seems like for very small numbers the statistics are not correctly calculated since they are always interpreted as zero rather than normalized values.  ???

    -Gagi
  • haddock
    haddock New Altair Community Member
    Hi there,

    In Rapido reals are really reals, they are only rounded up for display, according to the 'fractiondigits.number' preference setting. As for imposing scientific notation, or others ....
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="206" width="681">
          <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="111" y="67">
            <parameter key="attributes_lower_bound" value="-1.0E-100"/>
            <parameter key="attributes_upper_bound" value="1.0E-100"/>
          </operator>
          <operator activated="true" class="format_numbers" expanded="true" height="76" name="Format Numbers" width="90" x="313" y="75">
            <parameter key="format_type" value="pattern"/>
            <parameter key="pattern" value="0.###E0"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Format Numbers" to_port="example set input"/>
          <connect from_op="Format Numbers" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

  • land
    land New Altair Community Member
    Hi,
    in addition to what haddock said: The Remove Useless operator uses the standard deviation of the attribute values to determine if it's useless. If your numbers are very small, you will have to lower the threshold accordingly.
    I think it would be smarter to use some mean weighted threshold, but anyway, the remove useless operator should be avoided for attributes having different values at all if possible. The usage of a learner based attribute selection will be far preferable.

    Greetings,
      Sebastian