Import Integer in e notation

mataio
mataio New Altair Community Member
edited November 5 in Community Q&A
Hello everybody,

I have a rather simple question but I can't find an answer, so I try it here.

I have a simple RapidMiner (Read csv + store) process to import data from a text file. This data includes a column with very small numbers like 1.23e-5. No matter which settings I tried in the Import Configuration Wizard, the column is imported as "?".

Does anyone have an idea how I can pass the data correctly to RapidMiner?

Thanks for your help :)
Tagged:

Answers

  • homburg
    homburg New Altair Community Member
    Hello mataio,

    the reason for your problem is the lowercase "e" character your numbers are formatted with. You can solve the problem by loading the numerical column as nominal, replace all occurrences of "e" with an "E" and finally parse to numbers. Please have look at the following process (just paste this in the xml tab of RapidMiner):
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.008">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="6.0.008" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="read_csv" compatibility="6.0.008" expanded="true" height="60" name="Read CSV" width="90" x="112" y="75">
           <parameter key="csv_file" value="C:\Users\hhomburg\Documents\num_test.csv"/>
           <parameter key="column_separators" value=","/>
           <parameter key="first_row_as_names" value="false"/>
           <list key="annotations"/>
           <parameter key="encoding" value="windows-1252"/>
           <list key="data_set_meta_data_information">
             <parameter key="0" value="att1.true.polynominal.attribute"/>
             <parameter key="1" value="att2.true.polynominal.attribute"/>
           </list>
         </operator>
         <operator activated="true" class="replace" compatibility="6.0.008" expanded="true" height="76" name="Replace" width="90" x="246" y="75">
           <parameter key="attribute_filter_type" value="single"/>
           <parameter key="attribute" value="att2"/>
           <parameter key="replace_what" value="e"/>
           <parameter key="replace_by" value="E"/>
         </operator>
         <operator activated="true" class="parse_numbers" compatibility="6.0.008" expanded="true" height="76" name="Parse Numbers" width="90" x="380" y="75">
           <parameter key="attribute_filter_type" value="single"/>
           <parameter key="attribute" value="att2"/>
           <parameter key="grouping_character" value="e"/>
           <parameter key="unparsable_value_handling" value="replace with missing values"/>
         </operator>
         <connect from_op="Read CSV" from_port="output" to_op="Replace" to_port="example set input"/>
         <connect from_op="Replace" from_port="example set output" to_op="Parse Numbers" to_port="example set input"/>
         <connect from_op="Parse Numbers" from_port="example set output" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
    Please note: Since you are using very small numbers, adjust the rapidminer.general.fractiondigits.numbers parameter in RapidMiner to something bigger than 3. You will find this setting via Tools -> Preferences.

    Cheers,
       Helge