parse numbers output not numerical
I am reading a .csv file that has some numbers formatted as currency, eg $1,000 or $500. These are read by RapidMiner as polynominal. So I am using the Replace operator to remove the $ and , characters. The $ removal works fine and the , removal is also fine, but oddly for sums of $999 and below, which did not have a comma in them, I receive an error message: "No Number: according to the specified format, 500 cannot be parsed as a number". There are no spaces or other nuisances. Any ideas what could cause this? Thanks...
Find more posts tagged with
You can just be extra cautious and replace all characters that won't parse with the replace operator. It works for me on your dataset.
<?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="7.3.000" expanded="true" height="68" name="Read CSV" width="90" x="179" y="85">
<parameter key="csv_file" value="C:\Users\think\Downloads\Insurance Preparation.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="Order.true.integer.attribute"/>
<parameter key="1" value="No\. Risks.true.integer.attribute"/>
<parameter key="2" value="Value Insured.true.polynominal.attribute"/>
<parameter key="3" value="Employees.true.integer.attribute"/>
<parameter key="4" value="Rent.true.polynominal.attribute"/>
<parameter key="5" value="Preparation Time.true.real.attribute"/>
</list>
</operator>
<operator activated="true" class="replace" compatibility="7.3.000" expanded="true" height="82" name="Replace" width="90" x="313" y="85">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Rent|Value Insured"/>
<parameter key="replace_what" value="[-!"#$%&'()*+,/:;<=>?@\[\\\]_`{|}~a-zA-Z\s]"/>
</operator>
<operator activated="true" class="parse_numbers" compatibility="7.3.000" expanded="true" height="82" name="Parse Numbers" width="90" x="581" y="85">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Rent|Value Insured"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_op="Parse Numbers" to_port="example set input"/>
<connect from_op="Parse Numbers" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Thanks Thomas, yes I am using the Parse Numbers operator... that's what is giving me the error message.
I think you were referring to the decimal separator character? Trouble is that if I change that to a comma then 1,000,000 becomes 1.000.000 which doesn't read as a number
If you enable the XML view in Studio, then you can copy the XML provided and replace the default XML, and then hit the green check mark at the top of the window. That will render the process in the main process view and you will be able to see the operators and their configuration. Sharing the raw XML is thus an easy way of sharing a RapidMiner process and you will see it commonly done this way on the community forum posts.
Certainly is easy when you know how! Many thanks Brian, I really like this one-size-fits-all replacement operator and will include it in our training.
For the benefit of others: Brian's killer replacement operator has this in the 'replace what' parameter: [-!"#$%&'()*+,/:;<=>?@\[\\\]_`{|}~a-zA-Z\s]
David
Certainly is easy when you know how! Many thanks Brian, I really like this one-size-fits-all replacement operator and will include it in our training.
For the benefit of others: Brian's killer replacement operator has this in the 'replace what' parameter: [-!"#$%&'()*+,/:;<=>?@\[\\\]_`{|}~a-zA-Z\s]
David
Hi,
this sounds a bit odd. Could you provide an example process? And did you tried Trim to remove leading and ending white spaces?
~Martin