Parse Nominal ERROR
Hello!
I am trying to parse a nominal attribute to numerical, but it seems like there is some format issue. When I use "Nominal to Numerical" Operator, I get one new column for each value of the attribute i am trying to parse.
If I use "Parse Numbers" Operator, It shows me the error "The setup does not seem to contain any obvious errors, but you should check the log messages or activate the debug mode in the settings dialog in order to get more information about this problem"
Any ideas?
Thanks in advance.
Answers
-
I forgot to mention, my nominal attribute contains numbers and the separator is a comma. I specify this in the operator.
0 -
hello @agurruchaga welcome to the community! I'd recommend posting your XML process here (see https://youtu.be/KkgB5QXWXJ8 and "Read Before Posting" on right when you reply) and attach your dataset. This way we can replicate what you're doing and help you better.
Scott1 -
Hi @agurruchaga,
I will try to provide an answer element :
"When I use "Nominal to Numerical" Operator, I get one new column for each value of the attribute i am trying to parse"
It seems that you have set, in the "Nominal to Numerical" Operator, the parameter coding type as dummy coding, and in this case,
this is what is expected.
You can instead set coding type as unique integers. In this case, your attribute values will be remplaced by numeric values.
For example, consider that you have an attribute called Color which has as possible values : Red 12, Blue 24 and Green 35.
After transformation by the "Nominal to Numerical", your attribute values will be : 0, 1 and 2.
Is it what you are looking for ?
Regards,
Lionel
3 -
Thanks both. I now see I should not use nominal to numerical, it isn't what I am looking for. The problem is "Parse Number" is throwing the exception I mentioned in my last post. I attach the process xml :
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="set_macro" compatibility="8.0.001" expanded="true" height="68" name="Fecha de datos" width="90" x="45" y="34">
<parameter key="macro" value="fecha_datos"/>
<parameter key="value" value="20171231"/>
</operator>
<operator activated="true" class="subprocess" compatibility="8.0.001" expanded="true" height="82" name="Importar y tratar" width="90" x="45" y="136">
<process expanded="true">
<operator activated="true" class="jdbc_connectors:read_access" compatibility="8.0.001" expanded="true" height="68" name="Importar BBDD Lecturas" width="90" x="45" y="34">
<parameter key="define_connection" value="url"/>
<parameter key="database_system" value="UCanAccess"/>
<parameter key="database_url" value="jdbc:ucanaccess://Z:\ITALIA\Medidas\NON ORARI\LecturasItalia.accdb;MirrorFolder=java.io.tmpdir;jackcessOpener=com.rapidminer.jdbc.AccessCryptCodecOpener"/>
<parameter key="username" value="noUser"/>
<parameter key="password" value="WYO/VTwMmmI1YQyZ9ygN6w=="/>
<parameter key="define_query" value="table name"/>
<parameter key="use_default_schema" value="true"/>
<parameter key="table_name" value="LECT_FACTURA_DISTR"/>
<parameter key="prepare_statement" value="false"/>
<enumeration key="parameters"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="data_management" value="auto"/>
<parameter key="database_file" value="Z:\ITALIA\Medidas\NON ORARI\LecturasItalia.accdb"/>
<description align="center" color="transparent" colored="false" width="126"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="8.0.001" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
<list key="function_descriptions">
<parameter key="DataMisura_num" value="concat(cut(DataMisura,6,4),cut(DataMisura,3,2),cut(DataMisura,0,2))"/>
</list>
<parameter key="keep_all" value="true"/>
</operator>
<operator activated="true" class="parse_numbers" compatibility="8.0.001" expanded="true" height="82" name="Parse Numbers" width="90" x="313" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="EaF1"/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="nominal"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="file_path"/>
<parameter key="block_type" value="single_value"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="single_value"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="decimal_character" value=","/>
<parameter key="grouped_digits" value="false"/>
<parameter key="grouping_character" value=","/>
<parameter key="unparsable_value_handling" value="skip attribute"/>
</operator>
<connect from_op="Importar BBDD Lecturas" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Parse Numbers" to_port="example set input"/>
<connect from_op="Parse Numbers" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="set_macro" compatibility="8.0.001" expanded="true" height="68" name="Fecha_ant" width="90" x="179" y="34">
<parameter key="macro" value="fecha_ant"/>
<parameter key="value" value="20171130"/>
</operator>
<operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="103" name="Multiply" width="90" x="179" y="136"/>
<operator activated="true" class="filter_examples" compatibility="8.0.001" expanded="true" height="103" name="Filter Examples (2)" width="90" x="447" y="340">
<parameter key="parameter_expression" value="DataMisura_num < %{fecha_datos}"/>
<parameter key="condition_class" value="expression"/>
<parameter key="invert_filter" value="false"/>
<list key="filters_list">
<parameter key="filters_entry_key" value="DataMisura.lt.12/31/2017"/>
</list>
<parameter key="filters_logic_and" value="true"/>
<parameter key="filters_check_metadata" value="true"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="8.0.001" expanded="true" height="103" name="Filter Examples (4)" width="90" x="581" y="340">
<parameter key="parameter_expression" value="DataMisura_num < %{fecha_datos}"/>
<parameter key="condition_class" value="custom_filters"/>
<parameter key="invert_filter" value="false"/>
<list key="filters_list">
<parameter key="filters_entry_key" value="FlussoMisure.equals.PNO"/>
</list>
<parameter key="filters_logic_and" value="true"/>
<parameter key="filters_check_metadata" value="true"/>
</operator>
<connect from_op="Importar y tratar" from_port="out 1" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 2" to_op="Filter Examples (2)" to_port="example set input"/>
<connect from_op="Filter Examples (2)" from_port="example set output" to_op="Filter Examples (4)" to_port="example set input"/>
<connect from_op="Filter Examples (4)" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>The column "EaF1" contains numbers with a comma as decimal character, ie : 238,60
Thanks.
0 -
OK, it seems parse numbers operator doesn't accept blank values. I created another attribute, converting blanks to 0 and it works now(although it isn't the nicest solution IMO). Thanks for your answers.
1 -
Hi @agurruchaga,
For handling those blanks the parameter "unparsable value handling" is available. Assuming those blank cells represent missing entries you could set this parameter value to "replace with missing values". Within further processing you could apply the Operator Replace missing values to handle these.
Sidenote:
Depending on the source of your data sometimes there might be leading or trailing blanks in those cells. Those can be removed using the Operator Trim.Best regards,
Edin
0