Trim Not Working
Hi All,
I am trying to use the trim operator to remove a space at the start of my attribute values
But it doesn't seem to be working, I am using v7.5
<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.5.001" expanded="true" height="68" name="Retrieve CountryAndGPName" width="90" x="246" y="85">
<parameter key="repository_entry" value="../Data/CountryAndGPName"/>
</operator>
<operator activated="true" breakpoints="before,after" class="trim" compatibility="7.5.001" expanded="true" height="82" name="Trim" width="90" x="447" y="85">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Country"/>
</operator>
<connect from_op="Retrieve CountryAndGPName" from_port="output" to_op="Trim" to_port="example set input"/>
<connect from_op="Trim" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Answers
-
I'm not on my regular machine so I can't import the XML but one note of caution. Trim only works with a polynominal data type. If you have spaces with numbers, then I'd suggest converting them to polynominals, then applying Trim, and then converting back to numericals.
0 -
Hi,
it looks like the whitespaces in front of your data points are not real whitespaces. When importing it with UTF-8, I get this weird symbol, indicating that there is some kind of character that is not recognizable. Unless you know exactly what this character is, I think the simplest way would be to use the "Replace" operator with some Regex function. See, if the one below works for you:
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" breakpoints="before" class="replace" compatibility="7.6.001" expanded="true" height="82" name="Replace" width="90" x="313" y="136">
<parameter key="replace_what" value="[^\u0000-\u007F]+"/>
</operator>
<connect from_op="Replace" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
haha @FBT I was working on the same thing at the same time. It's a   character (unicode %C2%A0). Trim will not take care of this but this will do the trick.
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve CountryAndGPName (2)" width="90" x="246" y="85">
<parameter key="repository_entry" value="//Google Drive/RapidMiner/CountryAndGPName"/>
</operator>
<operator activated="true" class="web:encode_urls" compatibility="7.3.000" expanded="true" height="82" name="Encode URLs" width="90" x="380" y="85">
<parameter key="url_attribute" value="Country"/>
</operator>
<operator activated="true" class="replace" compatibility="7.6.001" expanded="true" height="82" name="Replace" width="90" x="514" y="85">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Country"/>
<parameter key="replace_what" value="%C2%A0"/>
</operator>
<connect from_op="Retrieve CountryAndGPName (2)" from_port="output" to_op="Encode URLs" to_port="example set input"/>
<connect from_op="Encode URLs" from_port="example set output" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>Scott
1