[Solved]Removing a truncated URL from a full URL
Kate_Strydom
New Altair Community Member
Good Morning,
Does anyone know how to remove a truncated URL from a full URL in RapidMiner?
I have two attributes:
truncated URL full URL
http://www.unisa.ac.za www.unisa.ac.za/news/index.php/2014/03/changes-to-mayjune-2014-examination-period/
I would like the new attribute to have
/news/index.php/2014/03/changes-to-mayjune-2014-examination-period
I have converted the attributes from nominal to text attributes.
Thanks for the help.
Regards
Kate
Does anyone know how to remove a truncated URL from a full URL in RapidMiner?
I have two attributes:
truncated URL full URL
http://www.unisa.ac.za www.unisa.ac.za/news/index.php/2014/03/changes-to-mayjune-2014-examination-period/
I would like the new attribute to have
/news/index.php/2014/03/changes-to-mayjune-2014-examination-period
I have converted the attributes from nominal to text attributes.
Thanks for the help.
Regards
Kate
Tagged:
0
Answers
-
Hi,
I have created a small process which can do that. Note that in your example your truncated url starts with a http:// while your full url does not. I have assumed your actual data does not contain different prefixes. If you need to remove the http:// first, just add another "Replace" operator after retrieving your data and replace 'http://' with an empty string.
Regards,
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.2.000-SNAPSHOT">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.2.000-SNAPSHOT" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="6.2.000-SNAPSHOT" expanded="true" height="60" name="Retrieve data" width="90" x="45" y="30">
<parameter key="repository_entry" value="data"/>
</operator>
<operator activated="true" class="loop_examples" compatibility="6.2.000-SNAPSHOT" expanded="true" height="76" name="Loop Examples" width="90" x="179" y="30">
<parameter key="iteration_macro" value="index"/>
<process expanded="true">
<operator activated="true" class="extract_macro" compatibility="6.2.000-SNAPSHOT" expanded="true" height="60" name="Extract Macro" width="90" x="112" y="30">
<parameter key="macro" value="to_replace"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="attribute_name" value="short_url"/>
<parameter key="example_index" value="%{index}"/>
<list key="additional_macros"/>
</operator>
<operator activated="true" class="replace" compatibility="6.2.000-SNAPSHOT" expanded="true" height="76" name="Replace" width="90" x="246" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="full_url"/>
<parameter key="replace_what" value="%{to_replace}"/>
</operator>
<connect from_port="example set" to_op="Extract Macro" to_port="example set"/>
<connect from_op="Extract Macro" from_port="example set" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_port="example set"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_example set" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve data" from_port="output" to_op="Loop Examples" to_port="example set"/>
<connect from_op="Loop Examples" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Marco0 -
Hi Marco,
Thanks so much. The code works well.
Regards
Kate0