[Solved]Removing a truncated URL from a full URL

Kate_Strydom
Kate_Strydom New Altair Community Member
edited November 5 in Community Q&A
Good Morning,

Does anyone know how to remove a truncated URL from a full URL in RapidMiner?

I have two attributes:

truncated URL                                         full URL    
http://www.unisa.ac.za                            www.unisa.ac.za/news/index.php/2014/03/changes-to-mayjune-2014-examination-period/

I would like the new attribute to have
/news/index.php/2014/03/changes-to-mayjune-2014-examination-period

I have converted the attributes from nominal to text attributes.

Thanks for the help.

Regards
Kate
Tagged:

Answers

  • Marco_Boeck
    Marco_Boeck New Altair Community Member
    Hi,

    I have created a small process which can do that. Note that in your example your truncated url starts with a http:// while your full url does not. I have assumed your actual data does not contain different prefixes. If you need to remove the http:// first, just add another "Replace" operator after retrieving your data and replace 'http://' with an empty string.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.2.000-SNAPSHOT">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.2.000-SNAPSHOT" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="6.2.000-SNAPSHOT" expanded="true" height="60" name="Retrieve data" width="90" x="45" y="30">
            <parameter key="repository_entry" value="data"/>
          </operator>
          <operator activated="true" class="loop_examples" compatibility="6.2.000-SNAPSHOT" expanded="true" height="76" name="Loop Examples" width="90" x="179" y="30">
            <parameter key="iteration_macro" value="index"/>
            <process expanded="true">
              <operator activated="true" class="extract_macro" compatibility="6.2.000-SNAPSHOT" expanded="true" height="60" name="Extract Macro" width="90" x="112" y="30">
                <parameter key="macro" value="to_replace"/>
                <parameter key="macro_type" value="data_value"/>
                <parameter key="attribute_name" value="short_url"/>
                <parameter key="example_index" value="%{index}"/>
                <list key="additional_macros"/>
              </operator>
              <operator activated="true" class="replace" compatibility="6.2.000-SNAPSHOT" expanded="true" height="76" name="Replace" width="90" x="246" y="30">
                <parameter key="attribute_filter_type" value="single"/>
                <parameter key="attribute" value="full_url"/>
                <parameter key="replace_what" value="%{to_replace}"/>
              </operator>
              <connect from_port="example set" to_op="Extract Macro" to_port="example set"/>
              <connect from_op="Extract Macro" from_port="example set" to_op="Replace" to_port="example set input"/>
              <connect from_op="Replace" from_port="example set output" to_port="example set"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve data" from_port="output" to_op="Loop Examples" to_port="example set"/>
          <connect from_op="Loop Examples" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Regards,
    Marco
  • Kate_Strydom
    Kate_Strydom New Altair Community Member
    Hi Marco,

    Thanks so much. The code works well.

    Regards
    Kate