🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"[SOLVED] How to Split an Attribute and Keep the Split Character?"

jan_kvacekUser: "jan_kvacek"
New Altair Community Member
Updated by Jocelyn
Hello!

I have a trubble with splitting attributes in Rapidminer Studio. My attribute looks like this:

"A002W0541G001"

I need to split it to several new attributes:

"A002"  "W0541"  "G001"  and so on.

But Split always dropps the character I use to determine where to split the original attribute. Is there any way to keep it?

Thank you for help!

Jan

Find more posts tagged with

Sort by:
1 - 6 of 61
    If its always 4 chars, 5 chars 5 chars you might simply use Generate Attributes with cut?
    Martin Schmitz wrote:

    If its always 4 chars, 5 chars 5 chars you might simply use Generate Attributes with cut?
    Unfortunately it is not. I need to do something like "find a letter, take the latter and all numbers behind it and make it new attribute"
    It sounds like you just need the right RegEx. 
    Assuming you have a pattern of [Letter+Numbers][Letter+Numbers] then this works: "(?<=[0-9]++)(.*?)(?=[A-Z])"
    Negative lookbehind to check there are numbers before, lookahead to check for the letter.  Anything inbetween is used to split.

    Sample process below:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.1.000-BETA">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="7.1.000-BETA" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <parameter key="parallelize_main_process" value="false"/>
        <process expanded="true">
          <operator activated="true" class="generate_data_user_specification" compatibility="7.1.000-BETA" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="112" y="85">
            <list key="attribute_values">
              <parameter key="myData" value="&quot;A002W0541G001&quot;"/>
            </list>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="generate_data_user_specification" compatibility="7.1.000-BETA" expanded="true" height="68" name="Generate Data by User Specification (2)" width="90" x="112" y="187">
            <list key="attribute_values">
              <parameter key="myData" value="&quot;A02202W0541G001G002231&quot;"/>
            </list>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="append" compatibility="7.1.000-BETA" expanded="true" height="103" name="Append" width="90" x="313" y="85">
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="merge_type" value="all"/>
          </operator>
          <operator activated="true" class="split" compatibility="7.1.000-BETA" expanded="true" height="82" name="Split" width="90" x="447" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="myData"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="split_pattern" value="(?&lt;=[0-9]++)(.*?)(?=[A-Z])"/>
            <parameter key="split_mode" value="ordered_split"/>
          </operator>
          <connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/>
          <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/>
          <connect from_op="Append" from_port="merged set" to_op="Split" to_port="example set input"/>
          <connect from_op="Split" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    On a side-note... anyone happen to know the right RegEx to split into n-grams?  I want one that splits a nominal value like "RapidMiner" into "Ra ap pi id dM Mi in ne er"... can you think of one?  When I try it I always get "Ra pi dM in er" which isn't right.  I wrote a rather complex loop to do it instead, but would prefer if could do it with one operator.
    JEdward wrote:

    It sounds like you just need the right RegEx. 
    Assuming you have a pattern of [Letter+Numbers][Letter+Numbers] then this works: "(?<=[0-9]++)(.*?)(?=[A-Z])"
    Negative lookbehind to check there are numbers before, lookahead to check for the letter.  Anything inbetween is used to split.
    This just does the thing! Thank you.
    very nice.  Thanks.  This is something I face often.  Maybe a feature request to simply add a checkbox option to keep the split text instead of removing it?  ;)