"Regular Expression search"

Railsman
Railsman New Altair Community Member
edited November 5 in Community Q&A
Hi all,

I am trying to filter rows in a dataset that contain characters such as ' | and @. Which operator is best suited to this and also how would this regular expression be be written?

Thanks in advance.
Tagged:

Answers

  • haddock
    haddock New Altair Community Member
    Greetings Railsman!

    In the following I generate examples with value1..value10 as attribute values throughout. Then value1 and value2 have the offending characters inserted. Finally a regex replacement finds those edits and sets them to missing values, and examples with missing values are trashed.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
          <location/>
        </output>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="341" width="815">
          <operator activated="true" class="generate_nominal_data" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="30">
            <parameter key="number_of_values" value="10"/>
          </operator>
          <operator activated="true" class="replace" expanded="true" height="76" name="Insert |" width="90" x="246" y="75">
            <parameter key="replace_what" value="value1"/>
            <parameter key="replace_by" value="val|ue1"/>
          </operator>
          <operator activated="true" class="replace" expanded="true" height="76" name="Insert @ width=90" x="380" y="75">
            <parameter key="replace_what" value="value2"/>
            <parameter key="replace_by" value="@value2"/&gt;
          </operator>
          <operator activated="true" class="replace" expanded="true" height="76" name="Replace | or @ width=90" x="581" y="120">
            <parameter key="replace_what" value=".*@.*|.*\|.*"/>
          </operator>
          <operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="715" y="165">
            <parameter key="condition_class" value="no_missing_attributes"/>
          </operator>
          <connect from_op="Generate Nominal Data" from_port="output" to_op="Insert |" to_port="example set input"/>
          <connect from_op="Insert |" from_port="example set output" to_op="Insert @ to_port=example set input"/>
          <connect from_op="Insert @ from_port=example set output" to_op="Replace | or @ to_port=example set input"/>
          <connect from_op="Replace | or @ from_port=example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="198"/>
          <portSpacing port="sink_result 2" spacing="90"/>
        </process>
      </operator>
    </process>
    If you play around with the ( new in V5 ) regex parameter editor you'll soon see just how useful, but impenetrable, regex can be.

  • Railsman
    Railsman New Altair Community Member
    Cool thanks Haddock!

    I have some experience with the old RapidMiner  versions and am still getting to grips with V5, but its a marvelous tool.