🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"[SOLVED] Filter data from examples set"

User: "tomkowski"
New Altair Community Member
Updated by Jocelyn
Hi,

I'm beginner in the RapidMiner, so in my first step I try to extract some data from Access database, do some operations and display it for the end.

I'm stopped at the point how to select some data from the data set.

What I do: make repository with data from MS Access, Select attributes - two columns A and B with text, next Generate Attributes - column C where are joined strings from A and B. All columns contains words (text). For example, column A: "Gurund", column B: "Corporation" and column C: "Gurund Corporation". Of course, at column B value are not only "Corporation". There are many different values also.

Next I would like to filter rows where can find word "Corporation" only and display it. I try different Operators like Filter Documents or Filter Examples,, but I not found anyone which help me. Can you write any suggestion?

Find more posts tagged with

Sort by:
1 - 7 of 71
    User: "frito"
    New Altair Community Member
    try operator Filter examples
    condition class:  Attribute value filer
    parameter string: B="Corporation"
    User: "tomkowski"
    New Altair Community Member
    OP
    Thank you for your answer.

    I try this operator, but problem is that column B (or A too) value may be one or more word. For example, in column B "Corporation Europe" or "Corp." which is the same for me. I think the best solution will be an operator with regular expression, but I can't find something similar to Filter Examples with regexp. Or maybe I don't know how to write correct expression for Filter Examples operator.
    User: "MariusHelf"
    New Altair Community Member
    Heya,

    a rework of the Filter Examples operator is planned. Until then you have to use a workaround with Generate Attributes: it checks a condition and creates a new indicator attribute, on which you can then apply Filter Examples.

    Please have a look at the attached process.

    Best, Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.005">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.005" expanded="true" name="Process">
        <process expanded="true" height="116" width="681">
          <operator activated="true" class="generate_nominal_data" compatibility="5.2.005" expanded="true" height="60" name="Generate Nominal Data" width="90" x="112" y="30">
            <parameter key="number_of_attributes" value="1"/>
          </operator>
          <operator activated="true" class="replace" compatibility="5.2.005" expanded="true" height="76" name="Replace" width="90" x="246" y="30">
            <parameter key="replace_what" value="value0"/>
            <parameter key="replace_by" value="Car Truck Moto"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.2.005" expanded="true" height="76" name="Generate Attributes" width="90" x="380" y="30">
            <list key="function_descriptions">
              <parameter key="indicator" value="matches(att1, &quot;.*Truck.*&quot;)"/>
            </list>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="5.2.005" expanded="true" height="76" name="Filter Examples" width="90" x="514" y="30">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="indicator=true"/>
          </operator>
          <connect from_op="Generate Nominal Data" from_port="output" to_op="Replace" to_port="example set input"/>
          <connect from_op="Replace" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    User: "frito"
    New Altair Community Member
    tomkowski wrote:

    Thank you for your answer.

    I try this operator, but problem is that column B (or A too) value may be one or more word. For example, in column B "Corporation Europe" or "Corp." which is the same for me. I think the best solution will be an operator with regular expression, but I can't find something similar to Filter Examples with regexp. Or maybe I don't know how to write correct expression for Filter Examples operator.
    hi t
    I think I used some filtering with reg exp before to filter examples  CONTAINING a word.
    here are RM regular expressions

    http://rapid-i.com/wiki/index.php?title=Regular_expressions

    I am not sure if the reg exp work in filter examples attribute_value_filter, try.
    If not they definitely work in Generate attrib as marius suggested.
    good luck

    User: "tomkowski"
    New Altair Community Member
    OP
    Hi All,

    Thanks Marius for your suggestion. I try and play with the Generate Attributes operator and I received desired result. 
    User: "zahrahnnx"
    New Altair Community Member
    Marius wrote:

    Heya,

    a rework of the Filter Examples operator is planned. Until then you have to use a workaround with Generate Attributes: it checks a condition and creates a new indicator attribute, on which you can then apply Filter Examples.

    Please have a look at the attached process.

    Best, Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.005">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.005" expanded="true" name="Process">
        <process expanded="true" height="116" width="681">
          <operator activated="true" class="generate_nominal_data" compatibility="5.2.005" expanded="true" height="60" name="Generate Nominal Data" width="90" x="112" y="30">
            <parameter key="number_of_attributes" value="1"/>
          </operator>
          <operator activated="true" class="replace" compatibility="5.2.005" expanded="true" height="76" name="Replace" width="90" x="246" y="30">
            <parameter key="replace_what" value="value0"/>
            <parameter key="replace_by" value="Car Truck Moto"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.2.005" expanded="true" height="76" name="Generate Attributes" width="90" x="380" y="30">
            <list key="function_descriptions">
              <parameter key="indicator" value="matches(att1, &quot;.*Truck.*&quot;)"/>
            </list>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="5.2.005" expanded="true" height="76" name="Filter Examples" width="90" x="514" y="30">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="indicator=true"/>
          </operator>
          <connect from_op="Generate Nominal Data" from_port="output" to_op="Replace" to_port="example set input"/>
          <connect from_op="Replace" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    It shows all rows which contain ".... Truck....", what if we want to check two words come together ? For example "Truck" and "car" come together or with 1~4 words in between. Eg: "...truck ,(some words), car... " 

    User: "Marco_Boeck"
    New Altair Community Member
    Hi,

    easily done with the Filter Examples operator in Studio 6.3, you just specify the words you want, then at the bottom if they must ALL be included or if ANY occurrence is sufficient.

    image

    Regards,
    Marco