"[SOLVED] Filter data from examples set"

tomkowski
New Altair Community Member
Hi,
I'm beginner in the RapidMiner, so in my first step I try to extract some data from Access database, do some operations and display it for the end.
I'm stopped at the point how to select some data from the data set.
What I do: make repository with data from MS Access, Select attributes - two columns A and B with text, next Generate Attributes - column C where are joined strings from A and B. All columns contains words (text). For example, column A: "Gurund", column B: "Corporation" and column C: "Gurund Corporation". Of course, at column B value are not only "Corporation". There are many different values also.
Next I would like to filter rows where can find word "Corporation" only and display it. I try different Operators like Filter Documents or Filter Examples,, but I not found anyone which help me. Can you write any suggestion?
I'm beginner in the RapidMiner, so in my first step I try to extract some data from Access database, do some operations and display it for the end.
I'm stopped at the point how to select some data from the data set.
What I do: make repository with data from MS Access, Select attributes - two columns A and B with text, next Generate Attributes - column C where are joined strings from A and B. All columns contains words (text). For example, column A: "Gurund", column B: "Corporation" and column C: "Gurund Corporation". Of course, at column B value are not only "Corporation". There are many different values also.
Next I would like to filter rows where can find word "Corporation" only and display it. I try different Operators like Filter Documents or Filter Examples,, but I not found anyone which help me. Can you write any suggestion?
0
Answers
-
try operator Filter examples
condition class: Attribute value filer
parameter string: B="Corporation"0 -
Thank you for your answer.
I try this operator, but problem is that column B (or A too) value may be one or more word. For example, in column B "Corporation Europe" or "Corp." which is the same for me. I think the best solution will be an operator with regular expression, but I can't find something similar to Filter Examples with regexp. Or maybe I don't know how to write correct expression for Filter Examples operator.0 -
Heya,
a rework of the Filter Examples operator is planned. Until then you have to use a workaround with Generate Attributes: it checks a condition and creates a new indicator attribute, on which you can then apply Filter Examples.
Please have a look at the attached process.
Best, Marius<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.005">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.005" expanded="true" name="Process">
<process expanded="true" height="116" width="681">
<operator activated="true" class="generate_nominal_data" compatibility="5.2.005" expanded="true" height="60" name="Generate Nominal Data" width="90" x="112" y="30">
<parameter key="number_of_attributes" value="1"/>
</operator>
<operator activated="true" class="replace" compatibility="5.2.005" expanded="true" height="76" name="Replace" width="90" x="246" y="30">
<parameter key="replace_what" value="value0"/>
<parameter key="replace_by" value="Car Truck Moto"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.2.005" expanded="true" height="76" name="Generate Attributes" width="90" x="380" y="30">
<list key="function_descriptions">
<parameter key="indicator" value="matches(att1, ".*Truck.*")"/>
</list>
</operator>
<operator activated="true" class="filter_examples" compatibility="5.2.005" expanded="true" height="76" name="Filter Examples" width="90" x="514" y="30">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="indicator=true"/>
</operator>
<connect from_op="Generate Nominal Data" from_port="output" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
hi ttomkowski wrote:
Thank you for your answer.
I try this operator, but problem is that column B (or A too) value may be one or more word. For example, in column B "Corporation Europe" or "Corp." which is the same for me. I think the best solution will be an operator with regular expression, but I can't find something similar to Filter Examples with regexp. Or maybe I don't know how to write correct expression for Filter Examples operator.
I think I used some filtering with reg exp before to filter examples CONTAINING a word.
here are RM regular expressions
http://rapid-i.com/wiki/index.php?title=Regular_expressions
I am not sure if the reg exp work in filter examples attribute_value_filter, try.
If not they definitely work in Generate attrib as marius suggested.
good luck
0 -
Hi All,
Thanks Marius for your suggestion. I try and play with the Generate Attributes operator and I received desired result.0 -
It shows all rows which contain ".... Truck....", what if we want to check two words come together ? For example "Truck" and "car" come together or with 1~4 words in between. Eg: "...truck ,(some words), car... "Marius wrote:
Heya,
a rework of the Filter Examples operator is planned. Until then you have to use a workaround with Generate Attributes: it checks a condition and creates a new indicator attribute, on which you can then apply Filter Examples.
Please have a look at the attached process.
Best, Marius<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.005">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.005" expanded="true" name="Process">
<process expanded="true" height="116" width="681">
<operator activated="true" class="generate_nominal_data" compatibility="5.2.005" expanded="true" height="60" name="Generate Nominal Data" width="90" x="112" y="30">
<parameter key="number_of_attributes" value="1"/>
</operator>
<operator activated="true" class="replace" compatibility="5.2.005" expanded="true" height="76" name="Replace" width="90" x="246" y="30">
<parameter key="replace_what" value="value0"/>
<parameter key="replace_by" value="Car Truck Moto"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.2.005" expanded="true" height="76" name="Generate Attributes" width="90" x="380" y="30">
<list key="function_descriptions">
<parameter key="indicator" value="matches(att1, ".*Truck.*")"/>
</list>
</operator>
<operator activated="true" class="filter_examples" compatibility="5.2.005" expanded="true" height="76" name="Filter Examples" width="90" x="514" y="30">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="indicator=true"/>
</operator>
<connect from_op="Generate Nominal Data" from_port="output" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0 -
Hi,
easily done with the Filter Examples operator in Studio 6.3, you just specify the words you want, then at the bottom if they must ALL be included or if ANY occurrence is sufficient.
Regards,
Marco0