"FP_Growth : must_contain"
rukawa10
New Altair Community Member
Hi,
I use RapidMiner 5.0, and I want to do FPGrowth with parameter "must_contain". I hope that this parameter helps the algorithm (FP-Growth) use less memory.
So, I use Golf data set which is a sample data set coming with RapidMiner. Then, I set must_contain to "CAR = true".
The result is every item set contains CAR = true at the rightmost. It seems this parameter (must_contain) do nothing but
bring its value showed in the final result by attaching every frequent item set.
Does anyone give me some advice?
Thank you in advance.
I use RapidMiner 5.0, and I want to do FPGrowth with parameter "must_contain". I hope that this parameter helps the algorithm (FP-Growth) use less memory.
So, I use Golf data set which is a sample data set coming with RapidMiner. Then, I set must_contain to "CAR = true".
The result is every item set contains CAR = true at the rightmost. It seems this parameter (must_contain) do nothing but
bring its value showed in the final result by attaching every frequent item set.
Does anyone give me some advice?
Thank you in advance.
0
Answers
-
Hi,
this is a known bug that will vanish with the next update.
Greetings,
Sebastian0 -
What is expected output of FPGrowth in the described case?
RM 5.2, Transactions data set, sample process "25_FPGrowth", change must_contain to "CAR = true"
Returns a single item set: 1 0.667 CAR = true
Is there any way to return all item sets when at least one of the items matches the pattern? (This is what I expected must_contain would do).
In other words I want to mimic UI 'Contains Item' behavior and get:
1 0.667 CAR = true
2 0.333 CAR = true APPARTEMENT = true
2 0.333 CAR = true VILLA = true
2 0.333 CAR = true RICH = true
2 0.333 CAR = true AVERAGE = true
3 0.333 CAR = true APPARTEMENT = true AVERAGE = true
3 0.333 CAR = true VILLA = true RICH = true
Thank you,
Bemoose0 -
Sebastian, anybody else,
Any clarification on the expected behavior of must_contain is appreciated.
Thank you,
Bemoose0 -
Hi,
there was still a bug in FP-Growth in combination with must_contain. It has been fixed and the fix will be included in the next version.
Best, Marius0 -
Hi Marius.
Which version with the fix did you refer in your last post?
I upgraded to 5.2.008. must_contain does not seem to work right.
Thanks,
Bemoose0 -
Hi,
FP Growth was patched with version 5.2.008.
Best,
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.009">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.0.000" expanded="true" name="Root">
<description><p>This process uses two important preprocessing operators: First the frequency discretization operator, which discretizes numerical attributes by putting the values into bins of equal size. Second, the filter operator nominal to binominal creates for each possible nominal value of a polynominal attribute a new binominal (binary) feature which is true if the example had the particular nominal value.</p><p>These preprocessing operators are necessary since particular learning schemes can not handle attributes of certain value types. For example, the very efficient frequent item set mining operator FPGrowth used in this process setup can only handle binominal features and no numerical or polynominal ones.</p><p>The next operator is the frequent item set mining operator FPGrowth. This operator efficiently calculates attribute value sets often occuring together. From these so called frequent item sets the most confident rules are calculated. with the association rule generator.</p> <p> The result will be displayed in a rule browser where desired conclusion can be selected in a selection list on the left side. As for all other tables available in RapidMiner you can sort the columns by clicking on the column header. Pressing CTRL during these clicks allows the selection for up to three sorting columns. </p> </description>
<parameter key="logverbosity" value="warning"/>
<process expanded="true" height="584" width="918">
<operator activated="true" class="retrieve" compatibility="5.0.000" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
<parameter key="repository_entry" value="../../data/Transactions"/>
</operator>
<operator activated="true" class="nominal_to_binominal" compatibility="5.0.000" expanded="true" height="94" name="Nominal2Binominal" width="90" x="180" y="30">
<parameter key="transform_binominal" value="true"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.0.000" expanded="true" height="76" name="AttributeFilter" width="90" x="315" y="30">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value=".*true.*"/>
</operator>
<operator activated="true" class="multiply" compatibility="5.2.009" expanded="true" height="94" name="Multiply" width="90" x="447" y="120"/>
<operator activated="true" class="fp_growth" compatibility="5.0.000" expanded="true" height="76" name="With must contain" width="90" x="581" y="30">
<parameter key="must_contain" value="CAR = true"/>
</operator>
<operator activated="true" class="fp_growth" compatibility="5.2.009" expanded="true" height="76" name="Without must contain" width="90" x="581" y="165"/>
<connect from_op="Retrieve" from_port="output" to_op="Nominal2Binominal" to_port="example set input"/>
<connect from_op="Nominal2Binominal" from_port="example set output" to_op="AttributeFilter" to_port="example set input"/>
<connect from_op="AttributeFilter" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="With must contain" to_port="example set"/>
<connect from_op="Multiply" from_port="output 2" to_op="Without must contain" to_port="example set"/>
<connect from_op="With must contain" from_port="frequent sets" to_port="result 1"/>
<connect from_op="Without must contain" from_port="frequent sets" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="18"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Nils0 -
Hi Bemoose,
what exactly does not work? Can you provide a sample process and sample data which cause the problem?
Best,
Marius0 -
RM 5.2.008, Transactions data set, sample process "25_FPGrowth",Marius wrote:
Hi Bemoose,
what exactly does not work? Can you provide a sample process and sample data which cause the problem?
Best,
Marius
1. Change must_contain in FPGrowth operator to "CAR = true"
Throws exception
Aug 21, 2012 5:02:40 PM SEVERE: Process failed: operator cannot be executed. Check the log messages...
Aug 21, 2012 5:02:40 PM SEVERE: Here: Root[1] (Process)
subprocess 'Main Process'
+- Retrieve[1] (Retrieve)
+- Nominal2Binominal[1] (Nominal to Binominal)
+- AttributeFilter[1] (Select Attributes)
+- FPGrowth[1] (FP-Growth)
==> +- AssociationRuleGenerator[1] (Create Association Rules)
Aug 21, 2012 5:02:40 PM SEVERE: java.lang.NullPointerException
2. Change must_contain in FPGrowth operator to to "CAR"
Erroneously (?) returns FrequentItemSets that do not have CAR. E.g. "VILLA = true"0 -
Hi,
the FPGrowth Operators works as it should as you can see in the process I have posted above.
You have to set the must_contain parameter to CAR = true and you will get FrequentItemSets that contain CAR = true.
The problem you describe concerns the AssociationRuleGenerator Operator. We are aware of it and hopefully will provide a bugfix with the next release.
Best,
Nils0 -
Nils, you are right, thanks for the clarification. Please post here when AssociationRuleGenerator is fixed.
What about my case #2? It looks like in this case FPgrowth should not return anything but it returns all item sets. To make it more clear we can set must_contain in FPGrowth operator to "nomatch". It will bring all the sets. Minor though.0 -
If must_contain does not match anything it is ignored, just as if it were not set at all. I admit that this may not be the best behavior, but we left it like that for historical reasons. Maybe we'll change the behavior in a feature release.
Best, Marius0