"FP_Growth : must_contain"

rukawa10
rukawa10 New Altair Community Member
edited November 5 in Community Q&A
Hi,

I use RapidMiner 5.0, and I want to do FPGrowth with parameter "must_contain". I hope that this parameter helps the algorithm (FP-Growth) use less memory.

So, I use Golf data set which is a sample data set coming with RapidMiner. Then, I set must_contain to "CAR = true".

The result is every item set contains CAR = true at the rightmost. It seems this parameter (must_contain) do nothing but
bring its value showed in the final result by attaching every frequent item set.

Does anyone give me some advice?

Thank you in advance.

Answers

  • land
    land New Altair Community Member
    Hi,
    this is a known bug that will vanish with the next update.

    Greetings,
      Sebastian
  • bmoose
    bmoose New Altair Community Member
    What is expected output of FPGrowth in the described case?

    RM 5.2, Transactions data set, sample process "25_FPGrowth", change must_contain to "CAR = true"
    Returns a single item set: 1 0.667 CAR = true

    Is there any way to return all item sets when at least one of the items matches the pattern? (This is what I expected must_contain would do).
    In other words I want to mimic UI 'Contains Item' behavior and get:
    1 0.667 CAR = true
    2 0.333 CAR = true APPARTEMENT = true
    2 0.333 CAR = true VILLA = true
    2 0.333 CAR = true RICH = true
    2 0.333 CAR = true AVERAGE = true
    3 0.333 CAR = true APPARTEMENT = true AVERAGE = true
    3 0.333 CAR = true VILLA = true RICH = true

    Thank you,
    Bemoose
  • bmoose
    bmoose New Altair Community Member
    Sebastian, anybody else,

    Any clarification on the expected behavior of must_contain is appreciated.

    Thank you,
    Bemoose
  • MariusHelf
    MariusHelf New Altair Community Member
    Hi,

    there was still a bug in FP-Growth in combination with must_contain. It has been fixed and the fix will be included in the next version.

    Best, Marius
  • bmoose
    bmoose New Altair Community Member
    Hi Marius.

    Which version with the fix did you refer in your last post?

    I upgraded to 5.2.008. must_contain does not seem to work right.

    Thanks,
    Bemoose
  • Nils_Woehler
    Nils_Woehler New Altair Community Member
    Hi,

    FP Growth was patched with version 5.2.008.


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.009">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.000" expanded="true" name="Root">
        <description>&lt;p&gt;This process uses two important preprocessing operators: First the frequency discretization operator,  which discretizes numerical attributes by putting the values into bins of equal size. Second, the filter operator nominal to binominal creates for each possible nominal value of a polynominal attribute a new binominal (binary) feature which is true if the example had the particular nominal value.&lt;/p&gt;&lt;p&gt;These preprocessing operators are necessary since particular learning schemes can not handle attributes  of certain value types. For example, the very efficient frequent item set mining operator FPGrowth used in this process setup can only handle binominal features and no numerical or polynominal ones.&lt;/p&gt;&lt;p&gt;The next operator is the frequent item set mining operator FPGrowth. This operator  efficiently calculates attribute value sets often occuring together. From these  so called frequent item sets the most confident rules are calculated. with the association rule generator.&lt;/p&gt; &lt;p&gt; The result will be displayed in a rule browser where desired conclusion can be selected in a selection list on the left side. As for all other tables available in RapidMiner you can sort the columns by clicking on the column header. Pressing CTRL during these clicks allows the selection for up to three sorting columns. &lt;/p&gt; </description>
        <parameter key="logverbosity" value="warning"/>
        <process expanded="true" height="584" width="918">
          <operator activated="true" class="retrieve" compatibility="5.0.000" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="../../data/Transactions"/>
          </operator>
          <operator activated="true" class="nominal_to_binominal" compatibility="5.0.000" expanded="true" height="94" name="Nominal2Binominal" width="90" x="180" y="30">
            <parameter key="transform_binominal" value="true"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.0.000" expanded="true" height="76" name="AttributeFilter" width="90" x="315" y="30">
            <parameter key="attribute_filter_type" value="regular_expression"/>
            <parameter key="regular_expression" value=".*true.*"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="5.2.009" expanded="true" height="94" name="Multiply" width="90" x="447" y="120"/>
          <operator activated="true" class="fp_growth" compatibility="5.0.000" expanded="true" height="76" name="With must contain" width="90" x="581" y="30">
            <parameter key="must_contain" value="CAR = true"/>
          </operator>
          <operator activated="true" class="fp_growth" compatibility="5.2.009" expanded="true" height="76" name="Without must contain" width="90" x="581" y="165"/>
          <connect from_op="Retrieve" from_port="output" to_op="Nominal2Binominal" to_port="example set input"/>
          <connect from_op="Nominal2Binominal" from_port="example set output" to_op="AttributeFilter" to_port="example set input"/>
          <connect from_op="AttributeFilter" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="With must contain" to_port="example set"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Without must contain" to_port="example set"/>
          <connect from_op="With must contain" from_port="frequent sets" to_port="result 1"/>
          <connect from_op="Without must contain" from_port="frequent sets" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="18"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    Best,
    Nils
  • MariusHelf
    MariusHelf New Altair Community Member
    Hi Bemoose,

    what exactly does not work? Can you provide a sample process and sample data which cause the problem?

    Best,
    Marius
  • bmoose
    bmoose New Altair Community Member
    Marius wrote:

    Hi Bemoose,

    what exactly does not work? Can you provide a sample process and sample data which cause the problem?

    Best,
    Marius
    RM 5.2.008, Transactions data set, sample process "25_FPGrowth",
    1. Change must_contain in FPGrowth operator to "CAR = true"
    Throws exception
    Aug 21, 2012 5:02:40 PM SEVERE: Process failed: operator cannot be executed. Check the log messages...
    Aug 21, 2012 5:02:40 PM SEVERE: Here:           Root[1] (Process)
              subprocess 'Main Process'
                +- Retrieve[1] (Retrieve)
                +- Nominal2Binominal[1] (Nominal to Binominal)
                +- AttributeFilter[1] (Select Attributes)
                +- FPGrowth[1] (FP-Growth)
          ==>   +- AssociationRuleGenerator[1] (Create Association Rules)
    Aug 21, 2012 5:02:40 PM SEVERE: java.lang.NullPointerException

    2. Change must_contain in FPGrowth operator to to "CAR"
    Erroneously (?) returns FrequentItemSets that do not have CAR. E.g. "VILLA = true"
  • Nils_Woehler
    Nils_Woehler New Altair Community Member
    Hi,

    the FPGrowth Operators works as it should as you can see in the process I have posted above.
    You have to set the must_contain parameter to CAR = true and you will get FrequentItemSets that contain CAR = true.
    The problem you describe concerns the AssociationRuleGenerator Operator. We are aware of it and hopefully will provide a bugfix with the next release.

    Best,
    Nils
  • bmoose
    bmoose New Altair Community Member
    Nils, you are right, thanks for the clarification. Please post here when AssociationRuleGenerator is fixed.

    What about my case #2? It looks like in this case FPgrowth should not return anything but it returns all item sets. To make it more clear we can set must_contain in FPGrowth operator to "nomatch". It will bring all the sets. Minor though.
  • MariusHelf
    MariusHelf New Altair Community Member
    If must_contain does not match anything it is ignored, just as if it were not set at all. I admit that this may not be the best behavior, but we left it like that for historical reasons. Maybe we'll change the behavior in a feature release.

    Best, Marius