"Association rules"

bkruger
bkruger New Altair Community Member
edited November 5 in Community Q&A
Hi,

I have built a model with the FP-Growth and Association rules operators on training data. Can anyone please show me an example of how to apply the results back to a new (test) set of data?

Thanks
BK

Answers

  • bkruger
    bkruger New Altair Community Member
    I searched a bit and found this to be really easy and simple. Interpreting the results on the other hand takes a bit more thinking.
  • haddock
    haddock New Altair Community Member
    Hi there,

    I'm developing an Association Rules generator in CUDA, and use RM to compare results; in order to convert RM Association Rules to an example set I use a Groovy script, like this...
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.000" expanded="true" name="Root">
        <parameter key="logverbosity" value="warning"/>
        <process expanded="true" height="217" width="745">
          <operator activated="true" class="retrieve" compatibility="5.0.000" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="subprocess" compatibility="5.0.000" expanded="true" height="76" name="Preprocessing" width="90" x="180" y="30">
            <process expanded="true">
              <operator activated="true" class="discretize_by_frequency" compatibility="5.0.000" expanded="true" name="FrequencyDiscretization">
                <parameter key="number_of_bins" value="5"/>
              </operator>
              <operator activated="true" class="nominal_to_binominal" compatibility="5.0.000" expanded="true" name="Nominal2Binominal">
                <parameter key="transform_binominal" value="true"/>
              </operator>
              <connect from_port="in 1" to_op="FrequencyDiscretization" to_port="example set input"/>
              <connect from_op="FrequencyDiscretization" from_port="example set output" to_op="Nominal2Binominal" to_port="example set input"/>
              <connect from_op="Nominal2Binominal" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="fp_growth" compatibility="5.0.000" expanded="true" height="76" name="FPGrowth" width="90" x="313" y="30">
            <parameter key="find_min_number_of_itemsets" value="false"/>
            <parameter key="min_support" value="0.1"/>
          </operator>
          <operator activated="true" class="create_association_rules" compatibility="5.0.000" expanded="true" height="76" name="AssociationRuleGenerator" width="90" x="313" y="165">
            <parameter key="min_confidence" value="0.7"/>
          </operator>
          <operator activated="true" class="execute_script" compatibility="5.0.000" expanded="true" height="76" name="Execute Script" width="90" x="581" y="75">
            <parameter key="script" value="import com.rapidminer.tools.Ontology;&#13;&#13;&#10;import com.rapidminer.operator.learner.associations.*;&#13;&#13;&#10;&#13;&#10;AssociationRules rules = input[0];&#13;&#13;&#10;&#10;&#13;// construct attribute set&#13;&#10;Attribute[] attributes= new Attribute[11];&#10;attributes[0] = AttributeFactory.createAttribute(&quot;Premise&quot;, Ontology.STRING);&#13;&#13;&#10;attributes[1] = AttributeFactory.createAttribute(&quot;Premise Items&quot;, Ontology.INTEGER);&#10;attributes[2] = AttributeFactory.createAttribute(&quot;Conclusion&quot;, Ontology.STRING);&#13;&#10;attributes[3] = AttributeFactory.createAttribute(&quot;Conclusion Items&quot;, Ontology.INTEGER);&#13;&#10;attributes[4] = AttributeFactory.createAttribute(&quot;Confidence&quot;, Ontology.REAL);&#13;&#10;attributes[5] = AttributeFactory.createAttribute(&quot;Conviction&quot;, Ontology.REAL);&#13;&#10;attributes[6] = AttributeFactory.createAttribute(&quot;Gain&quot;, Ontology.REAL);&#13;&#10;attributes[7] = AttributeFactory.createAttribute(&quot;Laplace&quot;, Ontology.REAL);&#13;&#13;&#10;attributes[8] = AttributeFactory.createAttribute(&quot;Lift&quot;, Ontology.REAL);&#13;&#10;attributes[9] = AttributeFactory.createAttribute(&quot;Ps&quot;, Ontology.REAL);&#10;&#13;&#13;attributes[10] = AttributeFactory.createAttribute(&quot;Total Support&quot;, Ontology.REAL);&#10;&#13;&#13;&#13;&#10;MemoryExampleTable table = new MemoryExampleTable(attributes);&#10;DataRowFactory ROW_FACTORY = new DataRowFactory(0);&#13;&#10;&#13;String[] strings= new String[11];&#13;&#10;&#10;for (AssociationRule rule : rules) {&#10;&#9;&#9;// construct example data&#10;        strings[0]=rule.toPremiseString();&#13;&#10;        strings[1]=rule.premise.size().toString();&#13;&#10;        strings[2]=rule.toConclusionString();&#13;&#10;        strings[3]=rule.conclusion.size().toString();&#13;&#10;        strings[4]=rule.getConfidence().toString();&#13;&#10;        strings[5]=rule.getConviction().toString();&#13;&#10;        strings[6]=rule.getGain().toString();&#13;&#10;        strings[7]=rule.getLaplace().toString();&#13;&#10;        strings[8]=rule.getLift().toString();&#13;&#10;&#13;        strings[9]=rule.getPs().toString();&#13;&#10;        strings[10]=rule.getTotalSupport().toString();&#13;&#13;&#10;        // make and add row&#13;&#10;        DataRow row = ROW_FACTORY.create(strings, attributes); &#13;&#10;        table.addDataRow(row);&#9;&#10;&#9;&#9;}&#10;&#13;&#10;ExampleSet exampleSet = table.createExampleSet();&#10;return exampleSet;&#10;"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Preprocessing" to_port="in 1"/>
          <connect from_op="Preprocessing" from_port="out 1" to_op="FPGrowth" to_port="example set"/>
          <connect from_op="FPGrowth" from_port="frequent sets" to_op="AssociationRuleGenerator" to_port="item sets"/>
          <connect from_op="AssociationRuleGenerator" from_port="rules" to_op="Execute Script" to_port="input 1"/>
          <connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    I'm also eager to hear from anybody with packages other than RM, such as Clementine, as I have a frequent item set benchmarking process and data which I would like to time on other platforms.

  • bkruger
    bkruger New Altair Community Member
    Hi,

    Thanks for the response! I have never used Groovy script before, but I have added the part you showed me to my process. I get all the results as I think it is intended to be - compared to the Iris example, BUT it still doesn't want to write to either DB or CSV?

    Any suggestions?

    Thanks
    BK
  • haddock
    haddock New Altair Community Member
    Hi,

    You need to deliver the results explicitly, like this..

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Root">
        <parameter key="logverbosity" value="warning"/>
        <process expanded="true" height="341" width="815">
          <operator activated="true" class="retrieve" compatibility="5.1.006" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="subprocess" compatibility="5.1.006" expanded="true" height="76" name="Preprocessing" width="90" x="180" y="30">
            <process expanded="true" height="463" width="896">
              <operator activated="true" class="discretize_by_frequency" compatibility="5.1.006" expanded="true" height="94" name="FrequencyDiscretization" width="90" x="112" y="30">
                <parameter key="number_of_bins" value="5"/>
              </operator>
              <operator activated="true" class="nominal_to_binominal" compatibility="5.1.006" expanded="true" height="94" name="Nominal2Binominal" width="90" x="470" y="30">
                <parameter key="transform_binominal" value="true"/>
              </operator>
              <connect from_port="in 1" to_op="FrequencyDiscretization" to_port="example set input"/>
              <connect from_op="FrequencyDiscretization" from_port="example set output" to_op="Nominal2Binominal" to_port="example set input"/>
              <connect from_op="Nominal2Binominal" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="fp_growth" compatibility="5.1.006" expanded="true" height="76" name="FPGrowth" width="90" x="313" y="30">
            <parameter key="find_min_number_of_itemsets" value="false"/>
            <parameter key="min_support" value="0.1"/>
          </operator>
          <operator activated="true" class="create_association_rules" compatibility="5.1.006" expanded="true" height="76" name="AssociationRuleGenerator" width="90" x="447" y="30">
            <parameter key="min_confidence" value="0.7"/>
          </operator>
          <operator activated="true" class="execute_script" compatibility="5.1.006" expanded="true" height="76" name="Execute Script" width="90" x="581" y="30">
            <parameter key="script" value="import com.rapidminer.tools.Ontology;&#13;&#13;&#10;import com.rapidminer.operator.learner.associations.*;&#13;&#13;&#10;&#13;exampleSetOutput = operator.getOutputPorts().getPortByIndex(0);&#10;AssociationRules rules = input[0];&#13;&#13;&#10;&#10;&#13;// construct attribute set&#13;&#10;Attribute[] attributes= new Attribute[11];&#10;attributes[0] = AttributeFactory.createAttribute(&quot;Premise&quot;, Ontology.STRING);&#13;&#13;&#10;attributes[1] = AttributeFactory.createAttribute(&quot;Premise Items&quot;, Ontology.INTEGER);&#10;attributes[2] = AttributeFactory.createAttribute(&quot;Conclusion&quot;, Ontology.STRING);&#13;&#10;attributes[3] = AttributeFactory.createAttribute(&quot;Conclusion Items&quot;, Ontology.INTEGER);&#13;&#10;attributes[4] = AttributeFactory.createAttribute(&quot;Confidence&quot;, Ontology.REAL);&#13;&#10;attributes[5] = AttributeFactory.createAttribute(&quot;Conviction&quot;, Ontology.REAL);&#13;&#10;attributes[6] = AttributeFactory.createAttribute(&quot;Gain&quot;, Ontology.REAL);&#13;&#10;attributes[7] = AttributeFactory.createAttribute(&quot;Laplace&quot;, Ontology.REAL);&#13;&#13;&#10;attributes[8] = AttributeFactory.createAttribute(&quot;Lift&quot;, Ontology.REAL);&#13;&#10;attributes[9] = AttributeFactory.createAttribute(&quot;Ps&quot;, Ontology.REAL);&#10;&#13;&#13;attributes[10] = AttributeFactory.createAttribute(&quot;Total Support&quot;, Ontology.REAL);&#10;&#13;&#13;&#13;&#10;&#10;table = new MemoryExampleTable(attributes);&#10;ROW_FACTORY = new DataRowFactory(0);&#13;&#10;&#13;String[] strings= new String[11];&#13;&#10;&#10;for (AssociationRule rule : rules) {&#10;&#9;&#9;// construct example data&#10;        strings[0]=rule.toPremiseString();&#13;&#10;        strings[1]=rule.premise.size().toString();&#13;&#10;        strings[2]=rule.toConclusionString();&#13;&#10;        strings[3]=rule.conclusion.size().toString();&#13;&#10;        strings[4]=rule.getConfidence().toString();&#13;&#10;        strings[5]=rule.getConviction().toString();&#13;&#10;        strings[6]=rule.getGain().toString();&#13;&#10;        strings[7]=rule.getLaplace().toString();&#13;&#10;        strings[8]=rule.getLift().toString();&#13;&#10;&#13;        strings[9]=rule.getPs().toString();&#13;&#10;        strings[10]=rule.getTotalSupport().toString();&#13;&#13;&#10;        // make and add row&#13;&#10;        DataRow row = ROW_FACTORY.create(strings, attributes); &#13;&#10;        table.addDataRow(row);&#9;&#10;&#9;&#9;}&#10;&#13;&#10;exampleSet = table.createExampleSet();&#10;exampleSetOutput.deliver(exampleSet);&#10;&#10;"/>
          </operator>
          <operator activated="true" class="write_csv" compatibility="5.1.006" expanded="true" height="60" name="Write CSV" width="90" x="722" y="29">
            <parameter key="csv_file" value="C:\Documents and Settings\Administrator.KNOWLEDG-P6715Y\My Documents\bla.csv"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Preprocessing" to_port="in 1"/>
          <connect from_op="Preprocessing" from_port="out 1" to_op="FPGrowth" to_port="example set"/>
          <connect from_op="FPGrowth" from_port="frequent sets" to_op="AssociationRuleGenerator" to_port="item sets"/>
          <connect from_op="AssociationRuleGenerator" from_port="rules" to_op="Execute Script" to_port="input 1"/>
          <connect from_op="Execute Script" from_port="output 1" to_op="Write CSV" to_port="input"/>
          <connect from_op="Write CSV" from_port="through" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Prior to execution the CSV writer cannot know that it will in fact get examples, so it posts an error. Just ignore it, as the process runs fine.

  • land
    land New Altair Community Member
    Hi,
    in order to apply Association Rules, you can use the Association Rules Applier. It will create on column for each suggested item of a conclusion. The value in the column will depend on the confidence. If the Item is suggested by more than rule, the confidence has to be aggregated, a parameter controls how this is done.

    @Haddock,
    you ever thought of building an extension containing all this pieces of code as a regular operator?

    Greetings,
    Sebastian
  • bkruger
    bkruger New Altair Community Member
    Thanks Haddock, I got it working. I also changed the Groovy script slightly and now it can write to the DB as well. The problem was with "conviction" as it was of type float and when it gets "infinity", SQL returns an error with a non-float data type.

    Sebastian, thanks to you too, I have managed to apply the association rules, but my biggest problem still is how to run the FP-Growth and Association rules PER customer and not across the whole bunch of customers. Currently my SQL uses where customer = 'xyz' and then I repeat this for where customer = 'abc', but that is such a long and dumb way of doing it. I want to run one process and get the Association rules PER customer. Any ideas? I explained this problem in detail in my other post on Association rules in this forum.

    Cheers
    BK
  • land
    land New Altair Community Member
    Ok, then we should refer the discussion to there...

    Greetings,
    Sebastian