"Association rules"
bkruger
New Altair Community Member
Hi,
I have built a model with the FP-Growth and Association rules operators on training data. Can anyone please show me an example of how to apply the results back to a new (test) set of data?
Thanks
BK
I have built a model with the FP-Growth and Association rules operators on training data. Can anyone please show me an example of how to apply the results back to a new (test) set of data?
Thanks
BK
Tagged:
0
Answers
-
I searched a bit and found this to be really easy and simple. Interpreting the results on the other hand takes a bit more thinking.0
-
Hi there,
I'm developing an Association Rules generator in CUDA, and use RM to compare results; in order to convert RM Association Rules to an example set I use a Groovy script, like this...<?xml version="1.0" encoding="UTF-8" standalone="no"?>
I'm also eager to hear from anybody with packages other than RM, such as Clementine, as I have a frequent item set benchmarking process and data which I would like to time on other platforms.
<process version="5.1.006">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.0.000" expanded="true" name="Root">
<parameter key="logverbosity" value="warning"/>
<process expanded="true" height="217" width="745">
<operator activated="true" class="retrieve" compatibility="5.0.000" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="subprocess" compatibility="5.0.000" expanded="true" height="76" name="Preprocessing" width="90" x="180" y="30">
<process expanded="true">
<operator activated="true" class="discretize_by_frequency" compatibility="5.0.000" expanded="true" name="FrequencyDiscretization">
<parameter key="number_of_bins" value="5"/>
</operator>
<operator activated="true" class="nominal_to_binominal" compatibility="5.0.000" expanded="true" name="Nominal2Binominal">
<parameter key="transform_binominal" value="true"/>
</operator>
<connect from_port="in 1" to_op="FrequencyDiscretization" to_port="example set input"/>
<connect from_op="FrequencyDiscretization" from_port="example set output" to_op="Nominal2Binominal" to_port="example set input"/>
<connect from_op="Nominal2Binominal" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="fp_growth" compatibility="5.0.000" expanded="true" height="76" name="FPGrowth" width="90" x="313" y="30">
<parameter key="find_min_number_of_itemsets" value="false"/>
<parameter key="min_support" value="0.1"/>
</operator>
<operator activated="true" class="create_association_rules" compatibility="5.0.000" expanded="true" height="76" name="AssociationRuleGenerator" width="90" x="313" y="165">
<parameter key="min_confidence" value="0.7"/>
</operator>
<operator activated="true" class="execute_script" compatibility="5.0.000" expanded="true" height="76" name="Execute Script" width="90" x="581" y="75">
<parameter key="script" value="import com.rapidminer.tools.Ontology; import com.rapidminer.operator.learner.associations.*; AssociationRules rules = input[0]; // construct attribute set Attribute[] attributes= new Attribute[11]; attributes[0] = AttributeFactory.createAttribute("Premise", Ontology.STRING); attributes[1] = AttributeFactory.createAttribute("Premise Items", Ontology.INTEGER); attributes[2] = AttributeFactory.createAttribute("Conclusion", Ontology.STRING); attributes[3] = AttributeFactory.createAttribute("Conclusion Items", Ontology.INTEGER); attributes[4] = AttributeFactory.createAttribute("Confidence", Ontology.REAL); attributes[5] = AttributeFactory.createAttribute("Conviction", Ontology.REAL); attributes[6] = AttributeFactory.createAttribute("Gain", Ontology.REAL); attributes[7] = AttributeFactory.createAttribute("Laplace", Ontology.REAL); attributes[8] = AttributeFactory.createAttribute("Lift", Ontology.REAL); attributes[9] = AttributeFactory.createAttribute("Ps", Ontology.REAL); attributes[10] = AttributeFactory.createAttribute("Total Support", Ontology.REAL); MemoryExampleTable table = new MemoryExampleTable(attributes); DataRowFactory ROW_FACTORY = new DataRowFactory(0); String[] strings= new String[11]; for (AssociationRule rule : rules) { 		// construct example data strings[0]=rule.toPremiseString(); strings[1]=rule.premise.size().toString(); strings[2]=rule.toConclusionString(); strings[3]=rule.conclusion.size().toString(); strings[4]=rule.getConfidence().toString(); strings[5]=rule.getConviction().toString(); strings[6]=rule.getGain().toString(); strings[7]=rule.getLaplace().toString(); strings[8]=rule.getLift().toString(); strings[9]=rule.getPs().toString(); strings[10]=rule.getTotalSupport().toString(); // make and add row DataRow row = ROW_FACTORY.create(strings, attributes); table.addDataRow(row);	 		} ExampleSet exampleSet = table.createExampleSet(); return exampleSet; "/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Preprocessing" to_port="in 1"/>
<connect from_op="Preprocessing" from_port="out 1" to_op="FPGrowth" to_port="example set"/>
<connect from_op="FPGrowth" from_port="frequent sets" to_op="AssociationRuleGenerator" to_port="item sets"/>
<connect from_op="AssociationRuleGenerator" from_port="rules" to_op="Execute Script" to_port="input 1"/>
<connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0 -
Hi,
Thanks for the response! I have never used Groovy script before, but I have added the part you showed me to my process. I get all the results as I think it is intended to be - compared to the Iris example, BUT it still doesn't want to write to either DB or CSV?
Any suggestions?
Thanks
BK0 -
Hi,
You need to deliver the results explicitly, like this..
Prior to execution the CSV writer cannot know that it will in fact get examples, so it posts an error. Just ignore it, as the process runs fine.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.006">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Root">
<parameter key="logverbosity" value="warning"/>
<process expanded="true" height="341" width="815">
<operator activated="true" class="retrieve" compatibility="5.1.006" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="subprocess" compatibility="5.1.006" expanded="true" height="76" name="Preprocessing" width="90" x="180" y="30">
<process expanded="true" height="463" width="896">
<operator activated="true" class="discretize_by_frequency" compatibility="5.1.006" expanded="true" height="94" name="FrequencyDiscretization" width="90" x="112" y="30">
<parameter key="number_of_bins" value="5"/>
</operator>
<operator activated="true" class="nominal_to_binominal" compatibility="5.1.006" expanded="true" height="94" name="Nominal2Binominal" width="90" x="470" y="30">
<parameter key="transform_binominal" value="true"/>
</operator>
<connect from_port="in 1" to_op="FrequencyDiscretization" to_port="example set input"/>
<connect from_op="FrequencyDiscretization" from_port="example set output" to_op="Nominal2Binominal" to_port="example set input"/>
<connect from_op="Nominal2Binominal" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="fp_growth" compatibility="5.1.006" expanded="true" height="76" name="FPGrowth" width="90" x="313" y="30">
<parameter key="find_min_number_of_itemsets" value="false"/>
<parameter key="min_support" value="0.1"/>
</operator>
<operator activated="true" class="create_association_rules" compatibility="5.1.006" expanded="true" height="76" name="AssociationRuleGenerator" width="90" x="447" y="30">
<parameter key="min_confidence" value="0.7"/>
</operator>
<operator activated="true" class="execute_script" compatibility="5.1.006" expanded="true" height="76" name="Execute Script" width="90" x="581" y="30">
<parameter key="script" value="import com.rapidminer.tools.Ontology; import com.rapidminer.operator.learner.associations.*; exampleSetOutput = operator.getOutputPorts().getPortByIndex(0); AssociationRules rules = input[0]; // construct attribute set Attribute[] attributes= new Attribute[11]; attributes[0] = AttributeFactory.createAttribute("Premise", Ontology.STRING); attributes[1] = AttributeFactory.createAttribute("Premise Items", Ontology.INTEGER); attributes[2] = AttributeFactory.createAttribute("Conclusion", Ontology.STRING); attributes[3] = AttributeFactory.createAttribute("Conclusion Items", Ontology.INTEGER); attributes[4] = AttributeFactory.createAttribute("Confidence", Ontology.REAL); attributes[5] = AttributeFactory.createAttribute("Conviction", Ontology.REAL); attributes[6] = AttributeFactory.createAttribute("Gain", Ontology.REAL); attributes[7] = AttributeFactory.createAttribute("Laplace", Ontology.REAL); attributes[8] = AttributeFactory.createAttribute("Lift", Ontology.REAL); attributes[9] = AttributeFactory.createAttribute("Ps", Ontology.REAL); attributes[10] = AttributeFactory.createAttribute("Total Support", Ontology.REAL); table = new MemoryExampleTable(attributes); ROW_FACTORY = new DataRowFactory(0); String[] strings= new String[11]; for (AssociationRule rule : rules) { 		// construct example data strings[0]=rule.toPremiseString(); strings[1]=rule.premise.size().toString(); strings[2]=rule.toConclusionString(); strings[3]=rule.conclusion.size().toString(); strings[4]=rule.getConfidence().toString(); strings[5]=rule.getConviction().toString(); strings[6]=rule.getGain().toString(); strings[7]=rule.getLaplace().toString(); strings[8]=rule.getLift().toString(); strings[9]=rule.getPs().toString(); strings[10]=rule.getTotalSupport().toString(); // make and add row DataRow row = ROW_FACTORY.create(strings, attributes); table.addDataRow(row);	 		} exampleSet = table.createExampleSet(); exampleSetOutput.deliver(exampleSet); "/>
</operator>
<operator activated="true" class="write_csv" compatibility="5.1.006" expanded="true" height="60" name="Write CSV" width="90" x="722" y="29">
<parameter key="csv_file" value="C:\Documents and Settings\Administrator.KNOWLEDG-P6715Y\My Documents\bla.csv"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Preprocessing" to_port="in 1"/>
<connect from_op="Preprocessing" from_port="out 1" to_op="FPGrowth" to_port="example set"/>
<connect from_op="FPGrowth" from_port="frequent sets" to_op="AssociationRuleGenerator" to_port="item sets"/>
<connect from_op="AssociationRuleGenerator" from_port="rules" to_op="Execute Script" to_port="input 1"/>
<connect from_op="Execute Script" from_port="output 1" to_op="Write CSV" to_port="input"/>
<connect from_op="Write CSV" from_port="through" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0 -
Hi,
in order to apply Association Rules, you can use the Association Rules Applier. It will create on column for each suggested item of a conclusion. The value in the column will depend on the confidence. If the Item is suggested by more than rule, the confidence has to be aggregated, a parameter controls how this is done.
@Haddock,
you ever thought of building an extension containing all this pieces of code as a regular operator?
Greetings,
Sebastian0 -
Thanks Haddock, I got it working. I also changed the Groovy script slightly and now it can write to the DB as well. The problem was with "conviction" as it was of type float and when it gets "infinity", SQL returns an error with a non-float data type.
Sebastian, thanks to you too, I have managed to apply the association rules, but my biggest problem still is how to run the FP-Growth and Association rules PER customer and not across the whole bunch of customers. Currently my SQL uses where customer = 'xyz' and then I repeat this for where customer = 'abc', but that is such a long and dumb way of doing it. I want to run one process and get the Association rules PER customer. Any ideas? I explained this problem in detail in my other post on Association rules in this forum.
Cheers
BK0 -
Ok, then we should refer the discussion to there...
Greetings,
Sebastian0