Association Rules
Spark22
New Altair Community Member
Hi there,
First of all let me apologize for my poor english. I´m from germany and english isn´t one of my best skills.
I´m totally new to RapidMiner and I have to do a little homework for my university. I have to analyse 100k datasets for association rules. Those datasets aren´t really complex, but I have zero experience in working with RapidMiner. One dataset consists of one custommer ID, one article ID and an integer variable between 0 and 2 with the translation: 0 = article was watched, 1 = article was placed in the shopping basket and 2 = article was puchased. Here are some examples:
ID article action
1 23 0
1 92 0
1 40 2
2 92 1
2 12 0
In this example customer 1 watched articles 23 and 92 and then bought article 40. Customer 2 put article 92 in his basket und watched article 12. So one customer can only choose one of the three actions for one article. There are no datasets with equal ID´s and equal article ID´s but different action. There will always be just one action for any article a customer looked at. I hope you get the idea of those datasets.
Now I have to find some rules like: If article 23 and article 92 were watched, article 40 was bought. But I have no clou how to find those rules with my RapidMiner community edition 5.3. My datasets are located in an excel file, so I read them and since the FP-Growth algorithm needs binomial attributes, I converted every attribut into binomial. At the end I connected the FP-Growth operator withe the Create Association Rules Operator. But the result I get are totally wrong. They ary like: If Article_ID Then Action or If Acticle_ID and customer_ID Then action. Instead I would like to get rules like: If article_ID = 23 and action = 0, article_ID = 92 and action = 0 then article_ID = 40, action = 2.
Can anybody explain to me how association rules in RapidMiner work? Do I have to transform my datasets or are there any operator configurations I have to do. Please halp me.
Btw. Again, sorry for my bad english.
First of all let me apologize for my poor english. I´m from germany and english isn´t one of my best skills.
I´m totally new to RapidMiner and I have to do a little homework for my university. I have to analyse 100k datasets for association rules. Those datasets aren´t really complex, but I have zero experience in working with RapidMiner. One dataset consists of one custommer ID, one article ID and an integer variable between 0 and 2 with the translation: 0 = article was watched, 1 = article was placed in the shopping basket and 2 = article was puchased. Here are some examples:
ID article action
1 23 0
1 92 0
1 40 2
2 92 1
2 12 0
In this example customer 1 watched articles 23 and 92 and then bought article 40. Customer 2 put article 92 in his basket und watched article 12. So one customer can only choose one of the three actions for one article. There are no datasets with equal ID´s and equal article ID´s but different action. There will always be just one action for any article a customer looked at. I hope you get the idea of those datasets.
Now I have to find some rules like: If article 23 and article 92 were watched, article 40 was bought. But I have no clou how to find those rules with my RapidMiner community edition 5.3. My datasets are located in an excel file, so I read them and since the FP-Growth algorithm needs binomial attributes, I converted every attribut into binomial. At the end I connected the FP-Growth operator withe the Create Association Rules Operator. But the result I get are totally wrong. They ary like: If Article_ID Then Action or If Acticle_ID and customer_ID Then action. Instead I would like to get rules like: If article_ID = 23 and action = 0, article_ID = 92 and action = 0 then article_ID = 40, action = 2.
Can anybody explain to me how association rules in RapidMiner work? Do I have to transform my datasets or are there any operator configurations I have to do. Please halp me.
Btw. Again, sorry for my bad english.
Tagged:
0
Answers
-
I'd first think of concatenating "article" and "action" (23_0, 92_0, 40_2, ...). Then e.g. use the template from your samples repository (samples/processes/01_learner/25_fpgrowth) plus a final "Item Sets to Data" and export the result to excel, where you can filter easily without java coding etc.0
-
Thanks for your answer. But I don't get why concatenating is helpful in this situation? I also looked at the template and didn't understand which attribut i had to filter. Cann you help me with that?0
-
Hmm here is a quick&dirty example process:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
You get rules like "If article = 76 and action = 2 then article = 46 and action = 2" and so on.
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_transaction_data" compatibility="5.3.008" expanded="true" height="60" name="Generate Transaction Data" width="90" x="45" y="30">
<parameter key="number_transactions" value="100000"/>
</operator>
<operator activated="true" class="rename" compatibility="5.3.008" expanded="true" height="76" name="Rename" width="90" x="179" y="30">
<parameter key="old_name" value="Id"/>
<parameter key="new_name" value="ID"/>
<list key="rename_additional_attributes">
<parameter key="Item" value="article"/>
</list>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Generate Attributes" width="90" x="313" y="30">
<list key="function_descriptions">
<parameter key="action" value="round(rand()*2+1)"/>
</list>
</operator>
<operator activated="true" class="generate_concatenation" compatibility="5.3.008" expanded="true" height="76" name="CONCAT" width="90" x="447" y="30">
<parameter key="first_attribute" value="article"/>
<parameter key="second_attribute" value="action"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Attributes" width="90" x="581" y="30">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Amount||ID|article_action"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="pivot" compatibility="5.3.008" expanded="true" height="76" name="Pivot" width="90" x="112" y="165">
<parameter key="group_attribute" value="ID"/>
<parameter key="index_attribute" value="article_action"/>
</operator>
<operator activated="true" class="replace_missing_values" compatibility="5.3.008" expanded="true" height="94" name="Replace Missing Values" width="90" x="246" y="165">
<parameter key="default" value="zero"/>
<list key="columns"/>
</operator>
<operator activated="true" class="numerical_to_binominal" compatibility="5.3.008" expanded="true" height="76" name="Numerical to Binominal" width="90" x="380" y="165"/>
<operator activated="true" class="fp_growth" compatibility="5.3.008" expanded="true" height="76" name="FP-Growth" width="90" x="112" y="300">
<parameter key="min_support" value="0.1"/>
<parameter key="max_items" value="2"/>
</operator>
<operator activated="true" class="create_association_rules" compatibility="5.3.008" expanded="true" height="76" name="Create Association Rules" width="90" x="246" y="300">
<parameter key="min_confidence" value="0.1"/>
</operator>
<operator activated="true" class="item_sets_to_data" compatibility="5.3.008" expanded="true" height="76" name="Item Sets to Data" width="90" x="380" y="300"/>
<operator activated="true" class="write_excel" compatibility="5.3.008" expanded="true" height="76" name="Write Excel" width="90" x="313" y="390">
<parameter key="excel_file" value="c:\itemsets.xls"/>
</operator>
<operator activated="true" class="split" compatibility="5.3.008" expanded="true" height="76" name="Split" width="90" x="447" y="390">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Items"/>
</operator>
<operator activated="true" class="filter_examples" compatibility="5.3.008" expanded="true" height="76" name="Filter Examples" width="90" x="581" y="390">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="Size=2"/>
</operator>
<connect from_op="Generate Transaction Data" from_port="output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="CONCAT" to_port="example set input"/>
<connect from_op="CONCAT" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Pivot" to_port="example set input"/>
<connect from_op="Pivot" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
<connect from_op="Numerical to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
<connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
<connect from_op="Create Association Rules" from_port="item sets" to_op="Item Sets to Data" to_port="frequent item sets"/>
<connect from_op="Item Sets to Data" from_port="example set" to_op="Write Excel" to_port="input"/>
<connect from_op="Write Excel" from_port="through" to_op="Split" to_port="example set input"/>
<connect from_op="Split" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0 -
I don´t get the rules. After testing your Process, i opened the new Excel File, but I simply don´t understand how to read those rules.
Items Size Frequenzy Support Score
Amount_Item 72_1.0 1,0 156,0 ,2 1,0
Amount_Item 72_3.0 1,0 155,0 ,2 1,0
Amount_Item 25_1.0 1,0 151,0 ,2 1,0
Amount_Item 76_2.0, Amount_Item 46_2.0 2,0 319,0 ,3 1,0
Amount_Item 76_2.0, Amount_Item 53_2.0 2,0 335,0 ,3 1,1
Amount_Item 76_2.0, Amount_Item 31_2.0 2,0 328,0 ,3 1,1
Amount_Item 76_2.0, Amount_Item 3_2.0 2,0 326,0 ,3 1,1
Here are some examples. And I don´t know what the 3 in Amount_Item 72_3.0 stands for.
Can you please help me one last time?
0 -
Sure. It stands for "article 72 was purchased". (I mapped view/cart/purchase to 1,2,3 instead of 0,1,2).And I don´t know what the 3 in Amount_Item 72_3.0 stands for.
Can you please help me0 -
Ah, ok. But I still wonder how to read those Rules. What does:
Items Size Frequenzy Support Score
Amount_Item 72_1.0 1,0 156,0 ,2 1,0
mean?0 -
It's no "rule" but just an info on that item. IMHO, a rule has a Size >= 2. (That's the reason why I used "Filter Examples" in the example process above.) Frequenzy [sic] and Support tell you how often that item or rule occurs (156 times = .2 = 20% of customers). Score is a measure for how interesting the rule is; I dunno how it's calculated. You propably have to look into the source code.
Please note that every operator offers a detailed help that will explain most of the things.0