Input for AssociationRuleGenerator
Legacy User
New Altair Community Member
Hi,
I'm know to the RapidMiner, so the question might be dump, but I hope you'll help me anyway.
I want to use the AssociationRuleGenerator and I found the Tutorial on how to use it, but my Input Format is different and I'm not shure how to configure RapidMiner to work with my input.
I've two formats available:
1. 1NF
A CSV file with two columns. The first column contains the transaction-ID, the second the items.
Example:
2. Binary Bitmap
A CSV file with the transaction-ID and the items as columns. The values for the items are 0 and 1 to indicate wether the transaction contains the item or not.
Example:
Can anyone tell me how I can use one or both of the formats for generating association rules? I would personally prefer the first format, since we need to convert the data to get the second, but any solution which makes it work will really help me.
Thanks a lot in advance,
Stampede
I'm know to the RapidMiner, so the question might be dump, but I hope you'll help me anyway.
I want to use the AssociationRuleGenerator and I found the Tutorial on how to use it, but my Input Format is different and I'm not shure how to configure RapidMiner to work with my input.
I've two formats available:
1. 1NF
A CSV file with two columns. The first column contains the transaction-ID, the second the items.
Example:
TID | ITEM |
0 | 1 |
0 | 2 |
0 | 3 |
1 | 1 |
1 | 3 |
... | ... |
A CSV file with the transaction-ID and the items as columns. The values for the items are 0 and 1 to indicate wether the transaction contains the item or not.
Example:
TID | 1 | 2 | 3 |
0 | 1 | 1 | 1 |
1 | 1 | 0 | 1 |
... | ... | ... | ... |
Thanks a lot in advance,
Stampede
Tagged:
0
Answers
-
Hi,
you could simply use the second option: just load the data set with one of the file based input operators and transform the numbers into binominal values with the corresponding preprocessing operators. Then you can apply FPGrowth.
Cheers,
Ingo0 -
Hi,
thanks for the answer, but I knew that much from the tutorial. Maybe I'm just slow on this one, but I just can't find a fitting Input and Preprocessing Operator.
My main Problem is: I tried a lot of programs for Association Rule generation, but all of them interpreted the zeros as values and not as "false". So I got negative association rules. I hope that RapidMiner will solve this problem, since I read, that it can handle bitmaps.
If someone could just tell me the fitting input operator and, if needed, the correct preprocessor, that would be really great.
Thanks a lot,
Stampede0 -
Hi,
sorry for the incovinience, but I found a partially working solution.
I get association rules from my input data now, but the rules make no sense (on a manually generated example).
for example:
I have the (little bit stupid) example:
But I get rules like:CAR APPARTEMENT VILLA POOR AVERAGE RICH false true false true false false true true false false true false true false true false false true
CAR -> POOR
AVERAGE -> POOR
RICH -> POOR
VILLA -> POOR
CAR v APPARTEMENT v VILLA -> POOR
CAR v APPARTEMENT v VILLA v RICH -> POOR
And I don't know where this rules came from. I used CSVExampleSource, FPGroth and AssociationRuleGenerator. It worked without any configuration.
If anyone can tell me if I make any mistaces on this one or if I missed something (for example preprocessing, but I can't think of any), I would be really thankfull!!!!
Greetings,
Stampede0 -
Hi,
the reason is that you have to define which value should be regarded as "negative" and which value should be regarded as "positive". You can do this by using an .aml file and using the ExampleSource operator (I have attached the .aml file and the corresponding .dat file to this message). Then the result will be like you would expect it. Please note that the first value which is defined for each attribute in the .aml file will be seen as positive - in this case it is "true".
And here the generated rules:
Hope that helps,
[VILLA] --> [RICH] (confidence: 1.000)
[RICH] --> [VILLA] (confidence: 1.000)
[CAR] --> [VILLA] (confidence: 1.000)
[CAR] --> [RICH] (confidence: 1.000)
[APPARTEMENT] --> [POOR] (confidence: 1.000)
[CAR] --> [AVERAGE] (confidence: 1.000)
[APPARTEMENT] --> [AVERAGE] (confidence: 1.000)
[VILLA, POOR] --> [RICH] (confidence: 1.000)
[RICH, POOR] --> [VILLA] (confidence: 1.000)
[VILLA, AVERAGE] --> [RICH] (confidence: 1.000)
[RICH, AVERAGE] --> [VILLA] (confidence: 1.000)
[CAR] --> [VILLA, RICH] (confidence: 1.000)
[VILLA, CAR] --> [RICH] (confidence: 1.000)
[RICH, CAR] --> [VILLA] (confidence: 1.000)
...
Ingo
[attachment deleted by admin]0 -
Hi,
Thanks a lot for the answer and the help, but it still doesn't work. In your example result (I was able to produce the same results), the following rules are generated:
Since there is no transaction where someone is POOR and RICH or AVERAGE and RICH, this doesn't make sense. I'll keep trying, but I hope you might have some ideas.mierswa wrote:
[VILLA, POOR] --> [RICH] (confidence: 1.000)
[RICH, POOR] --> [VILLA] (confidence: 1.000)
[VILLA, AVERAGE] --> [RICH] (confidence: 1.000)
[RICH, AVERAGE] --> [VILLA] (confidence: 1.000)
Thanks a lot,
Stampede0 -
Hi,
thanks again for this note - I totally missed this. I tried FPGrowth on this data set after first applying the operator Nominal2Binominal and then the results seems to be correct:
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="C:\Dokumente und Einstellungen\Mierswa\Eigene Dateien\rm_workspace\fp_growth\transformed.aml"/>
</operator>
<operator name="Nominal2Binominal" class="Nominal2Binominal">
</operator>
<operator name="FPGrowth" class="FPGrowth">
<parameter key="find_min_number_of_itemsets" value="false"/>
<parameter key="min_support" value="0.3"/>
</operator>
<operator name="AssociationRuleGenerator" class="AssociationRuleGenerator">
<parameter key="gain_theta" value="0.0"/>
<parameter key="keep_frequent_item_sets" value="true"/>
</operator>
</operator>
The drawback however is that the rules also contain XXX=false items which are often not desired. We will check the behavior of FPGrowth but we will not manage this before the next release.
Thanks again and cheers,
Ingo0 -
Hi,
thank you very much for your help. It works now and I'll just try to remove the negative association rules afterwards.
Thanks again,
Stampede0 -
Hello,
you could also get rid of the ... = false features by removing them first with the FeatureNameRemoval or the new AttributeFilter operator. This should also reduce running time.
Cheers,
Ingo0