how to use rapidminer to implement association rule modelling??
vilency
New Altair Community Member
sorry,i m a new comer .i confuse how to use association rule modelling. i have semibinary data ...as i read at tutorial.the process must retrieve->preprocessing->fp-growth->association rule. i already made data for retrieve(already import from excel) then i confuse how to make preprocessing?can you help me?thank you for your attention
idjualk226 k227 k228 k229 k230 k231 k232 k233 k237 k239
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 1 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0
7 0 0 0 0 0 1 0 0 0 0
8 0 0 0 0 0 1 0 0 1 0
9 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 1 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 1 0 0 0 0
13 0 0 0 0 0 1 0 0 0 0
14 0 0 0 1 0 1 0 0 0 0
15 0 0 0 0 1 1 0 0 0 0
16 0 0 0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 0 0
18 1 0 0 1 1 0 0 0 0 0
19 0 1 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0 0
21 0 0 0 0 0 0 0 0 0 0
22 0 0 0 0 0 0 0 0 0 0
23 0 0 0 0 0 1 0 0 0 0
24 0 0 0 0 0 1 0 0 0 0
25 0 0 0 0 0 0 0 0 0 0
26 0 0 0 0 0 1 0 0 0 0
27 0 0 0 0 0 1 0 0 0 0
28 0 0 0 0 0 0 0 0 0 0
29 0 0 0 0 0 0 0 0 0 0
30 0 0 0 0 0 1 0 0 0 0
31 0 0 0 0 0 0 0 0 0 0
32 0 0 0 0 0 0 0 0 0 0
33 0 0 0 0 0 0 0 0 0 0
34 0 1 0 1 0 0 0 0 1 0
35 0 0 0 0 0 0 0 0 0 0
36 0 0 0 0 0 0 0 0 0 0
37 0 0 0 0 0 1 0 0 0 0
38 0 0 0 0 0 1 0 0 0 0
39 0 0 0 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0 0 0 0
41 0 0 0 0 0 0 0 0 0 0
42 0 1 0 0 0 1 0 0 0 0
43 0 0 0 0 0 0 0 0 0 0
44 0 0 0 0 0 0 0 0 0 0
45 0 0 0 0 0 0 0 0 0 0
46 0 0 0 0 0 0 0 1 0 0
47 1 0 0 0 0 1 0 0 0 0
48 0 0 0 0 0 0 0 0 0 0
49 0 0 0 0 0 0 0 0 0 0
50 0 0 0 0 0 0 0 0 0 0
51 0 0 0 0 0 1 1 0 0 0
52 1 0 0 0 0 0 0 0 0 0
53 0 0 0 0 0 0 0 0 0 0
54 0 0 0 0 0 1 0 0 1 0
55 0 0 0 0 0 0 0 0 0 0
56 0 0 0 0 0 0 0 0 0 0
57 0 0 1 0 0 0 0 0 0 0
58 0 0 0 0 0 1 0 0 0 0
idjual is sales id
k299 and... are category
idjualk226 k227 k228 k229 k230 k231 k232 k233 k237 k239
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 1 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0
7 0 0 0 0 0 1 0 0 0 0
8 0 0 0 0 0 1 0 0 1 0
9 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 1 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 1 0 0 0 0
13 0 0 0 0 0 1 0 0 0 0
14 0 0 0 1 0 1 0 0 0 0
15 0 0 0 0 1 1 0 0 0 0
16 0 0 0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 0 0
18 1 0 0 1 1 0 0 0 0 0
19 0 1 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0 0
21 0 0 0 0 0 0 0 0 0 0
22 0 0 0 0 0 0 0 0 0 0
23 0 0 0 0 0 1 0 0 0 0
24 0 0 0 0 0 1 0 0 0 0
25 0 0 0 0 0 0 0 0 0 0
26 0 0 0 0 0 1 0 0 0 0
27 0 0 0 0 0 1 0 0 0 0
28 0 0 0 0 0 0 0 0 0 0
29 0 0 0 0 0 0 0 0 0 0
30 0 0 0 0 0 1 0 0 0 0
31 0 0 0 0 0 0 0 0 0 0
32 0 0 0 0 0 0 0 0 0 0
33 0 0 0 0 0 0 0 0 0 0
34 0 1 0 1 0 0 0 0 1 0
35 0 0 0 0 0 0 0 0 0 0
36 0 0 0 0 0 0 0 0 0 0
37 0 0 0 0 0 1 0 0 0 0
38 0 0 0 0 0 1 0 0 0 0
39 0 0 0 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0 0 0 0
41 0 0 0 0 0 0 0 0 0 0
42 0 1 0 0 0 1 0 0 0 0
43 0 0 0 0 0 0 0 0 0 0
44 0 0 0 0 0 0 0 0 0 0
45 0 0 0 0 0 0 0 0 0 0
46 0 0 0 0 0 0 0 1 0 0
47 1 0 0 0 0 1 0 0 0 0
48 0 0 0 0 0 0 0 0 0 0
49 0 0 0 0 0 0 0 0 0 0
50 0 0 0 0 0 0 0 0 0 0
51 0 0 0 0 0 1 1 0 0 0
52 1 0 0 0 0 0 0 0 0 0
53 0 0 0 0 0 0 0 0 0 0
54 0 0 0 0 0 1 0 0 1 0
55 0 0 0 0 0 0 0 0 0 0
56 0 0 0 0 0 0 0 0 0 0
57 0 0 1 0 0 0 0 0 0 0
58 0 0 0 0 0 1 0 0 0 0
idjual is sales id
k299 and... are category
Tagged:
0
Answers
-
Hi there,
You're right - if you have a column of 0's and 1's I would assume it was binary/binominal, but RM needs to be told. Anyway if I put your data away as a CSV I can generate rules from it, OK only if I set the bar very low! Here's how..
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="353" width="934">
<operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="96" y="61">
<parameter key="file_name" value="C:\Haddock\vilency.csv"/>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="246" y="75">
<parameter key="name" value="id"/>
<parameter key="target_role" value="id"/>
</operator>
<operator activated="true" class="numerical_to_binominal" expanded="true" height="76" name="Numerical to Binominal" width="90" x="380" y="75"/>
<operator activated="true" class="fp_growth" expanded="true" height="76" name="FP-Growth" width="90" x="506" y="72">
<parameter key="min_support" value="0.1"/>
</operator>
<operator activated="true" class="create_association_rules" expanded="true" height="60" name="Create Association Rules" width="90" x="648" y="120">
<parameter key="min_confidence" value="0.5"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
<connect from_op="Numerical to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
<connect from_op="FP-Growth" from_port="example set" to_port="result 1"/>
<connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
<connect from_op="Create Association Rules" from_port="rules" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>0 -
1.its helpfull.thanx haddock...im so appreciate it. As you said that it would generated rule if the bar set low(you mean minimum support and minimum confidence right?)
2,does it because the sample of the data so little? Actually the real data is 80000record , does it effect the result if i put it in rapidminer?
3.what should i set the minimum support and minimum confidence if the data is 80000 record to make the accurate result for prediction ?
4. i still confuse with the result from the rapidminer@tableview ,maybe because i'm new in mining
what mean laplace,gain,p-s,conviction? i just understand support and confidence.sorry if i asking this...
thanx
0 -
Hi there,
Here are my answers in order..
1. Yep.
2. More data, longer run.
3. Whatever convinces you !!! There is no single correct answer.
4. I have this bookmarked http://michael.hahsler.net/research/association_rules/measures.html
Have fun..
0 -
hi..
i want to ask, does set role before numerical to binomial process, it means told RM that which field must convert to binomial and which field doesnt need to convert?
thanx0 -
Hi,
per default special attribute (= attributes having a role different from regular) will be excluded from being transformed to binominal. This behavior is defined by the attribute subset selection parameters on Numerical to Binominal opertor.
So setting the first column to the special role "id", it will be excluded from this transformation unless you change the parameter settings,
Greetings,
Sebastian0 -
thanx sebastian
i already make the process until create association rule as haddock said.it success generate
some rule.
1. what i'm wondering is what function of next process like apply association rules,
generalized sequential patterns,unify item sets.
2.if i just want to generates association rule , where i must stop apply the process?at create association rule or
at unify item sets?
3.at result view, we can see from association rule result theres have table view,text view, graph view,annotation
what function of annotation?and when we must use that?
4.at example set numerical to binomial at metadata view.theres have statistic values "mode=false(12709) ,least=true(4161).what does it means?
sorry if i asking too much...and thanx again for replying my posting.long life rapidminer!!!hehe^^0 -
Hi,
I will answer in order:
1. Take a look at the operator documentation to get insights in what each operator does.
2. If you want to have association rules, you probably should stop after generating them, I suggest.
3. You don't have to use the annotation view. Some operators will annotate the results they generate. For example the read database operator will attach the querry used to retrieve the example set to the example set.
4. The mode is the most often occuring nominal value of an attribute. Least is the one that occurs least often. Big surprise, isn't it? :P
Greetings,
Sebastian0 -
hi there
in association rule result,theres have ,support ,confidence, laplace, gain,conviction,lift,p-s
i already understand support,confidence,lift,conviction.
i want to ask how to count laplace, gain,p-s, and what it used for in association rule?do anyone have tutorial about that?
thank you.
0 -
Hi,
I would suggest taking a look at the wikipedia. Each measure should be explained there and otherwise a google search will help you. Of course we eagerly offer you an introduction to all these measures and other things connected with association rule mining in our webinars. See the shop for details.
Greetings,
Sebastian0