Using FP-Growth and Weka-Aprori
neokyrgyz
New Altair Community Member
Hi, all
To decrease learning curve is it possible to make a little step-by-step tutorial for beginners. I mean really new beginners.
I'm not able to make even an example of FP-Growth and Weka-Aprori with generated transaction data set, whereas this should be really easy process.
Does any one know if there exist such a tutorial? Or is it possible for you to give step-by-step tutorial for above example.
I spent 2 days for getting general layout and do some processes, but seems it takes a month before I can do what I want.
Thanks and Regards.
Hoping to be understood and not accepted as a lazy "user".
To decrease learning curve is it possible to make a little step-by-step tutorial for beginners. I mean really new beginners.
I'm not able to make even an example of FP-Growth and Weka-Aprori with generated transaction data set, whereas this should be really easy process.
Does any one know if there exist such a tutorial? Or is it possible for you to give step-by-step tutorial for above example.
I spent 2 days for getting general layout and do some processes, but seems it takes a month before I can do what I want.
Thanks and Regards.
Hoping to be understood and not accepted as a lazy "user".
Tagged:
0
Answers
-
Hi there,
Firstly, welcome to the world of pattern mining. As to finding the tutorial, this might be rather an embarrassing answer for you, but from within RapidMiner try Help->RapidMiner Tutorial. Then do -> Next -> Next in the window that shows and you will see a working example of FP-Growth. It is a smart move to go through that tutorial several times, and to be familiar with all the examples.
Have fun!
0 -
Hi,
Thank you very much.
Sometimes this kind of "pointing" can save a lot time.
I've tried to do same as in tutorial but not working. I try step by step without FP-Growth and write output after each step - it works ok. Bu as soon as I insert FP-Growth, it's giving following error:
So, basically it means that it can do nominal2binominal without FP-Growth. Is this bug, or am I doing something wrong?The method getNominalMapping() is not supprted by numeric attributes! You probably tried to execute an operator on anumeric data which is only able to handel nominal values.
Thanks in advance.
My file:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<parameter key="logverbosity" value="all"/>
<parameter key="logfile" value="C:\AfterRuleAccoss.log"/>
<parameter key="resultfile" value="C:\afterRuleAccos.res"/>
<process expanded="true" height="601" width="784">
<operator activated="true" class="read_aml" expanded="true" height="60" name="Read AML" width="90" x="45" y="120">
<parameter key="attributes" value="C:\labor-negotiations.aml"/>
</operator>
<operator activated="true" class="replace_missing_values" expanded="true" height="94" name="Replace Missing Values" width="90" x="45" y="300">
<parameter key="attributes" value="duration|wage-inc-1st|wage-inc-2nd|wage-inc-3rd|working-hours|standby-pay|shift-differential|statutory-holidays"/>
<list key="columns"/>
</operator>
<operator activated="true" class="discretize_by_frequency" expanded="true" height="94" name="Discretize" width="90" x="179" y="300">
<parameter key="range_name_type" value="short"/>
</operator>
<operator activated="true" class="nominal_to_binominal" expanded="true" height="94" name="Nominal to Binominal" width="90" x="313" y="300"/>
<operator activated="true" class="fp_growth" expanded="true" height="76" name="FP-Growth" width="90" x="447" y="210"/>
<operator activated="true" class="write_excel" expanded="true" height="60" name="Write Excel" width="90" x="514" y="30">
<parameter key="excel_file" value="C:\result_afterRMVDiscretizeNom2BinomFPGrowth.xls"/>
</operator>
<connect from_op="Read AML" from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_op="Discretize" to_port="example set input"/>
<connect from_op="Discretize" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
<connect from_op="FP-Growth" from_port="example set" to_op="Write Excel" to_port="input"/>
<connect from_op="FP-Growth" from_port="frequent sets" to_port="result 2"/>
<connect from_op="Write Excel" from_port="through" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
0 -
Hi there,
So close, and yet so far! If you had just ticked the "transform_binominal" tick box in the nominal_to_binominal operator all would have worked fine...like this.<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<parameter key="logverbosity" value="all"/>
<parameter key="logfile" value="C:\AfterRuleAccoss.log"/>
<parameter key="resultfile" value="C:\afterRuleAccos.res"/>
<process expanded="true" height="404" width="915">
<operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="30" y="53">
<parameter key="repository_entry" value="//Samples/data/Labor-Negotiations"/>
</operator>
<operator activated="true" class="replace_missing_values" expanded="true" height="94" name="Replace Missing Values" width="90" x="45" y="297">
<parameter key="attributes" value="duration|wage-inc-1st|wage-inc-2nd|wage-inc-3rd|working-hours|standby-pay|shift-differential|statutory-holidays"/>
<list key="columns"/>
</operator>
<operator activated="true" class="discretize_by_frequency" expanded="true" height="94" name="Discretize" width="90" x="179" y="300">
<parameter key="range_name_type" value="short"/>
</operator>
<operator activated="true" class="nominal_to_binominal" expanded="true" height="94" name="Nominal to Binominal" width="90" x="380" y="300">
<parameter key="transform_binominal" value="true"/>
</operator>
<operator activated="true" breakpoints="before,after" class="fp_growth" expanded="true" height="76" name="FP-Growth" width="90" x="581" y="255"/>
<operator activated="true" class="create_association_rules" expanded="true" height="60" name="Create Association Rules" width="90" x="726" y="263"/>
<operator activated="true" class="write_excel" expanded="true" height="60" name="Write Excel" width="90" x="514" y="30">
<parameter key="excel_file" value="C:\result_afterRMVDiscretizeNom2BinomFPGrowth.xls"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_op="Discretize" to_port="example set input"/>
<connect from_op="Discretize" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
<connect from_op="FP-Growth" from_port="example set" to_op="Write Excel" to_port="input"/>
<connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
<connect from_op="Create Association Rules" from_port="rules" to_port="result 2"/>
<connect from_op="Write Excel" from_port="through" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>0 -
Thank you very much for your answer. It was really helpful. Learning step1 is completed
I stuck again on step2.
I'm trying to use W-Apriori on my data:
1) I want to calculate only True values. For instance I am not interested in if someone did not bought something, but I'm interested in if someone bought something, then what else did he/she buy.
beer,bread,jam,butter,cheese,chips,soda,chocolate
TRUE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE
TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,TRUE
FALSE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE
TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE
FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE
FALSE,TRUE,TRUE,FALSE,FALSE,TRUE,TRUE,TRUE
TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE
TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,TRUE
FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,FALSE
TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE
TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE
2) Even if I ignore first requirement (assuming that since RapidMiner calculates Falses then this must be a correct way). If I set M=0.4, but interesting part is that it's not showing what I'm expecting: I expect it to show itemsets with min support of 0.4, but it shows just some of them.
For above example it's (I expected beer=True 7. bread=true 9, ...)beer=FALSE 4
jam=FALSE 5
butter=TRUE 5
cheese=TRUE 4
chips=FALSE 5
soda=TRUE 5
chocolate=FALSE 5
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<parameter key="logverbosity" value="all"/>
<parameter key="logfile" value="C\part1_log.log"/>
<parameter key="resultfile" value="C:\part1_res.res"/>
<process expanded="true" height="601" width="784">
<operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="68" y="121">
<parameter key="file_name" value="C:\part1_data.csv"/>
</operator>
<operator activated="true" class="nominal_to_binominal" expanded="true" height="94" name="Nominal to Binominal" width="90" x="246" y="120"/>
<operator activated="true" class="weka:W-Apriori" expanded="true" height="60" name="W-Apriori" width="90" x="447" y="165">
<parameter key="C" value="0.6"/>
<parameter key="M" value="0.4"/>
<parameter key="I" value="true"/>
<parameter key="R" value="true"/>
<parameter key="V" value="true"/>
<parameter key="c" value="1.0"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_op="W-Apriori" to_port="example set"/>
<connect from_op="W-Apriori" from_port="associator" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
What am I doing wrong? What do I need to get what I want?
0 -
Hola,
If you want to thin out the Premises or Conclusions you may find this post interesting.
http://rapid-i.com/rapidforum/index.php/topic,1887.msg7366.html#msg7366
Because it shows how you can convert Association Rules to an exampleSet, which of course means that all the regular thinning agents can be applied.
Just a thought.0 -
Hi, haddock
I tried to understand what you have written. But it seems it is not the answer or the way. I'm not sure though.
My problem is I'm trying to get result from W-Apriori, but result is not what I expect
It's not minor difference, which can be a result of different implementations, but totally different that it should be.
FP-Growth is giving: { bread}, {beer},{jam},{chips},{chocolate}, {bread, jam}, {bread, beer}
I expect W-Apriori to give at least 50% similar to above for such a small data set.
This makes me to think that I'm doing something wrong, such as ticking some checkbox which was the case in above problem.
As it can be guessed I spent a week, but still could not solve.
Any ideas? Or any working processes of W-Apriori?
Thanks in advance.
0 -
Perhaps you can post the code?
0