"Mining CAR'S with FP-Growth (Urgent)"
choose_username
New Altair Community Member
Hi there,
i wanted to know if it is possible to mine rules with the right hand side only is a specific attribute.
i have a Data set with people and 1 Attribute shows if a special person earns more or less that 50k dollars.
1 Class is <=50k and 1 is >=50k
is it possible to mine assoc rules with FP-Growth like
Occupation = manager /\ Marital-Status= married => >=50k
the this attribute is called 'class'
for example with regular expressions?
It is really urgent! . if it is possible what do i have to write in the reg expression field.
thanks in advance
User
i wanted to know if it is possible to mine rules with the right hand side only is a specific attribute.
i have a Data set with people and 1 Attribute shows if a special person earns more or less that 50k dollars.
1 Class is <=50k and 1 is >=50k
is it possible to mine assoc rules with FP-Growth like
Occupation = manager /\ Marital-Status= married => >=50k
the this attribute is called 'class'
for example with regular expressions?
It is really urgent! . if it is possible what do i have to write in the reg expression field.
thanks in advance
User
0
Answers
-
is it understandalbe what i mean???
0 -
What?0
-
for example the W-apriori has this option but i badly need that for FP-Growth. it is called 'class association rules'
Table is for example:
Occupation | Marital-Status | Relationship | Earning
manager married husband >=50k
Cleaner divorced Not-in-Family <=50k
Cleaner separated Not-in-Family >=50k
i need association rules like the following two
Occupation = manager /\ Marital-Status= married /\ Relationship=husband => >=50k
Occupation = Cleaner /\ Marital-Status= divorced /\ Relationship = Not-in-Family => >=50k
The right side have to be only >=50 or <=50
_____________________
but i get with FP-Growth
Occupation = manager /\ Marital-Status= married => Relationship=husband
Occupation = Cleaner /\ Relationship = Not-in-Family => Marital-Status= divorced
Relationship = Not-in-Family => Marital-Status= divorced, Occupation = Cleaner
i get different attribute on the right side. but i need only the earning attribute being on the right side
greetings
User
0 -
Hi there,
There are probably other ways to do this, but here is a way of doing it with a Groovy script..<?xml version="1.0" encoding="UTF-8" standalone="no"?>
The script converts rules to examples if they contain the 'target' string.
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Root">
<parameter key="logverbosity" value="warning"/>
<process expanded="true" height="217" width="745">
<operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="subprocess" expanded="true" height="76" name="Preprocessing" width="90" x="180" y="30">
<process expanded="true">
<operator activated="true" class="discretize_by_frequency" expanded="true" name="FrequencyDiscretization">
<parameter key="number_of_bins" value="5"/>
</operator>
<operator activated="true" class="nominal_to_binominal" expanded="true" name="Nominal2Binominal">
<parameter key="transform_binominal" value="true"/>
</operator>
<connect from_port="in 1" to_op="FrequencyDiscretization" to_port="example set input"/>
<connect from_op="FrequencyDiscretization" from_port="example set output" to_op="Nominal2Binominal" to_port="example set input"/>
<connect from_op="Nominal2Binominal" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="fp_growth" expanded="true" height="76" name="FPGrowth" width="90" x="313" y="30">
<parameter key="find_min_number_of_itemsets" value="false"/>
<parameter key="min_support" value="0.1"/>
</operator>
<operator activated="true" class="create_association_rules" expanded="true" height="60" name="AssociationRuleGenerator" width="90" x="313" y="165">
<parameter key="min_confidence" value="0.7"/>
</operator>
<operator activated="true" class="execute_script" expanded="true" height="76" name="Execute Script" width="90" x="581" y="75">
<parameter key="script" value="import com.rapidminer.tools.Ontology; import com.rapidminer.operator.learner.associations.*; String target="a1 = range5" AssociationRules rules = input[0]; // construct attribute set Attribute[] attributes= new Attribute[11]; attributes[0] = AttributeFactory.createAttribute("Premise", Ontology.STRING); attributes[1] = AttributeFactory.createAttribute("Premise Items", Ontology.INTEGER); attributes[2] = AttributeFactory.createAttribute("Conclusion", Ontology.STRING); attributes[3] = AttributeFactory.createAttribute("Conclusion Items", Ontology.INTEGER); attributes[4] = AttributeFactory.createAttribute("Confidence", Ontology.REAL); attributes[5] = AttributeFactory.createAttribute("Conviction", Ontology.REAL); attributes[6] = AttributeFactory.createAttribute("Gain", Ontology.REAL); attributes[7] = AttributeFactory.createAttribute("Laplace", Ontology.REAL); attributes[8] = AttributeFactory.createAttribute("Lift", Ontology.REAL); attributes[9] = AttributeFactory.createAttribute("Ps", Ontology.REAL); attributes[10] = AttributeFactory.createAttribute("Total Support", Ontology.REAL); MemoryExampleTable table = new MemoryExampleTable(attributes); DataRowFactory ROW_FACTORY = new DataRowFactory(0); String[] strings= new String[11]; for (AssociationRule rule : rules) { 		// construct example data if(rule.toConclusionString().contains(target)) { strings[0]=rule.toPremiseString(); strings[1]=rule.premise.size().toString(); strings[2]=rule.toConclusionString(); strings[3]=rule.conclusion.size().toString(); strings[4]=rule.getConfidence().toString(); strings[5]=rule.getConviction().toString(); strings[6]=rule.getGain().toString(); strings[7]=rule.getLaplace().toString(); strings[8]=rule.getLift().toString(); strings[9]=rule.getPs().toString(); strings[10]=rule.getTotalSupport().toString(); // make and add row DataRow row = ROW_FACTORY.create(strings, attributes); table.addDataRow(row);	 		} } ExampleSet exampleSet = table.createExampleSet(); return exampleSet; "/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Preprocessing" to_port="in 1"/>
<connect from_op="Preprocessing" from_port="out 1" to_op="FPGrowth" to_port="example set"/>
<connect from_op="FPGrowth" from_port="frequent sets" to_op="AssociationRuleGenerator" to_port="item sets"/>
<connect from_op="AssociationRuleGenerator" from_port="rules" to_op="Execute Script" to_port="input 1"/>
<connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0 -
i looked at the script: it is almost a simple java prog. If i unterstand it right, its working like a filter
and in the Target-string i shall place the desired name that shall be on the right side?
Am i correct ?
greetings
User0 -
Yes, actually the way it is worded is such that the conclusion should contain the target.0
-
i get the error:
Script1.groovy:12:expecting anything but "\n", got it anyway @ line 12, column 16: 1 error.
but the groovy scripts just take the assoc rules and filter em. if there is no rules extracted which has my desired right side, then no rule is processed by groovy script.
Did i get something wrong?
my problem is that the fp-growth shall extract those rules, and he doesnt.0 -
I get very bored by this - you say there was an error, but if I run the code I posted there is no error. It is is not rocket science to suppose that if the association rule builder finds no rules then no rules will be displayed. I did say that there are other ways of doing this - figure them out for yourself.
0 -
I think the problem here is:choose_username wrote:
Occupation = manager /\ Marital-Status= married /\ Relationship=husband => >=50k
You wish to do classification, using a rule learner.
@ Haddock
Nice script, yes it runs without errors, and as an added bonus its fast.0 -
xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="512" width="705">
<operator activated="false" class="read_c4.5" expanded="true" height="60" name="Read C4.5" width="90" x="45" y="30">
<parameter key="c45_filestem" value="/home/wessel/Desktop/census/census-income.data"/>
</operator>
<operator activated="false" class="store" expanded="true" height="60" name="Store" width="90" x="180" y="30">
<parameter key="repository_entry" value="CensusData"/>
</operator>
<operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="315" y="30">
<parameter key="repository_entry" value="CensusData"/>
</operator>
<operator activated="true" class="nominal_to_binominal" expanded="true" height="94" name="Nominal to Binominal" width="90" x="450" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="label"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="585" y="30">
<parameter key="condition_class" value="no_missing_labels"/>
</operator>
<operator activated="true" class="filter_example_range" expanded="true" height="76" name="Filter Example Range" width="90" x="45" y="120">
<parameter key="first_example" value="1"/>
<parameter key="last_example" value="20000"/>
</operator>
<operator activated="true" class="weka:W-JRip" expanded="true" height="76" name="W-JRip" width="90" x="180" y="120"/>
<connect from_op="Read C4.5" from_port="output" to_op="Store" to_port="input"/>
<connect from_op="Retrieve" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
<connect from_op="Filter Example Range" from_port="example set output" to_op="W-JRip" to_port="training set"/>
<connect from_op="W-JRip" from_port="model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="126"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Result
JRIP rules:
===========
(weeks worked in year >= 46) and (dividends from stocks >= 1) and (sex = Male) and (capital gains >= 7688) => label=50000+ (101.0/10.0)
(weeks worked in year >= 49) and (dividends from stocks >= 1) and (sex = Male) and (age >= 35) and (major occupation code = Executive admin and managerial) and (education = Bachelors degree(BA AB BS)) => label=50000+ (62.0/13.0)
(weeks worked in year >= 48) and (dividends from stocks >= 1) and (sex = Male) and (age >= 37) and (major occupation code = Professional specialty) and (instance weight = 1504.5) => label=50000+ (55.0/9.0)
(weeks worked in year >= 49) and (major occupation code = Executive admin and managerial) and (sex = Male) and (education = Masters degree(MA MS MEng MEd MSW MBA)) => label=50000+ (78.0/18.0)
(weeks worked in year >= 46) and (dividends from stocks >= 1) and (sex = Male) and (capital losses >= 1887) => label=50000+ (40.0/11.0)
(weeks worked in year >= 50) and (dividends from stocks >= 1) and (sex = Male) and (num persons worked for employer >= 6) and (wage per hour = 0) and (own business or self employed = 0) and (instance weight >= 1011.69) and (education = Bachelors degree(BA AB BS)) => label=50000+ (38.0/6.0)
(weeks worked in year >= 51) and (major occupation code = Professional specialty) and (sex = Male) and (age >= 32) and (education = Prof school degree (MD DDS DVM LLB JD)) => label=50000+ (48.0/11.0)
(weeks worked in year >= 46) and (capital gains >= 7298) and (capital gains >= 9562) => label=50000+ (77.0/14.0)
(weeks worked in year >= 46) and (major occupation code = Professional specialty) and (education = Doctorate degree(PhD EdD)) => label=50000+ (71.0/29.0)
(weeks worked in year >= 48) and (sex = Male) and (age >= 33) and (education = Bachelors degree(BA AB BS)) and (detailed household and family stat = Spouse of householder) => label=50000+ (37.0/17.0)
(weeks worked in year >= 51) and (age >= 35) and (sex = Male) and (major occupation code = Executive admin and managerial) and (major industry code = Manufacturing-nondurable goods) and (age >= 39) => label=50000+ (20.0/2.0)
(weeks worked in year >= 49) and (dividends from stocks >= 1) and (num persons worked for employer >= 6) and (age >= 35) and (education = Masters degree(MA MS MEng MEd MSW MBA)) and (full or part time employment stat = Children or Armed Forces) => label=50000+ (23.0/7.0)
(weeks worked in year >= 39) and (age >= 35) and (sex = Male) and (num persons worked for employer >= 5) and (education = Bachelors degree(BA AB BS)) and (detailed occupation recode = 2) => label=50000+ (18.0/5.0)
(weeks worked in year >= 46) and (age >= 35) and (sex = Male) and (major occupation code = Professional specialty) and (detailed occupation recode = 4) and (marital stat = Married-civilian spouse present) => label=50000+ (40.0/13.0)
=> label=- 50000 (19292.0/675.0)
Number of Rules : 15
0 -
i found out that the Create Association Rules Operator can restrict the 'Conclusion' to my desired class.
but the problem is that my desired class will not appear in that Conclusion-overview, because it is very rare in the Dataset.
the the name of the attribute i wish to conclude on is 'A_B = High'
Can i use the 'must contain' - field in the FP-Growth Operator to force the workflow containing that attribute?
____________________________________________
These are the itemsets i get from FP-Growth:
Size| Support | Item 1 | Item 2
1 0.248 A_B = High
2 0.953 A_B = High Capital_loss
Is the nullpointer coming from the Itemset that only got 1 item containing? Does the createAssocRules-Operator
have problems with this?
Is it possible to filter the first row out ?
greetings
User0 -
Hi,
I think the relation can be identified using decision tree where we select label attribute is 'Earning' in the first post.If there is any relation it will reflect in the tree as Haddock said.
By
Ratheesan
0 -
Yes that was my point also.ratheesan wrote:
Hi,
I think the relation can be identified using decision tree where we select label attribute is 'Earning' in the first post.If there is any relation it will reflect in the tree as Haddock said.
By
Ratheesan
Finding this relation is not association rule learning, but classification.0