"Mining CAR'S with FP-Growth (Urgent)"

choose_username · June 2010

Hi there,

i wanted to know if it is possible to mine rules with the right hand side only is a specific attribute.

i have a Data set with people and 1 Attribute shows if a special person earns more or less that 50k dollars.
1 Class is <=50k and 1 is >=50k

is it possible to mine assoc rules with FP-Growth like

Occupation = manager /\ Marital-Status= married => >=50k
the this attribute is called 'class'

for example with regular expressions?

It is really urgent! . if it is possible what do i have to write in the reg expression field.

thanks in advance

User

choose_username · June 2010

is it understandalbe what i mean???

haddock · June 2010

What?

choose_username · June 2010

for example the W-apriori has this option but i badly need that for FP-Growth. it is called 'class association rules'

Table is for example:

Occupation | Marital-Status | Relationship | Earning
manager married husband >=50k
Cleaner divorced Not-in-Family <=50k
Cleaner separated Not-in-Family >=50k

i need association rules like the following two

Occupation = manager /\ Marital-Status= married /\ Relationship=husband => >=50k
Occupation = Cleaner /\ Marital-Status= divorced /\ Relationship = Not-in-Family => >=50k

The right side have to be only >=50 or <=50

_____________________

but i get with FP-Growth

Occupation = manager /\ Marital-Status= married => Relationship=husband
Occupation = Cleaner /\ Relationship = Not-in-Family => Marital-Status= divorced
Relationship = Not-in-Family => Marital-Status= divorced, Occupation = Cleaner

i get different attribute on the right side. but i need only the earning attribute being on the right side

greetings

User

haddock · June 2010

Hi there,

There are probably other ways to do this, but here is a way of doing it with a Groovy script..

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Root">
    <parameter key="logverbosity" value="warning"/>
    <process expanded="true" height="217" width="745">
      <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//Samples/data/Iris"/>
      </operator>
      <operator activated="true" class="subprocess" expanded="true" height="76" name="Preprocessing" width="90" x="180" y="30">
        <process expanded="true">
          <operator activated="true" class="discretize_by_frequency" expanded="true" name="FrequencyDiscretization">
            <parameter key="number_of_bins" value="5"/>
          </operator>
          <operator activated="true" class="nominal_to_binominal" expanded="true" name="Nominal2Binominal">
            <parameter key="transform_binominal" value="true"/>
          </operator>
          <connect from_port="in 1" to_op="FrequencyDiscretization" to_port="example set input"/>
          <connect from_op="FrequencyDiscretization" from_port="example set output" to_op="Nominal2Binominal" to_port="example set input"/>
          <connect from_op="Nominal2Binominal" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="fp_growth" expanded="true" height="76" name="FPGrowth" width="90" x="313" y="30">
        <parameter key="find_min_number_of_itemsets" value="false"/>
        <parameter key="min_support" value="0.1"/>
      </operator>
      <operator activated="true" class="create_association_rules" expanded="true" height="60" name="AssociationRuleGenerator" width="90" x="313" y="165">
        <parameter key="min_confidence" value="0.7"/>
      </operator>
      <operator activated="true" class="execute_script" expanded="true" height="76" name="Execute Script" width="90" x="581" y="75">
        <parameter key="script" value="import com.rapidminer.tools.Ontology;&#13;&#13;&#10;import com.rapidminer.operator.learner.associations.*;&#13;&#13;&#10;&#13;String target=&quot;a1 = range5&quot;&#10;AssociationRules rules = input[0];&#13;&#13;&#10;&#10;&#13;// construct attribute set&#13;&#10;Attribute[] attributes= new Attribute[11];&#10;attributes[0] = AttributeFactory.createAttribute(&quot;Premise&quot;, Ontology.STRING);&#13;&#13;&#10;attributes[1] = AttributeFactory.createAttribute(&quot;Premise Items&quot;, Ontology.INTEGER);&#10;attributes[2] = AttributeFactory.createAttribute(&quot;Conclusion&quot;, Ontology.STRING);&#13;&#10;attributes[3] = AttributeFactory.createAttribute(&quot;Conclusion Items&quot;, Ontology.INTEGER);&#13;&#10;attributes[4] = AttributeFactory.createAttribute(&quot;Confidence&quot;, Ontology.REAL);&#13;&#10;attributes[5] = AttributeFactory.createAttribute(&quot;Conviction&quot;, Ontology.REAL);&#13;&#10;attributes[6] = AttributeFactory.createAttribute(&quot;Gain&quot;, Ontology.REAL);&#13;&#10;attributes[7] = AttributeFactory.createAttribute(&quot;Laplace&quot;, Ontology.REAL);&#13;&#13;&#10;attributes[8] = AttributeFactory.createAttribute(&quot;Lift&quot;, Ontology.REAL);&#13;&#10;attributes[9] = AttributeFactory.createAttribute(&quot;Ps&quot;, Ontology.REAL);&#10;&#13;&#13;attributes[10] = AttributeFactory.createAttribute(&quot;Total Support&quot;, Ontology.REAL);&#10;&#13;&#13;&#13;&#10;MemoryExampleTable table = new MemoryExampleTable(attributes);&#10;DataRowFactory ROW_FACTORY = new DataRowFactory(0);&#13;&#10;&#13;String[] strings= new String[11];&#13;&#10;&#10;for (AssociationRule rule : rules) {&#10;&#9;&#9;// construct example data&#10;        if(rule.toConclusionString().contains(target))&#13;&#10;        {&#13;&#10;        strings[0]=rule.toPremiseString();&#13;&#10;        strings[1]=rule.premise.size().toString();&#13;&#10;        strings[2]=rule.toConclusionString();&#13;&#10;        strings[3]=rule.conclusion.size().toString();&#13;&#10;        strings[4]=rule.getConfidence().toString();&#13;&#10;        strings[5]=rule.getConviction().toString();&#13;&#10;        strings[6]=rule.getGain().toString();&#13;&#10;        strings[7]=rule.getLaplace().toString();&#13;&#10;        strings[8]=rule.getLift().toString();&#13;&#10;&#13;        strings[9]=rule.getPs().toString();&#13;&#10;        strings[10]=rule.getTotalSupport().toString();&#13;&#13;&#10;        // make and add row&#13;&#10;        DataRow row = ROW_FACTORY.create(strings, attributes); &#13;&#10;        table.addDataRow(row);&#9;&#10;&#9;&#9;}&#13;&#10;}&#10;&#13;&#10;ExampleSet exampleSet = table.createExampleSet();&#10;return exampleSet;&#10;"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Preprocessing" to_port="in 1"/>
      <connect from_op="Preprocessing" from_port="out 1" to_op="FPGrowth" to_port="example set"/>
      <connect from_op="FPGrowth" from_port="frequent sets" to_op="AssociationRuleGenerator" to_port="item sets"/>
      <connect from_op="AssociationRuleGenerator" from_port="rules" to_op="Execute Script" to_port="input 1"/>
      <connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

The script converts rules to examples if they contain the 'target' string.

choose_username · June 2010

i looked at the script: it is almost a simple java prog. If i unterstand it right, its working like a filter
and in the Target-string i shall place the desired name that shall be on the right side?

Am i correct ?

greetings

User

haddock · June 2010

Yes, actually the way it is worded is such that the conclusion should contain the target.

choose_username · June 2010

i get the error:

Script1.groovy:12:expecting anything but "\n", got it anyway @ line 12, column 16: 1 error.

but the groovy scripts just take the assoc rules and filter em. if there is no rules extracted which has my desired right side, then no rule is processed by groovy script.

Did i get something wrong?

my problem is that the fp-growth shall extract those rules, and he doesnt.

haddock · June 2010

I get very bored by this - you say there was an error, but if I run the code I posted there is no error. It is is not rocket science to suppose that if the association rule builder finds no rules then no rules will be displayed. I did say that there are other ways of doing this - figure them out for yourself.

wessel · June 2010

choose_username wrote:

Occupation = manager /\ Marital-Status= married /\ Relationship=husband => >=50k

I think the problem here is:
You wish to do classification, using a rule learner.

@ Haddock

Nice script, yes it runs without errors, and as an added bonus its fast.

wessel · June 2010

xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="512" width="705">
<operator activated="false" class="read_c4.5" expanded="true" height="60" name="Read C4.5" width="90" x="45" y="30">
<parameter key="c45_filestem" value="/home/wessel/Desktop/census/census-income.data"/>
</operator>
<operator activated="false" class="store" expanded="true" height="60" name="Store" width="90" x="180" y="30">
<parameter key="repository_entry" value="CensusData"/>
</operator>
<operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="315" y="30">
<parameter key="repository_entry" value="CensusData"/>
</operator>
<operator activated="true" class="nominal_to_binominal" expanded="true" height="94" name="Nominal to Binominal" width="90" x="450" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="label"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="585" y="30">
<parameter key="condition_class" value="no_missing_labels"/>
</operator>
<operator activated="true" class="filter_example_range" expanded="true" height="76" name="Filter Example Range" width="90" x="45" y="120">
<parameter key="first_example" value="1"/>
<parameter key="last_example" value="20000"/>
</operator>
<operator activated="true" class="weka:W-JRip" expanded="true" height="76" name="W-JRip" width="90" x="180" y="120"/>
<connect from_op="Read C4.5" from_port="output" to_op="Store" to_port="input"/>
<connect from_op="Retrieve" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
<connect from_op="Filter Example Range" from_port="example set output" to_op="W-JRip" to_port="training set"/>
<connect from_op="W-JRip" from_port="model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="126"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

Result

JRIP rules:
===========

(weeks worked in year >= 46) and (dividends from stocks >= 1) and (sex = Male) and (capital gains >= 7688) => label=50000+ (101.0/10.0)
(weeks worked in year >= 49) and (dividends from stocks >= 1) and (sex = Male) and (age >= 35) and (major occupation code = Executive admin and managerial) and (education = Bachelors degree(BA AB BS)) => label=50000+ (62.0/13.0)
(weeks worked in year >= 48) and (dividends from stocks >= 1) and (sex = Male) and (age >= 37) and (major occupation code = Professional specialty) and (instance weight = 1504.5) => label=50000+ (55.0/9.0)
(weeks worked in year >= 49) and (major occupation code = Executive admin and managerial) and (sex = Male) and (education = Masters degree(MA MS MEng MEd MSW MBA)) => label=50000+ (78.0/18.0)
(weeks worked in year >= 46) and (dividends from stocks >= 1) and (sex = Male) and (capital losses >= 1887) => label=50000+ (40.0/11.0)
(weeks worked in year >= 50) and (dividends from stocks >= 1) and (sex = Male) and (num persons worked for employer >= 6) and (wage per hour = 0) and (own business or self employed = 0) and (instance weight >= 1011.69) and (education = Bachelors degree(BA AB BS)) => label=50000+ (38.0/6.0)
(weeks worked in year >= 51) and (major occupation code = Professional specialty) and (sex = Male) and (age >= 32) and (education = Prof school degree (MD DDS DVM LLB JD)) => label=50000+ (48.0/11.0)
(weeks worked in year >= 46) and (capital gains >= 7298) and (capital gains >= 9562) => label=50000+ (77.0/14.0)
(weeks worked in year >= 46) and (major occupation code = Professional specialty) and (education = Doctorate degree(PhD EdD)) => label=50000+ (71.0/29.0)
(weeks worked in year >= 48) and (sex = Male) and (age >= 33) and (education = Bachelors degree(BA AB BS)) and (detailed household and family stat = Spouse of householder) => label=50000+ (37.0/17.0)
(weeks worked in year >= 51) and (age >= 35) and (sex = Male) and (major occupation code = Executive admin and managerial) and (major industry code = Manufacturing-nondurable goods) and (age >= 39) => label=50000+ (20.0/2.0)
(weeks worked in year >= 49) and (dividends from stocks >= 1) and (num persons worked for employer >= 6) and (age >= 35) and (education = Masters degree(MA MS MEng MEd MSW MBA)) and (full or part time employment stat = Children or Armed Forces) => label=50000+ (23.0/7.0)
(weeks worked in year >= 39) and (age >= 35) and (sex = Male) and (num persons worked for employer >= 5) and (education = Bachelors degree(BA AB BS)) and (detailed occupation recode = 2) => label=50000+ (18.0/5.0)
(weeks worked in year >= 46) and (age >= 35) and (sex = Male) and (major occupation code = Professional specialty) and (detailed occupation recode = 4) and (marital stat = Married-civilian spouse present) => label=50000+ (40.0/13.0)
=> label=- 50000 (19292.0/675.0)

Number of Rules : 15

choose_username · June 2010

i found out that the Create Association Rules Operator can restrict the 'Conclusion' to my desired class.
but the problem is that my desired class will not appear in that Conclusion-overview, because it is very rare in the Dataset.
the the name of the attribute i wish to conclude on is 'A_B = High'
Can i use the 'must contain' - field in the FP-Growth Operator to force the workflow containing that attribute?
____________________________________________

These are the itemsets i get from FP-Growth:

Size| Support | Item 1 | Item 2
1 0.248 A_B = High
2 0.953 A_B = High Capital_loss

Is the nullpointer coming from the Itemset that only got 1 item containing? Does the createAssocRules-Operator
have problems with this?

Is it possible to filter the first row out ?

greetings

User

ratheesan · June 2010

Hi,

I think the relation can be identified using decision tree where we select label attribute is 'Earning' in the first post.If there is any relation it will reflect in the tree as Haddock said.

By
Ratheesan

wessel · June 2010

ratheesan wrote:

Hi,

I think the relation can be identified using decision tree where we select label attribute is 'Earning' in the first post.If there is any relation it will reflect in the tree as Haddock said.

By
Ratheesan

Yes that was my point also.
Finding this relation is not association rule learning, but classification.

"Mining CAR'S with FP-Growth (Urgent)"

Answers

Categories