"Mining CAR'S with FP-Growth (Urgent)"

choose_username
choose_username New Altair Community Member
edited November 5 in Community Q&A
Hi there,

i wanted to know if it is possible to mine rules with the right hand side only is a specific attribute.

i have a Data set with people and 1 Attribute shows if a special person earns more or less that 50k dollars.
1 Class is <=50k  and 1 is >=50k


is it possible to mine assoc rules with FP-Growth like

Occupation = manager  /\  Marital-Status= married  =>      >=50k 
the this attribute is called 'class'

for example with regular expressions?

It is really urgent! .  if it is possible what do i have to write in the reg expression field.


thanks in advance


User

Answers

  • choose_username
    choose_username New Altair Community Member
    is it understandalbe what i mean???

  • haddock
    haddock New Altair Community Member
    What?
  • choose_username
    choose_username New Altair Community Member
    for example the W-apriori has this option but i badly need that for FP-Growth. it is called  'class association rules'

    Table is  for example:


    Occupation |  Marital-Status | Relationship | Earning
    manager        married            husband          >=50k 
    Cleaner          divorced          Not-in-Family  <=50k
    Cleaner          separated        Not-in-Family  >=50k


    i need association rules like the following two

    Occupation = manager  /\  Marital-Status= married  /\  Relationship=husband          =>      >=50k 
    Occupation = Cleaner  /\  Marital-Status= divorced /\  Relationship = Not-in-Family  =>      >=50k

    The right side have to be only  >=50 or <=50

    _____________________


    but i get with FP-Growth


    Occupation = manager  /\  Marital-Status= married      =>      Relationship=husband   
    Occupation = Cleaner  /\  Relationship = Not-in-Family  =>      Marital-Status= divorced
    Relationship = Not-in-Family  =>      Marital-Status= divorced, Occupation = Cleaner

    i get different attribute on the right side.  but i need only the earning attribute being on the right side





    greetings

    User









  • haddock
    haddock New Altair Community Member
    Hi there,

    There are probably other ways to do this, but here is a way of doing it with a Groovy script..
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Root">
        <parameter key="logverbosity" value="warning"/>
        <process expanded="true" height="217" width="745">
          <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="subprocess" expanded="true" height="76" name="Preprocessing" width="90" x="180" y="30">
            <process expanded="true">
              <operator activated="true" class="discretize_by_frequency" expanded="true" name="FrequencyDiscretization">
                <parameter key="number_of_bins" value="5"/>
              </operator>
              <operator activated="true" class="nominal_to_binominal" expanded="true" name="Nominal2Binominal">
                <parameter key="transform_binominal" value="true"/>
              </operator>
              <connect from_port="in 1" to_op="FrequencyDiscretization" to_port="example set input"/>
              <connect from_op="FrequencyDiscretization" from_port="example set output" to_op="Nominal2Binominal" to_port="example set input"/>
              <connect from_op="Nominal2Binominal" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="fp_growth" expanded="true" height="76" name="FPGrowth" width="90" x="313" y="30">
            <parameter key="find_min_number_of_itemsets" value="false"/>
            <parameter key="min_support" value="0.1"/>
          </operator>
          <operator activated="true" class="create_association_rules" expanded="true" height="60" name="AssociationRuleGenerator" width="90" x="313" y="165">
            <parameter key="min_confidence" value="0.7"/>
          </operator>
          <operator activated="true" class="execute_script" expanded="true" height="76" name="Execute Script" width="90" x="581" y="75">
            <parameter key="script" value="import com.rapidminer.tools.Ontology;&#13;&#13;&#10;import com.rapidminer.operator.learner.associations.*;&#13;&#13;&#10;&#13;String target=&quot;a1 = range5&quot;&#10;AssociationRules rules = input[0];&#13;&#13;&#10;&#10;&#13;// construct attribute set&#13;&#10;Attribute[] attributes= new Attribute[11];&#10;attributes[0] = AttributeFactory.createAttribute(&quot;Premise&quot;, Ontology.STRING);&#13;&#13;&#10;attributes[1] = AttributeFactory.createAttribute(&quot;Premise Items&quot;, Ontology.INTEGER);&#10;attributes[2] = AttributeFactory.createAttribute(&quot;Conclusion&quot;, Ontology.STRING);&#13;&#10;attributes[3] = AttributeFactory.createAttribute(&quot;Conclusion Items&quot;, Ontology.INTEGER);&#13;&#10;attributes[4] = AttributeFactory.createAttribute(&quot;Confidence&quot;, Ontology.REAL);&#13;&#10;attributes[5] = AttributeFactory.createAttribute(&quot;Conviction&quot;, Ontology.REAL);&#13;&#10;attributes[6] = AttributeFactory.createAttribute(&quot;Gain&quot;, Ontology.REAL);&#13;&#10;attributes[7] = AttributeFactory.createAttribute(&quot;Laplace&quot;, Ontology.REAL);&#13;&#13;&#10;attributes[8] = AttributeFactory.createAttribute(&quot;Lift&quot;, Ontology.REAL);&#13;&#10;attributes[9] = AttributeFactory.createAttribute(&quot;Ps&quot;, Ontology.REAL);&#10;&#13;&#13;attributes[10] = AttributeFactory.createAttribute(&quot;Total Support&quot;, Ontology.REAL);&#10;&#13;&#13;&#13;&#10;MemoryExampleTable table = new MemoryExampleTable(attributes);&#10;DataRowFactory ROW_FACTORY = new DataRowFactory(0);&#13;&#10;&#13;String[] strings= new String[11];&#13;&#10;&#10;for (AssociationRule rule : rules) {&#10;&#9;&#9;// construct example data&#10;        if(rule.toConclusionString().contains(target))&#13;&#10;        {&#13;&#10;        strings[0]=rule.toPremiseString();&#13;&#10;        strings[1]=rule.premise.size().toString();&#13;&#10;        strings[2]=rule.toConclusionString();&#13;&#10;        strings[3]=rule.conclusion.size().toString();&#13;&#10;        strings[4]=rule.getConfidence().toString();&#13;&#10;        strings[5]=rule.getConviction().toString();&#13;&#10;        strings[6]=rule.getGain().toString();&#13;&#10;        strings[7]=rule.getLaplace().toString();&#13;&#10;        strings[8]=rule.getLift().toString();&#13;&#10;&#13;        strings[9]=rule.getPs().toString();&#13;&#10;        strings[10]=rule.getTotalSupport().toString();&#13;&#13;&#10;        // make and add row&#13;&#10;        DataRow row = ROW_FACTORY.create(strings, attributes); &#13;&#10;        table.addDataRow(row);&#9;&#10;&#9;&#9;}&#13;&#10;}&#10;&#13;&#10;ExampleSet exampleSet = table.createExampleSet();&#10;return exampleSet;&#10;"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Preprocessing" to_port="in 1"/>
          <connect from_op="Preprocessing" from_port="out 1" to_op="FPGrowth" to_port="example set"/>
          <connect from_op="FPGrowth" from_port="frequent sets" to_op="AssociationRuleGenerator" to_port="item sets"/>
          <connect from_op="AssociationRuleGenerator" from_port="rules" to_op="Execute Script" to_port="input 1"/>
          <connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    The script converts rules to examples if they contain the 'target' string.

  • choose_username
    choose_username New Altair Community Member
    i looked at the script: it is almost a simple java prog. If i unterstand it right, its working like a filter
    and in the Target-string i shall place the desired name that shall be on the right side?

    Am i correct ?


    greetings

    User
  • haddock
    haddock New Altair Community Member
    Yes, actually the way it is worded is such that the conclusion should contain the target.
  • choose_username
    choose_username New Altair Community Member
    i get the error:


    Script1.groovy:12:expecting anything but "\n", got it anyway @ line 12, column 16: 1 error.


    but the groovy scripts just take the assoc rules and filter em.  if there is no rules extracted which has my desired right side, then no rule is processed by groovy script.

    Did i get something wrong?

    my problem is that the fp-growth shall extract those rules, and he doesnt.
  • haddock
    haddock New Altair Community Member
    I get very bored by this - you say there was an error, but if I run the code I posted there is no error. It is is not rocket science to suppose that if the association rule builder finds no rules then no rules will be displayed. I did say that there are other ways of doing this - figure them out for yourself.

  • wessel
    wessel New Altair Community Member
    choose_username wrote:

    Occupation = manager   /\   Marital-Status= married  /\  Relationship=husband          =>      >=50k  
    I think the problem here is:
    You wish to do classification, using a rule learner.

    @ Haddock

    Nice script, yes it runs without errors, and as an added bonus its fast.
  • wessel
    wessel New Altair Community Member
    xml
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="512" width="705">
          <operator activated="false" class="read_c4.5" expanded="true" height="60" name="Read C4.5" width="90" x="45" y="30">
            <parameter key="c45_filestem" value="/home/wessel/Desktop/census/census-income.data"/>
          </operator>
          <operator activated="false" class="store" expanded="true" height="60" name="Store" width="90" x="180" y="30">
            <parameter key="repository_entry" value="CensusData"/>
          </operator>
          <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="315" y="30">
            <parameter key="repository_entry" value="CensusData"/>
          </operator>
          <operator activated="true" class="nominal_to_binominal" expanded="true" height="94" name="Nominal to Binominal" width="90" x="450" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="label"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="585" y="30">
            <parameter key="condition_class" value="no_missing_labels"/>
          </operator>
          <operator activated="true" class="filter_example_range" expanded="true" height="76" name="Filter Example Range" width="90" x="45" y="120">
            <parameter key="first_example" value="1"/>
            <parameter key="last_example" value="20000"/>
          </operator>
          <operator activated="true" class="weka:W-JRip" expanded="true" height="76" name="W-JRip" width="90" x="180" y="120"/>
          <connect from_op="Read C4.5" from_port="output" to_op="Store" to_port="input"/>
          <connect from_op="Retrieve" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
          <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
          <connect from_op="Filter Example Range" from_port="example set output" to_op="W-JRip" to_port="training set"/>
          <connect from_op="W-JRip" from_port="model" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="126"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


    Result



    JRIP rules:
    ===========

    (weeks worked in year >= 46) and (dividends from stocks >= 1) and (sex = Male) and (capital gains >= 7688) => label=50000+ (101.0/10.0)
    (weeks worked in year >= 49) and (dividends from stocks >= 1) and (sex = Male) and (age >= 35) and (major occupation code = Executive admin and managerial) and (education = Bachelors degree(BA AB BS)) => label=50000+ (62.0/13.0)
    (weeks worked in year >= 48) and (dividends from stocks >= 1) and (sex = Male) and (age >= 37) and (major occupation code = Professional specialty) and (instance weight = 1504.5) => label=50000+ (55.0/9.0)
    (weeks worked in year >= 49) and (major occupation code = Executive admin and managerial) and (sex = Male) and (education = Masters degree(MA MS MEng MEd MSW MBA)) => label=50000+ (78.0/18.0)
    (weeks worked in year >= 46) and (dividends from stocks >= 1) and (sex = Male) and (capital losses >= 1887) => label=50000+ (40.0/11.0)
    (weeks worked in year >= 50) and (dividends from stocks >= 1) and (sex = Male) and (num persons worked for employer >= 6) and (wage per hour = 0) and (own business or self employed = 0) and (instance weight >= 1011.69) and (education = Bachelors degree(BA AB BS)) => label=50000+ (38.0/6.0)
    (weeks worked in year >= 51) and (major occupation code = Professional specialty) and (sex = Male) and (age >= 32) and (education = Prof school degree (MD DDS DVM LLB JD)) => label=50000+ (48.0/11.0)
    (weeks worked in year >= 46) and (capital gains >= 7298) and (capital gains >= 9562) => label=50000+ (77.0/14.0)
    (weeks worked in year >= 46) and (major occupation code = Professional specialty) and (education = Doctorate degree(PhD EdD)) => label=50000+ (71.0/29.0)
    (weeks worked in year >= 48) and (sex = Male) and (age >= 33) and (education = Bachelors degree(BA AB BS)) and (detailed household and family stat = Spouse of householder) => label=50000+ (37.0/17.0)
    (weeks worked in year >= 51) and (age >= 35) and (sex = Male) and (major occupation code = Executive admin and managerial) and (major industry code = Manufacturing-nondurable goods) and (age >= 39) => label=50000+ (20.0/2.0)
    (weeks worked in year >= 49) and (dividends from stocks >= 1) and (num persons worked for employer >= 6) and (age >= 35) and (education = Masters degree(MA MS MEng MEd MSW MBA)) and (full or part time employment stat = Children or Armed Forces) => label=50000+ (23.0/7.0)
    (weeks worked in year >= 39) and (age >= 35) and (sex = Male) and (num persons worked for employer >= 5) and (education = Bachelors degree(BA AB BS)) and (detailed occupation recode = 2) => label=50000+ (18.0/5.0)
    (weeks worked in year >= 46) and (age >= 35) and (sex = Male) and (major occupation code = Professional specialty) and (detailed occupation recode = 4) and (marital stat = Married-civilian spouse present) => label=50000+ (40.0/13.0)
    => label=- 50000 (19292.0/675.0)

    Number of Rules : 15
  • choose_username
    choose_username New Altair Community Member
    i found out that the Create Association Rules Operator can restrict the 'Conclusion' to my desired class.
    but the problem is that my desired class will not appear in that Conclusion-overview, because it is very rare in the Dataset.
    the the name of the attribute i wish to conclude on is   'A_B = High'
    Can i use the 'must contain' - field in the FP-Growth Operator to force the workflow containing that attribute?
    ____________________________________________

    These are the itemsets i get from FP-Growth:

    Size| Support |      Item 1        |        Item 2
      1    0.248      A_B = High       
      2    0.953      A_B = High          Capital_loss

    Is the nullpointer coming from the Itemset that only got 1 item containing? Does the createAssocRules-Operator
    have problems with this?

    Is it possible to filter the first row out ?


    greetings

    User
  • ratheesan
    ratheesan New Altair Community Member
    Hi,

    I think the relation can be identified using decision tree  where we select label attribute is 'Earning' in the first post.If there is any relation it will reflect in the tree as Haddock said.

    By
    Ratheesan


  • wessel
    wessel New Altair Community Member
    ratheesan wrote:

    Hi,

    I think the relation can be identified using decision tree  where we select label attribute is 'Earning' in the first post.If there is any relation it will reflect in the tree as Haddock said.

    By
    Ratheesan
    Yes that was my point also.
    Finding this relation is not association rule learning, but classification.