Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

Problem with FPGrowth?

I'm using the attached dataset to illustrate the problem. It is a very basic program to compute association rules. I read the binary matrix. I transform the 1/0 to true/false. I compute the frequent itemset with the operator FPgrowth and here the problems start. "Blouse" is item that appears only in 3 out of 20 transactions. The program reports a support of 0.85. Obviouly, the error carries over to the rule calculation part.

Here's my code in case I did something silly.

<operator name="Root" class="Process" expanded="yes">
    <operator name="CSVExampleSource" class="CSVExampleSource" breakpoints="after">
        <parameter key="filename"	value="K:\clothingstore.csv"/>
        <parameter key="id_name"	value="tid"/>
    </operator>
    <operator name="Numerical2Binominal" class="Numerical2Binominal" breakpoints="after">
    </operator>
    <operator name="FPGrowth" class="FPGrowth" breakpoints="after">
        <parameter key="min_support"	value="0.2"/>
    </operator>
    <operator name="AssociationRuleGenerator" class="AssociationRuleGenerator">
        <parameter key="min_confidence"	value="0.7"/>
    </operator>
</operator>

If I try the Apriori algorithm from the Weka list everything is fine. I've noticed this problem with other (bigger) datasets. Can you replicate my problem? I'm using version 4.3.

[attachment deleted by admin]

Find more posts tagged with

AI Studio

Accepted answers

All comments

land

Hi,
your process is just fine. But rapidMiner does something in this situation, that might be surprising: It uses the first nominal value as "false" and the second as "true". The Numerical2Binominal operator sometime gives "true" the index 0, causing this problem. If you invert this, support of 0.85 is exactly correct.

The solution is quite easy: Store your data into an rapid miner file using the exampleSetWriter and then sort the nominal mappings. Sorry for this inconvinience, but we are working to solve this problem once and for all in RapidMiner 5.0

Greetings,
Sebastian

earmijo

Thanks Sebastian. I'm still a bit confused though. If I understand you correctly, the problem is created by the Numerical2Binomial operator. Am I right? But when I place a stop after the conversion, everything looks fine. There are only 3 "true"s for blouse for instance. I even do the coding myself of true/false in Excel, read the file and I still have the problem. I thought that the problem was FPGROWTH since the operator Weka.Apriori doesn't have any problems.

land

Hi,
the trouble comes from the internal data handling within rapid miner. I try to sumarize the handling of nominal data shortly:
Nominal attributes hold a mapping from numbers to the real nominal String values. So internally nominal values are just numbers. Since binominal attributes are not restricted to true/false, instead could have any two nominal values like "1" "0" and "yes", "no" and so on, the FPGrowth operator assumes the first (index 0) nominal value as false and the second (index 1) as true.
If now true is mapped onto 0 and false onto 1, it will switch the meaning.
This has happend, because the Numerical2Binominal Operator simply adds the first occuring value, which then gets the index 0. If this was true by random, true gets index 0.
To overcome this problem you can save the data with exampleSetWriter. The aml file contains informations about the mapping, and there this mapping might be switched. If this is unhandy, because you have a too many attributes, then you could add an artificial example containing only numbers mapped onto false as first one.

Greetings,
Sebastian

PS: We are currently working hardly on removing this troublemaking edge of rapidMiner