"Market Basket not getting results"

online360
online360 New Altair Community Member
edited November 5 in Community Q&A
Hi everyone!

This is my first post in this board so I gotta tell you that I love rapidminer.
I think I'm going to use it very often in the future.

At the moment I'm trying to create a market basket analysis for the following data set:
About 350.000 transactions
Transaction-Id;Item-Id;Sales Value (I also inserted an "amount" value on how many pieces of a product were bought)

An exapmle:
Transaction-Id;Item-Id;Sales Value
525344;585555;24,80
525344;158065;12,85
524634;158065;12,85
...


I went through all the templates and tutorials in RM 7 and also tried several solutions from the board or from external pages (always renaming the column titles and choosing the correct attribute-type (even I can't find all of the suggested ones in RM 7)) but I can't get any results as either
1., The process runs out of memory (tried it with 4 GB and 16 GB Macs as well as on a 4 GB, 64 bit Windows 10 machine)
2., The process ends but doesn't show any results

For 1: I also tried splitting the data so the number of rows gets smaller

Does anyone have an idea on how to get this done?

Thank you very much in advance!

Answers

  • JEdward
    JEdward New Altair Community Member
    Which part of your process is running out of memory? 
    Is it the conversion to binominal? 
  • online360
    online360 New Altair Community Member
    Hi!

    Yes, most of the time it happened there but I think also sometimes at fp-growth or at create attribute sets.

    At the moment, it always runs all the way to the results view but there is no result shown.
  • JEdward
    JEdward New Altair Community Member
    Let's try not doing thle process in one go. 
    First get the dataset with the binominal conversion stored.  (Use the Store operator). 

    Ths has two advantages, first it saves memory by breaking it up. 
    Second it saves time, because if there is a problem in the way your FP Growth has been setup so it isn't actually finding associations then you don't need to wait for the binominal conversion before you try again.
  • online360
    online360 New Altair Community Member
    Thanks for the hint, I think saving the data before running fp-growth showes the problem:
    The saved data set only shows three rows (row no.; invoice; sum(orders)), wehere every value in sum(orders) is "true".

    Also; it only shows 68.515 examples out of over 300.000 in the original data set.

    What does that mean?

    Thanks!
  • JEdward
    JEdward New Altair Community Member
    Would you mind sharing your process XML? 
    It sounds like there is something not quite right there.  Possibly using an aggregate operator in the wrong place. 

    (Don't worry about the data, just the process XML is fine, you can get this by going to View -> Show Panel -> XML )
  • online360
    online360 New Altair Community Member
    Hi!

    Here is the xml, hope this helps:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.0.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="7.0.001" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="7.0.001" expanded="true" height="68" name="Load Transactions" width="90" x="112" y="187">
            <parameter key="repository_entry" value="//Local Repository/data/t123_transactions"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="7.0.001" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="136">
            <list key="function_descriptions">
              <parameter key="Orders" value="1"/>
            </list>
          </operator>
          <operator activated="true" class="rename" compatibility="7.0.001" expanded="true" height="82" name="Rename" width="90" x="447" y="136">
            <parameter key="old_name" value="rechnung"/>
            <parameter key="new_name" value="Invoice"/>
            <list key="rename_additional_attributes">
              <parameter key="artikel" value="product 1"/>
            </list>
          </operator>
          <operator activated="true" class="aggregate" compatibility="6.0.006" expanded="true" height="82" name="Aggregate" width="90" x="112" y="336">
            <list key="aggregation_attributes">
              <parameter key="Orders" value="sum"/>
            </list>
            <parameter key="group_by_attributes" value="Invoice|product 1"/>
          </operator>
          <operator activated="true" class="pivot" compatibility="7.0.001" expanded="true" height="82" name="Pivot" width="90" x="246" y="336">
            <parameter key="group_attribute" value="Invoice"/>
            <parameter key="index_attribute" value="product 1"/>
          </operator>
          <operator activated="true" class="rename_by_replacing" compatibility="7.0.001" expanded="true" height="82" name="Rename by Replacing" width="90" x="380" y="336">
            <parameter key="attribute" value="Invoice"/>
            <parameter key="replace_what" value="sum\(Orders\)_"/>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="7.0.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="112" y="442">
            <parameter key="default" value="zero"/>
            <list key="columns"/>
          </operator>
          <operator activated="true" class="numerical_to_binominal" compatibility="6.0.003" expanded="true" height="82" name="Numerical to Binominal" width="90" x="246" y="442"/>
          <operator activated="true" class="set_role" compatibility="7.0.001" expanded="true" height="82" name="Set Role" width="90" x="380" y="442">
            <parameter key="attribute_name" value="Invoice"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="store" compatibility="7.0.001" expanded="true" height="68" name="Store" width="90" x="514" y="748">
            <parameter key="repository_entry" value="//Local Repository/data/t123_bimominal-transactions_2015"/>
          </operator>
          <operator activated="true" class="fp_growth" compatibility="7.0.001" expanded="true" height="82" name="FP-Growth" width="90" x="648" y="289">
            <parameter key="find_min_number_of_itemsets" value="false"/>
            <parameter key="positive_value" value="true"/>
            <parameter key="min_support" value="0.005"/>
          </operator>
          <operator activated="true" class="create_association_rules" compatibility="7.0.001" expanded="true" height="82" name="Create Association Rules" width="90" x="648" y="442">
            <parameter key="min_confidence" value="0.1"/>
          </operator>
          <connect from_op="Load Transactions" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Pivot" to_port="example set input"/>
          <connect from_op="Pivot" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
          <connect from_op="Numerical to Binominal" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Store" to_port="input"/>
          <connect from_op="Store" from_port="through" to_op="FP-Growth" to_port="example set"/>
          <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
          <connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
          <connect from_op="Create Association Rules" from_port="item sets" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="147"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="42"/>
          <description align="left" color="yellow" colored="false" height="70" resized="false" width="850" x="20" y="25">MARKET BASKET ANALYSIS&lt;br&gt;Model associations between products by determining sets of items frequently purchased together and building association rules to derive recommendations.</description>
          <description align="left" color="blue" colored="true" height="185" resized="true" width="550" x="20" y="105">Step 1:&lt;br/&gt;Load transaction data containing a transaction id, a product id and a quantifier. The data denotes how many times a certain product has been purchased as part of a transactions.</description>
          <description align="left" color="purple" colored="true" height="341" resized="true" width="549" x="20" y="300">&lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; Step 2:&lt;br&gt;Edit, transform &amp;amp; load (ETL) - Aggregate transaction data to account for multiple occurrences of the same product in a transaction. Pivot the data so that each transaction is represented by a row. Transform purchase amounts to binary &amp;quot;product purchased yes/no &amp;quot; indicators.&lt;br&gt;</description>
          <description align="left" color="green" colored="true" height="310" resized="true" width="290" x="580" y="105">Step 3:&lt;br/&gt;Using FP-Growth, determine frequent item sets. A frequent item sets denotes that the items (products) in the set have been purchased together frequently, i.e. in a certain ratio of transactions. This ratio is given by the support of the item set.</description>
          <description align="left" color="green" colored="true" height="215" resized="true" width="286" x="579" y="425">&lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; Step 4:&lt;br/&gt;Create association rules which can be used for product recommendations depending on the confidences of the rules.&lt;br&gt;</description>
          <description align="left" color="yellow" colored="false" height="35" resized="true" width="849" x="20" y="655">Outputs: association rules, frequent item set&lt;br&gt;</description>
        </process>
      </operator>
    </process>
    Thanks
  • online360
    online360 New Altair Community Member
    Does anyone have an idea why this process just runs through (in a few seconds now) without showing any results?

    Thanks!
  • JEdward
    JEdward New Altair Community Member
    If you add a breakpoint before FP Growth what do you get? 

    What about after? 
  • online360
    online360 New Altair Community Member
    I get three columns:
    Row No.; Invoice; sum(Orders)

    Orders is automatically generated using the operator "generate attributes", setting each value to "1".

    Why is "Orders important"?
    Wouldn't it be sufficient to only have the invoice- and product-number?

    Thanks
  • JEdward
    JEdward New Altair Community Member
    Just 3 columns sounds like the problem.

    Run this process (from the RapidMiner 7 templates) and have a look at the breakpoint before FP Growth. 
    I would expect in your process to see similarly:

    Invoice - Product1 - Product2 - Productetc
    999999 - True      -  False      -  False 
    etc

    Have a check over your process and data again to convert it into this format just before FP Growth.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.0.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="7.0.001" expanded="true" name="Process">
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="7.0.001" expanded="true" height="68" name="Load Transactions" width="90" x="112" y="187">
            <parameter key="repository_entry" value="//Samples/Templates/Market Basket Analysis/Transactions"/>
          </operator>
          <operator activated="true" class="aggregate" compatibility="6.0.006" expanded="true" height="82" name="Aggregate" width="90" x="112" y="336">
            <list key="aggregation_attributes">
              <parameter key="Orders" value="sum"/>
            </list>
            <parameter key="group_by_attributes" value="Invoice|product 1"/>
          </operator>
          <operator activated="true" class="pivot" compatibility="7.0.001" expanded="true" height="82" name="Pivot" width="90" x="246" y="336">
            <parameter key="group_attribute" value="Invoice"/>
            <parameter key="index_attribute" value="product 1"/>
          </operator>
          <operator activated="true" class="rename_by_replacing" compatibility="7.0.001" expanded="true" height="82" name="Rename by Replacing" width="90" x="380" y="336">
            <parameter key="attribute" value="Invoice"/>
            <parameter key="replace_what" value="sum\(Orders\)_"/>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="7.0.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="112" y="442">
            <parameter key="default" value="zero"/>
            <list key="columns"/>
          </operator>
          <operator activated="true" class="numerical_to_binominal" compatibility="6.0.003" expanded="true" height="82" name="Numerical to Binominal" width="90" x="246" y="442"/>
          <operator activated="true" class="set_role" compatibility="7.0.001" expanded="true" height="82" name="Set Role" width="90" x="380" y="442">
            <parameter key="attribute_name" value="Invoice"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" breakpoints="before" class="fp_growth" compatibility="7.0.001" expanded="true" height="82" name="FP-Growth" width="90" x="648" y="289">
            <parameter key="find_min_number_of_itemsets" value="false"/>
            <parameter key="positive_value" value="true"/>
            <parameter key="min_support" value="0.005"/>
          </operator>
          <operator activated="true" class="create_association_rules" compatibility="7.0.001" expanded="true" height="82" name="Create Association Rules" width="90" x="648" y="442">
            <parameter key="min_confidence" value="0.1"/>
          </operator>
          <connect from_op="Load Transactions" from_port="output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Pivot" to_port="example set input"/>
          <connect from_op="Pivot" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
          <connect from_op="Numerical to Binominal" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
          <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
          <connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
          <connect from_op="Create Association Rules" from_port="item sets" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="147"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="42"/>
          <description align="left" color="yellow" colored="false" height="70" resized="false" width="850" x="20" y="25">MARKET BASKET ANALYSIS&lt;br&gt;Model associations between products by determining sets of items frequently purchased together and building association rules to derive recommendations.</description>
          <description align="left" color="blue" colored="true" height="185" resized="true" width="550" x="20" y="105">Step 1:&lt;br/&gt;Load transaction data containing a transaction id, a product id and a quantifier. The data denotes how many times a certain product has been purchased as part of a transactions.</description>
          <description align="left" color="purple" colored="true" height="341" resized="true" width="549" x="20" y="300">&lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; Step 2:&lt;br&gt;Edit, transform &amp;amp; load (ETL) - Aggregate transaction data to account for multiple occurrences of the same product in a transaction. Pivot the data so that each transaction is represented by a row. Transform purchase amounts to binary &amp;quot;product purchased yes/no &amp;quot; indicators.&lt;br&gt;</description>
          <description align="left" color="green" colored="true" height="310" resized="true" width="290" x="580" y="105">Step 3:&lt;br/&gt;Using FP-Growth, determine frequent item sets. A frequent item sets denotes that the items (products) in the set have been purchased together frequently, i.e. in a certain ratio of transactions. This ratio is given by the support of the item set.</description>
          <description align="left" color="green" colored="true" height="215" resized="true" width="286" x="579" y="425">&lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; Step 4:&lt;br/&gt;Create association rules which can be used for product recommendations depending on the confidences of the rules.&lt;br&gt;</description>
          <description align="left" color="yellow" colored="false" height="35" resized="true" width="849" x="20" y="655">Outputs: association rules, frequent item set&lt;br&gt;</description>
        </process>
      </operator>
    </process>
  • online360
    online360 New Altair Community Member
    Well, my process is also based on the template provided by RM.
    I just create the attribute "Orders" on the fly using a processor.

    I now created the attribute "Orders" in Excel and imported the data again so it looks exactly as the sample provided by RM. (including same attributes types and roles)
    Unfortunately, I still only get three columns; where the last one (sum(Orders) says "true" in each row.

    Is "Orders" really necessary?
    What I understand from this column is that this one shows how many pieces of the specific product was bought in one order, correct?
    Or is it about how many products are in an order? (This would make a huge difference.

    Thanks!
  • online360
    online360 New Altair Community Member
    Ok, it shouldn't be about how many products are in one Invoice, as the example set contains:
    Invoice "647991", 4x "Product 15", each set to Orders "1".

    Why is one product even listed several times in one Invoice?
  • online360
    online360 New Altair Community Member
    I'm still working on this.

    Does anyone have an idea on how to get this analysis done?

    Thanks!