"What is the best dataset form to mining using fp-growth algorithm in RM?"

brenda_natasha
brenda_natasha New Altair Community Member
edited November 5 in Community Q&A
Anyone knows the best criteria or at least the rules for dataset that want to be mined using fp-growth?
And about the form, which one is better?
1. order_id | item1 | item 2 | item 3
or
2. order_id | item {} 
or
3. order_id | book (T/F) | pencil (T/F) | bag (T/F)

because every example i read always use #2 form but what about the #1 and #3?? 

Answers

  • Marco_Barradas
    Marco_Barradas
    Altair Employee
    Hi @brenda_natasha
    It would not affect the outcome as long as you have information related to the order id and the items id. 
    The real difference is on the performance when you try to explore your data. 
    on case 1 and 3 you may have a column for each of the products depending on your use case it could be any number of columns and as it grows the array is bigger and the resources used by your computer would be bigger. 
    The main difference between 1 and 3 would be having binary encoding vs quantity of products on the order. Since the he amount ordered of each producto doesn't impact the outcome of the rule either way is ok. 
    At the end the process would transform the DataSet(DS) to a binary Matrix.
    I prefer form 2 since you only need 2 columns on your DS an its easier to obtain that structure out of any transaccional software. 
    Hope this answers you question. 
    Best regards.
  • jykim
    jykim New Altair Community Member
    edited May 2020
    Thanks. I found the answer for my previous question.