"What is the best dataset form to mining using fp-growth algorithm in RM?"
brenda_natasha
New Altair Community Member
Anyone knows the best criteria or at least the rules for dataset that want to be mined using fp-growth?
And about the form, which one is better?
1. order_id | item1 | item 2 | item 3
or
2. order_id | item {}
or
3. order_id | book (T/F) | pencil (T/F) | bag (T/F)
because every example i read always use #2 form but what about the #1 and #3??
And about the form, which one is better?
1. order_id | item1 | item 2 | item 3
or
2. order_id | item {}
or
3. order_id | book (T/F) | pencil (T/F) | bag (T/F)
because every example i read always use #2 form but what about the #1 and #3??
0
Answers
-
Hi @brenda_natasha
It would not affect the outcome as long as you have information related to the order id and the items id.
The real difference is on the performance when you try to explore your data.
on case 1 and 3 you may have a column for each of the products depending on your use case it could be any number of columns and as it grows the array is bigger and the resources used by your computer would be bigger.
The main difference between 1 and 3 would be having binary encoding vs quantity of products on the order. Since the he amount ordered of each producto doesn't impact the outcome of the rule either way is ok.
At the end the process would transform the DataSet(DS) to a binary Matrix.
I prefer form 2 since you only need 2 columns on your DS an its easier to obtain that structure out of any transaccional software.
Hope this answers you question.
Best regards.1 -
Thanks. I found the answer for my previous question.1