Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
"What is the best dataset form to mining using fp-growth algorithm in RM?"
brenda_natasha
Anyone knows the best criteria or at least the rules for dataset that want to be mined using fp-growth?
And about the form, which one is better?
1. order_id | item1 | item 2 | item 3
or
2. order_id | item {}
or
3. order_id | book (T/F) | pencil (T/F) | bag (T/F)
because every example i read always use #2 form but what about the #1 and #3??
Find more posts tagged with
AI Studio
FP-Growth
Association Rules + Mining
Algorithms
Accepted answers
All comments
Marco_Barradas
Hi
@brenda_natasha
It would not affect the outcome as long as you have information related to the order id and the items id.
The real difference is on the performance when you try to explore your data.
on case 1 and 3 you may have a column for each of the products depending on your use case it could be any number of columns and as it grows the array is bigger and the resources used by your computer would be bigger.
The main difference between 1 and 3 would be having binary encoding vs quantity of products on the order. Since the he amount ordered of each producto doesn't impact the outcome of the rule either way is ok.
At the end the process would transform the DataSet(DS) to a binary Matrix.
I prefer form 2 since you only need 2 columns on your DS an its easier to obtain that structure out of any transaccional software.
Hope this answers you question.
Best regards.
jykim
Thanks. I found the answer for my previous question.
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups