Market Basket Analysis
Spcalan14
New Altair Community Member
I have a very simple excel file with 2 columns.
Invoice #
Item #
About 12k entries
I want to know what the most commonly purchased products are..
If Product A is always sold with Product B.. we can make a package deal.
Thoughts ?
Invoice #
Item #
About 12k entries
I want to know what the most commonly purchased products are..
If Product A is always sold with Product B.. we can make a package deal.
Thoughts ?
Tagged:
0
Best Answer
-
ah ok.. thank you very much for your help!
My first time post and you saved the day !
0
Answers
-
Sorry to be vague.
I have opened the Market Basket Analysis template, imported data, and I get the Association rules output..
Largest support is 0.026 for Product 12 and Product 15..
29 sets... Support = 0.047 ( highest )...
So does this mean Product 12 and Product 15 are most commonly purchased together and there are 29 sets to confirm ?
Can't be... Item 15 was only purchased 1x.. and Product 12 was purchased 32 times...
0 -
Hi @Spcalan14,
Yes, you can take a look at the process template called "Market Basket Analysis" which include the 2 following operators :
- FP-Growth
- Create Association Rules
Hope this helps,
Regards,
Lionel1 -
Thank you.. Yes, this is the one I used.
But how do I interpret the results ?
0 -
According to the Association Rules.. Product 12, 27, and 20 have the most number of sets with a Support value of 0.006.
of the 12k data points... only make up 1084...
0 -
Should I focus on the greatest number of sets, Support, Confidence, Lift ?
I don't want to predict.. I just want to know what is my 2 most commonly purchased items on the same invoice
0 -
hi @Spcalan14 there are some good materials on association mining on the Academy:
https://academy.rapidminer.com/learn/article/cross-selling-do-you-want-fries-with-that
https://academy.rapidminer.com/learn/video/text-association-rules
1 -
Product 20 is the most commonly purchased product (1042 of 12k ), followed by Product 33 (887 of 12k ).
I would assume that these would be in the mix..
0 -
@Spcalan14,
The "support" is defined by the proportion of transactions T which contain both X and Y.
So I would say that to find "the 2 most commonly purchased items on the same invoice" you have to find the association with the hightest value of "support".(for that you can sort the results of the Create Association Rules operator).
Regards,
Lionel1 -
Question.. How can I see the actual name of the product.. instead of a "Product # ?
My descriptions are "CC-TT", and "CC-TTG".. not numbers..
0 -
I don't know what the output is referring to, since I am not using the exact words Product 1/2/3/ ...0
-
Thanks Lionel ...
How can I see the actual Product Description ( instead of Product 1, 2, 3 ) ?
0 -
@Spcalan14,
1. Go to the results of "Association Rules" generated by the operator Create Association Rules.
2. Sort the table by descending order of "support" by clicking on the name of the column "Support"
3. The first row (Premises and Conclusion) indicates the "2 most commonly purchased items on the same invoice"
Regards,
Lionel1 -
My question is.. what is
Product 12 and Product 15 ?0 -
This is very clear..
1. Go to the results of "Association Rules" generated by the operator Create Association Rules.
2. Sort the table by descending order of "support" by clicking on the name of the column "Support"
3. The first row (Premises and Conclusion) indicates the "2 most commonly purchased items on the same invoice"
But what is Product 12 and Product 15 ?
I need my product names...
0 -
@Spcalan14
My screenshots are coming from the RapidMiner Template which are fictive examples and not from your own data...
As said, run the process with your own data and go to the Association Rules results and you will see the 2 most commonly purchased items on the same invoice" of your own data....
If you are lost after this explanation, please share your data...
Regards,
Lionel
0 -
Wow.. don't I feel like a dumbxxx.
See below...
This makes MUCH more sense ( considering this is my data )..
LGL-TEE and CL-TEE makes much more sense....
But since I have 4 groups with same Support (0.017).. the are even..
0 -
So the Support is the key indicator..
What does the first column represent ?
0 -
Does the first column represent the number of bundles that included these 2 products ?
If I first column..
Then STP-Tee and CP-TEE have a value of (59)...
Does that mean there were 59 instances of that specific bundle ?
0 -
-
@Spcalan14
Yes, Support is the key indicator.
I must admit that I don't know what the first column represent...
Regards,
Lionel1 -
ah ok.. thank you very much for your help!
My first time post and you saved the day !
0 -
@Spcalan14,
After reflexion, the first column is a kind of "Id", the number of the association rules...
By playing with the "Min. Criterion Value", you will see that there are more or less association rules :
Regards,
Lionel
1