Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Market Basket Analysis
Spcalan14
I have a very simple excel file with 2 columns.
Invoice #
Item #
About 12k entries
I want to know what the most commonly purchased products are..
If Product A is always sold with Product B.. we can make a package deal.
Thoughts ?
Find more posts tagged with
AI Studio
Market Basket Analysis
Accepted answers
Spcalan14
ah ok.. thank you very much for your help!
My first time post and you saved the day !
All comments
Spcalan14
Sorry to be vague.
I have opened the Market Basket Analysis template, imported data, and I get the Association rules output..
Largest support is 0.026 for Product 12 and Product 15..
29 sets... Support = 0.047 ( highest )...
So does this mean Product 12 and Product 15 are most commonly purchased together and there are 29 sets to confirm ?
Can't be... Item 15 was only purchased 1x.. and Product 12 was purchased 32 times...
lionelderkrikor
Hi
@Spcalan14
,
Yes, you can take a look at the process template called "
Market Basket Analysis
" which include the 2 following operators :
-
FP-Growth
-
Create Association Rules
Hope this helps,
Regards,
Lionel
Spcalan14
Thank you.. Yes, this is the one I used.
But how do I interpret the results ?
Spcalan14
According to the Association Rules.. Product 12, 27, and 20 have the most number of sets with a Support value of 0.006.
of the 12k data points... only make up 1084...
Spcalan14
Should I focus on the greatest number of sets, Support, Confidence, Lift ?
I don't want to predict.. I just want to know what is my 2 most commonly purchased items on the same invoice
sgenzer
hi
@Spcalan14
there are some good materials on association mining on the Academy:
https://academy.rapidminer.com/learn/article/cross-selling-do-you-want-fries-with-that
https://academy.rapidminer.com/learn/video/text-association-rules
Spcalan14
Product 20 is the most commonly purchased product (1042 of 12k ), followed by Product 33 (887 of 12k ).
I would assume that these would be in the mix..
lionelderkrikor
@Spcalan14
,
The "support" is defined by the proportion of transactions T which contain both X and Y.
So I would say that to find "the 2 most commonly purchased items on the same invoice" you have to find the association with the hightest value of "support".(for that you can sort the results of the
Create Association Rules
operator).
Regards,
Lionel
Spcalan14
Question.. How can I see the actual name of the product.. instead of a "Product # ?
My descriptions are "CC-TT", and "CC-TTG".. not numbers..
Spcalan14
I don't know what the output is referring to, since I am not using the exact words Product 1/2/3/ ...
Spcalan14
Thanks
Lionel ...
How can I see the actual Product Description ( instead of Product 1, 2, 3 ) ?
lionelderkrikor
@Spcalan14
,
1. Go to the results of "Association Rules" generated by the operator
Create Association Rules.
2. Sort the table by descending order of "support" by clicking on the name of the column "Support"
3. The first row (Premises and Conclusion) indicates the "2 most commonly purchased items on the same invoice"
Regards,
Lionel
Spcalan14
My question is.. what is
Product 12 and Product 15 ?
Spcalan14
This is very clear..
1. Go to the results of "Association Rules" generated by the operator
Create Association Rules.
2. Sort the table by descending order of "support" by clicking on the name of the column "Support"
3. The first row (Premises and Conclusion) indicates the "2 most commonly purchased items on the same invoice"
But what is Product 12 and Product 15 ?
I need my product names...
lionelderkrikor
@Spcalan14
My screenshots are coming from the RapidMiner Template which are fictive examples and not from your own data...
As said, run the process with your own data and go to the Association Rules results and you will see the 2 most commonly purchased items on the same invoice" of your own data....
If you are lost after this explanation, please share your data...
Regards,
Lionel
Spcalan14
Wow.. don't I feel like a dumbxxx.
See below...
This makes MUCH more sense ( considering this is my data )..
LGL-TEE and CL-TEE makes much more sense....
But since I have 4 groups with same Support (0.017).. the are even..
Capture.JPG
Spcalan14
So the Support is the key indicator..
What does the first column represent ?
Spcalan14
Does the first column represent the number of bundles that included these 2 products ?
If I first column..
Then STP-Tee and CP-TEE have a value of (59)...
Does that mean there were 59 instances of that specific bundle ?
Spcalan14
See below..
Capture.JPG
lionelderkrikor
@Spcalan14
Yes, Support is the key indicator.
I must admit that I don't know what the first column represent...
Regards,
Lionel
Spcalan14
ah ok.. thank you very much for your help!
My first time post and you saved the day !
lionelderkrikor
@Spcalan14
,
After reflexion, the first column is a kind of "Id", the number of the association rules...
By playing with the "
Min. Criterion Value
", you will see that there are more or less association rules :
Regards,
Lionel
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups