"Market Basket Data Format"
svpriyan
New Altair Community Member
Hello Colleagues,
I am having a relation with TIDs, ITEM IDs.
TID ITEM
1 1
1 2
1 3
2 1
3 4
3 5
3 6
Now, I am intended to change that into Market Basket Data Format which might look like
TID ITEM
1 1 2 3
2 1
3 4 5 6
4 1 8
Is that possible to do with RapidMiner?
Could any one help me on this
Thanks
Priyan
I am having a relation with TIDs, ITEM IDs.
TID ITEM
1 1
1 2
1 3
2 1
3 4
3 5
3 6
Now, I am intended to change that into Market Basket Data Format which might look like
TID ITEM
1 1 2 3
2 1
3 4 5 6
4 1 8
Is that possible to do with RapidMiner?
Could any one help me on this
Thanks
Priyan
Tagged:
0
Answers
-
Hi,
this is indeed possible. You should at first binarize your Item attribute using the nominal2binominal operator. You then will get a column for every possible value of item, each Line exactly containing one 1 for an item.
You then could aggregrate over the tid using the aggregation operator, building the sum over examples having the same tid. So there is finally only one row for every transaction, containing the values of sold items in the appropriate attributes.
Greetings,
Sebastian0 -
Hai,
Thanks for the Information, I tried what you explained here, but i still in error. could you suggest to improve it.
ERROR:- TID does not exists.
Thanks
Priyan
<operator name="Root" class="Process" expanded="yes">
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="C:\rapid.csv"/>
</operator>
<operator name="Nominal2Binominal" class="Nominal2Binominal">
</operator>
<operator name="ChangeAttributeRole" class="ChangeAttributeRole">
<parameter key="name" value="ITEM"/>
</operator>
<operator name="Aggregation" class="Aggregation">
<list key="aggregation_attributes">
<parameter key="TID" value="sum"/>
</list>
</operator>
<operator name="Example2AttributePivoting" class="Example2AttributePivoting">
<parameter key="group_attribute" value="TID"/>
<parameter key="index_attribute" value="ITEM"/>
</operator>
<operator name="ResultWriter" class="ResultWriter">
<parameter key="result_file" value="C:\result16.res"/>
</operator>
</operator>
0 -
Hi,
probably it's exactly what it states: TID doesn't exist. Please use a breakpoint before and check if this attribute still exists. But I assume, that it has been binomalised and hence doesn't exist anymore. You have to ensure that only ITEM is binomalised. Thatfore you have to change the role of TID into Id and ensure that ITEM is binominal.
Greetings,
Sebastian0 -
Hai,
Thanks for the Info. I changed according to your feedback though still i am in problem.
for Example2AttributePivoting :- Group Attribute & Index Attribute.
What can i do with these two attributes.
Thanks
<operator name="Root" class="Process" expanded="yes">
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="C:\Documents and Settings\Administrator\Desktop\excel\rapid.csv"/>
</operator>
<operator name="Nominal2Binominal" class="Nominal2Binominal">
</operator>
<operator name="ChangeAttributeRole" class="ChangeAttributeRole">
<parameter key="name" value="TID"/>
<parameter key="target_role" value="id"/>
</operator>
<operator name="Aggregation" class="Aggregation">
<list key="aggregation_attributes">
<parameter key="TID" value="sum"/>
</list>
</operator>
<operator name="Example2AttributePivoting" class="Example2AttributePivoting">
<parameter key="group_attribute" value="ITEM"/>
<parameter key="index_attribute" value="TID"/>
</operator>
<operator name="ResultWriter" class="ResultWriter">
<parameter key="result_file" value="C:\Documents and Settings\Administrator\Desktop\answer16.res"/>
</operator>
</operator>
Priyan
0 -
Hi,
I have no idea, what this operator does. I have never used it. But I think you will not need it in this context. The data should be in the correct format before this operator.
Greetings,
Sebastian0 -
Hai,
I am really sorry to ask again here, i got stuck with this. sorry for troubling you.
the code i used
What i can use for the Group by attribute on the image i attached in the Aggregation. (image 1)
<operator name="Root" class="Process" expanded="yes">
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="C:\Documents and Settings\Administrator\Desktop\excel\rapid.csv"/>
</operator>
<operator name="Nominal2Binominal" class="Nominal2Binominal">
</operator>
<operator name="ChangeAttributeRole" class="ChangeAttributeRole">
<parameter key="name" value="TID"/>
<parameter key="target_role" value="id"/>
</operator>
<operator name="Aggregation" class="Aggregation">
<list key="aggregation_attributes">
<parameter key="TID" value="sum"/>
</list>
</operator>
<operator name="ResultWriter" class="ResultWriter">
<parameter key="result_file" value="C:\Documents and Settings\Administrator\Desktop\answer16.res"/>
</operator>
</operator>
Do i need to add more than this to format the data.
i get finally this result only( image 2)
It would be a great help if you could give a feedback sir.
thanks
priyan
[attachment deleted by admin]0 -
Hi,
starting from your original post (and using the data you provided there) this is the process which performs the desired transformation from transactional data to the basket data format:
<operator name="Root" class="Process" expanded="yes">
<operator name="SimpleExampleSource" class="SimpleExampleSource">
<parameter key="filename" value="C:\Dokumente und Einstellungen\Mierswa\Desktop\market_data.txt"/>
<parameter key="read_attribute_names" value="true"/>
</operator>
<operator name="IdTagging" class="IdTagging">
</operator>
<operator name="ChangeAttributeRole" class="ChangeAttributeRole">
<parameter key="name" value="id"/>
</operator>
<operator name="Example2AttributePivoting" class="Example2AttributePivoting">
<parameter key="group_attribute" value="TID"/>
<parameter key="index_attribute" value="ITEM"/>
</operator>
<operator name="Numerical2Polynominal" class="Numerical2Polynominal">
</operator>
<operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="attribute_name_regex" value="TID"/>
<parameter key="invert_selection" value="true"/>
<operator name="Mapping" class="Mapping">
<parameter key="attributes" value=".*"/>
<list key="value_mappings">
</list>
<parameter key="replace_what" value="?"/>
<parameter key="replace_by" value="false"/>
<parameter key="add_default_mapping" value="true"/>
<parameter key="default_value" value="true"/>
</operator>
</operator>
<operator name="FPGrowth" class="FPGrowth">
</operator>
</operator>
<operator name="Root" class="Process" expanded="yes">
<operator name="SimpleExampleSource" class="SimpleExampleSource">
<parameter key="filename" value="C:\Dokumente und Einstellungen\Mierswa\Desktop\market_data.txt"/>
<parameter key="read_attribute_names" value="true"/>
</operator>
<operator name="IdTagging" class="IdTagging">
</operator>
<operator name="ChangeAttributeRole" class="ChangeAttributeRole">
<parameter key="name" value="id"/>
</operator>
<operator name="Example2AttributePivoting" class="Example2AttributePivoting">
<parameter key="group_attribute" value="TID"/>
<parameter key="index_attribute" value="ITEM"/>
</operator>
<operator name="Numerical2Polynominal" class="Numerical2Polynominal">
</operator>
<operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="attribute_name_regex" value="TID"/>
<parameter key="invert_selection" value="true"/>
<operator name="Mapping" class="Mapping">
<parameter key="attributes" value=".*"/>
<list key="value_mappings">
</list>
<parameter key="replace_what" value="?"/>
<parameter key="replace_by" value="false"/>
<parameter key="add_default_mapping" value="true"/>
<parameter key="default_value" value="true"/>
</operator>
</operator>
<operator name="FPGrowth" class="FPGrowth">
</operator>
</operator>
Please note that you have to adapt the input operator.
Cheers,
Ingo0 -
Dear Ingo,
Thank you very much.
I able to find the way.
thanks
Priyan0