"Market Basket Data Format"

svpriyan
svpriyan New Altair Community Member
edited November 5 in Community Q&A
Hello Colleagues,
I am having a relation with TIDs, ITEM IDs.

TID  ITEM
1      1
1      2
  1      3
2      1
3      4
3        5
3        6

Now,  I am intended to change that into Market Basket Data Format which might  look like

TID    ITEM
1      1 2 3
2      1
3      4 5 6
4      1 8

Is that possible to do with RapidMiner?
Could any one help me on this

Thanks
Priyan

Answers

  • land
    land New Altair Community Member
    Hi,
    this is indeed possible. You should at first binarize your Item attribute using the nominal2binominal operator. You then will get a column for every possible value of item, each Line exactly containing one 1 for an item.
    You then could aggregrate over the tid using the aggregation operator, building the sum over examples having the same tid. So there is finally only one row for every transaction, containing the values of sold items in the appropriate attributes.

    Greetings,
      Sebastian
  • svpriyan
    svpriyan New Altair Community Member
    Hai,
    Thanks for the Information, I tried what you explained here, but i still in error. could you suggest to improve it.
    ERROR:- TID does not exists.
    Thanks
    Priyan


    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="C:\rapid.csv"/>
        </operator>
        <operator name="Nominal2Binominal" class="Nominal2Binominal">
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="ITEM"/>
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="TID" value="sum"/>
            </list>
        </operator>
        <operator name="Example2AttributePivoting" class="Example2AttributePivoting">
            <parameter key="group_attribute" value="TID"/>
            <parameter key="index_attribute" value="ITEM"/>
        </operator>
        <operator name="ResultWriter" class="ResultWriter">
            <parameter key="result_file" value="C:\result16.res"/>
        </operator>
    </operator>

  • land
    land New Altair Community Member
    Hi,
    probably it's exactly what it states: TID doesn't exist. Please use a breakpoint before and check if this attribute still exists. But I assume, that it has been binomalised and hence doesn't exist anymore. You have to ensure that only ITEM is binomalised. Thatfore you have to change the role of TID into Id and ensure that ITEM is binominal.

    Greetings,
      Sebastian
  • svpriyan
    svpriyan New Altair Community Member
    Hai,
    Thanks for the Info. I changed according to your feedback though still  i am in problem.
    for Example2AttributePivoting :- Group Attribute & Index Attribute.
    What can i do with these two attributes.

    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="C:\Documents and Settings\Administrator\Desktop\excel\rapid.csv"/>
        </operator>
        <operator name="Nominal2Binominal" class="Nominal2Binominal">
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="TID"/>
            <parameter key="target_role" value="id"/>
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="TID" value="sum"/>
            </list>
        </operator>
        <operator name="Example2AttributePivoting" class="Example2AttributePivoting">
            <parameter key="group_attribute" value="ITEM"/>
            <parameter key="index_attribute" value="TID"/>
        </operator>
        <operator name="ResultWriter" class="ResultWriter">
            <parameter key="result_file" value="C:\Documents and Settings\Administrator\Desktop\answer16.res"/>
        </operator>
    </operator>
    Thanks

    Priyan
  • land
    land New Altair Community Member
    Hi,
    I have no idea, what this operator does. I have never used it. But I think you will not need it in this context. The data should be in the correct format before this operator.

    Greetings,
      Sebastian
  • svpriyan
    svpriyan New Altair Community Member
    Hai,
    I am really sorry to ask again here, i got stuck with this. sorry for troubling you.
    the code i used

    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="C:\Documents and Settings\Administrator\Desktop\excel\rapid.csv"/>
        </operator>
        <operator name="Nominal2Binominal" class="Nominal2Binominal">
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="TID"/>
            <parameter key="target_role" value="id"/>
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="TID" value="sum"/>
            </list>
        </operator>
        <operator name="ResultWriter" class="ResultWriter">
            <parameter key="result_file" value="C:\Documents and Settings\Administrator\Desktop\answer16.res"/>
        </operator>
    </operator>
    What i can use for the Group by attribute on the image i attached in the Aggregation. (image 1)
    Do i need to add more than this to format the data.
    i get finally this result only( image 2)

    It would be a great help if you could give a feedback sir.

    thanks
    priyan


    [attachment deleted by admin]
  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    starting from your original post (and using the data you provided there) this is the process which performs the desired transformation from transactional data to the basket data format:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="SimpleExampleSource" class="SimpleExampleSource">
            <parameter key="filename" value="C:\Dokumente und Einstellungen\Mierswa\Desktop\market_data.txt"/>
            <parameter key="read_attribute_names" value="true"/>
        </operator>
        <operator name="IdTagging" class="IdTagging">
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="id"/>
        </operator>
        <operator name="Example2AttributePivoting" class="Example2AttributePivoting">
            <parameter key="group_attribute" value="TID"/>
            <parameter key="index_attribute" value="ITEM"/>
        </operator>
        <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
        </operator>
        <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="attribute_name_regex" value="TID"/>
            <parameter key="invert_selection" value="true"/>
            <operator name="Mapping" class="Mapping">
                <parameter key="attributes" value=".*"/>
                <list key="value_mappings">
                </list>
                <parameter key="replace_what" value="?"/>
                <parameter key="replace_by" value="false"/>
                <parameter key="add_default_mapping" value="true"/>
                <parameter key="default_value" value="true"/>
            </operator>
        </operator>
        <operator name="FPGrowth" class="FPGrowth">
        </operator>
    </operator>
    <operator name="Root" class="Process" expanded="yes">
        <operator name="SimpleExampleSource" class="SimpleExampleSource">
            <parameter key="filename" value="C:\Dokumente und Einstellungen\Mierswa\Desktop\market_data.txt"/>
            <parameter key="read_attribute_names" value="true"/>
        </operator>
        <operator name="IdTagging" class="IdTagging">
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="id"/>
        </operator>
        <operator name="Example2AttributePivoting" class="Example2AttributePivoting">
            <parameter key="group_attribute" value="TID"/>
            <parameter key="index_attribute" value="ITEM"/>
        </operator>
        <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
        </operator>
        <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
            <parameter key="condition_class" value="attribute_name_filter"/>
            <parameter key="attribute_name_regex" value="TID"/>
            <parameter key="invert_selection" value="true"/>
            <operator name="Mapping" class="Mapping">
                <parameter key="attributes" value=".*"/>
                <list key="value_mappings">
                </list>
                <parameter key="replace_what" value="?"/>
                <parameter key="replace_by" value="false"/>
                <parameter key="add_default_mapping" value="true"/>
                <parameter key="default_value" value="true"/>
            </operator>
        </operator>
        <operator name="FPGrowth" class="FPGrowth">
        </operator>
    </operator>

    Please note that you have to adapt the input operator.

    Cheers,
    Ingo
  • svpriyan
    svpriyan New Altair Community Member
    Dear Ingo,
    Thank you very much.
    I able to find the way.
    thanks
    Priyan