Customer basket analysis (Data preparation)

sara20
sara20 New Altair Community Member
edited November 5 in Community Q&A
Hello 
I work on customer basket analysis. I have a dataset like this for each customer basket:

Name               Number                 price           customer
pen                       2                            10                  1
bread                   10                           20                  1
fish                       2                             30                  1
...................................................................................................
bread                      3                          15                   2
salt                          1                            2                   2
...................................................................................................
pen                          3                           15                  3
bag                          1                            50                 3

Now I want to make columns for each Name and delete duplicate Names so I should change each row to the column then analyse them but I don't know how it is possible with RM? Also I should separate each customer basket from each other in to new rows. So it must be like this: ( for each Name which is in the basket of customer I should have "True" and which is not "False" in the row of that customer.)

Customer        pen           bread          fish          salt            bag  
1                      True            True            True         False         False
2                       False         True            False         True          False
3                      True            False          False          False        True

Can any body guide me? Also it is big data

Thank you in advance
Sara

Answers

  • Pavithra_Rao
    Pavithra_Rao New Altair Community Member
    Hi @sara20,

    Here's a good tutorial on how to prepare the data and implement market basket analysis in RapidMiner.

    https://academy.rapidminer.com/learn/article/cross-selling-do-you-want-fries-with-that

    Cheers,
    Pavithra
  • sara20
    sara20 New Altair Community Member
    @Pavithra_Rao

    Hello

    Thank you for the answer and the link was great but I don't know how to clean my data? 
    I don't know how to change the place of rows to columns?
    Please read my question and the sample data again.

    Regards
    Sara
  • Pavithra_Rao
    Pavithra_Rao New Altair Community Member
    Hi @sara20,

    The easiest way is to use Pivot option in 'Turbo Prep'



    Then just replace missing with false and no-missing with true (you can do this in Turbo Prep or open the process in 'Design' mode and use operators). Rename the columns accordingly.

    I hope this helps.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.7.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.7.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value="prao@rapidminer.com"/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.7.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="Name,Number,price,Customer&#10;pen,2,10,1&#10;bread,10,20,1&#10;fish,2,30,1&#10;bread,3,15,2&#10;salt,1,2,2&#10;pen, 3,15,3&#10;bag,1,50,3&#10;"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="blending:pivot" compatibility="9.7.000" expanded="true" height="82" name="Pivot" origin="GENERATED_TURBOPREP" width="90" x="179" y="34">
            <parameter key="group_by_attributes" value="Customer"/>
            <parameter key="column_grouping_attribute" value="Name"/>
            <list key="aggregation_attributes">
              <parameter key="Number" value="count"/>
            </list>
            <parameter key="use_default_aggregation" value="false"/>
            <parameter key="default_aggregation_function" value="first"/>
          </operator>
          <operator activated="true" class="order_attributes" compatibility="9.7.000" expanded="true" height="82" name="Reorder Attributes" origin="GENERATED_TURBOPREP" width="90" x="313" y="34">
            <parameter key="sort_mode" value="user specified"/>
            <parameter key="attribute_ordering" value="Customer|count(Number)_pen|count(Number)_bread|count(Number)_fish|count(Number)_salt|count(Number)_bag"/>
            <parameter key="use_regular_expressions" value="false"/>
            <parameter key="handle_unmatched" value="append"/>
            <parameter key="sort_direction" value="ascending"/>
          </operator>
          <operator activated="true" class="numerical_to_polynominal" compatibility="9.7.000" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="447" y="34">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Customer"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="real"/>
            <parameter key="block_type" value="value_series"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_series_end"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="numerical_to_binominal" compatibility="9.7.000" expanded="true" height="82" name="Numerical to Binominal" width="90" x="581" y="34">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="real"/>
            <parameter key="block_type" value="value_series"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_series_end"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="min" value="0.0"/>
            <parameter key="max" value="0.1"/>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="9.7.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="715" y="34">
            <parameter key="return_preprocessing_model" value="false"/>
            <parameter key="create_view" value="false"/>
            <parameter key="attribute_filter_type" value="value_type"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="binominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="default" value="value"/>
            <list key="columns"/>
            <parameter key="replenishment_value" value="false"/>
          </operator>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Pivot" to_port="input"/>
          <connect from_op="Pivot" from_port="output" to_op="Reorder Attributes" to_port="example set input"/>
          <connect from_op="Reorder Attributes" from_port="example set output" to_op="Numerical to Polynominal" to_port="example set input"/>
          <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
          <connect from_op="Numerical to Binominal" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    Cheers
    Pavithra
  • sara20
    sara20 New Altair Community Member
    edited June 2020
    @Pavithra_Rao

    Thank you very much for your help but how can I change each row to the column?


    Regards
    Sara

  • hbajpai
    hbajpai New Altair Community Member
    @sara20

    The best way to achieve that in my opinion is to convert the columns to text and use value map and replace operators.



    Check out the XML of the process.
    <?xml version="1.0" encoding="UTF-8"?><process version="9.7.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.7.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.7.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="34">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="Name ,              Number      ,           price        ,   customer&#10;pen,                 2   ,                         10  ,                1&#10;bread,                 10     ,                      20  ,                1&#10;fish,                     2    ,                         30   ,               1&#10;bread,                   3    ,                      15  ,                 2&#10;salt,                      1  ,                          2   ,                2&#10;pen,                      3     ,                      15   ,               3&#10;bag,                      1  ,                          50    ,             3"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="blending:pivot" compatibility="9.7.000" expanded="true" height="82" name="Pivot" width="90" x="246" y="34">
            <parameter key="group_by_attributes" value="customer"/>
            <parameter key="column_grouping_attribute" value="Name"/>
            <list key="aggregation_attributes">
              <parameter key="Number" value="count"/>
            </list>
            <parameter key="use_default_aggregation" value="false"/>
            <parameter key="default_aggregation_function" value="first"/>
          </operator>
          <operator activated="true" class="subprocess" compatibility="9.7.000" expanded="true" height="82" name="Rename" width="90" x="380" y="34">
            <process expanded="true">
              <operator activated="true" class="rename_by_replacing" compatibility="9.7.000" expanded="true" height="82" name="Rename by Replacing" width="90" x="45" y="34">
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="replace_what" value="count"/>
              </operator>
              <operator activated="true" class="rename_by_replacing" compatibility="9.7.000" expanded="true" height="82" name="Rename by Replacing (2)" width="90" x="179" y="34">
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="replace_what" value="Number"/>
              </operator>
              <operator activated="true" class="rename_by_replacing" compatibility="9.7.000" expanded="true" height="82" name="Rename by Replacing (3)" width="90" x="313" y="34">
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="replace_what" value="[_()]"/>
              </operator>
              <connect from_port="in 1" to_op="Rename by Replacing" to_port="example set input"/>
              <connect from_op="Rename by Replacing" from_port="example set output" to_op="Rename by Replacing (2)" to_port="example set input"/>
              <connect from_op="Rename by Replacing (2)" from_port="example set output" to_op="Rename by Replacing (3)" to_port="example set input"/>
              <connect from_op="Rename by Replacing (3)" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="set_role" compatibility="9.7.000" expanded="true" height="82" name="Set Role" width="90" x="514" y="34">
            <parameter key="attribute_name" value="customer"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="numerical_to_polynominal" compatibility="9.7.000" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="648" y="34">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="numeric"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="real"/>
            <parameter key="block_type" value="value_series"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_series_end"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="9.7.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="782" y="34">
            <parameter key="return_preprocessing_model" value="false"/>
            <parameter key="create_view" value="false"/>
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="default" value="value"/>
            <list key="columns"/>
            <parameter key="replenishment_value" value="False"/>
          </operator>
          <operator activated="true" class="replace" compatibility="9.7.000" expanded="true" height="82" name="Replace" width="90" x="916" y="34">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="1"/>
            <parameter key="replace_by" value="True"/>
          </operator>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Pivot" to_port="input"/>
          <connect from_op="Pivot" from_port="output" to_op="Rename" to_port="in 1"/>
          <connect from_op="Rename" from_port="out 1" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Numerical to Polynominal" to_port="example set input"/>
          <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Replace" to_port="example set input"/>
          <connect from_op="Replace" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>