Generalized Sequential Patterns (GSP) dataset format

abdero
abdero New Altair Community Member
edited November 2024 in Community Q&A
Hello,

i have seen some posts about this subject but i didn't see any good answer.

Can anyone say the format of the input dataset for GSP???

The only format that i have some results (bad ones) is like this:

Client_id, time , feature 1, feature 2, ....
1,1,0,1,0,...
1,2,1,1,1,....
2,1,0,0,0
Tagged:

Answers

  • land
    land New Altair Community Member
    Hi,
    this is already the correct format, you only need to turn the feature 1, feature 2, ... attributes into binominal ones. Use the Numerical To Binominal for this.

    Greetings,
      Sebastian
  • willgouldin
    willgouldin New Altair Community Member
    abdero,
    Can you post the XML of how you got your data in the format:

    Client_id, time , feature 1, feature 2, ....
    1,1,0,1,0,...
    1,2,1,1,1,....
    2,1,0,0,0

    Everytime I try to pivot my data from this format:
    Customer, Time, Item
    1,1,a
    1,1,b
    1,2,a
    2,1,c
    etc

    I fail to get your format. 
    Thanks,
    Will
  • MariusHelf
    MariusHelf New Altair Community Member
    Hi, unfortunately, the Pivot operator is currently only capable of grouping by one single attribute, so you have to combine client id and time before the Pivot operator and separate them afterwards. Please have a look at the attached process.

    Best regards,
    Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.005">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="5.3.005" expanded="true" height="76" name="Generate Data" width="90" x="45" y="30">
            <process expanded="true">
              <operator activated="true" class="generate_transaction_data" compatibility="5.3.005" expanded="true" height="60" name="Generate Transaction Data" width="90" x="45" y="30"/>
              <operator activated="true" class="set_role" compatibility="5.3.005" expanded="true" height="76" name="Set Role" width="90" x="180" y="30">
                <parameter key="name" value="Id"/>
                <list key="set_additional_roles"/>
              </operator>
              <operator activated="true" class="generate_id" compatibility="5.3.005" expanded="true" height="76" name="Generate ID" width="90" x="315" y="30"/>
              <operator activated="true" class="rename" compatibility="5.3.005" expanded="true" height="76" name="Rename" width="90" x="450" y="30">
                <parameter key="old_name" value="id"/>
                <parameter key="new_name" value="time"/>
                <list key="rename_additional_attributes"/>
              </operator>
              <operator activated="true" class="set_role" compatibility="5.3.005" expanded="true" height="76" name="Set Role (2)" width="90" x="585" y="30">
                <parameter key="name" value="time"/>
                <parameter key="target_role" value="id"/>
                <list key="set_additional_roles"/>
              </operator>
              <connect from_op="Generate Transaction Data" from_port="output" to_op="Set Role" to_port="example set input"/>
              <connect from_op="Set Role" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
              <connect from_op="Generate ID" from_port="example set output" to_op="Rename" to_port="example set input"/>
              <connect from_op="Rename" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
              <connect from_op="Set Role (2)" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="generate_concatenation" compatibility="5.3.005" expanded="true" height="76" name="Generate Concatenation" width="90" x="179" y="30">
            <parameter key="first_attribute" value="Id"/>
            <parameter key="second_attribute" value="time"/>
          </operator>
          <operator activated="true" class="pivot" compatibility="5.3.005" expanded="true" height="76" name="Pivot" width="90" x="313" y="30">
            <parameter key="group_attribute" value="Id_time"/>
            <parameter key="index_attribute" value="Item"/>
            <parameter key="skip_constant_attributes" value="false"/>
          </operator>
          <operator activated="true" class="split" compatibility="5.3.005" expanded="true" height="76" name="Split" width="90" x="447" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Id_time"/>
            <parameter key="split_pattern" value="_"/>
          </operator>
          <connect from_op="Generate Data" from_port="out 1" to_op="Generate Concatenation" to_port="example set input"/>
          <connect from_op="Generate Concatenation" from_port="example set output" to_op="Pivot" to_port="example set input"/>
          <connect from_op="Pivot" from_port="example set output" to_op="Split" to_port="example set input"/>
          <connect from_op="Split" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • willgouldin
    willgouldin New Altair Community Member
    Marius,
    Thanks for the timely response, I will examine the code you provided.

    Will
  • willgouldin
    willgouldin New Altair Community Member
    Marius,
    I actually applied your logic to my SQL and concat'd before rapid miner which speeds up processing.

    The trouble I have now is, when I pivot and attempt to replace missing values, that process doesn't work.

    I result in a green lighted process but still have '?' values in my pivot table.

    Example of my data:

    Time_Customer Item Count
    1_9 a 1
    2_9 b 1
    3_9 c 1
    3_9 d 1
    3_9 e 1
    3_9 f 1
    3_9 e 1
    3_9 b 1
    4_9 c 1
    4_9 b 1
    1_22 c 1
    1_27 c 1
    1_27 a 1
    1_27 g 1
    2_27 c 1
    2_27 h 1
    2_27 g 1
    3_27 c 1


    My code is below:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.005">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="5.3.005" expanded="true" height="60" name="Read Excel" width="90" x="112" y="30">
            <parameter key="excel_file" value="C:\MYFILE"/>
            <parameter key="sheet_number" value="2"/>
            <parameter key="imported_cell_range" value="A1:C32256"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Time_Customer.true.polynominal.attribute"/>
              <parameter key="1" value="Item.true.polynominal.attribute"/>
              <parameter key="2" value="Count.true.polynominal.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="pivot" compatibility="5.3.005" expanded="true" height="76" name="Pivot" width="90" x="246" y="30">
            <parameter key="group_attribute" value="Time_Customer"/>
            <parameter key="index_attribute" value="Item"/>
            <parameter key="consider_weights" value="false"/>
            <parameter key="skip_constant_attributes" value="false"/>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="5.3.005" expanded="true" height="94" name="Replace Missing Values" width="90" x="447" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Time_Customer"/>
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="default" value="value"/>
            <list key="columns"/>
            <parameter key="replenishment_value" value="0"/>
          </operator>
          <operator activated="true" class="split" compatibility="5.3.005" expanded="true" height="76" name="Split" width="90" x="648" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Time_Customer"/>
            <parameter key="split_pattern" value="_"/>
          </operator>
          <operator activated="true" class="rename" compatibility="5.3.005" expanded="true" height="76" name="Rename" width="90" x="782" y="30">
            <parameter key="old_name" value="Time_Customer_1"/>
            <parameter key="new_name" value="Time"/>
            <list key="rename_additional_attributes">
              <parameter key="Time_Customer_2" value="Customer"/>
            </list>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Pivot" to_port="example set input"/>
          <connect from_op="Pivot" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Split" to_port="example set input"/>
          <connect from_op="Split" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>



    I greatly appreciate any help you all can offer.

    Will
  • MariusHelf
    MariusHelf New Altair Community Member
    Hi,

    please examine your Replace Missing Values operator. You are replacing the values of only one attribute, but in reality you probably want to replace missing values in *all* attributes, right?

    Best regards,
    Marius
  • willgouldin
    willgouldin New Altair Community Member
    Marius,
    Thank you for your help, I got it to work.  The code for reference is provided below.  I do have one more snag, the output of the GSP Set works in a Mac OSX install but not in Windows 7. 

    In the Win7, I see summary data in the results overview tab, but when moving to the GSPSet(GSP) tab, all I see are the annotations options.  In the Mac OSX instance, everything appears as one would expect.

    Not sure if I should submit a bug report or what.

    Thanks for your help!

    Will
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.007">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="5.3.007" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
            <parameter key="excel_file" value="C:myfile.xls"/>
            <parameter key="sheet_number" value="2"/>
            <parameter key="imported_cell_range" value="A1:C32256"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Time_Customer.true.polynominal.attribute"/>
              <parameter key="1" value="Item.true.polynominal.attribute"/>
              <parameter key="2" value="Count.true.binominal.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="pivot" compatibility="5.3.007" expanded="true" height="76" name="Pivot" width="90" x="179" y="30">
            <parameter key="group_attribute" value="Time_Customer"/>
            <parameter key="index_attribute" value="Item"/>
            <parameter key="consider_weights" value="false"/>
            <parameter key="skip_constant_attributes" value="false"/>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="5.3.007" expanded="true" height="94" name="Replace Missing Values" width="90" x="313" y="30">
            <parameter key="attribute" value="Time_Customer"/>
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="default" value="value"/>
            <list key="columns"/>
            <parameter key="replenishment_value" value="0"/>
          </operator>
          <operator activated="true" class="split" compatibility="5.3.007" expanded="true" height="76" name="Split" width="90" x="45" y="255">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Time_Customer"/>
            <parameter key="split_pattern" value="_"/>
          </operator>
          <operator activated="true" class="rename" compatibility="5.3.007" expanded="true" height="76" name="Rename" width="90" x="179" y="255">
            <parameter key="old_name" value="Time_Customer_1"/>
            <parameter key="new_name" value="Time"/>
            <list key="rename_additional_attributes">
              <parameter key="Time_Customer_2" value="Customer"/>
            </list>
          </operator>
          <operator activated="true" class="nominal_to_numerical" compatibility="5.3.007" expanded="true" height="94" name="Nominal to Numerical" width="90" x="380" y="255">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Time"/>
            <parameter key="coding_type" value="unique integers"/>
            <list key="comparison_groups"/>
          </operator>
          <operator activated="true" class="generalized_sequential_patterns" compatibility="5.3.007" expanded="true" height="76" name="GSP" width="90" x="581" y="210">
            <parameter key="customer_id" value="Customer"/>
            <parameter key="time_attribute" value="Time"/>
            <parameter key="min_support" value="0.1"/>
            <parameter key="window_size" value="1.0"/>
            <parameter key="max_gap" value="18.0"/>
            <parameter key="min_gap" value="13.0"/>
            <parameter key="positive_value" value="1"/>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Pivot" to_port="example set input"/>
          <connect from_op="Pivot" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Split" to_port="example set input"/>
          <connect from_op="Split" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_op="GSP" to_port="example set"/>
          <connect from_op="GSP" from_port="patterns" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelf
    MariusHelf New Altair Community Member
    Hey Will,

    are you using the RapidMiner 5.3.7 on both your machines?

    Best regards,
    Marius
  • willgouldin
    willgouldin New Altair Community Member
    Yes Sir.  Updated this morning and it still produces the "error".

    Will
  • MariusHelf
    MariusHelf New Altair Community Member
    I could reproduce that behavior under windows, and it is obviously a bug. I created an internal bug report for that, so no need to submit a bug from your side.

    Best regards,
    Marius
  • willgouldin
    willgouldin New Altair Community Member
    Outstanding Marius,
    Thank you for your assistance!

    Will
  • willgouldin
    willgouldin New Altair Community Member
    Marius,
    Another question concerning GSP.  I receive the same result sets regardless of my Window, Min and Max Gap setting. 
    My raw data is using days between events as the time element.

    Is this a function of the same bug we previously found?


    Thanks,
    Will

    Marius wrote:

    I could reproduce that behavior under windows, and it is obviously a bug. I created an internal bug report for that, so no need to submit a bug from your side.

    Best regards,
    Marius
  • MariusHelf
    MariusHelf New Altair Community Member
    I can't imagine that the the two issues are related.
    Did you inspect your data and make sure that the entered values actually would make a difference?

    Best regards,
    Marius
  • Marco_Boeck
    Marco_Boeck New Altair Community Member
    Hi,

    we've just fixed the "empty GSP results" bug. You can either checkout the latest SVN version (see here, updated around midnight) and build RapidMiner yourself, or wait for the next release.

    Regards,
    Marco
  • willgouldin
    willgouldin New Altair Community Member
    Marco,
    Thanks for the response, I'll check my updates!
    Will
  • lvane
    lvane New Altair Community Member
    Hello dear Rapid I developers,

    my GSP empty problem still exists till now, how can i update my Rapidminer? or do I need to wait until next official update? Could anyone tell me at what time?

    Thank you!

  • willgouldin
    willgouldin New Altair Community Member
    I am curious as to when the next release will be that covers this as well.

    Thanks,
    Will
  • MariusHelf
    MariusHelf New Altair Community Member
    Will, we don't have any release schedule targeted at the great public yet.

    Best regards,
    Marius
  • willgouldin
    willgouldin New Altair Community Member
    Not to dig up an old topic, but I am still having trouble with the data layout for the GSP operator.

    I have combined the time (in day of year format) with my customer ID per your instructions.  I have a column for item and a binomial value for the "qty".

    When I import the excel sheet, pivot, replace the missing values with value "false" and then split, everything looks good.

    When I attempt to convert the split columns for time and customer from nominal to numerical per the GSP operator requirements, my pivot is ruined. 

    I expect :

    Customer, time, item a, item b, ......
    1,1,TRUE, FALSE
    1,3,TRUE, FALSE
    2,4, FALSE, FALSE
                      etc

    however it turns time into multiple columns within the pivot as well.

    I can provide a larger example data if required for trouble shooting.
    Any help that can be provided is appreciated.

    Will
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.015">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="5.3.015" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
            <parameter key="excel_file" value="C:\Users\me\Desktop\input.xls"/>
            <parameter key="sheet_number" value="2"/>
            <parameter key="imported_cell_range" value="A1:C7768"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="time_customer.true.polynominal.attribute"/>
              <parameter key="1" value="Item.true.polynominal.attribute"/>
              <parameter key="2" value="Qty.true.binominal.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="pivot" compatibility="5.3.015" expanded="true" height="76" name="Pivot" width="90" x="45" y="120">
            <parameter key="group_attribute" value="time_customer"/>
            <parameter key="index_attribute" value="Item"/>
            <parameter key="consider_weights" value="false"/>
            <parameter key="skip_constant_attributes" value="false"/>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="5.3.015" expanded="true" height="94" name="Replace Missing Values" width="90" x="45" y="210">
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="default" value="value"/>
            <list key="columns"/>
            <parameter key="replenishment_value" value="false"/>
          </operator>
          <operator activated="true" class="split" compatibility="5.3.015" expanded="true" height="76" name="Split" width="90" x="179" y="210">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="time_customer"/>
            <parameter key="split_pattern" value="_"/>
          </operator>
          <operator activated="true" class="nominal_to_numerical" compatibility="5.3.015" expanded="true" height="94" name="Nominal to Numerical" width="90" x="313" y="210">
            <parameter key="create_view" value="true"/>
            <list key="comparison_groups"/>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Pivot" to_port="example set input"/>
          <connect from_op="Pivot" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Split" to_port="example set input"/>
          <connect from_op="Split" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • usct01
    usct01 New Altair Community Member
    Hi
    Do we have any operator to apply GSP rules

    Thanks
  • MBM
    MBM New Altair Community Member

    this is a really good question