Generalized Sequential Patterns - Howto

Arandor
Arandor New Altair Community Member
edited November 5 in Community Q&A
Hello,

I need a little help in the GSP operator. I can not make it work, no matter how I tried. (Everything -the data, the operators, and the parameters- seems to me OK, but if I run the process I don't see anything on the GSP result screen, just a blank grey page, with the label "GSPSet" -no results, or like that) I connected the both the example and the patterns set to the result "slots". (The example set is working correctly)

For example, I have data, look like this:
(A few movie rental datas)

Person ID, Movie ID, Sequence
pers1,movie1,1
pers1,movie2,2
pers1,StarWars1,3
pers1,StarWars2,4
pers1,StarWars3,5
pers1,movie3,6
pers2,movie4,1
pers2,movie13,2
pers2,StarWars1,3
pers2,StarWars2,4
pers2,StarWars3,5
pers2,movie53,6
pers3,StarWars1,1
pers3,movie2,2
pers3,movie5,3
pers3,StarWars2,4
pers3,StarWars3,5
pers4,movie5,1
pers4,movie63,2
pers4,movie2,3
pers5,movie12,1
pers5,movie54,2
pers5,movie1,3
pers5,StarWars1,4
pers5,movie5,5
pers6,movie45,1
pers6,movie4,2
pers7,StarWars1,1
pers7,StarWars2,2
pers7,StarWars3,3
pers7,movie44,4
pers8,movie3,1
pers8,movie5,2
pers8,movie8,3
pers9,movie1,1
pers9,movie11,1
pers9,movie56,2
pers9,movie34,3
pers9,StarWars1,4
pers9,movie5,5
pers9,StarWars2,6
pers9,StarWars3,7
pers9,movie4,8
pers9,StarWars1,9
pers9,StarWars2,10
pers9,StarWars3,11
pers10,movie1,1
And the xml source of the process I made (a data reading operator, a Nominal to Numerical converter, and the GSP operator)

<process version="5.0">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.0.10" expanded="true" name="Process">
   <process expanded="true" height="640" width="748">
     <operator activated="true" class="read_arff" compatibility="5.0.10" expanded="true" height="60" name="Read ARFF" width="90" x="45" y="120">
       <parameter key="data_file" value="C:\Users\Csabi\Desktop\Diploma\Példák\Filmpélda\Filmpélda.arff"/>
       <list key="data_set_meta_data_information"/>
     </operator>
     <operator activated="true" class="nominal_to_numerical" compatibility="5.0.10" expanded="true" height="94" name="Nominal to Numerical" width="90" x="246" y="120">
       <parameter key="attribute_filter_type" value="single"/>
       <parameter key="attribute" value="Seq"/>
     </operator>
     <operator activated="true" class="generalized_sequential_patterns" compatibility="5.0.10" expanded="true" height="76" name="GSP" width="90" x="447" y="120">
       <parameter key="customer_id" value="person ID"/>
       <parameter key="time_attribute" value="Seq"/>
       <parameter key="min_support" value="0.5"/>
       <parameter key="window_size" value="4.0"/>
       <parameter key="max_gap" value="5.0"/>
       <parameter key="min_gap" value="0.0"/>
     </operator>
     <connect from_op="Read ARFF" from_port="output" to_op="Nominal to Numerical" to_port="example set input"/>
     <connect from_op="Nominal to Numerical" from_port="example set output" to_op="GSP" to_port="example set"/>
     <connect from_op="GSP" from_port="example set" to_port="result 1"/>
     <connect from_op="GSP" from_port="patterns" to_port="result 2"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="90"/>
     <portSpacing port="sink_result 3" spacing="36"/>
   </process>
 </operator>
</process>
In WEKA I managed to make it work with the example data I posted, and I want to see in Rapid Miner similar results (or.. any results of the GSP algorithm to proof, the algorithm/operator is working correctly) (I think the GSP of the Rapid Miner would be more customable if it would work)

For example to an result I expect:
Seq. pattern: (Star Wars 1, Star Wars 2, Star Wars 3)
(If somebody rent the Star Wars 1, there is a big chanche that he will rent the other SW movies )

I hope you understand my problem, and can you help/answer me.
Thank you!
Tagged:

Answers

  • land
    land New Altair Community Member
    Hi,
    did you try to lower the minimal support? 0.5 seems to high for your example data.

    Greetings,
      Sebastian
  • Arandor
    Arandor New Altair Community Member
    Hello!

    Yes, i did. (Of course) I tried smaller values for minimal support. (But if i lowered to much, i get a warning message : "PM WARNING: Found only 3.0 sequences. Together with the small minimal support, this could result in very many patterns and a long calculation time")  -okay, this is understandable.

    Anyway, the example data set, I posted is the smaller version of my example data set. And I tried to make the GSP work with a real database, with a lot of datas. No results again. I think, the problem is not the small data set, or the value of the minimal support, because I have a lot of datas and I tried  smaller minimal values to. (And tried various windowing values to)
    It is possible that the GSP does not work?

    Thank you for the answer!
      Csaba

    Ps.: Sorry for my bad english :)