Generalized Sequential Patterns - Howto
Arandor
New Altair Community Member
Hello,
I need a little help in the GSP operator. I can not make it work, no matter how I tried. (Everything -the data, the operators, and the parameters- seems to me OK, but if I run the process I don't see anything on the GSP result screen, just a blank grey page, with the label "GSPSet" -no results, or like that) I connected the both the example and the patterns set to the result "slots". (The example set is working correctly)
For example, I have data, look like this:
(A few movie rental datas)
For example to an result I expect:
Seq. pattern: (Star Wars 1, Star Wars 2, Star Wars 3)
(If somebody rent the Star Wars 1, there is a big chanche that he will rent the other SW movies )
I hope you understand my problem, and can you help/answer me.
Thank you!
I need a little help in the GSP operator. I can not make it work, no matter how I tried. (Everything -the data, the operators, and the parameters- seems to me OK, but if I run the process I don't see anything on the GSP result screen, just a blank grey page, with the label "GSPSet" -no results, or like that) I connected the both the example and the patterns set to the result "slots". (The example set is working correctly)
For example, I have data, look like this:
(A few movie rental datas)
And the xml source of the process I made (a data reading operator, a Nominal to Numerical converter, and the GSP operator)
Person ID, Movie ID, Sequence
pers1,movie1,1
pers1,movie2,2
pers1,StarWars1,3
pers1,StarWars2,4
pers1,StarWars3,5
pers1,movie3,6
pers2,movie4,1
pers2,movie13,2
pers2,StarWars1,3
pers2,StarWars2,4
pers2,StarWars3,5
pers2,movie53,6
pers3,StarWars1,1
pers3,movie2,2
pers3,movie5,3
pers3,StarWars2,4
pers3,StarWars3,5
pers4,movie5,1
pers4,movie63,2
pers4,movie2,3
pers5,movie12,1
pers5,movie54,2
pers5,movie1,3
pers5,StarWars1,4
pers5,movie5,5
pers6,movie45,1
pers6,movie4,2
pers7,StarWars1,1
pers7,StarWars2,2
pers7,StarWars3,3
pers7,movie44,4
pers8,movie3,1
pers8,movie5,2
pers8,movie8,3
pers9,movie1,1
pers9,movie11,1
pers9,movie56,2
pers9,movie34,3
pers9,StarWars1,4
pers9,movie5,5
pers9,StarWars2,6
pers9,StarWars3,7
pers9,movie4,8
pers9,StarWars1,9
pers9,StarWars2,10
pers9,StarWars3,11
pers10,movie1,1
In WEKA I managed to make it work with the example data I posted, and I want to see in Rapid Miner similar results (or.. any results of the GSP algorithm to proof, the algorithm/operator is working correctly) (I think the GSP of the Rapid Miner would be more customable if it would work)
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.0.10" expanded="true" name="Process">
<process expanded="true" height="640" width="748">
<operator activated="true" class="read_arff" compatibility="5.0.10" expanded="true" height="60" name="Read ARFF" width="90" x="45" y="120">
<parameter key="data_file" value="C:\Users\Csabi\Desktop\Diploma\Példák\Filmpélda\Filmpélda.arff"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="nominal_to_numerical" compatibility="5.0.10" expanded="true" height="94" name="Nominal to Numerical" width="90" x="246" y="120">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Seq"/>
</operator>
<operator activated="true" class="generalized_sequential_patterns" compatibility="5.0.10" expanded="true" height="76" name="GSP" width="90" x="447" y="120">
<parameter key="customer_id" value="person ID"/>
<parameter key="time_attribute" value="Seq"/>
<parameter key="min_support" value="0.5"/>
<parameter key="window_size" value="4.0"/>
<parameter key="max_gap" value="5.0"/>
<parameter key="min_gap" value="0.0"/>
</operator>
<connect from_op="Read ARFF" from_port="output" to_op="Nominal to Numerical" to_port="example set input"/>
<connect from_op="Nominal to Numerical" from_port="example set output" to_op="GSP" to_port="example set"/>
<connect from_op="GSP" from_port="example set" to_port="result 1"/>
<connect from_op="GSP" from_port="patterns" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="90"/>
<portSpacing port="sink_result 3" spacing="36"/>
</process>
</operator>
</process>
For example to an result I expect:
Seq. pattern: (Star Wars 1, Star Wars 2, Star Wars 3)
(If somebody rent the Star Wars 1, there is a big chanche that he will rent the other SW movies )
I hope you understand my problem, and can you help/answer me.
Thank you!
Tagged:
0
Answers
-
Hi,
did you try to lower the minimal support? 0.5 seems to high for your example data.
Greetings,
Sebastian0 -
Hello!
Yes, i did. (Of course) I tried smaller values for minimal support. (But if i lowered to much, i get a warning message : "PM WARNING: Found only 3.0 sequences. Together with the small minimal support, this could result in very many patterns and a long calculation time") -okay, this is understandable.
Anyway, the example data set, I posted is the smaller version of my example data set. And I tried to make the GSP work with a real database, with a lot of datas. No results again. I think, the problem is not the small data set, or the value of the minimal support, because I have a lot of datas and I tried smaller minimal values to. (And tried various windowing values to)
It is possible that the GSP does not work?
Thank you for the answer!
Csaba
Ps.: Sorry for my bad english
0