GSP Output Format
swissruss
New Altair Community Member
Hi,
I got the GSP operator running nicely with a pre-processed data from the transaction data generator. My question is: how can I use the generated patterns? For Frequent Item Sets, new attributes can be generated for each item set, flagging whether an example supports that itemset - is there an equivalent for the GSP sequential patterns or can the patterns only be output to results at present?
Thanks for any help anyone can offer!
Russ
I got the GSP operator running nicely with a pre-processed data from the transaction data generator. My question is: how can I use the generated patterns? For Frequent Item Sets, new attributes can be generated for each item set, flagging whether an example supports that itemset - is there an equivalent for the GSP sequential patterns or can the patterns only be output to results at present?
Thanks for any help anyone can offer!
Russ
Tagged:
0
Answers
-
Hi Russ,
I unfortunately have no answer to your question, but I will have the same question as I get the GSP running for my data.
This is exactly, why I am addressing you directly as it seems you already have quite some experience with the GSP, and I hope you don't mind.
I just started working with the GSP algorithm and ran into a problem when it comes to displaying the reuslts.
I ran a rather simple main process consisting of a retrieve-data-operator and the gsp-operator.
Regarding the data format, I believe, from what I have seen in earlier discussions, that my data is in the right format for the GSP:
Customer | Time_label | Sequence
customerA, 1, TypA
customerA, 2, TypA
customerA, 3, TypB
customerB, 1, TypB
customerB, 2, TypA
customerB, 3, TypB
customerC, 1, TypA
customerC, 2, TypA
customerC, 3, TypB
etc.
Although, the GSP-operator seems to work fine (no error messages), the result patterns seem not being correctly displayed to me. All I get is:
0.320: <Sequence> <Sequence>
0.135: <Sequence> <Sequence> <Sequence>
0.040: <Sequence> <Sequence> <Sequence> <Sequence>
0.012: <Sequence> <Sequence> <Sequence> <Sequence> <Sequence>
What I have expected or should be displayed instead is e.g.:
0.320: <TypA> <TypA>
0.135: <TypA> <TypA> <TypB>
Why is only the column-name displayed in the patterns instead of the actual values?
What do I have to adjust or change?
Any help is highly appreciated. Thank you in advance.
Bregads
Stefan0 -
Hi Stefan,
Very strange! Your input data looks correct to me - what are your parameter settings? Be warned that I currently have a bug posted relating to GSP (http://bugzilla.rapid-i.com/show_bug.cgi?id=936) as I can't understand the results I'm getting! But I'm sure the guys at Rapid-i will set me straight or fix the bug if it is one, so let me know your parameter values and I'll try to get you as far as I am!
Regards,
Russ
P.S. If it's ok for you, we can continue in this thread - can you link to it from yours? Thanks.0 -
Hi Russ,
the results I described above I reached with the following parameters:
window size=11 (as my data is a sequence of 11 years)
max gap=11
min gap=0
min support=0.01
I also tested various other parameters, however, the way results have been displayed has not changed.
Another thing that strikes me:
meanwhile I managed to get the gsp running with weka. there, even with higher min support several hunderts of patterns (all seem reasonable) are returned. With RM I only get four patterns, which I don't understand.
Do you have any ideas?
Thanx
Stefan
0