But my main concern is how to deal with variable length click sequences?
mierswa wrote:I am not sure if this is directly possible with standard operators (I think there is a Audit / Log file input operator in the Text plugin but I would have to checkout this myself...)
mierswa wrote:You could of course determine the maximum number of possible events in a sequence and build attributes for the maximum number and set the attributes of shorter sequences to missing values. But I could ask a former colleague who works on sequence mining with RapidMiner how he represents this.
Atif Abdul-Rahman wrote:I tried it out, there is an operator under IO->web->Server2LogTransactions in the Text Mining plugin. But this I assume expects a standard web log of an HTTP based server as I dont see any parameters to be set in its options. It obviously doesnot work on my server logs as I am using a custom server communication protocol stack. I will check out the operator development thing. but a ready made solution is always appreciated I had also thought of thsi approach but missing values, once i build profiles of user clicks, i have to find rarity which on missing values is an added disadvantage. I came to know of some techniques from relational mining where a similar reverse pivoting is used but instead of representing clickstreams as it is, their summaries r saved instead but that doesnot seem applicable for my problem. still scratching my head....