How to serialize log data
atifshaikh4514
New Altair Community Member
I have an audit log data where each tuple represents an event associated with a particular user id and a list of other attributes including both nominal and numerical. What is the best way to transform the data as a set of user web clicks using rapidminer?
Off the top of my head, I can think of quantifiynig all attributes and seralizing it. But my main concern is how to deal with variable length click sequences?
I actually need to create user click profiles as an end result.
Off the top of my head, I can think of quantifiynig all attributes and seralizing it. But my main concern is how to deal with variable length click sequences?
I actually need to create user click profiles as an end result.
Tagged:
0
Answers
-
Hello Atif,
I am not sure if this is directly possible with standard operators (I think there is a Audit / Log file input operator in the Text plugin but I would have to checkout this myself...) If this does not help, maybe you would have to code your own operator for this. But maybe someone else knows a better solution possible with existing operators.
You could of course determine the maximum number of possible events in a sequence and build attributes for the maximum number and set the attributes of shorter sequences to missing values. But I could ask a former colleague who works on sequence mining with RapidMiner how he represents this.But my main concern is how to deal with variable length click sequences?
Cheers,
Ingo0 -
Thanks Ingo for the apt response.
I tried it out, there is an operator under IO->web->Server2LogTransactions in the Text Mining plugin. But this I assume expects a standard web log of an HTTP based server as I dont see any parameters to be set in its options. It obviously doesnot work on my server logs as I am using a custom server communication protocol stack. I will check out the operator development thing. but a ready made solution is always appreciatedmierswa wrote:
I am not sure if this is directly possible with standard operators (I think there is a Audit / Log file input operator in the Text plugin but I would have to checkout this myself...)
I had also thought of thsi approach but missing values, once i build profiles of user clicks, i have to find rarity which on missing values is an added disadvantage.mierswa wrote:
You could of course determine the maximum number of possible events in a sequence and build attributes for the maximum number and set the attributes of shorter sequences to missing values. But I could ask a former colleague who works on sequence mining with RapidMiner how he represents this.
I came to know of some techniques from relational mining where a similar reverse pivoting is used but instead of representing clickstreams as it is, their summaries r saved instead but that doesnot seem applicable for my problem.
still scratching my head....
schones wochenende.
Atif.
0 -
Hi,
as far as I know the operator expects Apache log files but I could be mistaken. So maybe developing your own input operator is the only option right now, sorry.
Cheers,
Ingo0 -
Maybe I am missing a specific detail, but is there any reason why you just don't maintain a list of transactions, each related to a specific user. Then you define a user-session as the subset of events for that user. This is how I handle sequences in my GSP operator plugin for RapidMiner. This is in fact the basic structure that GSP (Srikant, Agrawal) built their algorithm upon.Atif Abdul-Rahman wrote:
I tried it out, there is an operator under IO->web->Server2LogTransactions in the Text Mining plugin. But this I assume expects a standard web log of an HTTP based server as I dont see any parameters to be set in its options. It obviously doesnot work on my server logs as I am using a custom server communication protocol stack. I will check out the operator development thing. but a ready made solution is always appreciated
I had also thought of thsi approach but missing values, once i build profiles of user clicks, i have to find rarity which on missing values is an added disadvantage.
I came to know of some techniques from relational mining where a similar reverse pivoting is used but instead of representing clickstreams as it is, their summaries r saved instead but that doesnot seem applicable for my problem.
still scratching my head....
But maybe you can be a bit more specific about what you intend to do with your data...
Regards,
Christian
0