How to serialize log data

atifshaikh4514
atifshaikh4514 New Altair Community Member
edited November 5 in Community Q&A
I have an audit log data where each tuple represents an event associated with a particular user id and a list of other attributes including both nominal and numerical. What is the best way to transform the data as a set of user web clicks using rapidminer?

Off the top of my head, I can think of quantifiynig all attributes and seralizing it. But my main concern is how to deal with variable length click sequences?

I actually need to create user click profiles as an end result.
Tagged:

Answers

  • IngoRM
    IngoRM New Altair Community Member
    Hello Atif,

    I am not sure if this is directly possible with standard operators (I think there is a Audit / Log file input operator in the Text plugin but I would have to checkout this myself...) If this does not help, maybe you would have to code your own operator for this. But maybe someone else knows a better solution possible with existing operators.
    But my main concern is how to deal with variable length click sequences?
    You could of course determine the maximum number of possible events in a sequence and build attributes for the maximum number and set the attributes of shorter sequences to missing values. But I could ask a former colleague who  works on sequence mining with RapidMiner how he represents this.

    Cheers,
    Ingo
  • atifshaikh4514
    atifshaikh4514 New Altair Community Member
    Thanks Ingo for the apt response.
    mierswa wrote:

    I am not sure if this is directly possible with standard operators (I think there is a Audit / Log file input operator in the Text plugin but I would have to checkout this myself...)
    I tried it out, there is an operator under IO->web->Server2LogTransactions in the Text Mining plugin. But this I assume expects a standard web log of an  HTTP based server as I dont see any parameters to be set in its options. It obviously doesnot work on my server logs as I am using a custom server communication protocol stack. I will check out the operator development thing. but a ready made solution is always appreciated :D
    mierswa wrote:

    You could of course determine the maximum number of possible events in a sequence and build attributes for the maximum number and set the attributes of shorter sequences to missing values. But I could ask a former colleague who  works on sequence mining with RapidMiner how he represents this.
    I had also thought of thsi approach but missing values, once i build profiles of user clicks, i have to find rarity which on missing values is an added disadvantage.
    I came to know of some techniques from relational mining where a similar reverse pivoting is used but instead of representing clickstreams as it is, their summaries r saved instead but that doesnot seem applicable for my problem.
    still scratching my head....

    schones wochenende.
    Atif.

  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    as far as I know the operator expects Apache log files but I could be mistaken. So maybe developing your own input operator is the only option right now, sorry.

    Cheers,
    Ingo
  • chris
    chris New Altair Community Member
    Atif Abdul-Rahman wrote:

    I tried it out, there is an operator under IO->web->Server2LogTransactions in the Text Mining plugin. But this I assume expects a standard web log of an  HTTP based server as I dont see any parameters to be set in its options. It obviously doesnot work on my server logs as I am using a custom server communication protocol stack. I will check out the operator development thing. but a ready made solution is always appreciated :D

    I had also thought of thsi approach but missing values, once i build profiles of user clicks, i have to find rarity which on missing values is an added disadvantage.
    I came to know of some techniques from relational mining where a similar reverse pivoting is used but instead of representing clickstreams as it is, their summaries r saved instead but that doesnot seem applicable for my problem.
    still scratching my head....
    Maybe I am missing a specific detail, but is there any reason why you just don't maintain a list of transactions, each related to a specific user. Then you define a user-session as the subset of events for that user. This is how I handle sequences in my GSP operator plugin for RapidMiner. This is in fact the basic structure that GSP (Srikant, Agrawal) built their algorithm upon.

    But maybe you can be a bit more specific about what you intend to do with your data...

    Regards,
        Christian