Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

Modelling events with loose association

I'm searching for the best way to do this and, being fairly new to data modelling, I would appreciate ideas or guidance!

I have two sets of events, A and B, both of which may be triggered by root cause events (set C, which I don't have). See the diagram below. The events in set A may (or may not) lead to events in set B. Set A contains around 10k possible distinct items (of which maybe 500 are particularly useful), and set B contains around 1000 items. There is a time lag between A and B and the closer A is to B, the more relevant the association. A and B are polynomials.

At present I want to develop a prediction model for A->B (what is likely to occur in B given events in A?) However, if there is any way to determine the elements of C from A and B... I'm all ears.

I'm thinking that FP-growth may be a good starting point. Anyone with experience of this?

Find more posts tagged with

AI Studio

Accepted answers

All comments

Thomas_Ott

So this sounds like market basket analysis provided your C set does have a connection with A and B sets.

So what you;ll need to do is load your C data set that contain A and B instances and use a Numerical to Binomal (or some other coversion operator) to set your data set to true and falses. Then feed it into the FP-growth algo and Association ruless operator.

Telcontar120

But if you really don't have any access to events "C" (as your original post implies) and it is instead some kind of hypothetical root cause, then you will have to directly model based on A and B, which you can also do using FP_Growth as @Thomas_Ott explains.

mizunooto

Thanks. As mentioned, I don't have access to, or information about, set C. The FP-growth algorithm might be appropriate, though it makes for a lot of binomial fields from my long list of polynomial data items. The other thing is that FP-growth seems to be looking at one set of data and trying to find associations within that set (potential combinations of items within a transaction), rather than associations between items in different sets. I guess what I'm really looking for is clusters of A relating to clusters of B, but I'm not sure if there is an appropriate model to use here.