I'm searching for the best way to do this and, being fairly new to data modelling, I would appreciate ideas or guidance!
I have two sets of events, A and B, both of which may be triggered by root cause events (set C, which I don't have). See the diagram below. The events in set A may (or may not) lead to events in set B. Set A contains around 10k possible distinct items (of which maybe 500 are particularly useful), and set B contains around 1000 items. There is a time lag between A and B and the closer A is to B, the more relevant the association. A and B are polynomials.
At present I want to develop a prediction model for A->B (what is likely to occur in B given events in A?) However, if there is any way to determine the elements of C from A and B... I'm all ears.
I'm thinking that FP-growth may be a good starting point. Anyone with experience of this?
