Mining Sequential Association rules / Sequential Pattern Mining
Find more posts tagged with
Sort by:
1 - 12 of
121
I am a little bit confued if we have the same in mind. i want to extrat association rules with a given date like the following:
Customer-ID, Product, Date
10150, softdrink, 1.5.2010
10150, fruitveg, 1.5.2010
10236, frozenmeal, 1.5.2010
10236, beer, 15.5.2010
10360, fish, 21.6.2010
10360, cannedveg, 21.6.2010
10360, beer, 26.6.2010
And i need Association Rules like
"If Customer A buys fish and cannedveg on 21.6.2010 , then he will buy beer on 26.6.2010.
Fish and cannedveg on 21.6 => beer on 26.6
if u have missunderstood me , i must appologize for that.
greetings
Lotus
Customer-ID, Product, Date
10150, softdrink, 1.5.2010
10150, fruitveg, 1.5.2010
10236, frozenmeal, 1.5.2010
10236, beer, 15.5.2010
10360, fish, 21.6.2010
10360, cannedveg, 21.6.2010
10360, beer, 26.6.2010
And i need Association Rules like
"If Customer A buys fish and cannedveg on 21.6.2010 , then he will buy beer on 26.6.2010.
Fish and cannedveg on 21.6 => beer on 26.6
if u have missunderstood me , i must appologize for that.
greetings
Lotus
Hmm, this is a very hard problem, because your hypothesis space is extremely large.
"If persons buys some things at sometime, how will this effect his buying in the future?"
A more specific hypothesis:
"If persons buys some things at sometime, how will this effect his buying the next time he enters the shop?"
This problem is a little less hard, but more manageable then the first.
To solve this problem I would convert the data.
edit: This might be possible in rapid miner using the windowing operator, but it is tricky
ID, softdrink, fruitveg, forzenmeal, fish, cannedveg, beer, softdrink2, fruitveg2, forzenmeal2, fish2, cannedveg2, beer2
10150, 1, 1, 0, 0, 0, 0, ?, ?, ?, ?, ?, ? (this guy buy softdrink and fruitveg, no info for next time)
10236, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1 (this guy buy frozen meal, next time beer)
10236, 0, 0, 0, 0, 0, 1, ?, ?, ?, ?, ?, ? (same as last entry, but no info on next next time)
10360, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1 (this guy buy fish and canned veg, and next time buy beer)
10360, 0, 0, 0, 0, 0, 1, ?, ?, ?, ?, ?, ? (same as last entry, but again no next next info)
(You did not give me much data, so you get a lot of ? symbols)
You can run any unsupervised learning algorithm on this data.
If you want to solve the "If persons buys some things at sometime, how will this effect his buying in the future?" problem,
you will get many more attributes in your dataset, it is possible, but unlikely to yield good results.
edit:
you might want to also add the attribute "number of days since last visit"
to account for the fact that shop visits do not occur at equal intervals.
"If persons buys some things at sometime, how will this effect his buying in the future?"
A more specific hypothesis:
"If persons buys some things at sometime, how will this effect his buying the next time he enters the shop?"
This problem is a little less hard, but more manageable then the first.
To solve this problem I would convert the data.
edit: This might be possible in rapid miner using the windowing operator, but it is tricky
ID, softdrink, fruitveg, forzenmeal, fish, cannedveg, beer, softdrink2, fruitveg2, forzenmeal2, fish2, cannedveg2, beer2
10150, 1, 1, 0, 0, 0, 0, ?, ?, ?, ?, ?, ? (this guy buy softdrink and fruitveg, no info for next time)
10236, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1 (this guy buy frozen meal, next time beer)
10236, 0, 0, 0, 0, 0, 1, ?, ?, ?, ?, ?, ? (same as last entry, but no info on next next time)
10360, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1 (this guy buy fish and canned veg, and next time buy beer)
10360, 0, 0, 0, 0, 0, 1, ?, ?, ?, ?, ?, ? (same as last entry, but again no next next info)
(You did not give me much data, so you get a lot of ? symbols)
You can run any unsupervised learning algorithm on this data.
If you want to solve the "If persons buys some things at sometime, how will this effect his buying in the future?" problem,
you will get many more attributes in your dataset, it is possible, but unlikely to yield good results.
edit:
you might want to also add the attribute "number of days since last visit"
to account for the fact that shop visits do not occur at equal intervals.
ok for my task the two definitions of the problem are 'aquivalent'. There i choose ur second managable one:
"If persons buys some things at sometime, how will this effect his buying the next time he enters the shop?"
Do u do this conversion with the Windowing operator?
If i understand it correkt: for every new date u get a new basket. Is that correct?
and then i should perform a FP-Growth on that data?
Could u get me a Workflow for this ? i never used the windowing-operator...
greetings Lotus
"If persons buys some things at sometime, how will this effect his buying the next time he enters the shop?"
Do u do this conversion with the Windowing operator?
If i understand it correkt: for every new date u get a new basket. Is that correct?
and then i should perform a FP-Growth on that data?
Could u get me a Workflow for this ? i never used the windowing-operator...
greetings Lotus
I tried but could not get it to work in RapidMiner, I normally use python for preprocessing like this.
Maybe this code can help, from com.rapidminer.gui.templates.Template@320a80db (market basket)
Maybe this code can help, from com.rapidminer.gui.templates.Template@320a80db (market basket)
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="558" width="567">
<operator activated="true" breakpoints="after" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30"/>
<operator activated="true" class="set_macro" expanded="true" height="76" name="Define Item Count" width="90" x="179" y="30">
<parameter key="macro" value="%{itemCountAttributeName}"/>
<parameter key="value" value="itemCount"/>
</operator>
<operator activated="true" class="set_macro" expanded="true" height="76" name="Define Customer" width="90" x="313" y="30">
<parameter key="macro" value="customerIdAttributeName"/>
<parameter key="value" value="customerId"/>
</operator>
<operator activated="true" class="set_macro" expanded="true" height="76" name="Define Item" width="90" x="447" y="30">
<parameter key="macro" value="itemIdAttributeName"/>
<parameter key="value" value="itemId"/>
</operator>
<operator activated="true" class="aggregate" expanded="true" height="76" name="Aggregate" width="90" x="45" y="210">
<list key="aggregation_attributes">
<parameter key="%{itemCountAttributeName}" value="sum"/>
</list>
<parameter key="group_by_attributes" value="%{customerIdAttributeName}|%{itemIdAttributeName}"/>
</operator>
<operator activated="true" class="pivot" expanded="true" height="76" name="Pivot" width="90" x="179" y="210">
<parameter key="group_attribute" value="%{customerIdAttributeName}"/>
<parameter key="index_attribute" value="%{itemIdAttributeName}"/>
</operator>
<operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="313" y="210">
<parameter key="name" value="%{customerIdAttributeName}"/>
<parameter key="target_role" value="id"/>
</operator>
<operator activated="true" class="fp_growth" expanded="true" height="76" name="FP-Growth" width="90" x="447" y="210"/>
<operator activated="true" class="create_association_rules" expanded="true" height="60" name="Create Association Rules" width="90" x="447" y="345"/>
<connect from_op="Retrieve" from_port="output" to_op="Define Item Count" to_port="through 1"/>
<connect from_op="Define Item Count" from_port="through 1" to_op="Define Customer" to_port="through 1"/>
<connect from_op="Define Customer" from_port="through 1" to_op="Define Item" to_port="through 1"/>
<connect from_op="Define Item" from_port="through 1" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Pivot" to_port="example set input"/>
<connect from_op="Pivot" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
<connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
<connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Ah cool, I googled the paper:
Ramakrishnan Srikant, Rakesh Agrawal (1996). Mining Sequential Patterns: Generalizations and Performance Improvements.
What should be the input to rapidminer?
Like figure 1, or like figure 2, or other?
http://img441.imageshack.us/img441/9206/inputx.jpg

Ramakrishnan Srikant, Rakesh Agrawal (1996). Mining Sequential Patterns: Generalizations and Performance Improvements.
What should be the input to rapidminer?
Like figure 1, or like figure 2, or other?
http://img441.imageshack.us/img441/9206/inputx.jpg

for my task is the input in figure 1 the most suitable
greetings
Lotus
______________________
@ B_Miner
the problem is i should use additional algorithms from weka. I only can use the algorithms from RapidMiner (this comes from the task).
and now it looks like rapidminer cant do a sequential pattern analysis....
but thx for the tip
greetings
Lotus
______________________
@ B_Miner
the problem is i should use additional algorithms from weka. I only can use the algorithms from RapidMiner (this comes from the task).
and now it looks like rapidminer cant do a sequential pattern analysis....
but thx for the tip
greetings
SunnyLotusFlower