"How to join TID"
wasylni
New Altair Community Member
Hello,
I am trying to find some association rules using RapidMiner 4.4.000
I am connecting to MySQL database and dig data from it by entering below SQL statements:
SELECT
`orders_products`.`orders_id` AS `TID`, `orders_products`.`products_name` AS
`ITEM`
FROM
`orders_products`;
after executing above I am getting data :
TID ITEM
1 Speed 2: Cruise Control
2 Red Corner
3 Speed 2: Cruise Control
4 Microsoft IntelliMouse Explorer
5 The Replacement Killers
5 Courage Under Fire
6 The Matrix
7 There's Something About Mary
7 Speed 2: Cruise Control
8 Under Siege 2 - Dark Territory
8 SWAT 3: Close Quarters Battle
.
.
.
up to 25000 records (15000 transactions)
Can someone advise how to distinct TID to have all products for given TID in one row (like below)
TID ITEM
1 Speed 2: Cruise Control
2 Red Corner
3 Speed 2: Cruise Control
4 Microsoft IntelliMouse Explorer
5 The Replacement Killers, Courage Under Fire
6 The Matrix
7 There's Something About Mary, Speed 2: Cruise Control
8 Under Siege 2 - Dark Territory, SWAT 3: Close Quarters Battle
???
Thanks,
wasyl
I am trying to find some association rules using RapidMiner 4.4.000
I am connecting to MySQL database and dig data from it by entering below SQL statements:
SELECT
`orders_products`.`orders_id` AS `TID`, `orders_products`.`products_name` AS
`ITEM`
FROM
`orders_products`;
after executing above I am getting data :
TID ITEM
1 Speed 2: Cruise Control
2 Red Corner
3 Speed 2: Cruise Control
4 Microsoft IntelliMouse Explorer
5 The Replacement Killers
5 Courage Under Fire
6 The Matrix
7 There's Something About Mary
7 Speed 2: Cruise Control
8 Under Siege 2 - Dark Territory
8 SWAT 3: Close Quarters Battle
.
.
.
up to 25000 records (15000 transactions)
Can someone advise how to distinct TID to have all products for given TID in one row (like below)
TID ITEM
1 Speed 2: Cruise Control
2 Red Corner
3 Speed 2: Cruise Control
4 Microsoft IntelliMouse Explorer
5 The Replacement Killers, Courage Under Fire
6 The Matrix
7 There's Something About Mary, Speed 2: Cruise Control
8 Under Siege 2 - Dark Territory, SWAT 3: Close Quarters Battle
???
Thanks,
wasyl
0
Answers
-
when Nominal2Binominal is applied I am getting
1.0 "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
2.0 "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
3.0 "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
4.0 "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
5.0 "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
5.0 "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
6.0 "false" "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
7.0 "false" "false" "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
7.0 "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
8.0 "false" "false" "false" "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
8.0 "false" "false" "false" "false" "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
9.0 "false" "false" "false" "false" "false" "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
9.0 "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
but still can not Distinct unique TID
0 -
Hi,
you have to use the [tt]Examples2AttributesPivoting[/tt] operator. Presumably, you also have to generate a column consisting of ones to acutally fill the newly created attributes with some values.
Kind regards,
Tobias0 -
Thanks,
can you give me some tip on how to get results like these below (pivoting does not work or I am doing something wrong)
example transaction 5
5.0 "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
5.0 "false" "false" "false" "false" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
i would like to get distinct data (in one row) like
5.0 "false" "false" "false" "true" "true" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false" "false"
0 -
Hello
I studied your problem and tried various approaches, but I wasnt successful, too. The best result I could achieve with the Example2AttributePivoting was this:
... giving data in csv format as in your first post.
<operator name="Root" class="Process" expanded="yes">
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value=""/>
</operator>
<operator name="Numerical2Polynominal" class="Numerical2Polynominal">
</operator>
<operator name="ValueIterator" class="ValueIterator" expanded="yes">
<parameter key="attribute" value="TID"/>
<operator name="MaterializeDataInMemory" class="MaterializeDataInMemory">
</operator>
<operator name="ExampleFilter" class="ExampleFilter">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="TID=%{a}"/>
</operator>
<operator name="IdTagging" class="IdTagging">
</operator>
</operator>
<operator name="ExampleSetMerge" class="ExampleSetMerge">
</operator>
<operator name="Example2AttributePivoting" class="Example2AttributePivoting">
<parameter key="group_attribute" value="TID"/>
<parameter key="index_attribute" value="id"/>
</operator>
</operator>
The result of this approach breaks the one column = one item - policy, which is not preferred in general.
So what now: The only thing I can suggest is to apply Nominal2Binominal, Sorting using TID and then your own selfwritten - operator which aggregates the rows using the boolean "OR"-operator for every TID for every column. Maybe there is an approach using only available operators, but I cannot think of any.
regards,
Steffen
PS @RapidMiner-Team: This is another usecase for scripting, isnt it ?
0 -
How can I build my own operator,
Can I apply somecomplex SQL/PLSQL statements or something similar to distinct mentioned above rows ?
regards,
wasylni0 -
This can be done entirely with MySQL's GROUP_CONCAT operator http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html#function_group-concat, and doesn't need RM at all (assuming I've understood the question correctly):
SELECT
`orders_products`.`orders_id` AS `TID`,
GROUP_CONCAT(`orders_products`.`products_name` ORDER BY 1 SEPARATOR ',') as `ITEM`
FROM
`orders_products`;
GROUP BY `orders_products`.`orders_id`
I use Oracle and MS SQL Server, which don't have GROUP_CONCAT, and so haven't tested this in MySQL, but the link above is from the MySQL documentation, so if it doesn't work, it should be easy to figure out why.
0 -
Hi,
finally I had another look into your issue. I have to correct my former advise to create a column of ones before applying the pivoting operator. Nevertheless you have to create kind of dummy column that has to have different values for each dataset row. Here is the code:
Hope that solves your problem,
<operator name="Root" class="Process" expanded="yes">
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="tdata.csv"/>
</operator>
<operator name="IdTagging" class="IdTagging">
</operator>
<operator name="ChangeAttributeRole" class="ChangeAttributeRole">
<parameter key="name" value="id"/>
</operator>
<operator name="Example2AttributePivoting" class="Example2AttributePivoting">
<parameter key="group_attribute" value="TID"/>
<parameter key="index_attribute" value="ITEM"/>
</operator>
<operator name="Numerical2Polynominal" class="Numerical2Polynominal">
</operator>
<operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="attribute_name_regex" value="TID"/>
<parameter key="invert_selection" value="true"/>
<operator name="Mapping" class="Mapping">
<parameter key="attributes" value=".*"/>
<list key="value_mappings">
</list>
<parameter key="replace_what" value="?"/>
<parameter key="replace_by" value="false"/>
<parameter key="add_default_mapping" value="true"/>
<parameter key="default_value" value="true"/>
</operator>
</operator>
</operator>
Tobias0 -
Hello,
Thanks for all your replies guys!
I have tested all proposed solutions to my problem and it looks like the best for further progres will be simplest solution proposed by keith,
as I am running out of memory on the one proposed by steffen, and there seems to be some problem with Tobias solution(or maybe its me ???).
What I have done is:
connected to MySQL database
I`ve applied sql<parameter key="query" value="SELECT `orders_products`.`orders_id` AS `TID`, GROUP_CONCAT(`orders_products`.`products_id` ORDER BY 1 SEPARATOR ',') as `ITEM` FROM `orders_products` GROUP BY `orders_products`.`orders_id`;"/>
and I have result as shown below:
<parameter key="label_attribute" value="ITEM"/>
<parameter key="id_attribute" value="TID"/>
Row No TID ITEM
1 1.0 18
2 2.0 14
3 3.0 18
4 4.0 26
5 5.0 4,16
6 6.0 6
7 7.0 18,19
8 8.0 10,21
9 9.0 7,14
10 10.0 26
11 11.0 12,17
12 12.0 12
13 13.0 1
14 14.0 2,14,23,25
.
.
.
up to transactions15000
can you please advise what do I have to apply next in order to get some rules of above data set?
Thanks,
wasylni
0 -
I think I might have found solution:
I will apply
1)GROUP_CONCAT `orders_products`.`products_id` as ITEM
2)split
3)apply FPGrowth
4)appl AM
what do you think ?<operator name="Root" class="Process" expanded="yes">
<operator name="DatabaseExampleSource" class="DatabaseExampleSource">
SESION DETAILS HERE
<parameter key="query" value="SELECT GROUP_CONCAT(`orders_products`.`products_id` ORDER BY 1 SEPARATOR ',') as `ITEM` FROM `orders_products` GROUP BY `orders_products`.`orders_id`;"/>
</operator>
<operator name="Split" class="Split">
<parameter key="attributes" value="ITEM"/>
<parameter key="apply_to_special_features" value="true"/>
<parameter key="split_mode" value="unordered_split"/>
</operator>
<operator name="FPGrowth" class="FPGrowth">
<parameter key="find_min_number_of_itemsets" value="false"/>
<parameter key="min_support" value="0.0010"/>
</operator>
<operator name="AssociationRuleGenerator" class="AssociationRuleGenerator">
<parameter key="keep_frequent_item_sets" value="true"/>
<parameter key="min_confidence" value="0.5"/>
</operator>
</operator>0 -
Hi,
if it works (?) this seems to be a solution to your problem! The steps after having loaded the data seems to be ok, if the data is loaded as you have posted.
Kind regards,
Tobias0