Data Preprocessing - Need urgently help
fdhhaki
New Altair Community Member
Hello Everbody..
i will transform a given data, but i dont know how can i do it in Rapidminer.
Given data:
Client Date Product
Client 1 14.02.1980 Product1
Client 1 14.02.1980 Product 2
Client 1 14.02.1980 Product 3
Client 1 14.02.1980 Product 4
Client 1 14.02.1980 Product 5
Client 1 13.02.1934 Product 1
Client 1 13.02.1934 Product 2
Client 1 13.02.1934 Product 3
Client 1 13.02.1934 Product 4
Client 3 14.02.1934 Product 1
Client 3 14.02.1934 Product 2
Client 3 14.02.1934 Product 3
Client 4 15.02.1934 Product 1
Client 4 15.02.1934 Product 2
Client 5 16.02.1934 Product 1
this is what i want..
Client Date Product1 Product 2 Product 3 Product 4 Product 5
Client 1 14.02.1980 1 1 1 1 1
Client 1 13.02.1934 1 1 1 1 0
Client 3 14.02.1934 1 1 1 0 0
Client 4 15.02.1934 1 1 0 0 0
Client 5 16.02.1934 1 0 0 0 0
I would be very happy if someone can help me. It is very urgent and important
Greetings!
i will transform a given data, but i dont know how can i do it in Rapidminer.
Given data:
Client Date Product
Client 1 14.02.1980 Product1
Client 1 14.02.1980 Product 2
Client 1 14.02.1980 Product 3
Client 1 14.02.1980 Product 4
Client 1 14.02.1980 Product 5
Client 1 13.02.1934 Product 1
Client 1 13.02.1934 Product 2
Client 1 13.02.1934 Product 3
Client 1 13.02.1934 Product 4
Client 3 14.02.1934 Product 1
Client 3 14.02.1934 Product 2
Client 3 14.02.1934 Product 3
Client 4 15.02.1934 Product 1
Client 4 15.02.1934 Product 2
Client 5 16.02.1934 Product 1
this is what i want..
Client Date Product1 Product 2 Product 3 Product 4 Product 5
Client 1 14.02.1980 1 1 1 1 1
Client 1 13.02.1934 1 1 1 1 0
Client 3 14.02.1934 1 1 1 0 0
Client 4 15.02.1934 1 1 0 0 0
Client 5 16.02.1934 1 0 0 0 0
I would be very happy if someone can help me. It is very urgent and important
Greetings!
Tagged:
0
Answers
-
Nominal to Binominal operator
in data transformation, type conversion0 -
hi,
thank you for your answer.
yes, i can use it for type conversion.
But how can i get this table structure?
In Example: If a Client buys some Articles on 14.02.1980, it represents one Data set.
If the same Client buys on another day articles, it represents onother Data set..and so on..
Can you show me an Example-Workflow in Rapidminer?
Thanks!!0 -
Hi there,
I think this may help..
Here's the data...
Client, Date, Product
Client 1, 14.02.1980, Product 1
Client 1, 14.02.1980, Product 2
Client 1, 14.02.1980, Product 3
Client 1, 14.02.1980, Product 4
Client 1, 14.02.1980, Product 5
Client 1, 13.02.1934, Product 1
Client 1, 13.02.1934, Product 2
Client 1, 13.02.1934, Product 3
Client 1, 13.02.1934, Product 4
Client 3, 14.02.1934, Product 1
Client 3, 14.02.1934, Product 2
Client 3, 14.02.1934, Product 3
Client 4, 15.02.1934, Product 1
Client 4, 15.02.1934, Product 2
Client 5, 16.02.1934, Product 1
And here's the code..
Hope so, good weekend to all!
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.0.11" expanded="true" name="Process">
<process expanded="true" height="251" width="748">
<operator activated="true" class="read_csv" compatibility="5.0.11" expanded="true" height="60" name="Read CSV" width="90" x="46" y="58">
<parameter key="file_name" value="C:\Documents and Settings\Administrator.KNOWLEDG-P6715Y\My Documents\RM5\a.csv"/>
<parameter key="column_separators" value=","/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="nominal_to_binominal" compatibility="5.0.11" expanded="true" height="94" name="Nominal to Binominal" width="90" x="179" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Product"/>
<parameter key="use_underscore_in_name" value="true"/>
</operator>
<operator activated="true" class="nominal_to_numerical" compatibility="5.0.11" expanded="true" height="94" name="Nominal to Numerical" width="90" x="313" y="30">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value="Pr.*"/>
</operator>
<operator activated="true" class="aggregate" compatibility="5.0.11" expanded="true" height="76" name="Aggregate" width="90" x="447" y="30">
<list key="aggregation_attributes">
<parameter key="Product_Product 1" value="sum"/>
<parameter key="Product_Product 2" value="sum"/>
<parameter key="Product_Product 3" value="sum"/>
</list>
<parameter key="group_by_attributes" value="Client|Date"/>
</operator>
<operator activated="true" class="rename_by_replacing" compatibility="5.0.11" expanded="true" height="76" name="Rename by Replacing" width="90" x="581" y="30">
<parameter key="replace_what" value="sum\(Product_|\)"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
<connect from_op="Nominal to Numerical" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
<connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0 -
goood..thank you very much haddock!
good weekend !
0 -
My pleasure!
Thanks for acknowledging, far too often folks don't bother to do that.
Have fun!
PS After I posted I thought it might be better to aggregate the data, here's how...<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.0.11" expanded="true" name="Process">
<process expanded="true" height="251" width="748">
<operator activated="true" class="read_csv" compatibility="5.0.11" expanded="true" height="60" name="Read CSV" width="90" x="46" y="58">
<parameter key="file_name" value="C:\Documents and Settings\Administrator.KNOWLEDG-P6715Y\My Documents\RM5\a.csv"/>
<parameter key="column_separators" value=","/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="nominal_to_binominal" compatibility="5.0.11" expanded="true" height="94" name="Nominal to Binominal" width="90" x="179" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Product"/>
<parameter key="use_underscore_in_name" value="true"/>
</operator>
<operator activated="true" class="nominal_to_numerical" compatibility="5.0.11" expanded="true" height="94" name="Nominal to Numerical" width="90" x="313" y="30">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value="Pr.*"/>
</operator>
<operator activated="true" class="aggregate" compatibility="5.0.11" expanded="true" height="76" name="Aggregate" width="90" x="447" y="30">
<list key="aggregation_attributes">
<parameter key="Product_Product 1" value="sum"/>
<parameter key="Product_Product 2" value="sum"/>
<parameter key="Product_Product 3" value="sum"/>
</list>
<parameter key="group_by_attributes" value="Client|Date"/>
</operator>
<operator activated="true" class="rename_by_replacing" compatibility="5.0.11" expanded="true" height="76" name="Rename by Replacing" width="90" x="581" y="30">
<parameter key="replace_what" value="sum\(Product_|\)"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
<connect from_op="Nominal to Numerical" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
<connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
Hello..
the workflow works correct, but there is a little Problem!
Here is the data:
Client Date Product
Client 1 14.02.1980 Product 1
Client 1 14.02.1980 Product 1
Client 1 14.02.1980 Product 1
Client 1 15.02.1980 Product 1
Client 1 14.02.1980 Product 3
Client 1 14.02.1980 Product 4
Client 1 14.02.1980 Product 5
Client 1 13.02.1934 Product 1
Client 1 13.02.1934 Product 2
Client 1 13.02.1934 Product 3
Client 1 13.02.1934 Product 4
Client 3 14.02.1934 Product 1
Client 3 14.02.1934 Product 2
Client 3 14.02.1934 Product 3
Client 4 15.02.1934 Product 1
Client 4 15.02.1934 Product 2
Client 5 16.02.1934 Product 1
So the Output in the Workflow is:
Row Client Date Prod.1 Prod.2 Prod.3 Prod.4 Prod5
1 Client 1 14.02.1980 3.0 0.0 1.0 1.0 1.0
2 Client 1 15.02.1980 1.0 0.0 0.0 0.0 0.0
3 Client 1 13.02.1934 1.0 1.0 1.0 1.0 0.0
4 Client 3 14.02.1934 1.0 1.0 1.0 0.0 0.0
5 Client 4 15.02.1934 1.0 1.0 0.0 0.0 0.0
6 Client 5 16.02.1934 1.0 0.0 0.0 0.0 0.0
It Aggregates the number of the Products..
So what i want to do is..group by Date (it´s correct here) and..:
1. if a Cust. buys a few Products on a same Date, it is one Dataset ( Transaktion) for the Table. ( so i will know, wich Products are bought
together) -->correct
2. So the grouping by date is here correkt.. because if the same Client buys another day, it is a new row and a new Transaction.
MY PROBLEM IS:
--> I dont want the sum of the Product.. i just want a "1" for buy.. "0" for not buy
how can i do this? daddock??0 -
Hi there,
You can add a discretizing operator, which puts values in bands, like this...<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.0.11" expanded="true" name="Process">
<process expanded="true" height="251" width="815">
<operator activated="true" class="read_csv" compatibility="5.0.11" expanded="true" height="60" name="Read CSV" width="90" x="46" y="58">
<parameter key="file_name" value="C:\Documents and Settings\Administrator.KNOWLEDG-P6715Y\My Documents\RM5\a.csv"/>
<parameter key="column_separators" value=","/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="nominal_to_binominal" compatibility="5.0.11" expanded="true" height="94" name="Nominal to Binominal" width="90" x="179" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Product"/>
<parameter key="use_underscore_in_name" value="true"/>
</operator>
<operator activated="true" class="nominal_to_numerical" compatibility="5.0.11" expanded="true" height="94" name="Nominal to Numerical" width="90" x="313" y="30">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value="Pr.*"/>
</operator>
<operator activated="true" class="aggregate" compatibility="5.0.11" expanded="true" height="76" name="Aggregate" width="90" x="447" y="30">
<list key="aggregation_attributes">
<parameter key="Product_Product 1" value="sum"/>
<parameter key="Product_Product 2" value="sum"/>
<parameter key="Product_Product 3" value="sum"/>
</list>
<parameter key="group_by_attributes" value="Client|Date"/>
</operator>
<operator activated="true" class="rename_by_replacing" compatibility="5.0.11" expanded="true" height="76" name="Rename by Replacing" width="90" x="581" y="30">
<parameter key="replace_what" value="sum\(Product_|\)"/>
</operator>
<operator activated="true" class="discretize_by_user_specification" compatibility="5.0.11" expanded="true" height="94" name="Discretize" width="90" x="701" y="34">
<list key="classes">
<parameter key="0" value="0.0"/>
<parameter key="1" value="Infinity"/>
</list>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
<connect from_op="Nominal to Numerical" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
<connect from_op="Rename by Replacing" from_port="example set output" to_op="Discretize" to_port="example set input"/>
<connect from_op="Discretize" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0