"How can I Iterate by dividing the database"
Anki
New Altair Community Member
Hi all,
I am new to RM. I have a database (SQL Server) with 1 millon bank transactions of 1400 customers. I want to find out outliers (In case of Transaction amount) customerwise. I read all data usung batch processing into Database in RM.
Can anyone please suggest me, how can I get transactions data customer wise and Iterate the process for each customer.
Thank you
I am new to RM. I have a database (SQL Server) with 1 millon bank transactions of 1400 customers. I want to find out outliers (In case of Transaction amount) customerwise. I read all data usung batch processing into Database in RM.
Can anyone please suggest me, how can I get transactions data customer wise and Iterate the process for each customer.
Thank you
0
Answers
-
Hi Anki,
how are the customers identified in your database? Do they have some sort of ID? If so, you can simply loop over these IDs either using a list of all customer IDs (maybe generated by a simple SQL query) or using the "Loop Values" operator for the ID attribute (if you loaded the entire database to RapidMiner). Inside this loop you can use "Filter Examples" to consider only the transactions from this customer and then start your outlier detection.
I would prefer just loading the data from a single customer by defining a proper query when importing the data ("Read Database" operator allows this). In this case you don't need to filter the examples and start analysing the data instantly.
This is just a simple way you could iterate over all of the customers...
Regards
Matthias0 -
Hi Matthias,
Thank you verymuch for your reply. Your solution exactly as per my requirements.
Customer Ids are like this
CUST_ID TRAN_ID TRAN_AMT
------------------------------------------------------------------
25051610 TM2134 5400
25051610 TT3245 12390
25051610 CD9807 120
25051610 CD235 1298011
......
25051610 CF9087 12
833282731 ........... ..........
......
319021623
From your reply I got as ..First connect RM to read Database, Write Query ( Seclect ............Where Cust_ID=25051610). Outliers belong to that customer.
After that How can I repeat for every customer? This question may be silly, But please help me.....Because I am new to RM...I am unable to find the process to repeat.
Please help me
Thank you once again Matthias0 -
Hi Anki,
if you don't want to load the whole table from database at once, you can use two "Read Database" operators. I would try it this way: first "Read Database" delivers all customer IDs (SELECT DISTINCT Cust_ID ...). Then you can use the "Loop Examples" operator to execute the embedded process of this operator for every row/customer from your list. You will need to extract the current ID via "Extract Macro" and start a second database query ("Read Database" again) using the macro for the customer ID to filter the desired entries (WHERE clause).
Regards
Matthias0 -
Hi Matthias,
The solution is seems to be simple but I have struggled for whole day but I am unable to solve this, becuse I am new to RM and not so familier with these operators.
Can you please give me an example process to solve this, so that It will help me a lot.
Thank you Matthias
Yours
Anki0 -
Hi Anki,
just built a quick example without testing it, since I don't have the time to create some data. I described a process like this:<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Your outlier detection has to be applied after the inner "Read Database" operator inside "Loop Examples". If you don't write back to database inside the loop, you will have to append the multiple example sets afterwards. But the "Append" operator is very memory consuming, so I would prefer incremental database storage.
<process version="5.1.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.008" expanded="true" name="Process">
<process expanded="true" height="607" width="773">
<operator activated="true" class="read_database" compatibility="5.1.008" expanded="true" height="60" name="Read Database" width="90" x="45" y="30">
<parameter key="connection" value="localDB"/>
<parameter key="query" value="SELECT DISTINCT `Cust_ID` FROM `tabellenname`"/>
<enumeration key="parameters"/>
</operator>
<operator activated="true" class="loop_examples" compatibility="5.1.008" expanded="true" height="94" name="Loop Examples" width="90" x="179" y="30">
<process expanded="true" height="607" width="773">
<operator activated="true" class="extract_macro" compatibility="5.1.008" expanded="true" height="60" name="Extract Macro" width="90" x="45" y="30">
<parameter key="macro" value="customer_id"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="attribute_name" value="Cust_ID"/>
<parameter key="example_index" value="%{example}"/>
</operator>
<operator activated="true" class="read_database" compatibility="5.1.008" expanded="true" height="60" name="Read Database (2)" width="90" x="45" y="120">
<parameter key="connection" value="localDB"/>
<parameter key="query" value="SELECT * FROM `tabellenname` WHERE `Cust_ID`=%{customer_id}"/>
<enumeration key="parameters"/>
</operator>
<connect from_port="example set" to_op="Extract Macro" to_port="example set"/>
<connect from_op="Extract Macro" from_port="example set" to_port="example set"/>
<connect from_op="Read Database (2)" from_port="output" to_port="output 1"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_example set" spacing="0"/>
<portSpacing port="sink_output 1" spacing="72"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read Database" from_port="output" to_op="Loop Examples" to_port="example set"/>
<connect from_op="Loop Examples" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Regards
Matthias0 -
Hi Matthias,
Thank you verymuch. It is workig perfectly.
Thanks again for your valuable time
Yours
Anki0 -
Hi Matthies,
Thanks a lot for your continues support for me in solving this problem.
Here while I am trying to append the output into Database ( with write Database operator ), it showing error after 4 iterations (I checked using break point)
The error is like
Database error occured: Data Truncation
Error may be because of lack of previlages....
It is working fine when I tried to write into CSV files ( into different file each iteration time).
What could be the reason...
Thank you Matthias
Yours
Anki
0