iterate and extract according to value

vabm
New Altair Community Member
Hi,
I am new to RapidMiner and I am struggling doing some simple iterations. I have a dataset that has different user ratings for a number of products and I need to extract the best product for each user according to these ratings. The file looks something like this:
I know this can be done with 'loop attributes' and macros, but I can't find an example to use as guide.
Any help/guidance would be more than welcome, thanks !!
I am new to RapidMiner and I am struggling doing some simple iterations. I have a dataset that has different user ratings for a number of products and I need to extract the best product for each user according to these ratings. The file looks something like this:
I can extract the best product for each user using something like: read_csv -> select attribute (set user id) -> sort (best to worst) -> filter examples (index=1), but is really inconvenient if I have a lot of users to process.
ID-user,ID-product,Rating
003,040,3
004,330,4
034,330,5
003,032,3
(...)
I know this can be done with 'loop attributes' and macros, but I can't find an example to use as guide.
Any help/guidance would be more than welcome, thanks !!
Tagged:
0
Answers
-
Hi,
I think the easiest way to do this is to combine an aggregate with a join. See attached process. Please be aware that his process produces two lines for a customer if there are two best rated products. You can use either remove duplicates or another aggregate to handle this.
~Martin
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.0.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.0.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" breakpoints="after" class="subprocess" compatibility="7.0.000" expanded="true" height="82" name="Subprocess" width="90" x="45" y="136">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.0.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.0.000" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="7.0.000" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34">
<list key="function_descriptions">
<parameter key="ID-user" value="round(rand()*50)"/>
<parameter key="ID-product" value="round(rand()*10)"/>
<parameter key="Rating" value="round(rand()*15)"/>
</list>
</operator>
<connect from_op="Retrieve Iris" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Generate a fitting example set</description>
</operator>
<operator activated="true" class="multiply" compatibility="7.0.000" expanded="true" height="103" name="Multiply" width="90" x="179" y="136"/>
<operator activated="true" class="aggregate" compatibility="7.0.000" expanded="true" height="82" name="Aggregate" width="90" x="313" y="85">
<list key="aggregation_attributes">
<parameter key="Rating" value="maximum"/>
</list>
<parameter key="group_by_attributes" value="ID-user"/>
</operator>
<operator activated="true" class="join" compatibility="7.0.000" expanded="true" height="82" name="Join" width="90" x="447" y="136">
<parameter key="use_id_attribute_as_key" value="false"/>
<list key="key_attributes">
<parameter key="ID-user" value="ID-user"/>
<parameter key="maximum(Rating)" value="Rating"/>
</list>
</operator>
<connect from_op="Subprocess" from_port="out 1" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Multiply" from_port="output 2" to_op="Join" to_port="right"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="Join" from_port="join" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="105"/>
<portSpacing port="sink_result 2" spacing="42"/>
</process>
</operator>
</process>0 -
That's amazing, thank you so much. Is there any good book or tutorial list you could recommend? Sometimes is difficult to understand the documentation.
And how do I mark this thread as solved?
Cheers0 -
Well, in this special case it is very hard to recommend a book. Joins and Aggregates are standard operations you do in a SQL database. If you are familiar with this way of thinking you can creativly combine them.
In general there are some references like:
https://rapidminer.com/resource/data-mining-masses/ - Very basic
http://www.amazon.com/Exploring-Data-RapidMiner-Andrew-Chisholm/dp/1782169334/ref=sr_1_3?s=books&;ie=UTF8&qid=1454406443&sr=1-3&keywords=rapidminer - A bit more advanced i think, andrew is supporting here in the forums
http://www.amazon.com/Predictive-Analytics-Data-Mining-RapidMiner/dp/0128014601/ref=sr_1_1?s=books&;ie=UTF8&qid=1454406443&sr=1-1&keywords=rapidminer - My favorite if it comes down to learn predictive analytics w/o pure math
I was thinking about putting together some kind of kind with tipps and tricks. I started to do so on my blog. Let's see - maybe i will create some document somewhen soon.
~Martin
P.S: My blog can be found at: http://data-analytics.ghost.io/0 -
Thanks Martin!0