iterate and extract according to value

vabm · February 2016

Hi,
I am new to RapidMiner and I am struggling doing some simple iterations. I have a dataset that has different user ratings for a number of products and I need to extract the best product for each user according to these ratings. The file looks something like this:


ID-user,ID-product,Rating
003,040,3
004,330,4
034,330,5
003,032,3
(...)

I can extract the best product for each user using something like: read_csv -> select attribute (set user id) -> sort (best to worst) -> filter examples (index=1), but is really inconvenient if I have a lot of users to process.
I know this can be done with 'loop attributes' and macros, but I can't find an example to use as guide.

Any help/guidance would be more than welcome, thanks !!

MartinLiebig · February 2016

Hi,

I think the easiest way to do this is to combine an aggregate with a join. See attached process. Please be aware that his process produces two lines for a customer if there are two best rated products. You can use either remove duplicates or another aggregate to handle this.

~Martin


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.0.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.0.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" breakpoints="after" class="subprocess" compatibility="7.0.000" expanded="true" height="82" name="Subprocess" width="90" x="45" y="136">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="7.0.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="7.0.000" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="7.0.000" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34">
            <list key="function_descriptions">
              <parameter key="ID-user" value="round(rand()*50)"/>
              <parameter key="ID-product" value="round(rand()*10)"/>
              <parameter key="Rating" value="round(rand()*15)"/>
            </list>
          </operator>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Generate a fitting example set</description>
      </operator>
      <operator activated="true" class="multiply" compatibility="7.0.000" expanded="true" height="103" name="Multiply" width="90" x="179" y="136"/>
      <operator activated="true" class="aggregate" compatibility="7.0.000" expanded="true" height="82" name="Aggregate" width="90" x="313" y="85">
        <list key="aggregation_attributes">
          <parameter key="Rating" value="maximum"/>
        </list>
        <parameter key="group_by_attributes" value="ID-user"/>
      </operator>
      <operator activated="true" class="join" compatibility="7.0.000" expanded="true" height="82" name="Join" width="90" x="447" y="136">
        <parameter key="use_id_attribute_as_key" value="false"/>
        <list key="key_attributes">
          <parameter key="ID-user" value="ID-user"/>
          <parameter key="maximum(Rating)" value="Rating"/>
        </list>
      </operator>
      <connect from_op="Subprocess" from_port="out 1" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Aggregate" to_port="example set input"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Join" to_port="right"/>
      <connect from_op="Aggregate" from_port="example set output" to_op="Join" to_port="left"/>
      <connect from_op="Join" from_port="join" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="105"/>
      <portSpacing port="sink_result 2" spacing="42"/>
    </process>
  </operator>
</process>

vabm · February 2016

That's amazing, thank you so much. Is there any good book or tutorial list you could recommend? Sometimes is difficult to understand the documentation.
And how do I mark this thread as solved?
Cheers

MartinLiebig · February 2016

Well, in this special case it is very hard to recommend a book. Joins and Aggregates are standard operations you do in a SQL database. If you are familiar with this way of thinking you can creativly combine them.

In general there are some references like:
https://rapidminer.com/resource/data-mining-masses/ - Very basic

http://www.amazon.com/Exploring-Data-RapidMiner-Andrew-Chisholm/dp/1782169334/ref=sr_1_3?s=books&;ie=UTF8&qid=1454406443&sr=1-3&keywords=rapidminer - A bit more advanced i think, andrew is supporting here in the forums

http://www.amazon.com/Predictive-Analytics-Data-Mining-RapidMiner/dp/0128014601/ref=sr_1_1?s=books&;ie=UTF8&qid=1454406443&sr=1-1&keywords=rapidminer - My favorite if it comes down to learn predictive analytics w/o pure math

I was thinking about putting together some kind of kind with tipps and tricks. I started to do so on my blog. Let's see - maybe i will create some document somewhen soon.

~Martin

P.S: My blog can be found at: http://data-analytics.ghost.io/

vabm · February 2016

Thanks Martin!

iterate and extract according to value

Welcome!

Answers

Welcome!

Welcome!

Quick Links

Categories