iterate and extract according to value

vabm
vabm New Altair Community Member
edited November 2024 in Community Q&A
Hi,
I am new to RapidMiner and I am struggling doing some simple iterations. I have a dataset that has different user ratings for a number of products and I need to extract the best product for each user according to these ratings. The file looks something like this:

ID-user,ID-product,Rating
003,040,3
004,330,4
034,330,5
003,032,3
(...)
I can extract the best product for each user using something like: read_csv -> select attribute (set user id) -> sort (best to worst) -> filter examples (index=1), but is really inconvenient if I have a lot of users to process.
I know this can be done with 'loop attributes' and macros, but I can't find an example to use as guide.

Any help/guidance would be more than welcome, thanks !!
Tagged:

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Hi,

    I think the easiest way to do this is to combine an aggregate with a join. See attached process. Please be aware that his process produces two lines for a customer if there are two best rated products. You can use either remove duplicates or another aggregate to handle this.

    ~Martin

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.0.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="7.0.000" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" breakpoints="after" class="subprocess" compatibility="7.0.000" expanded="true" height="82" name="Subprocess" width="90" x="45" y="136">
            <process expanded="true">
              <operator activated="true" class="retrieve" compatibility="7.0.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
                <parameter key="repository_entry" value="//Samples/data/Iris"/>
              </operator>
              <operator activated="true" class="select_attributes" compatibility="7.0.000" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
                <parameter key="invert_selection" value="true"/>
                <parameter key="include_special_attributes" value="true"/>
              </operator>
              <operator activated="true" class="generate_attributes" compatibility="7.0.000" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34">
                <list key="function_descriptions">
                  <parameter key="ID-user" value="round(rand()*50)"/>
                  <parameter key="ID-product" value="round(rand()*10)"/>
                  <parameter key="Rating" value="round(rand()*15)"/>
                </list>
              </operator>
              <connect from_op="Retrieve Iris" from_port="output" to_op="Select Attributes" to_port="example set input"/>
              <connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
              <connect from_op="Generate Attributes" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">Generate a fitting example set</description>
          </operator>
          <operator activated="true" class="multiply" compatibility="7.0.000" expanded="true" height="103" name="Multiply" width="90" x="179" y="136"/>
          <operator activated="true" class="aggregate" compatibility="7.0.000" expanded="true" height="82" name="Aggregate" width="90" x="313" y="85">
            <list key="aggregation_attributes">
              <parameter key="Rating" value="maximum"/>
            </list>
            <parameter key="group_by_attributes" value="ID-user"/>
          </operator>
          <operator activated="true" class="join" compatibility="7.0.000" expanded="true" height="82" name="Join" width="90" x="447" y="136">
            <parameter key="use_id_attribute_as_key" value="false"/>
            <list key="key_attributes">
              <parameter key="ID-user" value="ID-user"/>
              <parameter key="maximum(Rating)" value="Rating"/>
            </list>
          </operator>
          <connect from_op="Subprocess" from_port="out 1" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Join" to_port="right"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Join" from_port="join" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="105"/>
          <portSpacing port="sink_result 2" spacing="42"/>
        </process>
      </operator>
    </process>
  • vabm
    vabm New Altair Community Member
    That's amazing, thank you so much. Is there any good book or tutorial list you could recommend? Sometimes is difficult to understand the documentation.
    And how do I mark this thread as solved?
    Cheers
  • MartinLiebig
    MartinLiebig
    Altair Employee
    Well, in this special case it is very hard to recommend a book. Joins and Aggregates are standard operations you do in a SQL database. If you are familiar with this way of thinking you can creativly combine them.

    In general there are some references like:
    https://rapidminer.com/resource/data-mining-masses/ - Very basic

    http://www.amazon.com/Exploring-Data-RapidMiner-Andrew-Chisholm/dp/1782169334/ref=sr_1_3?s=books&;ie=UTF8&qid=1454406443&sr=1-3&keywords=rapidminer - A bit more advanced i think, andrew is supporting here in the forums

    http://www.amazon.com/Predictive-Analytics-Data-Mining-RapidMiner/dp/0128014601/ref=sr_1_1?s=books&;ie=UTF8&qid=1454406443&sr=1-1&keywords=rapidminer - My favorite if it comes down to learn predictive analytics w/o pure math

    I was thinking about putting together some kind of kind with tipps and tricks. I started to do so on my blog. Let's see - maybe i will create some document somewhen soon.

    ~Martin

    P.S: My blog can be found at: http://data-analytics.ghost.io/
  • vabm
    vabm New Altair Community Member
    Thanks Martin!