"How to get the last row after windowing"

WinKad
WinKad New Altair Community Member
edited November 5 in Community Q&A
Hi everybody,
I use the following test data [as you can see, all is in matrix form - (row,column)]:
1,1;1,2;1,3;1,4;1,5
2,1;2,2;2,3;2,4;2,5
3,1;3,2;3,3;3,4;3,5
4,1;4,2;4,3;4,4;4,5
5,1;5,2;5,3;5,4;5,5
6,1;6,2;6,3;6,4;6,5
7,1;7,2;7,3;7,4;7,5
8,1;8,2;8,3;8,4;8,5
9,1;9,2;9,3;9,4;9,5
10,1;10,2;10,3;10,4;10,5

After windowing with window size =3 for processing I want to get the last row of the data after windowing with window size = 2 as feed (unlabel data) for the process.

Perhaps is this question posted in another form, I didn't found it.

Here is my code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.0.11" expanded="true" name="Process">
    <process expanded="true" height="521" width="415">
      <operator activated="true" class="read_csv" compatibility="5.0.11" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="file_name" value="D:\Eigene Dateien\Meine Projekte\Lotto\Rapidminer\Test.csv"/>
        <parameter key="encoding" value="windows-1252"/>
        <parameter key="trim_lines" value="true"/>
        <parameter key="use_first_row_as_attribute_names" value="false"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="attribute_0.true.1.regular"/>
          <parameter key="1" value="attribute_1.true.1.regular"/>
          <parameter key="2" value="attribute_2.true.1.regular"/>
          <parameter key="3" value="attribute_3.true.1.regular"/>
          <parameter key="4" value="attribute_4.true.1.regular"/>
        </list>
        <parameter key="attribute_names_already_defined" value="true"/>
      </operator>
      <operator activated="true" class="rename_by_replacing" compatibility="5.0.11" expanded="true" height="76" name="Rename by Replacing" width="90" x="179" y="30">
        <parameter key="replace_what" value="(attribute_)"/>
        <parameter key="replace_by" value="Z"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="5.0.11" expanded="true" height="112" name="Multiply" width="90" x="45" y="120"/>
      <operator activated="true" class="series:windowing" compatibility="5.0.2" expanded="true" height="76" name="Windowing" width="90" x="179" y="165">
        <parameter key="window_size" value="3"/>
      </operator>
      <operator activated="true" class="series:windowing" compatibility="5.0.2" expanded="true" height="76" name="Windowing (2)" width="90" x="179" y="255">
        <parameter key="window_size" value="2"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Rename by Replacing" to_port="example set input"/>
      <connect from_op="Rename by Replacing" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_port="result 1"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Windowing" to_port="example set input"/>
      <connect from_op="Multiply" from_port="output 3" to_op="Windowing (2)" to_port="example set input"/>
      <connect from_op="Windowing" from_port="example set output" to_port="result 2"/>
      <connect from_op="Windowing (2)" from_port="example set output" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="90"/>
      <portSpacing port="sink_result 2" spacing="36"/>
      <portSpacing port="sink_result 3" spacing="162"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>

Is there a special opeator for this?

Answers

  • wessel
    wessel New Altair Community Member
    What you want to achieve with this last row?

    Make a prediction for the last window in your data?
  • WinKad
    WinKad New Altair Community Member
    Hallo wessel,
    yes, that's what I want.
    But I have seen by trying to apply both outputs after windowing - one with window size=3 and the other with window size=2 -  together that RM say NO to this managing. I suppose that there is a problem with the names or/and the order of the columns.
    I have just looked at the output with 9,1 10,1 9,2 10,2 9,3 10,3 ... 9,5 10,5. But that is just what I want to get. Do I have to rename the names of the columns (with a macro-Iterator)?
    Ciao
    Winkad
  • wessel
    wessel New Altair Community Member
    I did something very similar but I was not happy with my solution, so I hope someone else can suggest something better.

    If you have a dataset lets say:
    x
    1
    2
    3
    4
    5
    6
    7
    8
    9

    and you have windowSize = 3, horizon = 2, you get
    x-2 x-1 x-0 label  (where label is x+2)
    1  2  3  5
    2  3  4  6
    3  4  5  7
    4  5  8  9

    so what you want is
    7  8  9  ?

    you can get this by filter example range 7 to 9
    which gives
    x
    7
    8
    9

    if you do windowing on this dataset without a horizon you get
    x-2 x-1 x-0
    7  8  9

    Rapid Miner automatically adds the label attribute, it will give a warning that the label is missing, but it will work.
  • WinKad
    WinKad New Altair Community Member
    Hi everybody,
    oh, what am I stupid. I thought that Filtering by Example Range is meaning the content of the rows...
    Now here is what I found out:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.10" expanded="true" name="Process">
        <process expanded="true" height="386" width="480">
          <operator activated="true" class="subprocess" compatibility="5.0.10" expanded="true" height="130" name="Subprocess" width="90" x="45" y="30">
            <parameter key="parallelize_nested_chain" value="true"/>
            <process expanded="true" height="431" width="567">
              <operator activated="true" class="read_csv" compatibility="5.0.10" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
                <parameter key="file_name" value="D:\Eigene Dateien\Meine Projekte\Lotto\Rapidminer\Test.csv"/>
                <parameter key="encoding" value="windows-1252"/>
                <parameter key="trim_lines" value="true"/>
                <parameter key="use_first_row_as_attribute_names" value="false"/>
                <list key="data_set_meta_data_information">
                  <parameter key="0" value="attribute_0.true.1.regular"/>
                  <parameter key="1" value="attribute_1.true.1.regular"/>
                  <parameter key="2" value="attribute_2.true.1.regular"/>
                  <parameter key="3" value="attribute_3.true.1.regular"/>
                  <parameter key="4" value="attribute_4.true.1.regular"/>
                </list>
                <parameter key="attribute_names_already_defined" value="true"/>
              </operator>
              <operator activated="true" class="rename_by_replacing" compatibility="5.0.11" expanded="true" height="76" name="Rename by Replacing" width="90" x="179" y="30">
                <parameter key="attribute_filter_type" value="subset"/>
                <parameter key="attributes" value="attribute_4|attribute_3|attribute_2|attribute_1|attribute_0"/>
                <parameter key="regular_expression" value="(attribute_)"/>
                <parameter key="replace_what" value="(attribute_)"/>
                <parameter key="replace_by" value="col"/>
              </operator>
              <operator activated="true" class="multiply" compatibility="5.0.11" expanded="true" height="94" name="Multiply" width="90" x="45" y="165"/>
              <operator activated="true" class="series:windowing" compatibility="5.0.2" expanded="true" height="76" name="Windowing2" width="90" x="179" y="255">
                <parameter key="window_size" value="2"/>
              </operator>
              <operator activated="true" class="filter_example_range" compatibility="5.0.11" expanded="true" height="76" name="Filter Example Range" width="90" x="313" y="255">
                <parameter key="first_example" value="4"/>
                <parameter key="last_example" value="4"/>
              </operator>
              <operator activated="true" class="series:windowing" compatibility="5.0.2" expanded="true" height="76" name="Windowing3" width="90" x="179" y="165">
                <parameter key="window_size" value="3"/>
              </operator>
              <operator activated="true" class="set_role" compatibility="5.0.11" expanded="true" height="76" name="Set Role" width="90" x="313" y="165">
                <parameter key="name" value="col0-0"/>
                <parameter key="target_role" value="label"/>
              </operator>
              <operator activated="true" class="naive_bayes" compatibility="5.0.11" expanded="true" height="76" name="Naive Bayes" width="90" x="447" y="165"/>
              <operator activated="true" class="apply_model" compatibility="5.0.11" expanded="true" height="76" name="Apply Model" width="90" x="447" y="255">
                <list key="application_parameters"/>
              </operator>
              <connect from_op="Read CSV" from_port="output" to_op="Rename by Replacing" to_port="example set input"/>
              <connect from_op="Rename by Replacing" from_port="example set output" to_op="Multiply" to_port="input"/>
              <connect from_op="Multiply" from_port="output 1" to_op="Windowing3" to_port="example set input"/>
              <connect from_op="Multiply" from_port="output 2" to_op="Windowing2" to_port="example set input"/>
              <connect from_op="Windowing2" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
              <connect from_op="Filter Example Range" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Windowing3" from_port="example set output" to_op="Set Role" to_port="example set input"/>
              <connect from_op="Set Role" from_port="example set output" to_op="Naive Bayes" to_port="training set"/>
              <connect from_op="Naive Bayes" from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_op="Apply Model" from_port="labelled data" to_port="out 1"/>
              <connect from_op="Apply Model" from_port="model" to_port="out 2"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="234"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <portSpacing port="sink_out 3" spacing="0"/>
              <portSpacing port="sink_out 4" spacing="0"/>
              <portSpacing port="sink_out 5" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Subprocess" from_port="out 1" to_port="result 1"/>
          <connect from_op="Subprocess" from_port="out 2" to_port="result 2"/>
          <connect from_op="Subprocess" from_port="out 3" to_port="result 3"/>
          <connect from_op="Subprocess" from_port="out 4" to_port="result 4"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
          <portSpacing port="sink_result 5" spacing="0"/>
        </process>
      </operator>
    </process>
    The operator 'Naive Bayes' is nonsens here, but I want check if the filtered row would be accepted by the 'Apply Model'-operator.
    But now, what is the meaning of
    PM WARNING: SimpleDistribution: The number of regular attributes of the given example set does not fit the number of attributes of the training example set, training: 14, application: 10
    PM WARNING: SimpleDistribution: The given example set does not contain a regular attribute with name 'col0-2'. This might cause problems for some models depending on this particular attribute.
    ?

  • WinKad
    WinKad New Altair Community Member
    Additional question: how can I get the number of the last row?
  • WinKad
    WinKad New Altair Community Member
    Hi,
    :'( Note: I suppose there is an error!  Let's see...
    Windowing with window size=3 give with an original data set of 2 columns, labeled as C0 and C1, and with the header  (here in Excel notation) :
    C0-2 C0-1 C0-0 C1-2 C1-1 C0-0
    A1    A2  A3    B1    B2    B3
    A2    A3  A4    B2    B3    B4
    A3    A4  A5    B3    B4    B5

    Windowing with window size=2
    C0-1 C0-0 C1-1 C1-0
    A1  A2    B1    B2
    A2  A3    B2    B3
    A3  A4    B3    B4
    A4  A5    B4    B5

    Using this 2 example sets, the second one as unlabeled, with ApplyModel don't match.
    It's a great pity!
    I cannot make head or tail of it. ???

    Ciao
    WinKad