"Time series forecast (with Rapid Miner)"

DaiWizard
DaiWizard New Altair Community Member
edited November 5 in Community Q&A
Hi!

I've set up a model exactly as described by Thomas Ott of 'neuralmarkettrends' in videos 8-10 - and it's working well so far.

But what I would still need is the output of the probability for the predicted label (horizon = 1). The model only gives the average values in form of
prediction_trend_accuracy: 0.807 +/- 0.067 (mikro: 0.807).


Thanks for your help !

Answers

  • wessel
    wessel New Altair Community Member
    Hello.

    I'm now using Google to find the video you describe.
    Next time please use a direct link to the video that is of interest.
    Video link:
    https://www.youtube.com/watch?v=UmGIGEJMmN8

    Can you upload your process?

    As far as I understand the process is as follows:
    - Order your data by date
    - Split your data into two parts
    - Use data before date X for training, use data after date X for testing.
    - Features for training use created using windowing
    - SVM is used as learner
    * This process does not deal with horizons very well, neuralmarkettrends1 is aware of this fact, but does not want to complicate his video

    Now to answer your question:
    My suggestion would be to rescale absolute error to fall into range 0 to 1, and use this as a measure of probability.

    This is the best answer I can give right now.
    You need to provide better information to get a better answer.

    Best regards,

    Wessel
  • DaiWizard
    DaiWizard New Altair Community Member
    Thank you wessel for your answer!

    You are right the question was a bit too unprecise, however you got it right that's the way I'm doing it.

    Unfortunately I don't know what to do exactly regarding your answer "Now to answer your question:
    My suggestion would be to rescale absolute error to fall into range 0 to 1, and use this as a measure of probabilit".

    Where do I get the absolute error from ?

    Thank you in advance !
  • wessel
    wessel New Altair Community Member
    Using this process you can define any performance measure you want.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_data" compatibility="5.3.008" expanded="true" height="60" name="Gen TS" width="90" x="45" y="30">
            <parameter key="target_function" value="driller oscillation timeseries"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Create Sum" width="90" x="180" y="30">
            <list key="function_descriptions">
              <parameter key="sum" value="str(11*att1+22*att2+33*att3+44*att4+att5)"/>
            </list>
          </operator>
          <operator activated="true" class="guess_types" compatibility="5.3.008" expanded="true" height="76" name="Guess Types" width="90" x="315" y="30"/>
          <operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Sum" width="90" x="450" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="sum"/>
          </operator>
          <operator activated="true" class="normalize" compatibility="5.3.008" expanded="true" height="94" name="Normalize" width="90" x="585" y="30">
            <parameter key="method" value="range transformation"/>
          </operator>
          <operator activated="true" class="series:windowing" compatibility="5.3.000" expanded="true" height="76" name="Win 3 2" width="90" x="720" y="30">
            <parameter key="window_size" value="3"/>
            <parameter key="create_label" value="true"/>
            <parameter key="label_attribute" value="sum"/>
            <parameter key="horizon" value="2"/>
          </operator>
          <operator activated="true" class="series:predict_series" compatibility="5.3.000" expanded="true" height="60" name="Predict: 22 5 22" width="90" x="45" y="120">
            <parameter key="window_width" value="15"/>
            <parameter key="horizon" value="2"/>
            <parameter key="max_training_set_size" value="15"/>
            <process expanded="true">
              <operator activated="true" class="relevance_vector_machine" compatibility="5.3.008" expanded="true" height="76" name="Relevance Vector Machine" width="90" x="45" y="30"/>
              <connect from_port="window example set" to_op="Relevance Vector Machine" to_port="training set"/>
              <connect from_op="Relevance Vector Machine" from_port="model" to_port="prediction model"/>
              <portSpacing port="source_window example set" spacing="0"/>
              <portSpacing port="sink_prediction model" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="rename" compatibility="5.3.008" expanded="true" height="76" name="Rename" width="90" x="180" y="120">
            <parameter key="old_name" value="prediction(label)"/>
            <parameter key="new_name" value="pred"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Generate Attributes" width="90" x="315" y="120">
            <list key="function_descriptions">
              <parameter key="pred_times_label" value="pred*label"/>
              <parameter key="pred_times_label_greater_0" value="if(pred*label&gt;=0, 1, 0)"/>
              <parameter key="abs_pred_minus_label" value="abs(pred-label)"/>
            </list>
          </operator>
          <operator activated="true" class="extract_performance" compatibility="5.3.008" expanded="true" height="76" name="Performance" width="90" x="469" y="119">
            <parameter key="performance_type" value="statistics"/>
            <parameter key="attribute_name" value="abs_pred_minus_label"/>
          </operator>
          <connect from_op="Gen TS" from_port="output" to_op="Create Sum" to_port="example set input"/>
          <connect from_op="Create Sum" from_port="example set output" to_op="Guess Types" to_port="example set input"/>
          <connect from_op="Guess Types" from_port="example set output" to_op="Select Sum" to_port="example set input"/>
          <connect from_op="Select Sum" from_port="example set output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_op="Win 3 2" to_port="example set input"/>
          <connect from_op="Win 3 2" from_port="example set output" to_op="Predict: 22 5 22" to_port="example set"/>
          <connect from_op="Predict: 22 5 22" from_port="example set" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Performance" to_port="example set"/>
          <connect from_op="Performance" from_port="performance" to_port="result 1"/>
          <connect from_op="Performance" from_port="example set" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • wessel
    wessel New Altair Community Member
    You should get a result looking like this:
    ( I have problems uploading images, will edit this image later, just go into results dataset and plot "predicted" and "label" and maybe "abs_pred_minus_label" ).

    Try figure out why absolute error is different from average(abs_pred_minus_label)
    Also note that I'm not using a fixed split, instead I'm using a sliding window validation, because this is the proper way to validate time series models).


    image


    This XML shows how you can use the Regression Performance Operator.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="subprocess" compatibility="5.3.008" expanded="true" height="76" name="Generate Data (6)" width="90" x="45" y="30">
           <process expanded="true">
             <operator activated="true" class="generate_data" compatibility="5.3.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
               <parameter key="target_function" value="driller oscillation timeseries"/>
               <parameter key="number_examples" value="200"/>
             </operator>
             <operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Generate Sum" width="90" x="180" y="30">
               <list key="function_descriptions">
                 <parameter key="sum" value="str(11*att1+22*att2+33*att3+44*att4+att5)"/>
               </list>
             </operator>
             <operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Sum" width="90" x="319" y="29">
               <parameter key="attribute_filter_type" value="single"/>
               <parameter key="attribute" value="sum"/>
             </operator>
             <operator activated="true" class="parse_numbers" compatibility="5.3.008" expanded="true" height="76" name="Parse Numbers (2)" width="90" x="441" y="26">
               <parameter key="attribute_filter_type" value="single"/>
               <parameter key="attribute" value="sum"/>
             </operator>
             <operator activated="true" class="normalize" compatibility="5.3.008" expanded="true" height="94" name="Normalize" width="90" x="561" y="27">
               <parameter key="method" value="range transformation"/>
             </operator>
             <operator activated="true" class="rename" compatibility="5.3.008" expanded="true" height="76" name="Rename Label" width="90" x="699" y="28">
               <parameter key="old_name" value="sum"/>
               <parameter key="new_name" value="label"/>
               <list key="rename_additional_attributes"/>
             </operator>
             <connect from_op="Generate Data" from_port="output" to_op="Generate Sum" to_port="example set input"/>
             <connect from_op="Generate Sum" from_port="example set output" to_op="Select Sum" to_port="example set input"/>
             <connect from_op="Select Sum" from_port="example set output" to_op="Parse Numbers (2)" to_port="example set input"/>
             <connect from_op="Parse Numbers (2)" from_port="example set output" to_op="Normalize" to_port="example set input"/>
             <connect from_op="Normalize" from_port="example set output" to_op="Rename Label" to_port="example set input"/>
             <connect from_op="Rename Label" from_port="example set output" to_port="out 1"/>
             <portSpacing port="source_in 1" spacing="0"/>
             <portSpacing port="sink_out 1" spacing="0"/>
             <portSpacing port="sink_out 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="series:windowing" compatibility="5.3.000" expanded="true" height="76" name="Win 3 2" width="90" x="187" y="32">
           <parameter key="window_size" value="3"/>
           <parameter key="create_label" value="true"/>
           <parameter key="label_attribute" value="label"/>
           <parameter key="horizon" value="2"/>
         </operator>
         <operator activated="true" class="multiply" compatibility="5.3.008" expanded="true" height="94" name="Multiply" width="90" x="309" y="34"/>
         <operator activated="true" class="series:sliding_window_validation" compatibility="5.3.000" expanded="true" height="112" name="Validation" width="90" x="515" y="30">
           <parameter key="training_window_width" value="15"/>
           <parameter key="test_window_width" value="1"/>
           <parameter key="horizon" value="2"/>
           <parameter key="average_performances_only" value="false"/>
           <process expanded="true">
             <operator activated="true" class="relevance_vector_machine" compatibility="5.3.008" expanded="true" height="76" name="Relevance VM (2)" width="90" x="152" y="50"/>
             <connect from_port="training" to_op="Relevance VM (2)" to_port="training set"/>
             <connect from_op="Relevance VM (2)" from_port="model" to_port="model"/>
             <portSpacing port="source_training" spacing="0"/>
             <portSpacing port="sink_model" spacing="0"/>
             <portSpacing port="sink_through 1" spacing="0"/>
           </process>
           <process expanded="true">
             <operator activated="true" class="apply_model" compatibility="5.3.008" expanded="true" height="76" name="Apply Model" width="90" x="91" y="12">
               <list key="application_parameters"/>
             </operator>
             <operator activated="true" class="performance_regression" compatibility="5.3.008" expanded="true" height="76" name="Performance" width="90" x="282" y="61">
               <parameter key="root_mean_squared_error" value="false"/>
               <parameter key="absolute_error" value="true"/>
             </operator>
             <connect from_port="model" to_op="Apply Model" to_port="model"/>
             <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
             <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
             <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
             <portSpacing port="source_model" spacing="0"/>
             <portSpacing port="source_test set" spacing="0"/>
             <portSpacing port="source_through 1" spacing="0"/>
             <portSpacing port="sink_averagable 1" spacing="0"/>
             <portSpacing port="sink_averagable 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="series:predict_series" compatibility="5.3.000" expanded="true" height="60" name="Predict: 22 5 22" width="90" x="78" y="331">
           <parameter key="window_width" value="15"/>
           <parameter key="horizon" value="2"/>
           <parameter key="max_training_set_size" value="15"/>
           <process expanded="true">
             <operator activated="true" class="relevance_vector_machine" compatibility="5.3.008" expanded="true" height="76" name="Relevance VM" width="90" x="412" y="29"/>
             <connect from_port="window example set" to_op="Relevance VM" to_port="training set"/>
             <connect from_op="Relevance VM" from_port="model" to_port="prediction model"/>
             <portSpacing port="source_window example set" spacing="0"/>
             <portSpacing port="sink_prediction model" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="rename" compatibility="5.3.008" expanded="true" height="76" name="Rename" width="90" x="263" y="330">
           <parameter key="old_name" value="prediction(label)"/>
           <parameter key="new_name" value="pred"/>
           <list key="rename_additional_attributes"/>
         </operator>
         <operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Generate Attributes" width="90" x="439" y="335">
           <list key="function_descriptions">
             <parameter key="abs_pred_minus_label" value="abs(pred-label)"/>
           </list>
         </operator>
         <operator activated="true" class="extract_performance" compatibility="5.3.008" expanded="true" height="76" name="Performance (2)" width="90" x="657" y="349">
           <parameter key="performance_type" value="statistics"/>
           <parameter key="attribute_name" value="abs_pred_minus_label"/>
         </operator>
         <connect from_op="Generate Data (6)" from_port="out 1" to_op="Win 3 2" to_port="example set input"/>
         <connect from_op="Win 3 2" from_port="example set output" to_op="Multiply" to_port="input"/>
         <connect from_op="Multiply" from_port="output 1" to_op="Validation" to_port="training"/>
         <connect from_op="Multiply" from_port="output 2" to_op="Predict: 22 5 22" to_port="example set"/>
         <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
         <connect from_op="Predict: 22 5 22" from_port="example set" to_op="Rename" to_port="example set input"/>
         <connect from_op="Rename" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
         <connect from_op="Generate Attributes" from_port="example set output" to_op="Performance (2)" to_port="example set"/>
         <connect from_op="Performance (2)" from_port="performance" to_port="result 2"/>
         <connect from_op="Performance (2)" from_port="example set" to_port="result 3"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
         <portSpacing port="sink_result 4" spacing="0"/>
       </process>
     </operator>
    </process>
  • DaiWizard
    DaiWizard New Altair Community Member
    Dear Wessel!

    Thank you so much for your answer. Due to  the fact that I'm a beginner I don't know how to import your data as a new operator into my process of video 8 to 10 & I'm not sure at which position of the chain to position this operator then.



    Best regards, Dai Wizard!



  • wessel
    wessel New Altair Community Member
    Click view.

    Create new perspective.

    In show view, tick XML, untick all others.

    In XML tab:
    Paste XML code

    Click green V symbol.

    Return to your standard view.
  • DaiWizard
    DaiWizard New Altair Community Member
    Hi!

    Thank you wessel for your tips but I'm afraid it looks too complicated for me, I think I cannot handle (understand) it completely. Therefore I've created a PDF -  file that you could view using this link:  http://www.professor-heusenstamm.com/model.pdf

    Bild 1 shows my original process, Bild 2 is the content of the validation operator.
    Bild 3 shows the general performance output.

    Bild 4 is my latest progress :-) I've inserted the "Log - Operator" and defined here the values for performance and prediction accuracy.

    Bild 5 shows the result of the latter.

    My question is: Did I insert the Log - operator at the correct position in the process (Bild4) to be sure it delivers the performance of the predicted n+1 value, that's content of "Read Excel (2)" or do I have to rearrange / add something ???

    As usual I'm looking forward to anybodies comments.

  • wessel
    wessel New Altair Community Member
    My process (I call this process not model) looks like this:

    http://i.snag.gy/STABy.jpg

    I used this button to create a new perspective (I named this perspective XML):
    http://i.snag.gy/A53kc.jpg

    So now my screen looks like:
    http://i.snag.gy/6QXgV.jpg

    This is easy for sharing processes.