Time Series using Windowing operator in RapidMiner

User: "rainaadi"
New Altair Community Member
Updated by Jocelyn

I'm trying to use a time series model in RapidMiner to forecast premium paid to an insurance company. Specifically, I have an entry for each month from January 2009 - December 2015, I want to be able to forecast the data for the next 12 months (January 2016-December 2016).

I'm having trouble understanding how the Windowing operator works, I have a few questions:

1) What goes into selecting a window size? If I want to forecast Premium over the next 12 months, is my window size 12? And if so, why do I get 12 attributes for each original attribute in my data set (the original Premium amount in one of these 12)? I get that this is supposed to explain the corresponding label value (which is just the next row's original Premium, not sure why this is happening either), but where are these numbers coming from and why does RapidMiner generate these?

2) What does the option "create single attributes" do?

3) The horizon field: If this is the distance between the last window value and the value to predict, does this mean I can't at once predict the next 12 months of data? Even if I enter the horizon as 1 (which I take to mean, give me the prediction for January 2016 since the last data point is for December 2015), then why is there no label value for December 2015 or January 2016 in the output when I run the process?

I'm a beginner, and I would really appreciate any help!

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "Thomas_Ott"
    New Altair Community Member
    Accepted Answer

    Hi Rainaddi,

     

    I'm the author of those old videos and you're right, I didn't explain why I choose the Windowing parameters as I did.

     

    First off, there's another (older) and more detailed explanation of the Windowing operator in our community: http://community.rapidminer.com/t5/RapidMiner-Studio/Prediction-Forecasting-with-RM/td-p/210 check that out too.

     

    Great questions, let me start by prefacing that Series extension is a fantastic for forecasting trend directions and it's decent at doing point forecasts too but in a point forecast is what you're after, I'd mashup the R Forecast` Library in Studio. Pretty easy to do.

     

    Note, a lot of the parameters I chose will typically be a first starting point. I will make a "best guess" and then from there use a Parameter Optimization to vary parameters such as Window Size, Training/Testing Window Width, Step Size, etc. 

     

    I think Simafore's blog said it best, using the Windowing operator is like taking a "cross section of data" in time. You can have multiple attributes (columns) that have different data points to help describe your label (target variable). For example, let's take this simple stock close dataset. It has XOM, FB, and MSFT Closing values. We're interested in forecasting the trend of XOM_CLOSE using it as the Label (target variable) and FB and MSFT closing prices as part of the input. You want to create a multivariate data set to describe the XOM.

     

     

    WindowingExample 1.png

    So how do you use FB_CLOSE and MSFT_CLOSE in your forecast? That's where the Window operator comes in, I want to take that data and make a "window" of  FB/MSFT data points that describe some XOM data point in time. Question is, what size window to use? That's where a bit of domain knowledge comes in and you'll have to make your first "best guess," remembering that you can change the Window size when you use Parameter Optimization later.  

     

    For this argument, let's take a 5 day Window (the trading week is typically 5 days). That is the Window Size.  The Step Size is how far you want to advance the Window. Setting the Step Size also requires a bit of Domain knowledge because you could have be forecasting for Weekly, Quarterly, or Monthly types of data. For our example, we advanced it by 1 (the next day).  

     

    You should see something like this:

     

    WindowingExample 2.png

    The image above is what you should see. I put red boxes on it to illustrate the next point. The red boxes highlight an important concept. In example row 1, the Date-4 column corresponds to the closing price of XOM and MSFT (FB was cut off in screen shot) to XOM_CLOSE-4 and MSFT_CLOSE-4. Likewise in example row 3, Date-3 corresponds to the closing price of XOM and MSFT for XOM_CLOSE-3 and MSFT_CLOSE-3.  Now you have a 5 day Window of data on an example (row) by example (row) basis. This is good but we're not complete yet.

     

    Why is that important to rotate your data series from columns to rows? You could easily just use a simple univariate column and do a Linear Regression on it, which is just fine, but what if you want to use more than one variable and eventually test the performance (ie. the trend accuracy)? For that you have to transform the data set into the above screenshot because it preps it for the Sliding Window Validation operator (the Sliding Window Validation operator is how you backtest your multivariate data series).

     

    Before you can do that, you'll have to Create a Label from your above data set. You have to tell the Windowing operator what column (attribute) should be used to train a model too. There are two main parameters you should use here, the Create a Label toggle and the Horizon parameter. Those parameters will tell RapidMiner which attribute to use for the Label column (XOM_CLOSE) and what value you want to forecast too, in this case it's the value in Jan 6, 2016 for XOM_CLOSE (73.69)

     

    WindowingExample 3.png

    That looks like this:

     

    WindowingExample 4.png

    The next step would be to feed this data into a Sliding Window Validation operator and nest an algorithm in there to back test your assumptions.

     

    Hope this helps.