Time Series Forecasting for many examples

User: "Noel"
New Altair Community Member
Updated by Jocelyn
Hi All-
[Apologies in advance for any confusing or vague language I may use; I'm not a data scientist, so I don't know the proper terminology.]

Say I have a data set of sales volume over time for a retailer that sells screwdrivers. Their product catalog really runs the gamut: flathead, phillips, torx, long, short, every color you can think of, and on and on. If you wanted to forecast demand, you could create a model for one series at a time for each product (e.g. short, yellow, flathead screwdrivers and then medium length, purple, torx drivers with fat handles, etc), or one could aggregate sales for all phillips head screwdrivers or all the different types of screwdrivers in order to collapse them into one series.
For some reason, though, let's say you wanted to use all the data from every type of screwdrivers individually to train a model. For each date, you would have data points for every type of screwdriver in inventory. 

What is the "right way" to represent this in RapidMiner?

@sgenzer@tftemme

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "Telcontar120"
    New Altair Community Member
    Accepted Answer
    If you don't want to aggregate any sales data, then aren't you back to forecasting sales for each individual item (or at whatever level of granularity your data currently exists)?  But I thought that is what you said you did not want to do in the OP.  However, RapidMiner certainly will support it, you will just need to iterate through and provide a different target forecast attribute each time.
    If that isn't correct, you'll probably need to post a sample data file to be more clear on what exactly it is you are trying to accomplish.

    User: "hughesfleming68"
    New Altair Community Member
    Accepted Answer
    Updated by hughesfleming68
    Hi Noel,

    I see what you are trying to do. In most cases simpler is better. Treat each ID as an independent prediction and try and determine which of your attributes actually contains any signal. Select the attribute that you feel is contributing the most and with a series of joins, build a table that consists of your assets and one windowed attribute and run that through your cross validation. A real world example would be using data from sector ETF's to predict overall market direction. Remember to set your cross validation to linear sampling or better still, use a sliding window validation. Also take a look at your normalization. If you normalize first and then combine your assets, you will lose the relationship between them as you put them on the same scale. You might want to do this but there are cases where you might not.

    I am not sure that combining the attributes the way your are suggesting will give you the results you are looking for. Working up from the simplest model is always the best as it is already hard to separate signal from noise.

    Be aware that differentiation in order to achieve a stationary time series may actually result in over differentiation. A partial solution is to use fractional differentiation and the Augmented Dicky-Fuller test and estimate how much differentiation is actually necessary to achieve a stationary time series. This may or may not be necessary but it is worth investigating if it gives you better results. PM me if you would like the Python code to test this. Rather than using ADF tests, I prefer to set a loop of values for the fractional differentiation and see what effects it has on my prediction. Rapidminer is great for this kind of testing.

    Regards,

    Alex