"Time Series Analysis with two sources of data"

drobertson123
drobertson123 New Altair Community Member
edited November 5 in Community Q&A
Hello

I am relatively new to Rapid Miner, but have been doing complex data analysis for years.  I am attempting to get up to speed for a project and hoping someone could send me in the right direction.

My project consists of analysing two time series data files.  The first data file consists of inventory changes and transactions data from multiple sources.  Each time a change happens there is a new data line that contains a snapshot of all the different sources of data.  The data is logged at the moment of the change and is not normalized to any specific time period so the information is irregular or heteroskedastic.

The second file is a timestamped listing of quoted price changes.  Once again this data is spaced irregularly in time (heteroskedastic).

Based on the information I have I would like to determine if I have any ability to predict the following things;

1.  Will there be a quoted price change in the next X period?
2.  What direction/magnitude will the price change be?
3.  Will the price be stable or revert back?

I am not looking for anyone to do my project for me, but I would greatly apreciate someone sending me in the right direction.  Being new to Rapid Miner I am not sure where to find good examples or if this is even possible.  Any advice or links to similar projects would be greatly apreciated.  If I am successful I will happily share back my aproach in case it is useful to others.

Thanks for the help,
Doug Robertson

Answers

  • IngoRM
    IngoRM New Altair Community Member
    Hi Doug,

    good news first: in principal, building models with RapidMiner for those tasks is defintely possible. Of course you are perfectly aware of the fact that this holds only if there is something to be predicted at all. If there is no pattern at all or if the selected modeling schemes do not match them, the results will be bad - but that's always the case (but cannot repeated often enough  ;) )

    Of course I cannot go into too much detail here but here are some general hints:
    • the first taks would of course be to merge both data files in a way that you can identify price changes within time periods. This can be a bit tricky but you should be fine by first aggregating the irregular information to common time periods of a fixed length, do the same with the prices and join both according to those periods
    • define labels (the "target variables" in RapidMiner) according to the three questions stated in your post
    • use a windowing approach to transform the data into a form where you can use the classification and regression schemes provided by RapidMiner (the RapidMiner way of time series forecasting, searching for "windowing" here in the forum probably will deliver many results)
    Hope that gives you at least an idea for which terms you should look at.

    All the best,
    Ingo