Multivariate time series simulation in RapidMiner?
vpmail
New Altair Community Member
I'm actually "Getting started" with RapidMiner (RM). I'm an R expert but totally newbie to RM. The problem involved is analysing and forecasting the dispersion of multivariate timeseries (in finance). The R codes we have are very difficult for economists. I found RM as a real alternative because of its visual "self-documenting" and simplcity in ETL tasks. To be honest, I fitted some models to my data as an experiment , it was also very expressive. What are the experiences of others? Is it possible to SIMULATE a Series model in RM (or in other visual and simple tool)? The Looping operators were not enaugh to do this because (for example in a VAR1 setup) this kind of model simulation requires peturbating the error terms with some noise that affects the following predictions. To be extreme it would be neccesary to modify the VAR1 parameters (to modfy the base cases, e.g. the point prediction's trajectory). It’s possible in R, but not too intiutive, so It would be great implement a this kind of model in RM.
(The problem in short: /multivariate historical data/ -> /few factors/ -> /multivariate timeseries model fitted to these factors with indogeneous and exogeneous variables and lags/ -> /simulated factors/ -> /simulated data/ -> /transform and visualize dispersion of data/)
I know that RM is not well suited for this task by design, it is designed for others. Any case: I'm sure that we could use it much more than an (on site) ETL tool.
Thanks, regards
(The problem in short: /multivariate historical data/ -> /few factors/ -> /multivariate timeseries model fitted to these factors with indogeneous and exogeneous variables and lags/ -> /simulated factors/ -> /simulated data/ -> /transform and visualize dispersion of data/)
I know that RM is not well suited for this task by design, it is designed for others. Any case: I'm sure that we could use it much more than an (on site) ETL tool.
Thanks, regards
Tagged:
0
Answers
-
Hi,
to be honest, I did not completely understand what you want to do. What do you mean by "simulating a model"? Can you please be a bit more specific?
Best,
Marius0 -
Hi Marius,
sorry if I wasn't precise enough.
Let me show an example: an one dimensional AR(1)-process is given by X(t) = constant + A*X(t-1) + error. t is the time index, the constant and A parameters are fitted to data. The error has zero mean and constant variance. For the sake of simplicity assume that the training data is spaced monthly and I want to know the process' one year ahead dispersion (e.g. histogram). In this case I would make many (e.g. ten tausand) 12 step ahead simulation of the process - applying the estimated parameters and error terms drawed with a random number generator - and using the recorded 12th X(t+12) values it is possible to make a histogram. The effect of all former error term realisations are persistent in the trajectory (in one run of the ten tausand) of this process.
Of course I know that in this simplified case there is analytic solution as well but I would like to experiment with more complicated models in a very self explanatory visual way (this for tried RapidMiner).
Hope I was clearer now.
Many thanks,
Peter0 -
Hi Peter,
as you know, in RapidMiner all standard operators are based on example sets and only work on one row (i.e. example) at a time. So if you want to apply standard methods to time series data, you have to encode the values of several points in time into one example and set the label to a future value (in your case t+12). The Windowing operator from the Series extension can do this for you - no need for loops. The window_size specifies how many example of past data are encoded into each example, and the horizon specifies the amount of time to look ahead (12 in your example).
Does this make sense for your task?
Happy mining!
Marius0 -
Hello Marius,
thank you replying. The model is already fitted, I'll paste the xml to the end of this post. The next step is harder: I would make many (e.g. ten tausand) 12 step ahead applying the fitted model. I see Predict Series operator would be elegant but I cannot - if I'm not mistaken - use random numbers as error terms.
It may not possible in RM?
Best regards,
Peter<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="648" width="1036">
<operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve" width="90" x="53" y="419">
<parameter key="repository_entry" value="../data/yc_ts"/>
</operator>
<operator activated="true" class="normalize" compatibility="5.2.008" expanded="true" height="94" name="Normalize" width="90" x="112" y="210"/>
<operator activated="true" class="principal_component_analysis" compatibility="5.2.008" expanded="true" height="94" name="PCA" width="90" x="179" y="75">
<parameter key="dimensionality_reduction" value="fixed number"/>
<parameter key="number_of_components" value="3"/>
</operator>
<operator activated="true" class="series:windowing" compatibility="5.2.000" expanded="true" height="76" name="Windowing" width="90" x="313" y="75">
<parameter key="horizon" value="1"/>
<parameter key="window_size" value="2"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role" width="90" x="447" y="30">
<parameter key="name" value="pc_1-0"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles">
<parameter key="pc_2-0" value="label2"/>
<parameter key="pc_3-0" value="label3"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="514" y="210">
<parameter key="attributes" value="pc_1-0|pc_1-1|Dátum"/>
</operator>
<operator activated="true" class="vector_linear_regression" compatibility="5.2.008" expanded="true" height="76" name="Vector Linear Regression" width="90" x="648" y="165"/>
<connect from_op="Retrieve" from_port="output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="PCA" to_port="example set input"/>
<connect from_op="PCA" from_port="example set output" to_op="Windowing" to_port="example set input"/>
<connect from_op="Windowing" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Vector Linear Regression" to_port="training set"/>
<connect from_op="Vector Linear Regression" from_port="model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
Hi, where do you want to insert the error terms? If I understand you correctly, you want to apply a model and add the error to the prediction? That would, in my opinion, not make much sense. I would rather add noise to the training data, such that your model is not perfect and thus contains some errors because of the non-perfect input data.
However, you can use the Add Noise operator for both tasks: for noise prior to model learning, add the noise to the label, if you really want to add artificial noise to the predictions, do it after application of the Regression model. Which brings us to the next topic:
Simply pass the output of the regression operator to an Apply Model operator. Additionally pass in the test data to that operator. The output will contain the original data plus a prediction attribute with the values estimated by the model.
To know if your chosen learning algorithm is suited for the data, you should evaluate it with the X-Validation. Additionally, for regression tasks I like to visualize the outcome of the model by plotting the prediction versus the true label. If you don't have a dedicated test set, use X-Prediction to avoid applying the model to the training data.
Happy Mining!
Marius0 -
Hi Marius,
I'll do some experiment on this weekend, thanks,
regards,
Peter0