Hello all,
I'm new to RapidMiner and having a problem with arranging my operators to carry out a univariate time series forecasting. I need some help.
Here, I have a dataset consists of one attribute, i.e. the amount of beer production each month. There are approximately 476 rows in the dataset and each row represents the beer production at 1 month. So I divided this dataset manually to 70% and 30% for training and testing respectively. After that, I prepared the operators in RapidMiner as follows:
- Applying Series2WindowExamples operator in order to apply windowing.
- Let an algorithm (such as NeuralNet, LibSVMLearner, etc) to produce model based on the training data. This is achieved in a cross-validation scheme.
- Thinking that I should get a correct model from the above steps, I load my testing dataset (which is 30% in portion). Then I called the stored model to be applied on that testing data.
My xml code looks more or less like this:
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Documents and Settings\Wendy\Desktop\newelec.xls"/>
<parameter key="sheet_number" value="2"/>
</operator>
<operator name="Series2WindowExamples" class="Series2WindowExamples">
<parameter key="series_representation" value="encode_series_by_examples"/>
<parameter key="window_size" value="10"/>
</operator>
<operator name="XValidation" class="XValidation" expanded="yes">
<parameter key="number_of_validations" value="2"/>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="NeuralNet" class="NeuralNet">
<list key="hidden_layer_types">
</list>
<parameter key="training_cycles" value="1000"/>
<parameter key="learning_rate" value="0.7"/>
<parameter key="momentum" value="0.7"/>
</operator>
<operator name="ModelWriter" class="ModelWriter">
<parameter key="model_file" value="C:\Documents and Settings\Wendy\Desktop\newelec_model.mod"/>
</operator>
</operator>
<operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="Performance" class="Performance">
</operator>
</operator>
</operator>
<operator name="ExcelExampleSource (2)" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Documents and Settings\Wendy\Desktop\newelec.xls"/>
<parameter key="sheet_number" value="3"/>
</operator>
<operator name="ModelLoader" class="ModelLoader">
<parameter key="model_file" value="C:\Documents and Settings\Wendy\Desktop\newelec_model.mod"/>
</operator>
<operator name="ModelApplier (2)" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
</operator>
Now that I come to think of it, I believe that I have arranged the operators wrongly, but I don't know the correct way to do it. Regarding the cross-validation operator, I've read some threads in the forum, and I found out that this could lead into training the model falsely using values that come after the forecast values. Guess I would have to use SlidingWindowValidation instead.
I realized that there's something wrong with my testing part (that begins with ExcelSampleSource, ModelLoader, and ModelApplier). First, I'm supposed to forecast the values for the next 30% of the original dataset, but this testing dataset already contains the actual values. I actually need to compare the forecasting results with these values at the end.
I'm so confused about this. Should I actually not divide my original dataset? How do I let RapidMiner learn the model by feeding the first 70% of the data so that it can produce forecast values of the following 30% of the data?
I'm sorry if there are any unclarity in my writing. I've tried to search the archive on this forum regarding this matter and I still don't understand. I'll be very grateful if anyone could help. (sorry for the very long post ;p)
Thanks in advance,
Wendy