Tomorrow and the day after tomorrow..
walden21
New Altair Community Member
Hi there!
I did validation & test job for stock price forcast as below.
Could you tell me is there anything wrong in my understanding?
(1)Data : I have 780 X 5 data(as like below)
-------------------------------------------------------------------------------
--------------------------------------------------------------------------------
(2)Validation : I trained my PolynomialRegressin model by SlidingWindowValidation.
and wrote this model.
Here's XML for validation
*Here's XML for test
1.Training_Window_Width : 75
2.Training_Window_Step_size : 1
3.Test_window_width : 1
4.Horizon : 1
Did I used "the day after tomorrow's data" to predict "tomorrow' price" in test process even after 75th data?
I did validation & test job for stock price forcast as below.
Could you tell me is there anything wrong in my understanding?
(1)Data : I have 780 X 5 data(as like below)
-------------------------------------------------------------------------------
Date ND_C DJ_C KSP_O PL 2006-03-30 0.13 -0.58 0.09 2.80 2006-03-31 -0.04 -0.37 1.85 1.55 2006-04-03 -0.13 0.32 1.04 1.05 2006-04-04 0.37 0.53 0.67 1.05 ... 2009-06-01 0.06 3.02 2.57 -3.35 |
(2)Validation : I trained my PolynomialRegressin model by SlidingWindowValidation.
and wrote this model.
Here's XML for validation
<operator name="Root" class="Process" expanded="yes">(3)Test : I loaded that model and apply to "SAME" data set that was used in Condition2.
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\NDDJ_3cls.xls"/>
<parameter key="sheet_number" value="2"/>
<parameter key="first_row_as_names" value="true"/>
<parameter key="create_label" value="true"/>
<parameter key="label_column" value="5"/>
<parameter key="create_id" value="true"/>
</operator>
<operator name="ExampleVisualizer" class="ExampleVisualizer">
</operator>
<operator name="SlidingWindowValidation" class="SlidingWindowValidation" expanded="yes">
<parameter key="training_window_width" value="75"/>
<parameter key="training_window_step_size" value="1"/>
<parameter key="test_window_width" value="1"/>
<operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
<operator name="PolynomialRegression" class="PolynomialRegression">
</operator>
<operator name="ModelWriter" class="ModelWriter">
<parameter key="model_file" value="C:\DJ_NN_SW.mod"/>
</operator>
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="Performance" class="Performance">
</operator>
</operator>
</operator>
</operator>
*Here's XML for test
(4)SlidingWidow parameter
<operator name="Root" class="Process" expanded="yes">
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\NDDJ_3cls.xls"/>
<parameter key="sheet_number" value="3"/>
<parameter key="first_row_as_names" value="true"/>
<parameter key="label_column" value="4"/>
<parameter key="create_id" value="true"/>
</operator>
<operator name="ModelLoader" class="ModelLoader">
<parameter key="model_file" value="C:\DJ_NN_SW.mod"/>
</operator>
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
</operator>
1.Training_Window_Width : 75
2.Training_Window_Step_size : 1
3.Test_window_width : 1
4.Horizon : 1
Did I used "the day after tomorrow's data" to predict "tomorrow' price" in test process even after 75th data?
Tagged:
0
Answers
-
Hi there Walden21,
No, but you did use data that comes after the test date to train the model, except on the last test example. The model that gets used in your phase 3 was trained on the last but one set of 75 examples, so if you had 1000 examples it would have been trained on examples 924-999, and validated on example 1000. If you give the examples IDs, and put in breaks before the learn and model applier operators, you can see the point.Did I used "the day after tomorrow's data" to predict "tomorrow' price" in test process even after 75th data?
Hope that doesn't make things more confusing...
0 -
I am not sure what you are actually trying to achieve, but it looks like your processes are not doing what you expect them to do.
First, the ModelWriter will be executed for each iteration of the SlidingWindowValidation and since you are using a constant file name, your model will be overwritten again and again. Your second process will read only the result of the last iteration. To avoid this behaviour, you can use %{a} in the filename to append the iteration number to the filename. In that case, you will end up with several models, so you have to modify your second process.
Apart from that, you are not training on time series because your data contains one entry for each point in time. To transform this series into windows, you can, e.g., use the MultivariateSeries2WindowExamples.
Best,
Simon
0