Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

"Prediction accuracy problem"

Ok firstly hi to everyone this is my first post. My problem is to predict stock movements. So firstly I created a spreadsheet in Excel with daily closing prices for 3 stocks and a prediction to learn against which is OUT, LONG & SHORT
I run the prediction and get 85% accuracy
true out true long true short
pred. out 1626 85 73
pred. long 77 660 93
pred. short 62 73 433
class recall 92.12% 80.68% 72.29%
I then run the saved model on 1 year of test data not previously used for the prediction and the result of true vs predicted value is only 53%
What have I done wrong?

I can post xml's and excel sheet if required or answer in more detail if requested

Find more posts tagged with

AI Studio

Performance

Accepted answers

All comments

haddock

Welcome to the whacky world of RM!

Without seeing the XML and data it is almost impossible to give a useful answer; that being said my experience is that, when it comes to financial prediction, the more realistic the setup the lower the accuracy, dammit >:( You can check out globestreetjournal.com to see what I mean.

c1borg

I was going to attach the files but cant work out how to? So here are the 2 xml's

<?xml version="1.0" encoding="windows-1252"?>
<process version="4.4">

<operator name="Root" class="Process" expanded="yes">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="encoding" value="SYSTEM"/>
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Files\Rapidminer system\OS Prediction Daily\GoldOSinput.xls"/>
<parameter key="sheet_number" value="1"/>
<parameter key="row_offset" value="0"/>
<parameter key="column_offset" value="0"/>
<parameter key="first_row_as_names" value="true"/>
<parameter key="create_label" value="true"/>
<parameter key="label_column" value="5"/>
<parameter key="create_id" value="true"/>
<parameter key="id_column" value="1"/>
<parameter key="decimal_point_character" value="."/>
<parameter key="datamanagement" value="double_array"/>
</operator>
<operator name="ExampleVisualizer" class="ExampleVisualizer" breakpoints="after">
</operator>
<operator name="XValidation" class="XValidation" expanded="yes">
<parameter key="keep_example_set" value="false"/>
<parameter key="create_complete_model" value="false"/>
<parameter key="average_performances_only" value="true"/>
<parameter key="leave_one_out" value="false"/>
<parameter key="number_of_validations" value="10"/>
<parameter key="sampling_type" value="stratified sampling"/>
<parameter key="local_random_seed" value="-1"/>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="W-IBk" class="W-IBk">
<parameter key="keep_example_set" value="false"/>
<parameter key="I" value="false"/>
<parameter key="F" value="false"/>
<parameter key="K" value="1.0"/>
<parameter key="E" value="false"/>
<parameter key="W" value="0.0"/>
<parameter key="X" value="false"/>
<parameter key="A" value="weka.core.neighboursearch.LinearNNSearch -A "weka.core.EuclideanDistance -R first-last""/>
</operator>
<operator name="ModelWriter" class="ModelWriter">
<parameter key="model_file" value="C:\Files\Rapidminer system\OS Prediction Daily\OS Prediction Daily.mod"/>
<parameter key="overwrite_existing_file" value="true"/>
<parameter key="output_type" value="XML Zipped"/>
</operator>
</operator>
<operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<parameter key="keep_model" value="false"/>
<list key="application_parameters">
</list>
<parameter key="create_view" value="false"/>
</operator>
<operator name="PerformanceEvaluator" class="PerformanceEvaluator">
<parameter key="keep_example_set" value="false"/>
<parameter key="main_criterion" value="first"/>
<parameter key="root_mean_squared_error" value="false"/>
<parameter key="absolute_error" value="true"/>
<parameter key="relative_error" value="true"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="true"/>
<parameter key="squared_correlation" value="true"/>
<parameter key="prediction_average" value="false"/>
<parameter key="prediction_trend_accuracy" value="false"/>
<parameter key="AUC" value="false"/>
<parameter key="cross-entropy" value="false"/>
<parameter key="margin" value="false"/>
<parameter key="soft_margin_loss" value="false"/>
<parameter key="logistic_loss" value="false"/>
<parameter key="accuracy" value="true"/>
<parameter key="classification_error" value="true"/>
<parameter key="kappa" value="false"/>
<parameter key="weighted_mean_recall" value="false"/>
<parameter key="weighted_mean_precision" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
<list key="class_weights">
</list>
</operator>
</operator>
</operator>
</operator>

</process>

<?xml version="1.0" encoding="windows-1252"?>
<process version="4.4">

<operator name="Root" class="Process" expanded="yes">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="encoding" value="SYSTEM"/>
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Files\Rapidminer system\OS Prediction Daily\GoldOSinput.xls"/>
<parameter key="sheet_number" value="2"/>
<parameter key="row_offset" value="0"/>
<parameter key="column_offset" value="0"/>
<parameter key="first_row_as_names" value="true"/>
<parameter key="create_label" value="false"/>
<parameter key="label_column" value="1"/>
<parameter key="create_id" value="true"/>
<parameter key="id_column" value="1"/>
<parameter key="decimal_point_character" value="."/>
<parameter key="datamanagement" value="double_array"/>
</operator>
<operator name="ModelLoader" class="ModelLoader">
<parameter key="model_file" value="C:\Files\Rapidminer system\OS Prediction Daily\OS Prediction Daily.mod"/>
</operator>
<operator name="ModelApplier" class="ModelApplier">
<parameter key="keep_model" value="false"/>
<list key="application_parameters">
</list>
<parameter key="create_view" value="false"/>
</operator>
</operator>

</process>

If you can point me in the right direction so I can post attachments I will do that.

haddock

Hi,

I'll take a better look tomorrow, but one point is clear, namely that stratified sampling cannot be right for the validation, as you could end up training on the future if you think about it... try sliding window validation instead.

Gottarush, cheers.

c1borg

Ok many thanks if you need the data file let me know but I might have to email you with It cant find the attachment option?

haddock

G'Day c1borg!

No need to send in the data. The core of your problem is that the results of validating your model are so different from the results you get when you apply it to unseen data. Applying the model is fine, so you need to concentrate on the validation end. Validation splits the data into training and test sets, making the model from the former and applying it to the latter. So the key notion is to make sure that this splitting is done sensibly.

For your problem you need to be certain that the training is done on examples that occur before the examples to be tested. If you check out http://en.wikipedia.org/wiki/Stratified_sampling you will see that stratified sampling does not do this. However, sliding a window down your examples ensures that this cannot happen, so that would be a possibility.

Happy mining, and good luck!

c1borg

Ok many thanks will take your advice.