Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
"Prediction accuracy problem"
c1borg
Ok firstly hi to everyone this is my first post. My problem is to predict stock movements. So firstly I created a spreadsheet in Excel with daily closing prices for 3 stocks and a prediction to learn against which is OUT, LONG & SHORT
I run the prediction and get 85% accuracy
true out true long true short
pred. out 1626 85 73
pred. long 77 660 93
pred. short 62 73 433
class recall 92.12% 80.68% 72.29%
I then run the saved model on 1 year of test data not previously used for the prediction and the result of true vs predicted value is only 53%
What have I done wrong?
I can post xml's and excel sheet if required or answer in more detail if requested
Find more posts tagged with
AI Studio
Performance
Accepted answers
All comments
haddock
Welcome to the whacky world of RM!
Without seeing the XML and data it is almost impossible to give a useful answer; that being said my experience is that, when it comes to financial prediction, the more realistic the setup the lower the accuracy, dammit >:( You can check out globestreetjournal.com to see what I mean.
c1borg
I was going to attach the files but cant work out how to? So here are the 2 xml's
<?xml version="1.0" encoding="windows-1252"?>
<process version="4.4">
<operator name="Root" class="Process" expanded="yes">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="encoding" value="SYSTEM"/>
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Files\Rapidminer system\OS Prediction Daily\GoldOSinput.xls"/>
<parameter key="sheet_number" value="1"/>
<parameter key="row_offset" value="0"/>
<parameter key="column_offset" value="0"/>
<parameter key="first_row_as_names" value="true"/>
<parameter key="create_label" value="true"/>
<parameter key="label_column" value="5"/>
<parameter key="create_id" value="true"/>
<parameter key="id_column" value="1"/>
<parameter key="decimal_point_character" value="."/>
<parameter key="datamanagement" value="double_array"/>
</operator>
<operator name="ExampleVisualizer" class="ExampleVisualizer" breakpoints="after">
</operator>
<operator name="XValidation" class="XValidation" expanded="yes">
<parameter key="keep_example_set" value="false"/>
<parameter key="create_complete_model" value="false"/>
<parameter key="average_performances_only" value="true"/>
<parameter key="leave_one_out" value="false"/>
<parameter key="number_of_validations" value="10"/>
<parameter key="sampling_type" value="stratified sampling"/>
<parameter key="local_random_seed" value="-1"/>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="W-IBk" class="W-IBk">
<parameter key="keep_example_set" value="false"/>
<parameter key="I" value="false"/>
<parameter key="F" value="false"/>
<parameter key="K" value="1.0"/>
<parameter key="E" value="false"/>
<parameter key="W" value="0.0"/>
<parameter key="X" value="false"/>
<parameter key="A" value="weka.core.neighboursearch.LinearNNSearch -A "weka.core.EuclideanDistance -R first-last""/>
</operator>
<operator name="ModelWriter" class="ModelWriter">
<parameter key="model_file" value="C:\Files\Rapidminer system\OS Prediction Daily\OS Prediction Daily.mod"/>
<parameter key="overwrite_existing_file" value="true"/>
<parameter key="output_type" value="XML Zipped"/>
</operator>
</operator>
<operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<parameter key="keep_model" value="false"/>
<list key="application_parameters">
</list>
<parameter key="create_view" value="false"/>
</operator>
<operator name="PerformanceEvaluator" class="PerformanceEvaluator">
<parameter key="keep_example_set" value="false"/>
<parameter key="main_criterion" value="first"/>
<parameter key="root_mean_squared_error" value="false"/>
<parameter key="absolute_error" value="true"/>
<parameter key="relative_error" value="true"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="true"/>
<parameter key="squared_correlation" value="true"/>
<parameter key="prediction_average" value="false"/>
<parameter key="prediction_trend_accuracy" value="false"/>
<parameter key="AUC" value="false"/>
<parameter key="cross-entropy" value="false"/>
<parameter key="margin" value="false"/>
<parameter key="soft_margin_loss" value="false"/>
<parameter key="logistic_loss" value="false"/>
<parameter key="accuracy" value="true"/>
<parameter key="classification_error" value="true"/>
<parameter key="kappa" value="false"/>
<parameter key="weighted_mean_recall" value="false"/>
<parameter key="weighted_mean_precision" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
<list key="class_weights">
</list>
</operator>
</operator>
</operator>
</operator>
</process>
<?xml version="1.0" encoding="windows-1252"?>
<process version="4.4">
<operator name="Root" class="Process" expanded="yes">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="encoding" value="SYSTEM"/>
<operator name="ExcelExampleSource" class="ExcelExampleSource">
<parameter key="excel_file" value="C:\Files\Rapidminer system\OS Prediction Daily\GoldOSinput.xls"/>
<parameter key="sheet_number" value="2"/>
<parameter key="row_offset" value="0"/>
<parameter key="column_offset" value="0"/>
<parameter key="first_row_as_names" value="true"/>
<parameter key="create_label" value="false"/>
<parameter key="label_column" value="1"/>
<parameter key="create_id" value="true"/>
<parameter key="id_column" value="1"/>
<parameter key="decimal_point_character" value="."/>
<parameter key="datamanagement" value="double_array"/>
</operator>
<operator name="ModelLoader" class="ModelLoader">
<parameter key="model_file" value="C:\Files\Rapidminer system\OS Prediction Daily\OS Prediction Daily.mod"/>
</operator>
<operator name="ModelApplier" class="ModelApplier">
<parameter key="keep_model" value="false"/>
<list key="application_parameters">
</list>
<parameter key="create_view" value="false"/>
</operator>
</operator>
</process>
If you can point me in the right direction so I can post attachments I will do that.
haddock
Hi,
I'll take a better look tomorrow, but one point is clear, namely that stratified sampling cannot be right for the validation, as you could end up training on the future if you think about it... try sliding window validation instead.
Gottarush, cheers.
c1borg
Ok many thanks if you need the data file let me know but I might have to email you with It cant find the attachment option?
haddock
G'Day c1borg!
No need to send in the data. The core of your problem is that the results of validating your model are so different from the results you get when you apply it to unseen data. Applying the model is fine, so you need to concentrate on the validation end. Validation splits the data into training and test sets, making the model from the former and applying it to the latter. So the key notion is to make sure that this splitting is done sensibly.
For your problem you need to be certain that the training is done on examples that occur
before
the examples to be tested. If you check out
http://en.wikipedia.org/wiki/Stratified_sampling
you will see that stratified sampling does not do this. However, sliding a window down your examples ensures that this cannot happen, so that would be a possibility.
Happy mining, and good luck!
c1borg
Ok many thanks will take your advice.
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups