"Financial modeling, with a small twist"
chanelops
New Altair Community Member
Hello all,
I'm developing a forecasting model for financial securities. I've done this many times, but this will be the first time with RM.
Once thing that is a bit unusual is that I want to optimize the model on total trade returns, not accuracy. Since I'm a newbie to RM, it's a bit unclear as to how I do that. I'm thinking that maybe I could use either the Performance operator or the Performance (User-Based) operator.
On the other hand, there may well be a better and easier way to do this. I'm not looking for someone to solve this problem for me, but if anyone has any hints or can provide a gentle nudge in which direction to go, it would be much appreciated.
And if there are any examples out there of someone already having done this, that would be wonderful. I have done some looking, but so far I have not found anything. (I assume that it would have to work with time series, by the way, unless someone can convince me otherwise)
Thanks much.
I'm developing a forecasting model for financial securities. I've done this many times, but this will be the first time with RM.
Once thing that is a bit unusual is that I want to optimize the model on total trade returns, not accuracy. Since I'm a newbie to RM, it's a bit unclear as to how I do that. I'm thinking that maybe I could use either the Performance operator or the Performance (User-Based) operator.
On the other hand, there may well be a better and easier way to do this. I'm not looking for someone to solve this problem for me, but if anyone has any hints or can provide a gentle nudge in which direction to go, it would be much appreciated.
And if there are any examples out there of someone already having done this, that would be wonderful. I have done some looking, but so far I have not found anything. (I assume that it would have to work with time series, by the way, unless someone can convince me otherwise)
Thanks much.
0
Answers
-
Hi,
you could use the "Extract Performance" operator to get a performance value from a data set. If you have an attribute column that gives you the trade return of each example, you can use the "statistics" mode and enter the attribute's name. This way the performance will be the sum of all returns divided by number of trades. This should be optimal for optimizing the inner modelling algorithm.
Greetings,
Sebastian0 -
Hi Sebastian,
Thank you very much for the pointer. Let me get to work on it, and I'll report back.
0 -
I'm slightly stuck, due to a combination of ignorance and inexperience with RM. To implement Sebastian's solution (which makes sense to me), I need to have an attribute that reflects the trade profit or loss for that trade. But, of course, while I can predict with 100% accuracy what the *absolute* value of the trade profit or loss will be, determining whether that value will be a profit (positive) or a loss (negative) depends on the models decision as to how to classify that day's trade -- did it buy or sell?
Would it be possible to use a Generate Attributes operator, and apply an If-then-else function, so depending on whether the model said to sell or buy for that day, then the sign of the trade profit/loss could be determined? If so, can the Generate Attributes operator modify an attribute for a given example depending on what the model's prediction is for that same example? Or would I have to shift the trades by a day, so that if the model says to buy yesterday, then when I see what the closing prices are today, I would write the profit or loss into today's row, instead of yesterday's? Either way would be fine, if it would work.
Or perhaps there is a better and completely different way to do this? All suggestions are appreciated.
0 -
Hello
For fun, I had a go at an artificial example.
Andrew<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.002">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.002" expanded="true" name="Process">
<process expanded="true" height="512" width="955">
<operator activated="true" class="generate_data" compatibility="5.1.002" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="target_function" value="sum"/>
</operator>
<operator activated="true" class="add_noise" compatibility="5.1.002" expanded="true" height="94" name="Add Noise" width="90" x="45" y="120">
<parameter key="label_noise" value="0.0"/>
<parameter key="default_attribute_noise" value="0.05"/>
<list key="noise"/>
</operator>
<operator activated="true" class="linear_regression" compatibility="5.1.002" expanded="true" height="94" name="Linear Regression" width="90" x="45" y="255">
<parameter key="feature_selection" value="none"/>
</operator>
<operator activated="true" class="apply_model" compatibility="5.1.002" expanded="true" height="76" name="Apply Model" width="90" x="246" y="255">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="rename" compatibility="5.1.002" expanded="true" height="76" name="Rename" width="90" x="380" y="255">
<parameter key="old_name" value="prediction(label)"/>
<parameter key="new_name" value="pred"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.1.002" expanded="true" height="76" name="Generate Attributes" width="90" x="514" y="255">
<list key="function_descriptions">
<parameter key="generatedPerformance" value="pred-(att1+att2+att3+att4+att5)"/>
</list>
<parameter key="use_standard_constants" value="false"/>
</operator>
<operator activated="true" class="extract_performance" compatibility="5.1.002" expanded="true" height="76" name="Performance" width="90" x="648" y="255">
<parameter key="performance_type" value="statistics"/>
<parameter key="attribute_name" value="generatedPerformance"/>
<parameter key="optimization_direction" value="minimize"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Add Noise" to_port="example set input"/>
<connect from_op="Add Noise" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Linear Regression" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Performance" to_port="example set"/>
<connect from_op="Performance" from_port="performance" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
Hi there,
If you are trying to predict N time slices ahead you should really take a peep at the 'Sliding Window Validation' operator, which allows you to vary your lookback and prediction horizon.
Once you have your time slices sorted you can measure the difference between Actual and Predicted to give a performance measure for parameter optimisation ( and there will be many to tweak, so have time and pooter poke available in abundance).
As a wealth warning I use Predictions from Rapidminer as inputs to a trading platform, and its strategy optimisation routines. That way I can incorporate cash management rules that are independent of the prediction mechanism, stop loss limits etc.. , and which would be, shall we say, an interesting RM challenge for the unwary!0 -
.awchisholm wrote:
Hello
For fun, I had a go at an artificial example.
Andrew
Hi Andrew,
Many thanks, I'm playing with it now and will get back to you if I have questions. (that should probably be "when", not "if" !)0 -
Mr. Haddock, sir,
Thanks very much for your suggestion. I actually was aware of the "Sliding Window Validation" operator, although I have yet to use it. I thought I would try some concepts on regular systems first, before venturing into the dreaded Series folder. (walk before run and all that, although in my case it's more like slither before crawling)
However, I do know that sooner or later (probably later) I will end up there, and probably wind up living out of that folder. That's the nature of making time series predictions, I guess.
I think my current approach may be similar to yours. I generate my trading signals in external programs, and then bring them into Excel, where I apply liberal doses of stops and/or profit targets, as the case may be. And then there is another external program that does position sizing/money management. While it might be nice to bring much of that into RM and have an integrated system, I doubt that is going to happen in my lifetime. (at least not if I have to do it!)
But my current system is really working OK, I just need to improve the trading signal performance a little, which is where I hope RM will come into play.
I look forward to further discussions with you on how to do this, but I will also try to refrain from being a pest.
P.S. I wasn't sure what you meant by "pooter poke", so I spent some time trying to figure out how to acquire this apparently needed item. I finally did get what you meant (or at least I think I did), but not before I found out that pooter is an "artificial flatulence maker"! See http://thepooter.com/
0 -
Andrew,
As promised, I'm back with questions:
1 - What is the purpose of the "Add Noise" operator here? Just to add some additional randomness to the data? I didn't see it having much effect.
2 - More importantly, shouldn't the "per" output of the Extract Performance operator loop back to provide some feedback and optimization? Or did you leave that as an exercise for the reader? (which is perfectly fine, of course) I just didn't know if I was missing something here.
Thanks for putting this together.
0 -
Hello
The noise was just to make it slightly more realistic so it's not necessary. The performance score is zero if you leave it out because the calculation is based on how the label is calculated in the original data. The feedback part is indeed an exercise for the reader.
regards
Andrew0 -
Andrew,
Got it. Thanks again.
=C=
0