Calculating dataset total profit/loss based upon each example
BAMBAMBAM
New Altair Community Member
Hi Everyone,
I think I am trying to solve a reinforcement learning problem using supervised learning, and this is how I'm trying to do it:
I am building a model to recommend one of three actions for each example. The actions are "A", "B", or "Do Nothing". Only one action can be performed for each example.
Each example has a "true action", "profit" amount attribute (numerical) and a "loss" amount attribute (numerical). When the true action is performed, the example will yield its profit amount. When the "Do Nothing" action is performed, the action will yield nothing, and when any other action is performed, the example will yield its loss amount.
I would like to optimize this model so that the entire group's "profit" is maximized, or, failing to optimize it, at least calculate the group's profit after learning has been completed and a test set is evaluated.
To determine the group's profit, we need to sum up the profits for all the model recommendations which were "right", and subtract from that all the losses for which the model recommendation was "wrong". Examples whose recommended action was "Do Nothing" contribute no profit or loss to the total.
However, there is not necessarily a "correct" action for each example, since the correct action for each example really depends on the accuracy of the model. If the model is inaccurate, it is probably better to choose the "Do Nothing" action more frequently, in order to avoid large losses. If the model is accurate, then it is better for the model to choose actions "A" or "B" more frequently.
So my first question is: What are your ideas on how I can use rapidminer to build a model that recommends the best actions?
My second question is more nuts-and-bolts, since I have already tried an approach and need a little help getting over the finish line.
This is what I have done so far: I assigned a numerical label called "ActionNumber" to each example. ActionNumber ranges from -1 to +1. When ActionNumber is close to -1, the true action is "A", the profit is large, and the loss is small. When the ActionNumber is close to 0, the profit is small compared to the loss, and so it is too dangerous to choose either action "A" or action "B" - the best action is to Do Nothing. And when the ActionNumber is close to +1, the true action is "B" and the profit is large and the loss is small.
Then I built a regression model to predict the ActionNumber. It does so seemingly well, but I have really no idea how much profit or loss would be achieved using this model.
I am trying to figure out to how to use "threshold finder" so that profits could be optimized. I will need two thresholds (t1 and t2) - i.e. when p(ActionNumber) < t1 then perform action "A", and when p(ActionNumber) > t2 then perform action "B", otherwise choose action "Do Nothing". I have looked at using MetaCost and CostEvaluator to do this, but they don't allow for different costs (profit and losses) for each example. Perhaps there is some way to do this using Macros?
Any insight you guys might have on solving this problem would be greatly appreciated.
Thanks,
John
I think I am trying to solve a reinforcement learning problem using supervised learning, and this is how I'm trying to do it:
I am building a model to recommend one of three actions for each example. The actions are "A", "B", or "Do Nothing". Only one action can be performed for each example.
Each example has a "true action", "profit" amount attribute (numerical) and a "loss" amount attribute (numerical). When the true action is performed, the example will yield its profit amount. When the "Do Nothing" action is performed, the action will yield nothing, and when any other action is performed, the example will yield its loss amount.
I would like to optimize this model so that the entire group's "profit" is maximized, or, failing to optimize it, at least calculate the group's profit after learning has been completed and a test set is evaluated.
To determine the group's profit, we need to sum up the profits for all the model recommendations which were "right", and subtract from that all the losses for which the model recommendation was "wrong". Examples whose recommended action was "Do Nothing" contribute no profit or loss to the total.
However, there is not necessarily a "correct" action for each example, since the correct action for each example really depends on the accuracy of the model. If the model is inaccurate, it is probably better to choose the "Do Nothing" action more frequently, in order to avoid large losses. If the model is accurate, then it is better for the model to choose actions "A" or "B" more frequently.
So my first question is: What are your ideas on how I can use rapidminer to build a model that recommends the best actions?
My second question is more nuts-and-bolts, since I have already tried an approach and need a little help getting over the finish line.
This is what I have done so far: I assigned a numerical label called "ActionNumber" to each example. ActionNumber ranges from -1 to +1. When ActionNumber is close to -1, the true action is "A", the profit is large, and the loss is small. When the ActionNumber is close to 0, the profit is small compared to the loss, and so it is too dangerous to choose either action "A" or action "B" - the best action is to Do Nothing. And when the ActionNumber is close to +1, the true action is "B" and the profit is large and the loss is small.
Then I built a regression model to predict the ActionNumber. It does so seemingly well, but I have really no idea how much profit or loss would be achieved using this model.
I am trying to figure out to how to use "threshold finder" so that profits could be optimized. I will need two thresholds (t1 and t2) - i.e. when p(ActionNumber) < t1 then perform action "A", and when p(ActionNumber) > t2 then perform action "B", otherwise choose action "Do Nothing". I have looked at using MetaCost and CostEvaluator to do this, but they don't allow for different costs (profit and losses) for each example. Perhaps there is some way to do this using Macros?
Any insight you guys might have on solving this problem would be greatly appreciated.
Thanks,
John
Tagged:
0
Answers
-
Hi John,
if you already have your cost attributes and the prediction of the model, you could use the AttributeConstruction Operator to derive the real costs for each example. Then you could use the Data2Performance Operator to derive the average value of this constructed attribute. It will then be used as PerformanceVector and hence make optimization possible.
Greetings,
Sebastian.0 -
Thanks Sebastian!
I have constructed something like you described, but now the problem is that the AtttributeConstruction operator is in an XValidation loop that is in a YAGGA loop. So I get these errors the second time through the loop:
[Fatal] IllegalArgumentException occured in 2nd application of AttributeConstruction (AttributeConstruction)
[Fatal] Process failed: operator cannot be executed (Duplicate attribute name: ProfitLoss). Check the log messages...
I have tried moving the AttributeConstruction operator out of the loops but the prediction(RRRatio) attribute isn't available there. I know there is a "delete..." operator and am thinking of trying to delete the Attribute at the beginning of each loop iteration, but this seems like the wrong approach. How would you deal with the problem?
Here's the XML:
<operator name="YAGGA" class="YAGGA" expanded="no">
<parameter key="population_size" value="100"/>
<parameter key="maximum_number_of_generations" value="100"/>
<parameter key="generations_without_improval" value="10"/>
<parameter key="keep_best_individual" value="true"/>
<parameter key="p_initialize" value="1.0"/>
<parameter key="use_plus" value="false"/>
<parameter key="use_diff" value="true"/>
<parameter key="use_div" value="true"/>
<operator name="XValidation" class="XValidation" breakpoints="after" expanded="no">
<parameter key="keep_example_set" value="true"/>
<parameter key="create_complete_model" value="true"/>
<parameter key="number_of_validations" value="3"/>
<parameter key="sampling_type" value="shuffled sampling"/>
<operator name="W-REPTree" class="W-REPTree" breakpoints="after">
<parameter key="keep_example_set" value="true"/>
<parameter key="M" value="600.0"/>
</operator>
<operator name="ApplierChain" class="OperatorChain" expanded="yes">
<operator name="Applier" class="ModelApplier">
<parameter key="keep_model" value="true"/>
<list key="application_parameters">
</list>
</operator>
<operator name="ChangeAttributeName" class="ChangeAttributeName">
<parameter key="old_name" value="prediction(RRRatio)"/>
<parameter key="new_name" value="pred"/>
</operator>
<operator name="AttributeConstruction" class="AttributeConstruction">
<list key="function_descriptions">
<parameter key="ProfitLoss" value="if(pred>%{longT}, Rise, if(pred<%{shortT},-Rise, 0), 0)"/>
</list>
<parameter key="use_standard_constants" value="false"/>
</operator>
<operator name="Data2Performance" class="Data2Performance" breakpoints="after">
<parameter key="keep_example_set" value="true"/>
<parameter key="performance_type" value="data_value"/>
<parameter key="attribute_name" value="ProfitLoss"/>
<parameter key="example_index" value="1"/>
</operator>
<operator name="MinMaxWrapper" class="MinMaxWrapper">
<parameter key="minimum_weight" value="0.9"/>
</operator>
</operator>
</operator>
<operator name="ProcessLog" class="ProcessLog">
<list key="log">
<parameter key="generation" value="operator.YAGGA.value.generation"/>
<parameter key="best" value="operator.YAGGA.value.best"/>
<parameter key="len" value="operator.YAGGA.value.best_length"/>
<parameter key="perf" value="operator.YAGGA.value.performance"/>
</list>
</operator>
</operator>
0 -
Hi,
I think you will have to delete it. Otherwise, the attribute will be taken into account by the learner in the second iteration, because it is regular...
But it should be no problem deleting it after the performance measure was calculated...
Greetings,
Sebastian0 -
I think I've determined that constantly deleting & readding the attribute is the reason RapidMiner always runs out of memory. Even using a very small process with LinearRegression, I can get RM to run out of memory in 15mins.
AttributeConstruction is the only operator I've found that allows me to set the value of an attribute using a formula (SetData allows me to change the value to just a fixed constant.) Is there another operator like SetData that allows the use of a formula like AttributeConstruction does?0