The regression performance measure prediction_trend_accuracy currently compares the correct label and the predicted label to the previous rightmost data point in an example. When using multivariateseries2window (and maybe other cases too) the rightmost point may not be the previous value of the series being predicted.
For example, with a hypothetical 2 attribute example, we start with data (ex. stock market) like this:
price1 volume1
price2 volume2
price3 volume3
price4 volume4
After windowing to predict the next period's price it becomes this (assuming label dimension=0 and window_size=1):
price1 volume1 label1(price2)
price2 volume2 label2(price3)
price3 volume3 label3(price4)
Then the learner adds its predictions:
price1 volume1 label1(price2) pred1
price2 volume2 label2(price3) pred1
price3 volume3 label3(price4) pred1
And finally we evaluate it with prediction_trend_accuracy. The formula, from the source code, would be,
COUNTIF( {(pred1-volume1)*(label1-volume1), (pred1-volume1)*(label1-volume1), (pred1-volume1)*(label1-volume1)}, >=0) / 3
However one would expect it to use this formula,
COUNTIF( {(pred1-price1)*(label1-price1), (pred1-price1)*(label1-price1), (pred1-price1)*(label1-price1)}, >=0) / 3
I recommend at least adding a note in the description since the problem is hard to recognize. Rather than choosing the rightmost attribute you could make the user pick the correct column as a workaround.
Also, in the source code for PredictionTrendAccuracy the comment explaining it is missing some parts of the formula used to calculate the measure. Here is what it says,
This performance measure then calculates the actuals trend between the last time point * in the series (T3 here) and the actual label (L) and compares it to the trend between T3 * and the prediction (P), sums the products between both trends, and divides this sum by the * total number of examples, i.e. [(v4-v3)*(p1-v3)+(v5-v4)*(p2-v4)+...] / 7 in this example. |
To agree with the code, the formula should be, [(if ((v4-v3)*(p1-v3)>=0), 1, 0) + (if ((v5-v4)*(p2-v4)>=0), 1, 0) +...] / 7
I'm just trying to help polish the software, not be picky so don't worry if there is not time to fix this immediately. If either of these two issues are actually me misinterpreting what is supposed to happen, then I'm happy to be corrected. If you'd like a clearer explanation I can try to do that too. Thanks
Regards,
Max