Hello,
I am struggling with correctly setting up a cost matrix for the MetaCost operator. The documentation on it is quite sparse and even after reading many posts on this forum, I cannot find my answer. I also
Here is the cost matrix for the default tutorial process for the MetaCost operator (distinguishing mines from rocks in the Sonar dataset):
Class 1 is Rock; Class 2 is Mine.
- I assume that the 2.0 and 3.0 are costs (penalties) for misclassification, since they are for wrong predictions. The Matlab instructions say that the true positive (TP) and true negative (TN) diagonal is supposed to be left at 0, but this does not make sense to me if I have benefits. Would they not be negative (opposite of costs) in that case?
Here is my business scenario on an actual (but sample) dataset. A bank is trying to contact customers to offer a financial product. The cost of calling a customer is 5€. If a customer accepts the offer and purchases the product, the bank expects to receive revenues from each customer of 50€. So, the profit from a successful contact is 50€ - 5€ = 45€. The loss for calling a customer who declines is 5€. The bank has data from past customers and wants to create a model that can be used on new customers. The data is quite unbalanced; approximately 9% of customers said yes, and 91% said no. So, I would like to use MetaCost to indicate my priorities to the machine learner. How should I configure MetaCost in such a situation?
Here is what I would think:
That is, with "yes" as the positive class:
- True positive: earns 45€, so cost is -45
- True negative: we spend nothing and gain nothing, so cost is 0
- False positive: we spent 5€ to call a customer but gained nothing, so cost is 5
- False negative: we spent nothing, but missed the opportunity of receiving 45€ profit, so cost is 45
However, when I run my data with that cost matrix, my results are always unsatisfactory. I don't want to get into the details now (though I could if necessary), but when I calculated my total earnings in euro, it is always negative: I always end up losing money. Of course, this has to do with the difficulty of my data, so the learners rarely attain above 55% recall on the "yes" class, but still, I wonder if I am configuring the cost matrix correctly.
So, I would appreciate clear guidance on how to correctly configure the cost matrix.
Regards,
Chitu