Howto: Classification via Multi response linear regression.

Gabriele
Gabriele New Altair Community Member
edited November 5 in Community Q&A
I have a classification problem. Instead of a model which tells me the most probable class of a sample i would like to have a model which tells me the chances the current sample has of being of each possible class.
This means that the labeled examples should have, instead of a column containing the given prediction, an additional column for each class containing a real value between 0 and 1 which approximate the likeliness the sample has of being of such class.

In the Weka community i'm told this is called "Classification using multi response linear regression" (or at least that MLR is the best or most common way to do that) and that the Weka component ClassificationViaRegression does exactly that but I didn't found such component in RapidMiner even if it does indeed contains most Weka components. I tried several components in RM for linear regression but all requires a single numerical label instead of my available nominal label, most probably because these do not support multi respose.

Could you please tell me:

1- If the above mentioned component or Weka is available in RM and where i can find it.
2- If there are other component which does a similar thing (maybe even better).
3- If there are other approaches to solve the problem in a similar way which you believe may work nicely.
4- In case there aren't component which supports MLR natively, something (a guide, a tutorial, an example) about how to use single response linear regression components to approximate MLR in RapidMiner.

Thank you.

Answers

  • Gabriele
    Gabriele New Altair Community Member
    I haven't understood exactly what "Generate Prediction Ranking" does but i believe that it's not what i'm looking for. The example you posted train a standard decision tree and then create a "confidence" for each class (i admit it's not clear to me how this is done), while i would like to have a model which actually produce "confidence" such that the error is minimized wrt the "confidence", not just the classification error.

    In the example the decision tree somewhat minimize the error over the classification and then the confidence is somehow computed, while the criteria i'm refering to is actually a training heuristics which minimize the error over the "confidence", i.e. should be included in the labeled example "by construction".

  • land
    land New Altair Community Member
    Hi,
    if you apply any classification model inside RapidMiner you will give so called confidence values which express the estimated probability that the example belongs to the referenced class. There's one confidence value for each class and they always sum up to 1.
    The estimation of course depends on the classification model used and might differ from model to model.

    Greetings,
      Sebastian