Explicitly listing probabilities in a classification task

Jane
Jane New Altair Community Member
edited November 5 in Community Q&A

I am using RapidMiner with a database of medical information to estimate the probability that a patient will be diagnosed with a certain class of ailment (eg. gastrointestinal, cancer, respiratory) based on their sociodemographic data.  My dataset contains almost one million records, with each record representing a patient.  For each patient and each ailment category, I have the label "true" if the patient has been diagnosed with an ailment in this category, and "false" if they have not been.

What I would like RapidMiner to do, is to learn the classification rules from a training set, and then return the probability that a record belongs to the classification group "true" for each record in the test set.  I have found many useful tools for performing the classification, but I can't find a routine that will tell me the value of P(true) after everything else is said and done.  If anyone has any suggestions about how to do this, I would be very grateful.  Thanks in advance!

-- Jane

Answers

  • TobiasMalbrecht
    TobiasMalbrecht New Altair Community Member
    Hi Jane,

    after having learned the classification model on the training set, you can simply apply it on the data you wish to classify. When you apply the model, two columns are added to the example set which contain (not the probabilities but) the confidences that the examples are of the one or the other class.

    Regards,
    Tobias
  • Jane
    Jane New Altair Community Member
    Hi Tobias,

    Thanks so much for your help!  After viewing your response I was able to find the appropriate columns in my data, it was just what I needed.

    -- Jane