"Which Learning Algorithm to use for probability estimation?"

Ghostrider
Ghostrider New Altair Community Member
edited November 5 in Community Q&A
I have several (around 30) attributes that I want to feed into a learning algorithm.  The attributes are all numeric.  The result that I am after is a probability about whether one event will or will not happen (I'm only trying to predict the probability of one event, not multiple events / classification).  The probability of event has a non-linear dependence on the attributes.  What I mean by this, sometimes a 70% chance of event occurring can be given based upon the conditions of several attributes when taken as a whole.  Sometimes, a 70% chance of event occurring can be inferred based on condition of one attribute in particular.  The example space is huge so a fast algorithm would be preferred.  Can anyone make some recommendations on which learning algorithm to use?  If it's not part of RM, but has an open-source Java library, I'd still consider it.

EDIT/UPDATE: One example of what I am looking for is more commonly known as a probabilistic neural network.  Link: http://www.statsoft.com/textbook/neural-networks/. ; The disadvantage of such a network, however, is that the model stores the training data.  Anyone know of a learning algorithm which outputs probability for each class (in my case, only one...maybe 3 eventually) that does not require storing all training examples?

Answers

  • land
    land New Altair Community Member
    Hi,
    you can use Naive Bayes if you want to have a straight forward probability calculation.

    But I wonder why you have the constraint that the result must be the result of a probability calculation?

    Greetings,
      Sebastian
  • steffen
    steffen New Altair Community Member
    Hello,

    I recommend Logistic Regression since you only have numeric predictors and a binary response variable. It is indeed slower than NaiveBayes, but the output is a generally better approximation to the probability you seek to calculate. NaiveBayes probabilities are not that well calibrated and tend to clump in regions near 0 and 1.
    Regarding general model quality (AUC etc.), logistic regression and naive bayes perform both well.

    greetings,

    steffen