"Which Learning Algorithm to use for probability estimation?"

Ghostrider
Ghostrider New Altair Community Member
edited November 2024 in Community Q&A
I have several (around 30) attributes that I want to feed into a learning algorithm.  The attributes are all numeric.  The result that I am after is a probability about whether one event will or will not happen (I'm only trying to predict the probability of one event, not multiple events / classification).  The probability of event has a non-linear dependence on the attributes.  What I mean by this, sometimes a 70% chance of event occurring can be given based upon the conditions of several attributes when taken as a whole.  Sometimes, a 70% chance of event occurring can be inferred based on condition of one attribute in particular.  The example space is huge so a fast algorithm would be preferred.  Can anyone make some recommendations on which learning algorithm to use?  If it's not part of RM, but has an open-source Java library, I'd still consider it.

EDIT/UPDATE: One example of what I am looking for is more commonly known as a probabilistic neural network.  Link: http://www.statsoft.com/textbook/neural-networks/. ; The disadvantage of such a network, however, is that the model stores the training data.  Anyone know of a learning algorithm which outputs probability for each class (in my case, only one...maybe 3 eventually) that does not require storing all training examples?

Answers

  • land
    land New Altair Community Member
    Hi,
    you can use Naive Bayes if you want to have a straight forward probability calculation.

    But I wonder why you have the constraint that the result must be the result of a probability calculation?

    Greetings,
      Sebastian
  • steffen
    steffen New Altair Community Member
    Hello,

    I recommend Logistic Regression since you only have numeric predictors and a binary response variable. It is indeed slower than NaiveBayes, but the output is a generally better approximation to the probability you seek to calculate. NaiveBayes probabilities are not that well calibrated and tend to clump in regions near 0 and 1.
    Regarding general model quality (AUC etc.), logistic regression and naive bayes perform both well.

    greetings,

    steffen

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.