Specifying Prior Probabilities

tobyb
tobyb New Altair Community Member
edited November 5 in Community Q&A
Is there a way to specify prior probabilities in Rapid Miner?  For example let's say I have a dataset that has 80% of one class and 20% of another class.  A subset is created that has 50% of both classes.  I would like to be able to specify that the prior probabilities were 80% and 20%.
Tagged:

Answers

  • haddock
    haddock New Altair Community Member
    Hi there,

    You could do this by filtering and counting using data macros, but a quick and sneaky fix sometimes has its place, like this...
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="simple non linear classification"/>
        </operator>
        <operator name="EqualLabelWeighting" class="EqualLabelWeighting">
        </operator>
    </operator>
    Good weekend to all!

  • keith
    keith New Altair Community Member
    haddock wrote:

    Hi there,

    You could do this by filtering and counting using data macros, but a quick and sneaky fix sometimes has its place, like this...
    <process omitted>

    I'm probably missing something obvious, but it seems like this is backwards.  The original question was about data with a true (prior) probability of 80/20, but with the minority label oversampled such that the training data was 50/50.  Wouldn't EqualLabelWeighting be more like taking an 80/20 sample to a 50/50 prior?

    Keith


  • haddock
    haddock New Altair Community Member
    Hi Keith,

    Have you not heard? Backwards is the new forwards! Perhaps I should have been more explicit; we can use the fact that we know the number of classes and the 'equal weight' number to keep track of the original distribution. In the binominal case we simply divide 0.5 by the weight to produce the count, like this...
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="simple non linear classification"/>
        </operator>
        <operator name="EqualLabelWeighting" class="EqualLabelWeighting">
        </operator>
        <operator name="AttributeConstruction" class="AttributeConstruction">
            <list key="function_descriptions">
              <parameter key="Count" value="0.5/weight"/>
            </list>
        </operator>
    </operator>