Rule Induction Model Results

Panda_Pie
Panda_Pie New Altair Community Member
edited November 5 in Community Q&A
HI

I'm new to Rapid Miner and Data Mining in general. I'm using Rapid Miner 5 and I'm having a problem interpreting the results of the Rule Induction Model. Below is a section of the results


RuleModel
if marital-status = Never-married then <=50K  (1304 / 52)
if education-num = 10.500 and sex = Female and relationship = Unmarried then <=50K  (188 / 7)
if education-num = 10.500 and capital-gain = 4225 and relationship = Not-in-family then <=50K  (267 / 27)
if marital-status = Married-civ-spouse and age > 27.500 then >50K  (47 / 65)

I've highlighted 2 of them to better illustrate what I'm asking...
I'm basically just a little confused about what the numbers actually mean. I've noticed that when the result is <=50K the numbers are always (high/low) (like in the first highlighted result), and when the result is >50K the numbers are always (low/high) (second highlight).

At first I thought that maybe it could mean, for example with the first one, that people who are never married earn <=50K, and this was true for 52 of the 1304 people sampled... But that wouldn't make sense for the second one because the number are switched.

Any clarification on this would be greatly appreciated.

Thanks very much

Noel
Tagged:

Answers

  • land
    land New Altair Community Member
    Hi Noel,
    these numbers indicate the class distribution after applying this rule. So in your case the first number indicates the number of examples belonging to class <=50K and the second to >50K. We don't use the correct/wrong separation here, because in the case you have more than 2 classes this way not all information would be displayable, because sometimes it is important which classes get intermixed with one other.

    Greetings,
      Sebastian