Decision Tree Data exploration with numerical value

b00122599
b00122599 New Altair Community Member
edited November 5 in Community Q&A
Hey folks,

I am fairly new to data science but wish to use a deicision tree to explore a dataset. The dataset has no label so I am assigning a label that would be a numerical value of 1-20. Would it be possible to have my label to target only high scorers on that attribute so a the class label would only be those objects which are scored 15 - 20 on the attribute I select as a label? If this make sense would anyone have any ideas of how to do so in rapidminer?

Any help is much appreciated.

Neil. 

Best Answer

  • b00122599
    b00122599 New Altair Community Member
    Answer ✓
    Thanks very much for the pointers guys much appreciated

Answers

  • varunm1
    varunm1 New Altair Community Member
    Hi @b00122599

    Trying to understand what you want, So you are adding a label column whose labels range between 1 and 20 (1,2,3,... 20). But you want to predict only labels between 15 and 20 which you treat as high scores. If you want to apply a decision tree for classification purpose it will train based on all the labels unless you delete unnecessary labels from the data. You can train a model only on labels from 15 to 20 by filtering examples (your model doesn't train on 1 to 14 labeled samples). 
  • Telcontar120
    Telcontar120 New Altair Community Member
    Or perhaps an even better solution would be to discretize your numerical label and turn it into a nominal attribute instead, where values of 15-20 get the class "high" and the others get the class "low."  This can be done with multiple operators in RapidMiner including Discretize by User Specification or Generate Attributes.
    Then you will simply use that as your label and you will have a typical classification problem, which your Decision Tree learner should handle easily.
  • b00122599
    b00122599 New Altair Community Member
    Answer ✓
    Thanks very much for the pointers guys much appreciated