"Balanced sampling decision trees"

ddr
ddr New Altair Community Member
edited November 2024 in Community Q&A
Hi everyone,

I'm just starting to use rapidminer and I have a problem with decision trees. I am working with a somewhat large dataset (approximately 500.000 cases). I am trying to use decision trees to predict if users are willing or not to buy a product. The problem is that the buying rate is very low 0.5%. When using stratified sampling with a ratio of 50% with the "sample" operator as pointed out somewhere in a similar thread in the forum, my tree is always biased towards the majority class so the results are totally useless. Is there any way I can balance the outcome variable with a rate of 50-50% do the modeling, and then rebalance the samples to their original rates? I have searched over the forum but trying all the answers and searching over many operators in rapidminer didn't gave me any results.

Thanks a lot in advance!

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    The Sample operator can be used to sample the majority class down (i.e. discarding some of the examples) if you use the balance_data option. Then you can specify how many examples of each class you want to use for learning.

    Is that sufficient for you?

    Best regards,
    Marius
  • abbasi_samira
    abbasi_samira New Altair Community Member

    Hello
    How can I equal the number of classes (50 50) for two feature

    please help me

    thanks

     

  • sgenzer
    sgenzer
    Altair Employee
    Answered in the other thread where you posted the same question.

    Scott