how many trees and datasets are used to optimize random forest?

IqbalMalikAlfaruq
IqbalMalikAlfaruq New Altair Community Member
edited November 5 in Community Q&A
I'm making predictions that produce fast, medium, and slow predictions. I used 100 trees and around 1000 data training. but always returns fast prediction.

Best Answer

  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓
    Hi!

    Do I understand correctly that you're doing classification and your classes are fast, medium and slow?

    Sometimes datasets are not suitable for a particular machine learning algorithm, or its default parameters. Sometimes they are imbalanced and then the "best" approach for a machine learning algorithm is to predict the majority class.

    Take a look at your data. Is fast overrepresented by a large margin? If it is, can you downsample the class? 
    Do decision trees, naive bayes, k-NN give you the same result or are they better able to cope with the data? 

    There are videos in the RapidMiner Academy for topics like sampling and validation that could help you.

    Regards,
    Balázs

Answers

  • BalazsBarany
    BalazsBarany New Altair Community Member
    Answer ✓
    Hi!

    Do I understand correctly that you're doing classification and your classes are fast, medium and slow?

    Sometimes datasets are not suitable for a particular machine learning algorithm, or its default parameters. Sometimes they are imbalanced and then the "best" approach for a machine learning algorithm is to predict the majority class.

    Take a look at your data. Is fast overrepresented by a large margin? If it is, can you downsample the class? 
    Do decision trees, naive bayes, k-NN give you the same result or are they better able to cope with the data? 

    There are videos in the RapidMiner Academy for topics like sampling and validation that could help you.

    Regards,
    Balázs
  • IqbalMalikAlfaruq
    IqbalMalikAlfaruq New Altair Community Member
    i mean yes, there is 1000 data of fast, 60 data of medium, and 10 data of slow. how can i downsample it? 
    i have tried with those and still give the same result
  • BalazsBarany
    BalazsBarany New Altair Community Member
    Hi,

    the videos here explain how you sample or weight examples for a more balanced dataset:
    https://academy.rapidminer.com/catalog?query=balance

    Your data are massively imbalanced. You could also try other approaches like putting together medium+slow into one class (and then possibly a second model for deciding between those), attribute generation (finding some connections between variables that the models don't find, something like area = length * width) and so on.

    Regards,
    Balázs