Generate more examples based on our dataset data

mansour_ebrahim
mansour_ebrahim New Altair Community Member
edited November 2024 in Community Q&A
Hi all
I asked the following question but haven't received a piece of good advice so far. Do appreciate if anyone can help me. (I don't want to do UPsampling or SMOT operator).

Is there any operator in RapidMiner to increase the number of example in the dataset? I mean an operator which generate more samples from all groups and increase the total numbers of example in my dataset. I am running a DL model on my dataset but the number of samples is not enough and cannot get more samples and have to generate and produce more samples from all groups. 
Also, I do not want to balance the number of samples in classes; just increasing the size of dataset let's say threefold.
Regards.
Mansour
Tagged:

Answers

  • BalazsBaranyRM
    BalazsBaranyRM New Altair Community Member
    Hi! 

    If you just want to repeat your existing examples, multiply your example set and use Append to append them as many times as you want.

    You can optionally add some noise by randomly changing some attribute values. 

    However, this won't really change your model. You usually can't cheat machine learning algorithms by inventing more data than you actually have.

    Regards,

    Balázs
  • kypexin
    kypexin New Altair Community Member
    Hi @mansour_ebrahim

    For your purpose, you can use SAMPLE (BOOTSTRAPPING) operator which will do exactly what you want - increase number of examples without creating any synthetic examples. But as @BalazsBarany said already, this technique won't have any significant effect on model performance.