"meaning of sample ratio in ArffExampleSource"

lotusinsnow
lotusinsnow New Altair Community Member
edited November 5 in Community Q&A
Dear all,

I have a very large dataset, so the miner can't finish clustering successfully and also took a long time. I used sample_ratio=0.1 in ArffExampleSource, it executed successfully! Could you please tell me what kind of sampling mechanism that rapidminer is using, so I can have an idea of what the data likes after sampling by sample_ratio?

Many thanks,
Jing

Answers

  • lotusinsnow
    lotusinsnow New Altair Community Member
    I saw the code, and the sample is randomly chosen by the ratio.

    Jing
  • land
    land New Altair Community Member
    Hi Jing,
    You are correct. For more sophisticated sampling algorithms, see the preprocessing/data/sampling group. There we provide operators like kennard-stone sampling, stratifiedSampling. Of course your data has to fit entirly into the memory, in order to sample it with this operators...

    Greetings,
      Sebastian