Cluster Sampling in RapidMiner

StefanRei
StefanRei New Altair Community Member
edited November 5 in Altair RapidMiner
Hi, 

i would like to use the Cluster Sampling Method in RapidMiner (e.g. look at Towardsdatascience Article for Sampling Techniques)

Do you have any suggestions? 

Thank you very much.

Bes

Comments

  • Telcontar120
    Telcontar120 New Altair Community Member
    You'll have to incorporate this via a python script or R script since there is no native RapidMiner operator that implements this particular algorithm.
  • varunm1
    varunm1 New Altair Community Member
    edited July 2019
    Hello @StefanRei

    I am not sure if there is a particular operator in RM to do this. If this is implemented in Python or R, you can use the script operators to embed in the RM process. 

    One disadvantage from my view is that it is selecting entire sampled data from a few clusters which might either over-represent or under-represent the distributions in data. The problem with this is the high variations (low precision) in results. The major advantage is the processing time (fast) as it doesn't go through all the samples in our dataset. If you would like to have more precise results, you can go with stratified sampling.

    Based on the concept, one way to do what you need is by using clustering algorithms to generate clusters and select few clusters from that and test your process and observe how it goes. I didn't try this but got an idea based on the concept.

    Hope this helps.