"Terminology on Data Sampling"

shaihulud
shaihulud New Altair Community Member
edited November 5 in Community Q&A
Hi

simple question:

I have a scenario where i use cluster analysis to sample a set of data into different groups of homogenous entities. Then i extract one entity of each group as a representative. What is the terminology on that? I would call it something like Data Sampling, but googling data sampling wasn much successfull... For example "sampling" (wikipedia) seems to be concentrated on investigation on populations and such.

However, searching in this forum i think that sampling might nevertheless what i am looking for.I would appreciate any help on the terminology and also if somebody could advise some literatur on that topic.

greetings
shai

 

Answers

  • el_chief
    el_chief New Altair Community Member
  • shaihulud
    shaihulud New Altair Community Member
    ive already read about clustersampling,but it seemed to be just a subclassof what i am looking for.first of all because its set on population data differentiating between geografical and such criteria,while my focus is on any kind of data sets including objects with attributes.furthermore i dont think that clusteranalysis is the only technique to group data. what woujd be the supertopic of cluster sampling?
  • el_chief
    el_chief New Altair Community Member
    The word you are looking for may be Stratified or Quota.

    Quota is a subset of Stratified, but it makes sure that the sample proportions are similar to the population proportions of groups.

  • shaihulud
    shaihulud New Altair Community Member
    oki this sounds much better, but why is it always about population???
    Population is not the only kind of data that needs to be analysed.. I am a littled bit puzzled about that..
  • IngoRM
    IngoRM New Altair Community Member
    Another hint might be "prototypes" for each group. At least this is something which can be used in a quite fashion and everybody got the idea and it is frequently used by many clustering people. Another term describing this might be "relevance vector" coming from the Relevance Vector Machine which concentrates on the prototypical points for each class instead of the points describing the borders like it is done by Support Vector Machines.

    Cheers,
    Ingo