Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Creating equally sized clusters that are representative for the population
Kristjan_Mar
Hi all,
I have a set of data (population) with individuals that have signed up to be a part of a group. When they signed up they gave some background information, leaving me with 5 variables that I am mostly focusing on.
What I want to do is create 4 equally sized groups that are as representative for the whole population as possible. That is, I want to create 4 homogenous groups.
Also, I have some other columns in the dataset that are important in handling/using the dataset. I would like this information to be included in each of the groups (subsamples) so that they still match the respondent that they should belong to.
In short: How can I create four homogenous subsamples that are representative of the population, using only selected variables from the dataset?
Cheers, K
Find more posts tagged with
AI Studio
Sampling
Accepted answers
Marco_Barradas
Hi
@Kristjan_Mar
it seems you need to create 4 stratified samples of your data.
For that you need to use the Split Data operator with sampling type stratified.
Hope that helps you.
Telcontar120
I think I am confused about your wording of your intended outcome here---"as representative of the whole population as possible" and "homogeneous" are typically not synonymous. If you want the groups to be as representative of the whole as possible, you basically want random subsets, which you can accomplish easily by Split Data and choosing sampling type of shuffled. You would only need to select the sampling type of stratify if you first choose a nominal attribute as your label to stratify on, and you want to make sure that each resulting partition contains the same proportions of these label classes. I suggest you have a look at the tutorial and help explanation of the Split Data operator. (You can use Select Attributes prior to the split to only bring in the 5 attributes that you are interested in if you only want to look at those).
All comments
Marco_Barradas
Hi
@Kristjan_Mar
it seems you need to create 4 stratified samples of your data.
For that you need to use the Split Data operator with sampling type stratified.
Hope that helps you.
Telcontar120
I think I am confused about your wording of your intended outcome here---"as representative of the whole population as possible" and "homogeneous" are typically not synonymous. If you want the groups to be as representative of the whole as possible, you basically want random subsets, which you can accomplish easily by Split Data and choosing sampling type of shuffled. You would only need to select the sampling type of stratify if you first choose a nominal attribute as your label to stratify on, and you want to make sure that each resulting partition contains the same proportions of these label classes. I suggest you have a look at the tutorial and help explanation of the Split Data operator. (You can use Select Attributes prior to the split to only bring in the 5 attributes that you are interested in if you only want to look at those).
Kristjan_Mar
Thank you
@MarcoBarradas
and
@Telcontar120
!
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups