🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Using a Kmeans model (clm file) created w/ a sample to cluster my population"

User: "Keithr"
New Altair Community Member
Updated by Jocelyn
I created a kmeans model with 7 clusters using a sample of 10K records.  I now want to cluster my whole population (~ 1 mill records) and am using the operators ClusterModelReader to read the cluster model I created w. the sample (clm file) and the operator ClusterModel2ExampleSet to cluster the entire population. 

Your description for this operator states "This Operator clusters an exampleset given a cluster model. If an exampleSet does not contain id attributes it is probably not the same as the cluster model has been created on. Since cluster models depend on a static nature of the id attributes, the outcome on another exampleset with different values but same ids will be unpredictable.".  Does this mean that it will only cluster the records that I used to create the model, and will not do any new records? 

The process below finishes correctly but only clustered the records that had been clustered in the sample file.  All other records had a blank cluster # in the output file.

Is there a way to use the model I created to cluster new records or do I have to run the kmeans algorithm on the 1 mill record file and not use the clm file created from the sample data?

Thanks in advance.

Keith


      <operator name="ClusterModelReader" class="ClusterModelReader">
          <description text="The cluster model 8051_Lifestyle_Matches_Excel.clm is the exact model I used for the Excel study so use it to cluster the population"/>
          <parameter key="cluster_model_file" value="C:\Documents and Settings\krobinson\My Documents\rm_workspace\Clustering\8051_Lifestyle_Matches_Excel.clm"/>
      </operator>
      <operator name="ClusterModel2ExampleSet" class="ClusterModel2ExampleSet">
      </operator>
      <operator name="OperatorChain" class="OperatorChain" expanded="yes">
          <operator name="PSVExampleSetWriter" class="CSVExampleSetWriter">
              <parameter key="column_separator" value="|"/>
              <parameter key="csv_file" value="C:\Documents and Settings\krobinson\My Documents\rm_workspace\Clustering\8051_Population_Lifestyle.psv"/>
          </operator>
      </operator>

Find more posts tagged with