"cluster attribute

dan_agape
dan_agape New Altair Community Member
edited November 2024 in Community Q&A
Hi,

I have got two simple processes - one that builds a clustering model and stores it in the repository

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.0.10" expanded="true" name="Process">
    <process expanded="true" height="404" width="599">
      <operator activated="true" class="generate_data" compatibility="5.0.10" expanded="true" height="60" name="Generate Data (3)" width="90" x="45" y="165">
        <parameter key="number_examples" value="1000"/>
        <parameter key="use_local_random_seed" value="true"/>
        <parameter key="local_random_seed" value="188886"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.0.10" expanded="true" height="76" name="Select Attributes (2)" width="90" x="179" y="165">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="label"/>
        <parameter key="invert_selection" value="true"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="k_means" compatibility="5.0.10" expanded="true" height="76" name="Clustering" width="90" x="313" y="165">
        <parameter key="k" value="3"/>
      </operator>
      <operator activated="true" class="store" compatibility="5.0.10" expanded="true" height="60" name="Store" width="90" x="447" y="165">
        <parameter key="repository_entry" value="../models/tmp_kmeans_mod"/>
      </operator>
      <connect from_op="Generate Data (3)" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Clustering" to_port="example set"/>
      <connect from_op="Clustering" from_port="cluster model" to_op="Store" to_port="input"/>
      <connect from_op="Clustering" from_port="clustered set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="180"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
and the second one that uses this model to cluster a new dataset, and that is looking for explanations/ the profiles of the clusters, via the rules of a decision tree. The process works, but the problem here is that the attribute cluster, although in the metadata, is not seen by the remaining operators (with appropriate errors displayed), and is used anyway.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.0.10" expanded="true" name="Process">
    <process expanded="true" height="404" width="599">
      <operator activated="true" class="generate_data" compatibility="5.0.10" expanded="true" height="60" name="Generate Data (2)" width="90" x="45" y="165">
        <parameter key="number_examples" value="2000"/>
        <parameter key="use_local_random_seed" value="true"/>
        <parameter key="local_random_seed" value="20090"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.0.10" expanded="true" height="76" name="Select Attributes" width="90" x="45" y="255">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="label"/>
        <parameter key="invert_selection" value="true"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="5.0.10" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
        <parameter key="repository_entry" value="../models/tmp_kmeans_mod"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="5.0.10" expanded="true" height="76" name="Apply Model" width="90" x="179" y="165">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="5.0.10" expanded="true" height="76" name="Set Role" width="90" x="313" y="165">
        <parameter key="name" value="cluster"/>
        <parameter key="target_role" value="label"/>
      </operator>
      <operator activated="true" class="decision_tree" compatibility="5.0.10" expanded="true" height="76" name="Decision Tree" width="90" x="447" y="165">
        <parameter key="minimal_gain" value="0.15"/>
        <parameter key="maximal_depth" value="15"/>
      </operator>
      <connect from_op="Generate Data (2)" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Retrieve" from_port="output" to_op="Apply Model" to_port="model"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Decision Tree" to_port="training set"/>
      <connect from_op="Decision Tree" from_port="model" to_port="result 1"/>
      <connect from_op="Decision Tree" from_port="exampleSet" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="126"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="90"/>
    </process>
  </operator>
</process>
Since the model was stored in the repository, perhaps these error messages could have been avoided?
Also Set Role operator does not display the attribute cluster as a choice.

Thanks,
Dan

Answers

  • land
    land New Altair Community Member
    Hi,
    it's possible that there's a meta data rule missing for the cluster model. Would you please be so kind to write a bug report for this?

    Greetings,
      Sebastian