[SOLVED] "Hierarchical Classification" operator

mdc
mdc New Altair Community Member
edited November 2024 in Community Q&A
Hello,

I'm trying to do hierarchical classification of documents and I believe the 'hierarchical classification' operator is the way to go as recommended here in the forum. My problem is that I couldn't figure out how to use this operator and what to expect as an output. I couldn't find any example of use in the forum either. Can somebody post a sample process using this operator?

thanks in advance,
Matthew
Tagged:

Answers

  • Andrew2
    Andrew2 New Altair Community Member
    Hello

    Here's an example of a top down clustering. It uses the top clustering operator which itself contains another clustering operator; in this case expectation maximization with k = 2.. By observation this all works something like this. The outer operator invokes the inner which splits the example set into k = 2 clusters. The outer operator then repeats this with the examples from these 2 clusters and the inner operator duly splits these into 2 more clusters. This repeats for the number defined in the max depth parameter for the top down clustering operator. I believe the flatten clusters operator is what is needed to extract a particular clustering and to prove this to myself I added a map labels operator with performance to see how well the clusters map to the ground truth.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.008" expanded="true" height="60" name="Retrieve Iris" width="90" x="112" y="75">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="top_down_clustering" compatibility="5.3.008" expanded="true" height="76" name="Clustering (2)" width="90" x="112" y="165">
            <parameter key="max_depth" value="2"/>
            <process expanded="true">
              <operator activated="true" class="expectation_maximization_clustering" compatibility="5.3.008" expanded="true" height="76" name="Clustering" width="90" x="179" y="75"/>
              <connect from_port="example set" to_op="Clustering" to_port="example set"/>
              <connect from_op="Clustering" from_port="cluster model" to_port="cluster model"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_cluster model" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="flatten_clustering" compatibility="5.3.008" expanded="true" height="76" name="Flatten Clustering" width="90" x="112" y="255"/>
          <operator activated="true" class="map_clustering_on_labels" compatibility="5.3.008" expanded="true" height="76" name="Map Clustering on Labels" width="90" x="380" y="210"/>
          <operator activated="true" class="performance" compatibility="5.3.008" expanded="true" height="76" name="Performance" width="90" x="514" y="75"/>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Clustering (2)" to_port="example set"/>
          <connect from_op="Clustering (2)" from_port="cluster model" to_op="Flatten Clustering" to_port="hierarchical"/>
          <connect from_op="Clustering (2)" from_port="clustered set" to_op="Flatten Clustering" to_port="example set"/>
          <connect from_op="Flatten Clustering" from_port="flat" to_op="Map Clustering on Labels" to_port="cluster model"/>
          <connect from_op="Flatten Clustering" from_port="example set" to_op="Map Clustering on Labels" to_port="example set"/>
          <connect from_op="Map Clustering on Labels" from_port="example set" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Map Clustering on Labels" from_port="cluster model" to_port="result 1"/>
          <connect from_op="Performance" from_port="performance" to_port="result 2"/>
          <connect from_op="Performance" from_port="example set" to_port="result 3"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
        </process>
      </operator>
    </process>
    regards

    Andrew
  • mdc
    mdc New Altair Community Member
    Thanks Andrew for the reply.
    But I'm looking for hierarchical classification, particularly its operator. I have hierarchical labels which I can enter in the operator's table. But other than that I have no idea how to use (expected input and output) it.

    Matthew
  • Andrew2
    Andrew2 New Altair Community Member
    Hello Matthew

    Good point - I didn't pay attention to the question and substituted clustering for classification

    I'm not familiar with hierarchical classification in the context of machine learning but I'm guessing it's something to do with dividing example sets into smaller and smaller pieces based on a rule at each stage. That's sort of what the clustering example is doing with the proviso that the rule is not controllable because it is the same clustering algorithm at all times. It also produces a prediction so it is usable as a classifier - again with one proviso, the label results are not derived from the training data so there would also be ambiguity about the true identify of the clusters.


    regards

    Andrew
  • mdc
    mdc New Altair Community Member

    Hi,

    I created a hierarchical classification a couple of years ago similar to what you described --modelling/applying different set of labels to each divided example set. The set of labels are hierarchical. But since there is this 'Hierarchical Classification' operator, I thought that this could make the process simpler.

    Anyways, if anybody has a sample process please post it or maybe a hint on how it works.  ???

    thanks,
    Matthew
  • mdc
    mdc New Altair Community Member

    Anybody :(, any hint  ??? on how to  use that 'hierarchical classification' operator?
  • MariusHelf
    MariusHelf New Altair Community Member
    The following process performs a hierarchical classification on Iris. You have to define the hierarchy in tabular form, starting from a "root" node.
    Please have a look at the process below and come back with any questions you have.

    Best regards,
    Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.008" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="hierarchical_multi_class_classification" compatibility="5.3.008" expanded="true" height="76" name="Hierarchical Classification" width="90" x="179" y="30">
            <list key="hierarchy">
              <parameter key="versicolor_virginica" value="Iris-versicolor"/>
              <parameter key="versicolor_virginica" value="Iris-virginica"/>
              <parameter key="root" value="Iris-setosa"/>
              <parameter key="root" value="versicolor_virginica"/>
            </list>
            <process expanded="true">
              <operator activated="true" class="support_vector_machine" compatibility="5.3.008" expanded="true" height="112" name="SVM" width="90" x="179" y="30"/>
              <connect from_port="training set" to_op="SVM" to_port="training set"/>
              <connect from_op="SVM" from_port="model" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Hierarchical Classification" to_port="training set"/>
          <connect from_op="Hierarchical Classification" from_port="model" to_port="result 2"/>
          <connect from_op="Hierarchical Classification" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • mdc
    mdc New Altair Community Member

    Thanks Marius. It works but if I apply the model to an exampleset, the result is not showing the hierarchical labels --just the original labels (iris-*). Is there a way to make the prediction use the parent labels too --like another column?

    Matthew

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.008" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="hierarchical_multi_class_classification" compatibility="5.3.008" expanded="true" height="76" name="Hierarchical Classification" width="90" x="179" y="30">
            <list key="hierarchy">
              <parameter key="versicolor_virginica" value="Iris-versicolor"/>
              <parameter key="versicolor_virginica" value="Iris-virginica"/>
              <parameter key="root" value="Iris-setosa"/>
              <parameter key="root" value="versicolor_virginica"/>
            </list>
            <process expanded="true">
              <operator activated="true" class="support_vector_machine" compatibility="5.3.008" expanded="true" height="112" name="SVM" width="90" x="179" y="30"/>
              <connect from_port="training set" to_op="SVM" to_port="training set"/>
              <connect from_op="SVM" from_port="model" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="75">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="label"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.3.008" expanded="true" height="76" name="Apply Model" width="90" x="447" y="30">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Hierarchical Classification" to_port="training set"/>
          <connect from_op="Hierarchical Classification" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Hierarchical Classification" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
          <connect from_op="Apply Model" from_port="model" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>

  • MariusHelf
    MariusHelf New Altair Community Member
    Matthew,

    unfortunately that is not possible with a single operator. It is possible to build a custom process that creates hierarchical labels, but that is way more complex.

    Best regards,
    Marius
  • mdc
    mdc New Altair Community Member

    Thanks. That's good to know.

    Matthew
  • ahkcsit
    ahkcsit New Altair Community Member

    Why do not you view this process graphically! I think it will be much easier than trying to imagine connections in the above code.

  • mattia_fumagall
    mattia_fumagall New Altair Community Member
    Dear All,
    sorry, maybe I am a little bit late, but, please, can you provide some hints on how to to set up the custom process you suggested? I am referring to the following post:
    Matthew,

    unfortunately that is not possible with a single operator. It is possible to build a custom process that creates hierarchical labels, but that is way more complex.

    Best regards,
    Marius
    Thank you in advance!
    Mattia

  • sgenzer
    sgenzer
    Altair Employee
    hi @mattia_fumagall yes this is an OLD thread :smile: Can you please start a new discussion (click the "Ask a Question" button on the top) and describe what exactly you want to do? Using the code above from RapidMiner 5.3 is just not going to get us very far...

    Scott