Looping Clusters and store them in Repository

New Altair Community Member

Jun 28, 2018

Updated Nov 5, 2024 by Jocelyn

Hi everybody,

My dataset consists 4000 examples, 4 special attributes (ID, cluster, text and outlier), and 570 regular attributes from textprocessing. What I have done with the data so far was only to cluster it. Now I have 37 clusters and I want to store the 1 example set for each cluster in my repository.

Thats where my problem is: I think it should be possible with macros, "loop cluster" - and the "store" -operator, but I cant figure out how to set the parameters right.

I have a snippet attached from the data.

And the XML of my process so far:

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Daten KAM clustered (opt.)" width="90" x="112" y="34">
        <parameter key="repository_entry" value="//Datenbearbeitung MA/Filter Outliers/Daten KAM clustered (opt.)"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="34">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="ID|label|text"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="380" y="34">
        <parameter key="attribute_name" value="label"/>
        <parameter key="target_role" value="cluster"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="loop_clusters" compatibility="8.2.000" expanded="true" height="82" name="Loop Clusters" width="90" x="648" y="34">
        <process expanded="true">
          <operator activated="true" class="filter_examples" compatibility="8.2.000" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="34">
            <list key="filters_list">
              <parameter key="filters_entry_key" value="label.equals.%{myMacro_0}"/>
            </list>
          </operator>
          <operator activated="true" class="store" compatibility="8.2.000" expanded="true" height="68" name="Store" width="90" x="648" y="34">
            <parameter key="repository_entry" value="999TEST"/>
          </operator>
          <connect from_port="cluster subset" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Store" to_port="input"/>
          <connect from_op="Store" from_port="through" to_port="out 1"/>
          <portSpacing port="source_cluster subset" spacing="0"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="set_macros" compatibility="8.2.000" expanded="true" height="68" name="Set Macros" width="90" x="313" y="136">
        <list key="macros">
          <parameter key="myMacro_0" value="&quot;cluster_0&quot;"/>
        </list>
      </operator>
      <connect from_op="Retrieve Daten KAM clustered (opt.)" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Loop Clusters" to_port="example set"/>
      <connect from_op="Loop Clusters" from_port="out 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

My goal is to apply the "Extract Topics from Document (LDA)" operator on every cluster with number of topics = 1 so that I can see the top words for each cluster.

Thank you all in advance

flo

Find more posts tagged with

AI Studio

Clustering

Loops + Branches

🎉Community Raffle - Win $25

Looping Clusters and store them in Repository

Find more posts tagged with

Quick Links