Loop data sets and dynamically generated file path

Serek91
Serek91 New Altair Community Member
edited November 2024 in Community Q&A
Hi,

I have subprocess with Write CSV operator. It is multiplied ~70 times. Output file has path like "{category_id}/{set_id}/filename.csv" So I want to have it dynamically generated. Can I create it somehow? Like putting to the subprocess two custom variables and then using it in filepath?

EDIT:
I'm using Loop Datasets operator. But after each iteration I have to somehow obtain index of current iteration and generate filepath...



Process added as attachment.

Best Answer

  • Marco_Boeck
    Marco_Boeck New Altair Community Member
    edited August 2019 Answer ✓
    Hi,

    Just define a macro before the Loop, and then use it inside the Loop and increment it each iteration. Some Loops do it for you, but for your Loop you have to do it yourself. I would also recommend to check out Loop Collection, then you don't have to define 70 connections to the Loop Data Sets operator.. Anyway, It's quite easy, see the little example below:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.4.001-SNAPSHOT">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.4.001-SNAPSHOT" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.4.001-SNAPSHOT" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="set_macro" compatibility="9.4.001-SNAPSHOT" expanded="true" height="82" name="Prepare counter" width="90" x="179" y="34">
            <parameter key="macro" value="i"/>
            <parameter key="value" value="1"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="9.4.001-SNAPSHOT" expanded="true" height="124" name="Multiply" width="90" x="313" y="34"/>
          <operator activated="true" class="loop_data_sets" compatibility="9.4.001-SNAPSHOT" expanded="true" height="124" name="Loop Data Sets" width="90" x="447" y="34">
            <parameter key="only_best" value="false"/>
            <process expanded="true">
              <operator activated="true" class="store" compatibility="9.4.001-SNAPSHOT" expanded="true" height="68" name="Store" width="90" x="179" y="34">
                <parameter key="repository_entry" value="%{i} - myData"/>
              </operator>
              <operator activated="true" class="generate_macro" compatibility="9.4.001-SNAPSHOT" expanded="true" height="82" name="Increment counter" width="90" x="380" y="34">
                <list key="function_descriptions">
                  <parameter key="i" value="eval(%{i})+1"/>
                </list>
              </operator>
              <connect from_port="example set" to_op="Store" to_port="input"/>
              <connect from_op="Store" from_port="through" to_op="Increment counter" to_port="through 1"/>
              <connect from_op="Increment counter" from_port="through 1" to_port="output 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Prepare counter" to_port="through 1"/>
          <connect from_op="Prepare counter" from_port="through 1" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Loop Data Sets" to_port="example set 1"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Loop Data Sets" to_port="example set 2"/>
          <connect from_op="Multiply" from_port="output 3" to_op="Loop Data Sets" to_port="example set 3"/>
          <connect from_op="Loop Data Sets" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    

    Regards,
    Marco

Answers

  • kayman
    kayman New Altair Community Member
    Seems like you need to use a nested loop values operator.
    The first one you use to loop through the category_id's, then you loop through the set_id's, and then you do your logic. You can then save it using the stored macro values for both category and set id. As in attached simplified example

    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="UTF-8"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.3.001" expanded="true" height="68" name="Create ExampleSet" width="90" x="112" y="34">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="category_id,set_id,something&#10;1,1,x&#10;1,1,y&#10;1,2,z&#10;2,1,a&#10;2,2,b&#10;2,2,c&#10;"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="true"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="concurrency:loop_values" compatibility="9.3.001" expanded="true" height="82" name="Loop Values" width="90" x="246" y="34">
            <parameter key="attribute" value="category_id"/>
            <parameter key="iteration_macro" value="cid"/>
            <parameter key="reuse_results" value="false"/>
            <parameter key="enable_parallel_execution" value="false"/>
            <process expanded="true">
              <operator activated="true" class="filter_examples" compatibility="9.3.001" expanded="true" height="103" name="Filter Examples" width="90" x="45" y="34">
                <parameter key="parameter_expression" value=""/>
                <parameter key="condition_class" value="custom_filters"/>
                <parameter key="invert_filter" value="false"/>
                <list key="filters_list">
                  <parameter key="filters_entry_key" value="category_id.equals.%{cid}"/>
                </list>
                <parameter key="filters_logic_and" value="true"/>
                <parameter key="filters_check_metadata" value="true"/>
              </operator>
              <operator activated="true" class="concurrency:loop_values" compatibility="9.3.001" expanded="true" height="82" name="Loop Values (2)" width="90" x="179" y="34">
                <parameter key="attribute" value="set_id"/>
                <parameter key="iteration_macro" value="sid"/>
                <parameter key="reuse_results" value="false"/>
                <parameter key="enable_parallel_execution" value="false"/>
                <process expanded="true">
                  <operator activated="true" class="filter_examples" compatibility="9.3.001" expanded="true" height="103" name="Filter Examples (2)" width="90" x="45" y="34">
                    <parameter key="parameter_expression" value=""/>
                    <parameter key="condition_class" value="custom_filters"/>
                    <parameter key="invert_filter" value="false"/>
                    <list key="filters_list">
                      <parameter key="filters_entry_key" value="set_id.equals.%{sid}"/>
                    </list>
                    <parameter key="filters_logic_and" value="true"/>
                    <parameter key="filters_check_metadata" value="true"/>
                  </operator>
                  <operator activated="true" breakpoints="before" class="write_csv" compatibility="9.3.001" expanded="true" height="82" name="Write CSV" width="90" x="179" y="34">
                    <parameter key="csv_file" value="mypath/%{cid}/%{sid}/filename.csv"/>
                    <parameter key="column_separator" value=";"/>
                    <parameter key="write_attribute_names" value="true"/>
                    <parameter key="quote_nominal_values" value="true"/>
                    <parameter key="format_date_attributes" value="true"/>
                    <parameter key="append_to_file" value="false"/>
                    <parameter key="encoding" value="UTF-8"/>
                  </operator>
                  <connect from_port="input 1" to_op="Filter Examples (2)" to_port="example set input"/>
                  <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Write CSV" to_port="input"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                </process>
              </operator>
              <connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Loop Values (2)" to_port="input 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Loop Values" to_port="input 1"/>
          <connect from_op="Loop Values" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    


  • Serek91
    Serek91 New Altair Community Member
    edited August 2019
    Hi, I modified my previous post.

    Custom values added to the path of csv file are not abtained from example set. It is just an index. I mean something like:


    index = 0;
    exampleSets = [A, B, C, D];
    foreach (exampleSets as exampleSet) {
       ++index;
        path = index . '/example.csv';
    }
  • Marco_Boeck
    Marco_Boeck New Altair Community Member
    edited August 2019 Answer ✓
    Hi,

    Just define a macro before the Loop, and then use it inside the Loop and increment it each iteration. Some Loops do it for you, but for your Loop you have to do it yourself. I would also recommend to check out Loop Collection, then you don't have to define 70 connections to the Loop Data Sets operator.. Anyway, It's quite easy, see the little example below:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.4.001-SNAPSHOT">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.4.001-SNAPSHOT" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.4.001-SNAPSHOT" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="set_macro" compatibility="9.4.001-SNAPSHOT" expanded="true" height="82" name="Prepare counter" width="90" x="179" y="34">
            <parameter key="macro" value="i"/>
            <parameter key="value" value="1"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="9.4.001-SNAPSHOT" expanded="true" height="124" name="Multiply" width="90" x="313" y="34"/>
          <operator activated="true" class="loop_data_sets" compatibility="9.4.001-SNAPSHOT" expanded="true" height="124" name="Loop Data Sets" width="90" x="447" y="34">
            <parameter key="only_best" value="false"/>
            <process expanded="true">
              <operator activated="true" class="store" compatibility="9.4.001-SNAPSHOT" expanded="true" height="68" name="Store" width="90" x="179" y="34">
                <parameter key="repository_entry" value="%{i} - myData"/>
              </operator>
              <operator activated="true" class="generate_macro" compatibility="9.4.001-SNAPSHOT" expanded="true" height="82" name="Increment counter" width="90" x="380" y="34">
                <list key="function_descriptions">
                  <parameter key="i" value="eval(%{i})+1"/>
                </list>
              </operator>
              <connect from_port="example set" to_op="Store" to_port="input"/>
              <connect from_op="Store" from_port="through" to_op="Increment counter" to_port="through 1"/>
              <connect from_op="Increment counter" from_port="through 1" to_port="output 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Prepare counter" to_port="through 1"/>
          <connect from_op="Prepare counter" from_port="through 1" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Loop Data Sets" to_port="example set 1"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Loop Data Sets" to_port="example set 2"/>
          <connect from_op="Multiply" from_port="output 3" to_op="Loop Data Sets" to_port="example set 3"/>
          <connect from_op="Loop Data Sets" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    

    Regards,
    Marco
  • Serek91
    Serek91 New Altair Community Member
    edited August 2019
    Thanks! One last ask, can you check my process now (and sorry for polish descriptions above operators)? I hope that now it is ok...




  • Marco_Boeck
    Marco_Boeck New Altair Community Member
    Hi,

    If you put the input CSV files into one folder, you could use Loop Files and use a single Read CSV instead of multiple, but other than that, the macro thing looks fine.

    Regards,
    Marco