How to copy rows (samples) based on numerical value of defined attribute?

CausalityvsCorr
CausalityvsCorr New Altair Community Member
edited November 5 in Community Q&A

I have a dataset with a few thousand rows and tens of attributes, where one attribute contains integers between 1 and around 100. It can be treated as a sort of sampling weight. I need to copy each row based on the value in that specific attribute (i.e. from 1  to around hundred times) and to create a new dataset accordingly.

I cannot find any  operator which is dedicated to this kind of task, but I am sure this can be done with RM. But how?

Best Answer

  • Thomas_Ott
    Thomas_Ott New Altair Community Member
    Answer ✓

    @CausalityvsCorr pretty simple to do with loops and macros. Something like this?

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="8.1.000" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
    <parameter key="target_function" value="sum classification"/>
    <parameter key="number_of_attributes" value="10"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="8.1.000" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34"/>
    <operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">
    <parameter key="attribute_name" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="extract_macro" compatibility="8.1.000" expanded="true" height="68" name="Extract Macro" width="90" x="447" y="34">
    <parameter key="macro" value="num"/>
    <list key="additional_macros"/>
    </operator>
    <operator activated="true" class="concurrency:loop" compatibility="8.1.000" expanded="true" height="82" name="Loop" width="90" x="581" y="34">
    <parameter key="number_of_iterations" value="%{num}"/>
    <process expanded="true">
    <operator activated="true" class="filter_examples" compatibility="8.1.000" expanded="true" height="103" name="Filter Examples" width="90" x="112" y="34">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="id.eq.%{iteration}"/>
    </list>
    </operator>
    <operator activated="true" class="store" compatibility="8.1.000" expanded="true" height="68" name="Store" width="90" x="313" y="34">
    <parameter key="repository_entry" value="//Local Repository/data/%{iteration}_data"/>
    </operator>
    <connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="Store" to_port="input"/>
    <connect from_op="Store" from_port="through" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
    <connect from_op="Extract Macro" from_port="example set" to_op="Loop" to_port="input 1"/>
    <connect from_op="Loop" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member
    Answer ✓

    @CausalityvsCorr pretty simple to do with loops and macros. Something like this?

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="8.1.000" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
    <parameter key="target_function" value="sum classification"/>
    <parameter key="number_of_attributes" value="10"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="8.1.000" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34"/>
    <operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">
    <parameter key="attribute_name" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="extract_macro" compatibility="8.1.000" expanded="true" height="68" name="Extract Macro" width="90" x="447" y="34">
    <parameter key="macro" value="num"/>
    <list key="additional_macros"/>
    </operator>
    <operator activated="true" class="concurrency:loop" compatibility="8.1.000" expanded="true" height="82" name="Loop" width="90" x="581" y="34">
    <parameter key="number_of_iterations" value="%{num}"/>
    <process expanded="true">
    <operator activated="true" class="filter_examples" compatibility="8.1.000" expanded="true" height="103" name="Filter Examples" width="90" x="112" y="34">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="id.eq.%{iteration}"/>
    </list>
    </operator>
    <operator activated="true" class="store" compatibility="8.1.000" expanded="true" height="68" name="Store" width="90" x="313" y="34">
    <parameter key="repository_entry" value="//Local Repository/data/%{iteration}_data"/>
    </operator>
    <connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="Store" to_port="input"/>
    <connect from_op="Store" from_port="through" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
    <connect from_op="Extract Macro" from_port="example set" to_op="Loop" to_port="input 1"/>
    <connect from_op="Loop" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • land
    land New Altair Community Member

    Hi,

    depending on why you need the rows copied, you may also avoid the data copy by setting the numerical attribute as weight. Many algorithms support weigthed examples. See their operator capabilities.

     

    Greetings,

     Sebastian

  • u1111082
    u1111082 New Altair Community Member
    edited June 2020
    I'm also trying to copy a row with say an attribute count value of 10 into 10 identical rows, so I can then run the  FG-growth operator. I'm not sure if the solutions above will work in my situation? Appreciate any comments.