Separate dataset attributes

binsetyawan
binsetyawan New Altair Community Member
edited November 2024 in Community Q&A

Hello everyone,

 

for example, i have a dataset with attributes like this

att1 att2 att3 attx atty attz

 

and i want to separate the the dataset into example set with attributes like this

att1 attx atty attz

att2 attx atty attz

att3 attx atty attz

 

i've tried loop attributes and stuck on it

Tagged:

Best Answer

  • Thomas_Ott
    Thomas_Ott New Altair Community Member
    Answer ✓

    AH, then I think you want something like this?

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data_user_specification" compatibility="7.5.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="34">
    <list key="attribute_values">
    <parameter key="att1" value="100"/>
    <parameter key="att2" value="200"/>
    <parameter key="att3" value="300"/>
    <parameter key="attx" value="400"/>
    <parameter key="atty" value="500"/>
    <parameter key="attz" value="600"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="7.5.000" expanded="true" height="103" name="Multiply" width="90" x="179" y="136"/>
    <operator activated="true" class="select_attributes" compatibility="7.5.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="313" y="187">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="att[0-9]"/>
    <parameter key="invert_selection" value="true"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.5.000" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="att[0-9]"/>
    </operator>
    <operator activated="true" class="extract_macro" compatibility="7.5.000" expanded="true" height="68" name="Extract Macro" width="90" x="447" y="34">
    <parameter key="macro" value="num_att"/>
    <parameter key="macro_type" value="number_of_attributes"/>
    <list key="additional_macros"/>
    </operator>
    <operator activated="true" class="concurrency:loop_attributes" compatibility="7.5.000" expanded="true" height="103" name="Loop Attributes" width="90" x="782" y="34">
    <process expanded="true">
    <operator activated="true" class="select_attributes" compatibility="7.5.000" expanded="true" height="82" name="Select Attributes (3)" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="%{loop_attribute}"/>
    </operator>
    <operator activated="true" class="cartesian_product" compatibility="7.5.000" expanded="true" height="82" name="Cartesian" width="90" x="313" y="187"/>
    <connect from_port="input 1" to_op="Select Attributes (3)" to_port="example set input"/>
    <connect from_port="input 2" to_op="Cartesian" to_port="right"/>
    <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Cartesian" to_port="left"/>
    <connect from_op="Cartesian" from_port="join" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="source_input 3" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Generate Data by User Specification" from_port="output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Loop Attributes" to_port="input 2"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
    <connect from_op="Extract Macro" from_port="example set" to_op="Loop Attributes" to_port="input 1"/>
    <connect from_op="Loop Attributes" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    Did you check out this thread, it looks similar to what you want to do and there is a sample process: http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Combining-values-from-common-examples-into-new-attributes/m-p/38375#U38375

  • binsetyawan
    binsetyawan New Altair Community Member

    ah im sorry but i think it's different, its convert 3 row data into one row data, what i need to do is convert a dataset into 3 example set, i tried loop attributes with subset attx atty attz but i confuse to call each attribute (att1, att2, att3)

  • Thomas_Ott
    Thomas_Ott New Altair Community Member
    Answer ✓

    AH, then I think you want something like this?

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data_user_specification" compatibility="7.5.000" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="34">
    <list key="attribute_values">
    <parameter key="att1" value="100"/>
    <parameter key="att2" value="200"/>
    <parameter key="att3" value="300"/>
    <parameter key="attx" value="400"/>
    <parameter key="atty" value="500"/>
    <parameter key="attz" value="600"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="7.5.000" expanded="true" height="103" name="Multiply" width="90" x="179" y="136"/>
    <operator activated="true" class="select_attributes" compatibility="7.5.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="313" y="187">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="att[0-9]"/>
    <parameter key="invert_selection" value="true"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.5.000" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="att[0-9]"/>
    </operator>
    <operator activated="true" class="extract_macro" compatibility="7.5.000" expanded="true" height="68" name="Extract Macro" width="90" x="447" y="34">
    <parameter key="macro" value="num_att"/>
    <parameter key="macro_type" value="number_of_attributes"/>
    <list key="additional_macros"/>
    </operator>
    <operator activated="true" class="concurrency:loop_attributes" compatibility="7.5.000" expanded="true" height="103" name="Loop Attributes" width="90" x="782" y="34">
    <process expanded="true">
    <operator activated="true" class="select_attributes" compatibility="7.5.000" expanded="true" height="82" name="Select Attributes (3)" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="%{loop_attribute}"/>
    </operator>
    <operator activated="true" class="cartesian_product" compatibility="7.5.000" expanded="true" height="82" name="Cartesian" width="90" x="313" y="187"/>
    <connect from_port="input 1" to_op="Select Attributes (3)" to_port="example set input"/>
    <connect from_port="input 2" to_op="Cartesian" to_port="right"/>
    <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Cartesian" to_port="left"/>
    <connect from_op="Cartesian" from_port="join" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="source_input 3" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Generate Data by User Specification" from_port="output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Loop Attributes" to_port="input 2"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
    <connect from_op="Extract Macro" from_port="example set" to_op="Loop Attributes" to_port="input 1"/>
    <connect from_op="Loop Attributes" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • binsetyawan
    binsetyawan New Altair Community Member

    woww great! so much thank you, i'll try it with my data

  • Telcontar120
    Telcontar120 New Altair Community Member

    And here's just a small addition to the @Thomas_Ott process to rename the first attribute to a generic name and append them all back together again into one dataset.

     

     

     

  • binsetyawan
    binsetyawan New Altair Community Member

    when it use cartesian product for join 2 dataset, it comes to duplicate value from another row, so i change the cartesian product with join operator and the result is what i expected, so much thank you sir

  • binsetyawan
    binsetyawan New Altair Community Member

    Can i ask once more? when the core dataset is divide to some example set, i run forecasting with windowing and ann operator for each example set. The result is i get a prediction attribute for each example set. And i want to join all prediction attribute from each example set into a dataset with the number attribute is equal with the number of example set, so i use select attribute operator and set the attribute filter type with regular expression and i fill the regular expression with label so it only appear prediction attribute. at the end to join all example set i used append. And the problem is here, the result apears only 2 attribute, the id and the prediction with the data from other example set is below from before example set, instead of create new attribute of prediction for other example set.

     

    the result is like this

    example set 1 : id prediction att1 attx

    example set 2 : id prediction att2 atty

    joined : id prediction

                  1 example set 1

                  2 example set 2

     

    the result that i want :

    example set 1 : id prediction att1 attx

    example set 2 : id prediction att2 atty

    joined : id predictionexample1 predictionexample2