De-pivot?

earmijo
earmijo New Altair Community Member
edited November 5 in Community Q&A

Imagine that I have an example set (after using clustering) that looks like this:

 

id,cluster,product1,product2
1,clust0,10.5,20.3
2,clust1,15.75,30.12
3,clust0,5.25,27.2
4,clust1,12.75,8.25
5,clust0,18.05,22.3
6,clust1,1.25,3.75

And I need it in the following format:

id,cluster,value,type
1,clust0,10.5,product1
2,clust1,15.75,product1
3,clust0,5.25,product1
4,clust1,12.75,product1
5,clust0,18.05,product1
6,clust1,1.25,product1
1,clust0,20.3,product2
2,clust1,30.12,product2
3,clust0,27.2,product2
4,clust1,8.25,product2
5,clust0,22.3,product2
6,clust1,3.75,product2

How could I do it? If I didn't have id and cluster I could easily use de-pivot. But I don't know how to do it in the presence of the other two attributes.  

 

 

 

Best Answer

  • YYH
    YYH
    Altair Employee
    Answer ✓

    Hi @earmijo,

    You do not have to get rid of id or cluster columns to have your table de-pivoted,

     

    check out this process:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.4.001" expanded="true" height="68" name="Create Document" width="90" x="112" y="34">
    <parameter key="text" value="id,cluster,product1,product2&#10;1,clust0,10.5,20.3&#10;2,clust1,15.75,30.12&#10;3,clust0,5.25,27.2&#10;4,clust1,12.75,8.25&#10;5,clust0,18.05,22.3&#10;6,clust1,1.25,3.75"/>
    </operator>
    <operator activated="true" class="text:write_document" compatibility="7.4.001" expanded="true" height="82" name="Write Document" width="90" x="246" y="34"/>
    <operator activated="true" class="read_csv" compatibility="7.3.001" expanded="true" height="68" name="Read CSV" width="90" x="380" y="136">
    <parameter key="column_separators" value=","/>
    <list key="annotations"/>
    <list key="data_set_meta_data_information"/>
    </operator>
    <operator activated="true" class="de_pivot" compatibility="7.3.001" expanded="true" height="82" name="De-Pivot" width="90" x="514" y="136">
    <list key="attribute_name">
    <parameter key="value" value="product.*"/>
    </list>
    <parameter key="index_attribute" value="type"/>
    <parameter key="create_nominal_index" value="true"/>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="Write Document" to_port="document"/>
    <connect from_op="Write Document" from_port="file" to_op="Read CSV" to_port="file"/>
    <connect from_op="Read CSV" from_port="output" to_op="De-Pivot" to_port="example set input"/>
    <connect from_op="De-Pivot" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Happy RapidMining!

    YY

Answers

  • Andrew
    Andrew New Altair Community Member

    For fun, I made a process that does what you want but doesn't use de-pivot.  It uses the Loop Attributes operator to iterate over each regular attribute to be selected which allows an example set to be created containing the single regular attribute and all other special ones. Some renaming and generation is then done so that the output from the loop operation is a collection that can be appended together to yield the answer.

     

    Andrew

  • YYH
    YYH
    Altair Employee
    Answer ✓

    Hi @earmijo,

    You do not have to get rid of id or cluster columns to have your table de-pivoted,

     

    check out this process:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.4.001" expanded="true" height="68" name="Create Document" width="90" x="112" y="34">
    <parameter key="text" value="id,cluster,product1,product2&#10;1,clust0,10.5,20.3&#10;2,clust1,15.75,30.12&#10;3,clust0,5.25,27.2&#10;4,clust1,12.75,8.25&#10;5,clust0,18.05,22.3&#10;6,clust1,1.25,3.75"/>
    </operator>
    <operator activated="true" class="text:write_document" compatibility="7.4.001" expanded="true" height="82" name="Write Document" width="90" x="246" y="34"/>
    <operator activated="true" class="read_csv" compatibility="7.3.001" expanded="true" height="68" name="Read CSV" width="90" x="380" y="136">
    <parameter key="column_separators" value=","/>
    <list key="annotations"/>
    <list key="data_set_meta_data_information"/>
    </operator>
    <operator activated="true" class="de_pivot" compatibility="7.3.001" expanded="true" height="82" name="De-Pivot" width="90" x="514" y="136">
    <list key="attribute_name">
    <parameter key="value" value="product.*"/>
    </list>
    <parameter key="index_attribute" value="type"/>
    <parameter key="create_nominal_index" value="true"/>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="Write Document" to_port="document"/>
    <connect from_op="Write Document" from_port="file" to_op="Read CSV" to_port="file"/>
    <connect from_op="Read CSV" from_port="output" to_op="De-Pivot" to_port="example set input"/>
    <connect from_op="De-Pivot" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Happy RapidMining!

    YY

  • earmijo
    earmijo New Altair Community Member

    Thank you very much Andrew for taking the time to play with my problem. It works perfectly.

  • earmijo
    earmijo New Altair Community Member

    Thank you very much Andrew. It works perfectly. 

  • earmijo
    earmijo New Altair Community Member

    Thank you very much yyhuang. It works perfectly in this example. 

    How would I generalize it when the columns instead of being named "product1, product2, product 3..." were named "supermarket,gasoline,hotels, ...."?  In the first case I can use (as you did) "product.*". I tried ".*" in my case and it doesn't work.

     
  • YYH
    YYH
    Altair Employee

    Very good question @earmijo ! Actually you still can use de-pivot for the columns that are not easily selecte by regex. You just need extra step for table join.

    check out this:

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.4.001" expanded="true" height="68" name="Create Document" width="90" x="112" y="34">
    <parameter key="text" value="customer_id,cluster,gas,grocery&#10;11,clust0,10.5,20.3&#10;22,clust1,15.75,30.12&#10;33,clust0,5.25,27.2&#10;44,clust1,12.75,8.25&#10;55,clust0,18.05,22.3&#10;66,clust1,1.25,3.75"/>
    </operator>
    <operator activated="true" class="text:write_document" compatibility="7.4.001" expanded="true" height="82" name="Write Document" width="90" x="246" y="34"/>
    <operator activated="true" class="read_csv" compatibility="7.3.001" expanded="true" height="68" name="Read CSV" width="90" x="380" y="136">
    <parameter key="column_separators" value=","/>
    <list key="annotations"/>
    <list key="data_set_meta_data_information"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="7.3.001" expanded="true" height="82" name="Generate ID" width="90" x="514" y="136"/>
    <operator activated="true" class="select_attributes" compatibility="7.3.001" expanded="true" height="82" name="Select Attributes" width="90" x="648" y="136">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="cluster|customer_id"/>
    <parameter key="invert_selection" value="true"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.3.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="782" y="187">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="cluster|customer_id|id"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="de_pivot" compatibility="7.3.001" expanded="true" height="82" name="De-Pivot" width="90" x="782" y="34">
    <list key="attribute_name">
    <parameter key="value" value="(?!^id$)(^.*$)"/>
    </list>
    <parameter key="index_attribute" value="type"/>
    <parameter key="create_nominal_index" value="true"/>
    </operator>
    <operator activated="true" class="join" compatibility="7.3.001" expanded="true" height="82" name="Join" width="90" x="983" y="136">
    <parameter key="join_type" value="left"/>
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="id" value="id"/>
    </list>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="Write Document" to_port="document"/>
    <connect from_op="Write Document" from_port="file" to_op="Read CSV" to_port="file"/>
    <connect from_op="Read CSV" from_port="output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="De-Pivot" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="original" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="De-Pivot" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Join" from_port="join" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>