De-pivot?

earmijo · February 2017

Imagine that I have an example set (after using clustering) that looks like this:

id,cluster,product1,product2
1,clust0,10.5,20.3
2,clust1,15.75,30.12
3,clust0,5.25,27.2
4,clust1,12.75,8.25
5,clust0,18.05,22.3
6,clust1,1.25,3.75

And I need it in the following format:

id,cluster,value,type
1,clust0,10.5,product1
2,clust1,15.75,product1
3,clust0,5.25,product1
4,clust1,12.75,product1
5,clust0,18.05,product1
6,clust1,1.25,product1
1,clust0,20.3,product2
2,clust1,30.12,product2
3,clust0,27.2,product2
4,clust1,8.25,product2
5,clust0,22.3,product2
6,clust1,3.75,product2

How could I do it? If I didn't have id and cluster I could easily use de-pivot. But I don't know how to do it in the presence of the other two attributes.

YYH · February 2017

Hi @earmijo,

You do not have to get rid of id or cluster columns to have your table de-pivoted,

check out this process:

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="7.4.001" expanded="true" height="68" name="Create Document" width="90" x="112" y="34">
        <parameter key="text" value="id,cluster,product1,product2&#10;1,clust0,10.5,20.3&#10;2,clust1,15.75,30.12&#10;3,clust0,5.25,27.2&#10;4,clust1,12.75,8.25&#10;5,clust0,18.05,22.3&#10;6,clust1,1.25,3.75"/>
      </operator>
      <operator activated="true" class="text:write_document" compatibility="7.4.001" expanded="true" height="82" name="Write Document" width="90" x="246" y="34"/>
      <operator activated="true" class="read_csv" compatibility="7.3.001" expanded="true" height="68" name="Read CSV" width="90" x="380" y="136">
        <parameter key="column_separators" value=","/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
      </operator>
      <operator activated="true" class="de_pivot" compatibility="7.3.001" expanded="true" height="82" name="De-Pivot" width="90" x="514" y="136">
        <list key="attribute_name">
          <parameter key="value" value="product.*"/>
        </list>
        <parameter key="index_attribute" value="type"/>
        <parameter key="create_nominal_index" value="true"/>
      </operator>
      <connect from_op="Create Document" from_port="output" to_op="Write Document" to_port="document"/>
      <connect from_op="Write Document" from_port="file" to_op="Read CSV" to_port="file"/>
      <connect from_op="Read CSV" from_port="output" to_op="De-Pivot" to_port="example set input"/>
      <connect from_op="De-Pivot" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Happy RapidMining!

YY

Andrew · February 2017

For fun, I made a process that does what you want but doesn't use de-pivot. It uses the Loop Attributes operator to iterate over each regular attribute to be selected which allows an example set to be created containing the single regular attribute and all other special ones. Some renaming and generation is then done so that the output from the loop operation is a collection that can be appended together to yield the answer.

Andrew

YYH · February 2017

Hi @earmijo,

You do not have to get rid of id or cluster columns to have your table de-pivoted,

check out this process:

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="7.4.001" expanded="true" height="68" name="Create Document" width="90" x="112" y="34">
        <parameter key="text" value="id,cluster,product1,product2&#10;1,clust0,10.5,20.3&#10;2,clust1,15.75,30.12&#10;3,clust0,5.25,27.2&#10;4,clust1,12.75,8.25&#10;5,clust0,18.05,22.3&#10;6,clust1,1.25,3.75"/>
      </operator>
      <operator activated="true" class="text:write_document" compatibility="7.4.001" expanded="true" height="82" name="Write Document" width="90" x="246" y="34"/>
      <operator activated="true" class="read_csv" compatibility="7.3.001" expanded="true" height="68" name="Read CSV" width="90" x="380" y="136">
        <parameter key="column_separators" value=","/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
      </operator>
      <operator activated="true" class="de_pivot" compatibility="7.3.001" expanded="true" height="82" name="De-Pivot" width="90" x="514" y="136">
        <list key="attribute_name">
          <parameter key="value" value="product.*"/>
        </list>
        <parameter key="index_attribute" value="type"/>
        <parameter key="create_nominal_index" value="true"/>
      </operator>
      <connect from_op="Create Document" from_port="output" to_op="Write Document" to_port="document"/>
      <connect from_op="Write Document" from_port="file" to_op="Read CSV" to_port="file"/>
      <connect from_op="Read CSV" from_port="output" to_op="De-Pivot" to_port="example set input"/>
      <connect from_op="De-Pivot" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Happy RapidMining!

YY

earmijo · February 2017

Thank you very much Andrew for taking the time to play with my problem. It works perfectly.

earmijo · February 2017

Thank you very much Andrew. It works perfectly.

earmijo · February 2017

Thank you very much yyhuang. It works perfectly in this example.

How would I generalize it when the columns instead of being named "product1, product2, product 3..." were named "supermarket,gasoline,hotels, ...."? In the first case I can use (as you did) "product.*". I tried ".*" in my case and it doesn't work.

YYH · February 2017

Very good question @earmijo ! Actually you still can use de-pivot for the columns that are not easily selecte by regex. You just need extra step for table join.

check out this:

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="7.4.001" expanded="true" height="68" name="Create Document" width="90" x="112" y="34">
        <parameter key="text" value="customer_id,cluster,gas,grocery&#10;11,clust0,10.5,20.3&#10;22,clust1,15.75,30.12&#10;33,clust0,5.25,27.2&#10;44,clust1,12.75,8.25&#10;55,clust0,18.05,22.3&#10;66,clust1,1.25,3.75"/>
      </operator>
      <operator activated="true" class="text:write_document" compatibility="7.4.001" expanded="true" height="82" name="Write Document" width="90" x="246" y="34"/>
      <operator activated="true" class="read_csv" compatibility="7.3.001" expanded="true" height="68" name="Read CSV" width="90" x="380" y="136">
        <parameter key="column_separators" value=","/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
      </operator>
      <operator activated="true" class="generate_id" compatibility="7.3.001" expanded="true" height="82" name="Generate ID" width="90" x="514" y="136"/>
      <operator activated="true" class="select_attributes" compatibility="7.3.001" expanded="true" height="82" name="Select Attributes" width="90" x="648" y="136">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="cluster|customer_id"/>
        <parameter key="invert_selection" value="true"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.3.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="782" y="187">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="cluster|customer_id|id"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="de_pivot" compatibility="7.3.001" expanded="true" height="82" name="De-Pivot" width="90" x="782" y="34">
        <list key="attribute_name">
          <parameter key="value" value="(?!^id$)(^.*$)"/>
        </list>
        <parameter key="index_attribute" value="type"/>
        <parameter key="create_nominal_index" value="true"/>
      </operator>
      <operator activated="true" class="join" compatibility="7.3.001" expanded="true" height="82" name="Join" width="90" x="983" y="136">
        <parameter key="join_type" value="left"/>
        <parameter key="use_id_attribute_as_key" value="false"/>
        <list key="key_attributes">
          <parameter key="id" value="id"/>
        </list>
      </operator>
      <connect from_op="Create Document" from_port="output" to_op="Write Document" to_port="document"/>
      <connect from_op="Write Document" from_port="file" to_op="Read CSV" to_port="file"/>
      <connect from_op="Read CSV" from_port="output" to_op="Generate ID" to_port="example set input"/>
      <connect from_op="Generate ID" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="De-Pivot" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="original" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Join" to_port="right"/>
      <connect from_op="De-Pivot" from_port="example set output" to_op="Join" to_port="left"/>
      <connect from_op="Join" from_port="join" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

De-pivot?

Welcome!

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories