How can I have some melting function in rapidminer?

smmsamm
smmsamm New Altair Community Member
edited November 2024 in Community Q&A

I am beginner in dataminer,

I have a list of 10000 rows and about 200 column like this :

 

look,1,2,3,4,5,6,7,8

book,4,5,6,7,8,102,104,107

look,6,7,8,9

hook,100,101,102

cook,7,8,9

build,102,103,104,107

hook,103,104,105

...

 

at first i need to make unique list of words:

look,1,2,3,4,5,6,7,8,9

book,4,5,6,7,8,102,104,107

hook,100,101,102,103,104,105

cook,7,8,9

build,102,103,104,107

 

Now I need to find lines with at least 3 (or n) similar values and generate a new list:

 

look,1,2,3,4,5,6,7,8,9

book,4,5,6,7,8,102,104,107

cook,7,8,9

*************

book,4,5,6,7,8,102,104,107

build,102,103,104,107

*************

hook,100,101,102,103,104,105

build,102,103,104,107

*************

 

Please help me in anyway

thank you

Tagged:

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member
    What is melting function?
  • smmsamm
    smmsamm New Altair Community Member

    I Searched the internet and someone said python melt can help me, but I don't know how can I do in rapidminer!

  • MartinLiebig
    MartinLiebig
    Altair Employee

    Hi,

    from the pandas doc for melt:

    “Unpivots” a DataFrame from wide format to long format, optionally leaving identifier variables set.

    I guess it maps to something along the lines of De-Pivot.  

    Best,

    Martin

  • Thomas_Ott
    Thomas_Ott New Altair Community Member
    I guess I learned something new today!
  • sgenzer
    sgenzer
    Altair Employee

    so that's a fun puzzle.  I would begin like this (you will need @land's Statistics Extension to run this process):

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve smmsamm" width="90" x="45" y="85">
    <parameter key="repository_entry" value="smmsamm"/>
    </operator>
    <operator activated="true" class="de_pivot" compatibility="7.6.001" expanded="true" height="82" name="De-Pivot" width="90" x="179" y="85">
    <list key="attribute_name">
    <parameter key="foo" value="att[2-9]"/>
    </list>
    <parameter key="index_attribute" value="bar"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="bar"/>
    <parameter key="invert_selection" value="true"/>
    </operator>
    <operator activated="true" class="numerical_to_polynominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="447" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="foo"/>
    </operator>
    <operator activated="true" class="rmx_stat:cross_table" compatibility="1.3.000" expanded="true" height="82" name="Extract Cross Table" width="90" x="581" y="85">
    <parameter key="group_attribute_a" value="att1"/>
    <parameter key="group_attribute_b" value="foo"/>
    </operator>
    <connect from_op="Retrieve Untitled 3smmsamm" from_port="output" to_op="De-Pivot" to_port="example set input"/>
    <connect from_op="De-Pivot" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Numerical to Polynominal" to_port="example set input"/>
    <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Extract Cross Table" to_port="example set input"/>
    <connect from_op="Extract Cross Table" from_port="cross table output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    That said I am certain there is a cleverer way to do this!


    Scott

     

  • smmsamm
    smmsamm New Altair Community Member

    I updated my rapidminer and installed statics extension:

    !error0.jpg

    but I Get error:

    !error1.jpg
    and I can not find missing extension:

    !error2.jpg

    Would you please help again.

    Thank you

  • sgenzer
    sgenzer
    Altair Employee

    hmm I'm not sure the extension in the marketplace is up-to-date (Sebastian?).  I would go directly to the website: https://oldworldcomputing.com/products/statistics-extension-for-rapidminer

     

    Scott

  • smmsamm
    smmsamm New Altair Community Member

    This is my csv file.
    would you please test with it?

  • sgenzer
    sgenzer
    Altair Employee

    so the process I posted was not intended to be a finished product - just something to get you in the right direction.  :)  If you take that csv file and put it in my process, you get the attached result.

     

    Scott

  • smmsamm
    smmsamm New Altair Community Member

    Oh thank you sir, You are the master
    but These were samples data for test
    my real data have about 100000 difeerent value, with this method I will have about 100000 Columns?
    Is it possible to convert the list to my wanted list?

     

    look,1,2,3,4,5,6,7,8,9

    book,4,5,6,7,8,102,104,107

    cook,7,8,9

    *************

    book,4,5,6,7,8,102,104,107

    build,102,103,104,107

    *************

    hook,100,101,102,103,104,105

    build,102,103,104,107

    *************

  • smmsamm
    smmsamm New Altair Community Member

     

    !error0.jpg

    I mean these coloums convert to rows with header values?

  • sgenzer
    sgenzer
    Altair Employee

    Your flattery is noted and not deserved.  There are many here who are far more masterful than I.  That said, I think at this point I would recommend getting more knowledgable with RapidMiner Studio before moving forward with large data sets like the one you describe - actions such as renaming attributes and so forth are the beginning of a long journey.  I would highly recommend starting with the "Getting Started with RapidMiner" YouTube playlist.  The whole beauty of RapidMiner is that you can learn to create your own processes and be a master yourself!

     

    Scott

  • land
    land New Altair Community Member

    Hi all,

    I just published the most recent version of our extensions on the marketplace. So if that was the problem, it should be gone now. At least I can use it with the most recent version of RM.

     

    Greetings,

     Sebastian