How can I have some melting function in rapidminer?

User: "smmsamm"
New Altair Community Member
Updated by Jocelyn

I am beginner in dataminer,

I have a list of 10000 rows and about 200 column like this :

 

look,1,2,3,4,5,6,7,8

book,4,5,6,7,8,102,104,107

look,6,7,8,9

hook,100,101,102

cook,7,8,9

build,102,103,104,107

hook,103,104,105

...

 

at first i need to make unique list of words:

look,1,2,3,4,5,6,7,8,9

book,4,5,6,7,8,102,104,107

hook,100,101,102,103,104,105

cook,7,8,9

build,102,103,104,107

 

Now I need to find lines with at least 3 (or n) similar values and generate a new list:

 

look,1,2,3,4,5,6,7,8,9

book,4,5,6,7,8,102,104,107

cook,7,8,9

*************

book,4,5,6,7,8,102,104,107

build,102,103,104,107

*************

hook,100,101,102,103,104,105

build,102,103,104,107

*************

 

Please help me in anyway

thank you

Find more posts tagged with

Sort by:
1 - 13 of 131
    User: "Thomas_Ott"
    New Altair Community Member
    What is melting function?
    User: "smmsamm"
    New Altair Community Member
    OP

    I Searched the internet and someone said python melt can help me, but I don't know how can I do in rapidminer!

    Hi,

    from the pandas doc for melt:

    “Unpivots” a DataFrame from wide format to long format, optionally leaving identifier variables set.

    I guess it maps to something along the lines of De-Pivot.  

    Best,

    Martin

    User: "Thomas_Ott"
    New Altair Community Member
    I guess I learned something new today!
    User: "sgenzer"
    Altair Employee

    so that's a fun puzzle.  I would begin like this (you will need @land's Statistics Extension to run this process):

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve smmsamm" width="90" x="45" y="85">
    <parameter key="repository_entry" value="smmsamm"/>
    </operator>
    <operator activated="true" class="de_pivot" compatibility="7.6.001" expanded="true" height="82" name="De-Pivot" width="90" x="179" y="85">
    <list key="attribute_name">
    <parameter key="foo" value="att[2-9]"/>
    </list>
    <parameter key="index_attribute" value="bar"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="bar"/>
    <parameter key="invert_selection" value="true"/>
    </operator>
    <operator activated="true" class="numerical_to_polynominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="447" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="foo"/>
    </operator>
    <operator activated="true" class="rmx_stat:cross_table" compatibility="1.3.000" expanded="true" height="82" name="Extract Cross Table" width="90" x="581" y="85">
    <parameter key="group_attribute_a" value="att1"/>
    <parameter key="group_attribute_b" value="foo"/>
    </operator>
    <connect from_op="Retrieve Untitled 3smmsamm" from_port="output" to_op="De-Pivot" to_port="example set input"/>
    <connect from_op="De-Pivot" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Numerical to Polynominal" to_port="example set input"/>
    <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Extract Cross Table" to_port="example set input"/>
    <connect from_op="Extract Cross Table" from_port="cross table output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    That said I am certain there is a cleverer way to do this!


    Scott

     

    User: "smmsamm"
    New Altair Community Member
    OP

    I updated my rapidminer and installed statics extension:

    !error0.jpg

    but I Get error:

    !error1.jpg
    and I can not find missing extension:

    !error2.jpg

    Would you please help again.

    Thank you

    User: "sgenzer"
    Altair Employee

    hmm I'm not sure the extension in the marketplace is up-to-date (Sebastian?).  I would go directly to the website: https://oldworldcomputing.com/products/statistics-extension-for-rapidminer

     

    Scott

    User: "smmsamm"
    New Altair Community Member
    OP

    This is my csv file.
    would you please test with it?

    User: "sgenzer"
    Altair Employee

    so the process I posted was not intended to be a finished product - just something to get you in the right direction.  :)  If you take that csv file and put it in my process, you get the attached result.

     

    Scott

    User: "smmsamm"
    New Altair Community Member
    OP

    Oh thank you sir, You are the master
    but These were samples data for test
    my real data have about 100000 difeerent value, with this method I will have about 100000 Columns?
    Is it possible to convert the list to my wanted list?

     

    look,1,2,3,4,5,6,7,8,9

    book,4,5,6,7,8,102,104,107

    cook,7,8,9

    *************

    book,4,5,6,7,8,102,104,107

    build,102,103,104,107

    *************

    hook,100,101,102,103,104,105

    build,102,103,104,107

    *************

    User: "smmsamm"
    New Altair Community Member
    OP

     

    !error0.jpg

    I mean these coloums convert to rows with header values?

    User: "sgenzer"
    Altair Employee

    Your flattery is noted and not deserved.  There are many here who are far more masterful than I.  That said, I think at this point I would recommend getting more knowledgable with RapidMiner Studio before moving forward with large data sets like the one you describe - actions such as renaming attributes and so forth are the beginning of a long journey.  I would highly recommend starting with the "Getting Started with RapidMiner" YouTube playlist.  The whole beauty of RapidMiner is that you can learn to create your own processes and be a master yourself!

     

    Scott

    User: "land"
    New Altair Community Member

    Hi all,

    I just published the most recent version of our extensions on the marketplace. So if that was the problem, it should be gone now. At least I can use it with the most recent version of RM.

     

    Greetings,

     Sebastian